Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks

Xiao, Guangnian; Xia, Tong; Chen, Xinqiang; Ni, Anning

doi:10.3390/su18105044

Open AccessArticle

Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks

¹

School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China

²

Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China

³

School of Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(10), 5044; https://doi.org/10.3390/su18105044 (registering DOI)

Submission received: 5 April 2026 / Revised: 2 May 2026 / Accepted: 11 May 2026 / Published: 17 May 2026

Download

Browse Figures

Versions Notes

Abstract

Traffic assignment is crucial for urban traffic regulation and management. Based on this background, this study proposes a heterogeneous graph neural network that integrates Transformer-based multi-head self-attention for traffic assignment in urban road networks. The model builds a heterogeneous graph with both physical road links and virtual origin–destination links. It features a dual-encoder structure: the V-Encoder and the R-Encoder. The V-Encoder employs Transformer multi-head self-attention to capture long-range spatial relationships between origin and destination nodes. In contrast, the R-Encoder aggregates local topological features to characterize the transmission of flow across road segments. A combined loss function that includes flow conservation constraints is designed to ensure predictions are both accurate and physically realistic. Experiments on the Sioux Falls and EMA networks demonstrate that the method outperforms baseline models under various congestion conditions, exhibiting high accuracy and efficiency. Ablation tests show that Transformer multi-head self-attention is vital for performance enhancement. The approach also remains robust under abnormal conditions, such as in the case of incomplete OD demands, making it a practical solution for efficient, low-carbon, and sustainable traffic management.

Keywords:

traffic assignment; heterogeneous graph neural network; transformer multi-head attention; flow conservation; long-distance spatial relationship

1. Introduction

Traffic assignment is the core of traffic network analysis and urban mobility management, and underpins the full process of road network planning, signal control optimization, congestion alleviation, emergency evacuation, and sustainable urban transportation systems [1,2]. Solving the Traffic Assignment Problem (TAP) effectively reveals spatiotemporal flow patterns and supply-demand coupling characteristics, providing critical support for congestion governance, network optimization, resilience evaluation, and sustainable mobility [3]. Accurate traffic assignment results not only serve as an essential prerequisite for the low-carbon and economic scheduling [4], but also provide quantitative support for the implementation of carbon pricing policies [5]. Meanwhile, large-scale V2G adoption relies on precise traffic assignment and real-time network sensing, underscoring its value in sustainable low-carbon transportation systems [6]. Current mainstream research in this field is mainly divided into two classic paradigms: User Equilibrium (UE) and System Optimum (SO). The UE paradigm conforms to realistic travel decision-making mechanisms: travelers select paths to minimize individual travel costs, and eventually form a Nash equilibrium with no further improvement paths available through dynamic route gaming [7]. Game theory has been widely applied to analyze behavioral competition and decision interactions in travel service platforms [8], validating its rationality in traffic systems. Due to its high consistency with actual traffic flow evolution characteristics, UE has become the dominant research framework in urban traffic flow modeling and equilibrium solution studies [9].

Traditional solutions mainly rely on numerical iterative optimization algorithms such as the Frank–Wolfe (FW) method, the gradient projection algorithm, and the incremental assignment method. Although these models possess clear physical logic, they have some limitations. First, they suffer from long iteration times and difficult convergence in large-scale road networks. In addition, they generate excessive computational overhead under high-dimensional Origin–Destination (OD) demands, which makes them inadequate for real-time management and rapid prediction in ultra-large-scale urban road networks [10,11]. With the rapid development of big data perception technologies and deep learning theories, the extraction of microscopic traffic parameters has become more efficient and robust even under complex interference [12], and intelligent models have become advanced approaches to address the bottlenecks of traditional TAP solutions. These models include Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), and Transformers, which possess powerful nonlinear fitting capabilities, high-dimensional feature extraction, and end-to-end fast inference advantages [13]. Nevertheless, CNNs are constrained by grid-based modeling structures. They cannot effectively capture the non-Euclidean topological features of traffic networks, leading to an inadequate characterization of correlations among intersections, physical road links, and travel paths [14]. Existing homogeneous GNNs primarily focus on the topological modeling of physical road links and only capture adjacent connectivity relationships, ignoring the potential long-range supply–demand dependencies between OD pairs. This issue inevitably leads to remote node feature attenuation and low efficiency of long-distance information propagation [15].

Furthermore, most data-driven deep learning models for traffic assignment fail to fully integrate inherent physical mechanisms. These methods rely solely on data fitting without embedding fundamental traffic physical rules, such as traffic flow conservation, path-flow coupling, and link capacity constraints [16]. Consequently, its outputs may disobey real-world traffic constraints and lead to unreasonable traffic assignment. In addition, these models exhibit poor generalization ability. Their inference accuracy declines significantly when facing OD demand fluctuations, road network capacity disturbances, temporary link closures, and other out-of-distribution scenarios, which limits their practical engineering applications [17].

Taking advantage of Heterogeneous Graph Neural Networks (HetGNN) in modeling multi-type nodes and multi-semantic edges, this study constructs an end-to-end surrogate model for traffic assignment, which integrates the Transformer multi-head self-attention mechanism, designs a dual-encoder architecture with feature decoupling, and embeds the physical constraints of traffic flow conservation to optimize model training. The main contributions of this study are summarized as follows: (1) To overcome the drawbacks that homogeneous Graph Neural Networks fail to model heterogeneous traffic elements and pure Transformer methods lack prior knowledge of road network topology, a multi-semantic heterogeneous traffic graph is constructed by fusing physical road links and OD virtual auxiliary edges. The Transformer multi-head self-attention mechanism [18] is introduced to fill the research gap of insufficient capture of global long-range dependencies in conventional models. (2) A dual-encoder architecture coupled with adaptive graph attention is proposed to decouple the modeling of global OD demand features and local road network topological features, thereby realizing differentiated feature propagation. Compared with single-encoder structures and single attention mechanisms, it significantly improves the representation capability of complex traffic patterns. (3) The node-level traffic flow conservation principle is embedded into the training process to construct a composite loss function, which balances prediction accuracy and physical rationality of traffic flow. This effectively solves the practical challenges that existing models are difficult to adapt to incomplete OD demand and road network capacity disturbances, and enhances the model’s robustness.

2. Literature Review

The traffic assignment problem aims to reveal the spatial distribution patterns of network traffic flow and provide reliable decision support for traffic management and sustainable urban development. Existing research has largely focused on the optimization of traffic assignment models and the integration of deep learning techniques into transportation studies. This section provides a systematic literature review of recent advances in these areas.

2.1. Traffic Assignment Problem

Traditional traffic assignment methods are grounded in mathematical programming and variational inequality theories, with an emphasis on improving algorithmic convergence efficiency and computational performance. The core challenge lies in solving two classic paradigms: UE and SO [7]. UE assumes that travelers independently choose routes to minimize their own travel costs, leading to a Nash equilibrium state in which no traveler can improve their outcome by unilaterally switching paths: a behavioral assumption that aligns well with observed traffic patterns. In contrast, SO aims to minimize the total travel time across the network, reflecting the needs of global network regulation [19].

For the UE problem, the FW algorithm remains a classic solution approach, and numerous improvements have since been proposed. Lee et al. [20] proposed a conjugate gradient projection algorithm that optimizes the gradient update strategy to enhance convergence efficiency in large-scale networks but still relies on complete OD data. Fukushima [21] designed a modified FW algorithm that reduces the number of iterations through line search step size optimization. Babazadeh et al. [22] introduced a reduced-gradient algorithm to lower complexity, which is only suitable for small to medium networks.

However, traditional optimization algorithms generally rely on strong idealized assumptions, including fully known OD demands, perfectly rational drivers, and error-free road network information. In scenarios characterized by incomplete or noisy demand data and dynamically changing network conditions, these algorithms tend to underestimate link flows and yield traffic assignment results that deviate from real-world scenarios [23]. Moreover, their high computational complexity necessitates numerous iterations to achieve convergence in large-scale networks, limiting their applicability to real-time traffic management and the sustainable operation of smart cities [24].

In recent years, the rapid advancement of big data sensing technologies and deep learning has provided new ideas for traffic assignment. Data-driven methods [25] have gradually emerged as a mainstream alternative to traditional model-driven paradigms [26]. These approaches do not require complex physical assumptions; instead, they learn potential nonlinear relationships among OD demands, network topology, link attributes, and traffic flows, enabling end-to-end mapping. This capability significantly improves the real-time performance and dynamic adaptability of traffic assignment [15]. For instance, Zhang et al. [27] constructed a direct mapping model from OD demand to link flow using a multi-layer perceptron (MLP) to capture high-dimensional feature interactions, but ignoring network topology. Rahman and Hasan [28] employed a GCN to extract topological features of road networks and built a data-driven model, demonstrating its applicability on the Sioux Falls network, although its homogeneous nature fails to capture OD road correlations.

Nevertheless, existing deep learning models still exhibit limitations. CNNs are constrained by grid-structured modeling, making it difficult to adapt to the non-Euclidean topology of traffic networks and to fully capture the relationships among nodes, links, and paths [29]. Sequence models such as Long Short-Term Memory (LSTM) networks can capture temporal dynamics but fail to handle spatial non-Euclidean structures, leading to a lack of global rationality in assignment results [30]. Traditional GNNs are mostly homogeneous graph models that consider only physical road adjacency, overlooking potential dependencies between OD node pairs. This limitation often results in long-distance feature attenuation and inefficient information propagation [31]. Furthermore, most existing models prioritize fitting accuracy over constraint satisfaction; they do not incorporate basic physical principles such as traffic flow conservation and link capacity constraints. Consequently, some outputs violate traffic engineering laws and exhibit insufficient physical plausibility [32].

2.2. Applications of Heterogeneous Graph Neural Networks in Transportation

Network analysis has been widely applied to various transportation systems, such as port clusters, to reveal the topological connection and coordinated development characteristics of regional transportation networks [33]. Compared with the inherent drawbacks of homogeneous graph models, HetGNN adopt differentiated modeling for multiple types of nodes and multi-semantic edges, together with customized feature aggregation mechanisms. This enables the deep fusion of heterogeneous traffic elements and fundamentally compensates for the structural deficiencies of conventional graph models [34]. At present, the basic theoretical system of heterogeneous graph learning has become well-established. Representative methods, including heterogeneous graph attention networks and metapath aggregated graph neural networks, adopt strategies such as meta-path constraints, semantic-level attention weighting, and multi-dimensional fine-grained aggregation to realize hierarchical modeling of different node categories and their association relationships [35,36,37], which provides solid technical support for the modeling of complex transportation systems.

In the research field of traffic assignment, introducing heterogeneous graph structures to optimize global correlation mining has become a cutting-edge research trend. However, existing studies still have research gaps and fail to fully satisfy two core requirements: long-range dependence optimization and physical constraint coupling. Liu and Meidani [38] firstly introduced heterogeneous graphs into UE traffic assignment modeling, distinguishing physical road links from virtual OD connections and embedding basic traffic flow conservation constraints. Nevertheless, the model adopts a simplistic long-range information transmission mechanism, resulting in limited capability to capture cross-regional travel correlations. Tian et al. [39] constructed a heterogeneous graph with multi-type nodes and adopted edge-order-aware message passing to characterize the superposition effect of traffic flows. However, such models suffer from high computational complexity and cannot adapt to practical scenarios with incomplete OD demand data. Liu and Meidani [40] aggregated multi-source travel demands with identical travel purposes via virtual nodes to improve the model’s robustness, while neglecting the dynamic evolution of road networks and congestion boundary constraints. Meanwhile, several studies have enhanced the engineering practicality of models through dynamic adjacency matrix updating and flow threshold constraints [41,42]. Nonetheless, most of them only conduct partial improvements rather than starting from the underlying architecture of heterogeneous graphs to systematically address the common challenges of long-range feature decay, heterogeneous semantic separation, and insufficient embedding of physical constraints.

2.3. Research Gaps and Summary

Although remarkable progress has been made in optimizing traffic assignment algorithms and applying deep learning, critical research gaps remain. The generalization capability of current deep learning models under out-of-distribution scenarios, such as dynamic network topology changes and incomplete OD demand data, remains to be rigorously evaluated. Furthermore, while existing graph neural networks have improved heterogeneous feature fusion, the long-range dependency problem, a key limitation of traditional GNNs, has not received sufficient attention. Therefore, constructing a traffic assignment model that integrates long-range dependency modeling with heterogeneous feature fusion while maintaining strong generalization performance has become a core research direction for building intelligent and sustainable transportation systems.

3. Technical Background

This chapter systematically elaborates on the core theories of urban road traffic assignment, the fundamental principles of graph neural networks, and key enabling technologies. Traffic assignment theory, graph neural networks, and the core principle of Transformer multi-head self-attention are presented in this section.

3.1. Core Theory of Traffic Assignment

The traffic system contains complex heterogeneous interactions among OD demands, road links, and network topologies. Given a traffic network represented as a graph G = (V, E), nodes V correspond to travel demand points, while edges E denote physical road connections and implicit OD correlation relationships. As the core of transportation research, traffic assignment aims to reasonably distribute OD demands to multiple routes and links following travelers’ route choice principles, and further obtain key indicators, including link flow patterns and travel impedance. Relevant research relies on reasonable abstract assumptions of network topological characteristics and traveler route choice behaviors [7]. In general, the traffic assignment problem can be formulated as a constrained nonlinear optimization problem, and its general mathematical expression is presented as follows:

{m i n}_{f} : \sum_{e \in E} Z_{e} (f_{e})

(1)

where

Z_{e} (f_{e})

is the cost function of link e, whose value is determined by the link flow

f_{e}

, and different assignment criteria correspond to different forms of cost functions. These constraints are detailed sequentially in the following subsections.

\sum_{k \in K_{r s}} f_{k}^{r s} = q_{r s}, \forall r, s \in V

(2)

f_{k}^{r s} \geq 0, \forall k \in K_{r s}, \forall r, s \in V

(3)

x_{e} = \sum_{r, s \in V} \sum_{k \in K_{r s}} f_{k}^{r s} ζ_{e, k}^{r s}, \forall e \in E

(4)

t_{e} = t_{e} (x_{e}), \forall e \in E

(5)

where

f_{k}^{r s}

denotes the flow of the k-th feasible path between the OD pair (r, s), and

K_{r s}

represents the set of all feasible paths between the OD pair (r, s). Equation (2) is the flow conservation constraint, indicating that the sum of the flows of all feasible paths for a given OD pair equals the total travel demand of that OD pair. Equation (3) is the non-negativity constraint, which requires the flow of each path to be non-negative.

ζ_{e, k}^{r s}

is a 0–1 incidence variable: if link e belongs to the k-th path between the OD pair (r, s), then

ζ_{e k}^{r s} = 1

; otherwise, it equals 0. Equation (4) is the link flow aggregation constraint, meaning that the total flow of a link equals the sum of the flows of all paths passing through it, thereby establishing the mapping between path flows and link flows. The variable

x_{e}

represents the total flow on link

e

. Equation (5) describes that the travel impedance of a link is a function of its traffic flow. It captures the traffic characteristic of “higher flow leading to increased congestion and travel time,” serving as the fundamental basis for travelers’ route choice based on impedance in UE assignment. Implemented via the BPR function, this relationship ensures that traffic assignment results align with actual congestion characteristics of road networks.

3.2. Graph Neural Network Technology

Graph data is represented by an adjacency matrix

A \in R^{|V| \times |V|}

, a node feature matrix

X \in R^{|V| \times d}

(d denotes the node feature dimension), and an edge feature matrix

X_{e} \in R^{|E| \times d_{e}}

(

d_{e}

represents the edge feature dimension) [28]. To avoid numerical imbalance during feature aggregation, a normalized adjacency matrix is constructed as follows:

\hat{A} = D^{- \frac{1}{2}} (A + I) D^{- \frac{1}{2}}

(6)

where D denotes the degree matrix (with node degrees on the diagonal), and I is the identity matrix, which is used to preserve nodes’ self-features.

GCNs [43,44] update node features through spatial neighbor feature aggregation, with the core formula expressed as:

H^{(l + 1)} = σ (\hat{A} h^{(l)} W^{(l)})

(7)

where

H^{(l)}

is the node feature matrix of the l-th layer,

W^{(l)}

denotes the learnable weight matrix, and

σ (\cdot)

represents the nonlinear activation function. GCNs can effectively capture local spatial dependencies of road networks; however, they assign equal weights to all neighboring nodes and cannot distinguish differences in feature contributions.

To better adapt to the complex heterogeneous characteristics of traffic systems (including diverse nodes such as OD demand points and road segments, as well as multiple types of edges), the graph structure needs to be further extended to a heterogeneous graph. The construction of the traffic heterogeneous graph is illustrated in Figure 1.

In Figure 1, the left subgraph illustrates the traditional physical road network structure, consisting of link-node (intersection nodes) and real-link (physical roads). This graph only reflects the direct “intersection-road-intersection” connection relationship in line with the actual urban road network. It merely models the local topological dependencies among road segments and spatial correlations between adjacent nodes, while failing to capture the long-range travel demand correlations between non-adjacent OD nodes. Its core limitation lies in that it can only characterize the traffic flow transmission between adjacent intersections, but cannot describe the implicit correlation between geographically disconnected regions derived from massive travel demands.

The right subgraph extends the physical road network by introducing OD nodes and virtual links (virtual OD edges). Real links are represented by solid arrows, which capture the traversal relationships of physical roads and are responsible for propagating local traffic flow information. Virtual links are represented by dashed arrows; they do not correspond to actual roads but directly connect OD nodes with travel demand associations, enabling the model to capture long-distance travel demand dependencies between OD pairs.

By constructing this heterogeneous graph structure, the framework achieves deep integration of two types of heterogeneous information: physical network topology and OD travel demand. It preserves the ability to characterize local traffic flow propagation through physical links while enabling global feature propagation of long-distance OD demand via virtual links, thereby effectively overcoming the limitation of traditional homogeneous graph models that fail to accommodate the heterogeneity of traffic networks.

3.3. Fundamentals of the Transformer Multi-Head Self-Attention Mechanism

The core advantage of the Transformer lies in breaking the limitation of local dependencies through the multi-head self-attention mechanism [45], thereby enabling efficient interaction of long-range node features. Its core principles include Query (Q), Key (K), and Value (V) mapping and multi-head attention aggregation, which satisfy the requirement of capturing spatial relationships among OD nodes in traffic scenarios [46]. The QKV mapping layer projects node features into Q, K, and V vectors through linear transformations, enabling dimension alignment and feature enhancement.

Q = h W_{Q}, K = h W_{K}, V = h W_{V}

(8)

where h denotes the raw embedding of node features, and

W_{Q}

,

W_{K}

, and

W_{V}

are learnable linear transformation matrices.

Multi-head attention learns feature correlations across different dimensions through parallel attention heads, then concatenates and fuses the outputs of all heads to improve the comprehensiveness of long-range feature capture:

MultiHead (Q, K, V) = C o n c a t ({H e a d}_{1}, {H e a d}_{2}, \dots, {H e a d}_{h}) W_{O}

(9)

{H e a d}_{i} = Attention (Q W_{Q_{i}}, {KW}_{K_{i}}, V W_{V_{i}})

(10)

where h is the number of attention heads;

W_{Q_{i}}

,

W_{K_{i}}

, and

W_{V_{i}}

are the dedicated linear projection matrices for the i-th attention head;

W_{O}

represents the final output projection matrix. The attention function calculates feature similarity via the scaled dot product:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{K}}})

(11)

where

d_{K}

denotes the dimension of the

Q

/

K

vectors, which is fixedly determined by the linear projection layer and node embedding size in the network. In high-dimensional feature space, the original inner product of query and key will produce overly large values as the increase in feature dimension. Dividing by

d_{K}

is a standard scaling normalization strategy, which actively constrains the variance of similarity scores, limits the input range of softmax, and effectively avoids gradient vanishing and numerical divergence.

In the traffic assignment scenario, this mechanism can specifically capture multi-dimensional spatial dependencies, such as travel demand correlations between different OD pairs and congestion propagation effects among road segments. Each attention head independently learns correlation patterns of specific semantic dimensions. Parallel multi-head computation enables multi-perspective mining of global spatial features, effectively compensating for the limitation of conventional graph convolutions that only aggregate local neighbor features. It provides core technical support for the feature propagation of long-range feature propagation OD demands in the virtual encoder and significantly enhances the model’s ability to model global dependencies in complex traffic networks.

4. Architecture of the Dual Encoder Heterogeneous Graph Neural Network for Traffic Assignment

To fully capture the core influencing factors of traffic assignment, this study designs a two-stage feature extraction module that integrates the advantages of the Transformer multi-head self-attention mechanism and heterogeneous graph modeling. Through the cooperation of the Virtual Encoder (V-Encoder) and the Real Encoder (R-Encoder), it excavates long-range correlations of OD demands and local topological dependencies of the road network, respectively, thereby realizing the deep fusion of heterogeneous information.

4.1. Virtual Encoder (V-Encoder)

The V-Encoder takes virtual OD association edges as the carrier of feature propagation. Its core objective is to transmit long-range OD demand information and overcome the difficulty of capturing spatial dependencies among non-adjacent nodes. Its design deeply incorporates the Transformer multi-head self-attention mechanism with customized adaptations for traffic scenarios. The detailed workflow is described as follows.

A Transformer multi-head design with eight parallel attention heads is adopted, enabling each attention head to independently learn long-range demand correlation features from different dimensions and enhancing the feature representation capability of the model. The input feature of each attention head is obtained by linearly projecting the output feature of the (L − 1)-th layer to the specified dimension of the current head, ensuring consistent feature dimensions for parallel computation. Regarding the problem that virtual edges lack inherent features, a Feed Forward Network (FFN) is applied to concatenate and map the node embeddings

x_{u}^{L, i}

and

x_{v}^{L, i}

of nodes u and v in the i-th attention head of the L-th layer, to generate edge-level adaptive weights [47]. The calculation formula is:

β_{e}^{L, i} = FFN ([x_{u}^{L, i} ⨁ x_{v}^{L, i}]; W_{β}^{L, i}, b_{β}^{L, i})

(12)

where

x_{u}^{L, i}

and

x_{v}^{L, i}

represent the respective feature embeddings of node u and node v assigned to the i-th attention head in the L-th layer of the V-Encoder. The operator ⨁ is defined as the concatenation operation of two node feature vectors, which aggregates the latent information of paired nodes. The concatenated joint feature is fed into a position-wise feed-forward network with a sigmoid activation function in the hidden layer, which maps the joint feature to a scalar adaptive weight to mine node-pair correlation and generate edge attributes for virtual links.

W_{β}^{L, i}

and

b_{β}^{L, i}

are layer-wise and head-specific learnable weight matrices and bias terms of the FFN, respectively. The output

β_{e}^{L, i}

serves as the adaptive edge weight to compensate for the missing inherent features of synthetic virtual edges.

Subsequently, following the Query-Key mapping logic of Transformer, node features are linearly transformed to obtain the query

q_{u}

and key

k_{v}

. The feature similarity is calculated, and normalization is performed, combined with the adaptive edge weights and the feature dimension

d_{L}

to acquire the normalized attention scores:

s_{e}^{L, i} = \exp (\frac{q_{u}^{L, i} \cdot k_{v}^{L, i}}{\sqrt{d_{L}}} \cdot β_{e}^{L, i})

(13)

in which

d_{L}

denotes the feature dimension of the L-th layer;

q_{u}

and

k_{v}

are derived from node features via Transformer linear transformation layers, enabling effective long-range feature interaction. The features of outgoing neighboring nodes of node u are then weighted and aggregated based on the computed attention scores, and the attention weights are normalized by summing over all outgoing neighbors of u. After layer normalization and two-layer FFN processing, residual connections are used to fuse with the original feature

x_{u}^{L}

yielding the feature update result of a single attention head:

x_{u}^{L + 1, i} = x_{u}^{L, i} + LayerNorm (FFN (\sum_{v \in N_{o} (u)} \frac{s_{(u, v)}^{L, i} \cdot v_{v}^{L, i}}{\sum s_{(u, v)}^{L, i}}))

(14)

Finally, the output features of all attention heads are concatenated to obtain the final output

U_{n}^{0}

of the V-Encoder, i.e., the long-range correlation features of nodes. The core components of the Virtual Encoder are illustrated in Figure 2.

The FFN learns the correlation features of OD node pairs to generate adaptive weights

β_{e}

, which are fused with standard attention scores to realize the modeling of long-distance travel demands.

4.2. Real Encoder (R-Encoder)

The R-Encoder adopts real road edges of the network as the feature propagation carrier, focusing on extracting local topological features of the road network to compensate for the deficiency of the V-Encoder in modeling physical road structures. Its overall architecture is similar to that of the V-Encoder, including multi-head attention, feed-forward network, layer normalization, and residual connection, while the core difference lies in that the attention mechanism is specially adapted to the inherent attributes of real edges. Following the same 8-parallel-head multi-head attention setting as the V-Encoder, the node features are linearly projected to generate query, key, and value vectors for each attention head. The detailed implementation is presented as follows. First, the standardized real-edge features

β_{e, p}^{r}

(e.g., link capacity, free-flow travel time, etc.) are integrated into the attention score calculation with scaled dot-product normalization, which is formulated as:

s_{e}^{M, j} = \exp (\sum_{p = 1}^{P} \frac{q_{u}^{M, j} \cdot k_{v}^{M, j}}{\sqrt{d_{L}}} \cdot β_{e, p}^{r})

(15)

where P is the dimension of real-edge features, M is the layer index of the R-Encoder,

d_{L}

is the feature dimension of query/key vectors for stable similarity calculation, and

q_{u}^{M, j}

,

k_{v}^{M, j}

denote the query and key vectors of nodes u and v in the j-th attention head of the M-th layer, respectively.

Second, the R-Encoder aggregates features only over direct adjacent nodes connected by real edges: it computes a weighted summation of neighbor node features based on attention scores, and normalizes the weights over all outgoing neighbors to obtain stable attention-weighted node representations. The weighted features are further processed by a two-layer FFN with sigmoid activation, followed by layer normalization and residual connection to fuse the original node features, so as to complete the feature update of each attention head. The outputs of all attention heads are then concatenated to form the comprehensive node embedding of the current layer.

Finally, after feature extraction and propagation through two stacked R-Encoder layers, the final node embedding

O \in R^{| V | \times N_{h}}

is output, which integrates long-range OD demand correlations and local road network topological dependencies (

N_{h}

= 64 denotes the hidden dimension). This embedding serves as the core input for the subsequent link flow prediction task.

This module achieves complementary advantages through the collaborative dual encoder architecture. The V-Encoder overcomes the challenge of long-range OD demand feature propagation through the Transformer multi-head self-attention mechanism, while the R-Encoder strengthens the capture of local topological dependencies. In combination, they encompass the key influencing factors of traffic assignment. Meanwhile, an adaptive attention mechanism is introduced to deeply fuse edge-level adaptive weights with inherent real-edge features, enabling the model to dynamically adjust the importance of nodes and links and adapt to complex traffic scenarios (e.g., high-demand OD pairs and critical road segments), which helps sustain stable and efficient network performance.

4.3. Traffic Assignment Prediction and Loss Function Design

Based on the node and edge features extracted by the dual encoders, the prediction logic for traffic assignment is designed as follows. First, edge embedding features are constructed. The start-node embedding

o_{u}

, end-node embedding

o_{v}

, and the standardized physical feature of real edges

β_{e}^{r}

are integrated through feature concatenation to obtain edge-level embeddings that fuse the spatial characteristics of nodes, OD demand correlations, and inherent edge attributes:

f_{e} = o_{u} ⨁ o_{v} ⨁ β_{e}^{r}

(16)

A two layer FFN(consistent with the FFN structure in the dual encoders) is adopted to map the edge embedding to the flow-capacity ratio

{\tilde{α}}_{e} = F F N (f_{e}, W_{o}, b_{o})

, where

W_{o}, b_{o}

are learnable parameters of the FFN. The FFN consists of a hidden linear layer with sigmoid activation and an output linear layer, which maintains structural consistency with the feature mapping networks in the V-Encoder and R-Encoder. The scaled sigmoid activation function is adopted in the output layer to constrain the predicted flow-capacity ratio within a reasonable range of (0, 1.2). This range is set based on actual traffic characteristics, allowing road links to operate in an oversaturated state where the actual traffic flow can moderately exceed the designed link capacity, with an upper bound of 1.2. Finally, combined with the known design capacity

c_{e}

of each link, the predicted link flow is derived as

\tilde{f_{e}} = {\tilde{α}}_{e} \cdot c_{e}

, realizing the conversion from relative operational status to absolute flow values.

To balance prediction accuracy and traffic flow physical rationality, we construct a composite total loss function

L_{t o t a l}

that integrates supervised loss and flow conservation loss [48]. The corresponding weight coefficients are optimized via validation set calibration. The supervised loss

L_{s}

measures the deviation between predicted and ground-truth values, including flow loss

L_{f}

and flow-capacity ratio loss

L_{α}

:

L_{s} = L_{f} + L_{α} = \frac{1}{|E_{r}|} \sum_{e ϵ E_{r}} ||f_{e} - \tilde{f_{e}}|| + \frac{1}{|E_{r}|} \sum_{e ϵ E_{r}} ||α_{e} - \tilde{α_{e}}||

(17)

The flow conservation loss

L_{c}

ensures that the predicted flows satisfy node-level conservation constraints and penalizes discrepancies between inflow and outflow:

L_{c} = \sum_{i ϵ V} | \sum_{k \in N_{i} (i)} \tilde{f_{k i}} - \sum_{j \in N_{o} (i)} \tilde{f_{i j}} - ∆ f_{i} |

(18)

where

∆ f_{i}

denotes the flow difference at node i (equal to the total demand for OD nodes and zero for non-OD nodes). The total loss function integrates all terms via weighted summation:

L_{t o t a l} = w_{α} \cdot L_{α} + w_{f} \cdot L_{f} + w_{c} \cdot L_{c}

(19)

The overall framework of the proposed dual-encoder heterogeneous graph neural network for traffic assignment is illustrated in Figure 3. At the feature extraction stage, a two-stage collaborative structure is adopted: the V-Encoder leverages a customized Transformer multi-head self-attention mechanism combined with an adaptive edge weight generation strategy to capture long-range pairwise demand correlations through virtual OD links. The R-Encoder focuses on real road network topology, incorporates inherent physical road attributes into attention calculation, and strengthens the modeling of local flow propagation and congestion spreading. At the prediction and optimization stage, based on the fused features output by the dual encoders, volume-to-capacity ratio prediction and traffic flow estimation are realized through edge embedding concatenation and FFN mapping. A composite loss function combining supervised learning and node flow conservation constraints is constructed, which guarantees prediction accuracy while strictly complying with the physical laws of traffic flow propagation.

5. Experiments and Conclusions

All experiments are implemented using Python 3.10, PyTorch 2.5.1, and the Deep Graph Library for heterogeneous graph modeling. Model training is conducted on an NVIDIA RTX 3080 GPU equipped with 16 GB DDR4 3200 MHz RAM, with computation acceleration enabled via the CUDA parallel computing platform. The dataset is divided into training, validation, and test sets with a ratio of 8:1:1. The initial learning rate is set to 0.001 and decayed to 1 × 10⁻⁶ using the cosine annealing scheduler. The batch size is fixed at 128, and an early stopping strategy is adopted. The optimal model parameters from the validation set are selected to avoid overfitting, improve generalization capability, and ensure assignment accuracy. The Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination (R²) are adopted as evaluation metrics to quantify the model’s prediction performance. The mathematical definitions of these metrics are given as follows:

MAE = \frac{1}{N} \sum_{i = 1}^{N} |\hat{y_{i}} - y_{i}|

(20)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(\hat{y_{i}} - y_{i})}^{2}}

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(\hat{y_{i}} - y_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(22)

Although the proposed data-driven traffic assignment method is designed for real-world traffic scenarios, the current coverage density and data availability of traffic sensing technologies still cannot meet the requirements for large-scale and multi-scenario model validation. Therefore, a standardized synthetic dataset is generated based on the UE theory to ensure the rigor and comparability of experimental verification. The UE assignment results are used as the ground truth data to validate the learning effectiveness and generalization capability of the data-driven model.

Two widely recognized standard benchmark networks in the transportation field are selected for experiments: the Sioux Falls network and the East Massachusetts (EMA) network, as illustrated in Figure 4. The Sioux Falls network (Figure 4a) is a classic grid-like test case with 24 nodes and 76 directed links. The EMA network (Figure 4b) is a real-world large-scale network with 74 nodes and 258 directed links, representing a more realistic and complex urban road structure. The numbers in the diagrams denote the unique IDs of nodes and links. These two typical networks cover small and medium-scale urban road topological structures with distinct scales, road layouts, and travel demand characteristics, which can effectively verify the adaptability and generalization performance of the proposed model under diverse urban traffic conditions. Their topological structures, link attributes, and initial OD demand data are obtained from the open-source project Transportation Networks for Research Core Team [49]. This open dataset has been widely adopted and repeatedly validated in long-term academic research, serving as an authoritative and classic benchmark for evaluating various traffic assignment algorithms. The core characteristics of the two networks are summarized in Table 1.

5.1. Experimental Design and Overall Performance Analysis

To verify the overall performance and scenario adaptability of the proposed HetGNN, the study designs systematic experiments from three perspectives: overall performance comparison, effectiveness validation of core modules, and anti-interference capability under abnormal scenarios. To fully cover complex practical traffic conditions such as demand fluctuations, congestion disparities, and missing data, and to enhance the authenticity and comprehensiveness of the experiments, multi-dimensional data expansion and scenario-based processing are implemented on the original datasets (the expanded and processed data are independently generated in this study), as detailed below:

Randomization of OD demands: The original dataset provides a fixed initial OD demand matrix. For each travel volume in this original OD demand matrix, a random scaling factor following the uniform distribution U (0.1, 1.0) is applied to independently generate 5000 groups of differentiated demand samples. This fluctuation range refers to mainstream traffic demand simulation studies, which conforms to the variation law of urban travel demand in peak hours, off-peak hours, and on different dates. This simulates travel demand fluctuations of urban networks across different time periods and dates, ensuring that the model performance is fully validated under various demand intensities.
Construction of multi-level congestion scenarios: The original road network dataset only provides basic link capacity data and does not include pre-defined congestion levels. Based on the core index of the volume-to-capacity ratio (V/C ratio) of road segments (calculated using the original link capacity and the generated OD demand data), three typical traffic scenarios are independently defined in this study. The V/C ratio classification thresholds in this study adopt widely recognized traffic operation evaluation criteria and follow the classical congestion division standards in existing traffic network research, realizing quantitative and reasonable grading of network operation status: Free-flow scenario (V/C < 0.6, where the entire network remains in a smooth free-flow state); Moderate congestion scenario (0.5 ≤ V/C ≤ 1.0, supply and demand remain balanced and critical links approaching saturation); Severe congestion scenario (V/C > 1.0, representing oversaturated network operation with flow accumulation and congestion propagation on major road segments). Figure 5 presents the boxplot distribution of link V/C ratios (calculated based on original and generated data) for the Sioux Falls network under the three scenarios, visually demonstrating the operational differences among distinct congestion levels.

Figure 5 illustrates the differences in network operational characteristics of the Sioux Falls network under various congestion states. In the non-congested condition, the link V/C ratios are generally concentrated below 0.6 with a low median and compact distribution, indicating that the entire network operates smoothly under free flow conditions. In the moderate congestion condition, the overall V/C distribution shifts upward; most links fall within the range of 0.5 to 1.0, and several critical links approach the capacity threshold, leading to a balanced supply-demand status across the network. In the severe congestion scenario, the V/C ratios increase significantly, with numerous links exceeding 1.0 and showing a much higher distribution dispersion. Severe congestion and flow accumulation occur on core links, placing the network under high-load operation.

The overall performance comparison experiments are conducted under ideal baseline conditions with complete OD demands and no network disturbances. The main objective is to verify the core assignment capability of the proposed HetGNN under standard operating conditions. All variables are strictly controlled to ensure full OD information integrity, stable link capacity, and fixed network topology, focusing solely on the model’s ability to learn the essential mapping relationships of traffic assignment. Two key evaluation categories are adopted: flow prediction accuracy and feature fitting performance. The proposed HetGNN is compared with baseline models on the Sioux Falls and EMA networks with different complexity levels. MAE and RMSE are used to quantify deviations between predicted and ground-truth flows, while R² is adopted to evaluate the model’s explanatory power regarding flow distribution patterns. The experimental results are presented in Table 2.

Table 2 presents the experimental results for the two road networks of different scales, Sioux Falls and EMA. It shows that the proposed HetGNN model achieves better performance than the baseline models GAT and T-GCN in terms of flow prediction accuracy under the experimental settings of this study. Across free-flow, moderate congestion, and severe congestion conditions, HetGNN maintains lower MAE and RMSE values, as well as higher R² scores, indicating improved fitting performance for traffic assignment. In the congested scenario of the Sioux Falls network, HetGNN achieves an MAE of 48.65, an RMSE of 74.38, and an R² of 0.9890. Compared with GAT, the MAE and RMSE are reduced by 23.6% and 22.4%, while the R² increases by 2.1%. Relative to T-GCN, the MAE and RMSE decrease by 13.9% and 12.7%, and the R² rises by 1.3%. For the larger-scale EMA network under congested conditions, HetGNN still yields lower MAE and RMSE than the baseline models, with the R² remaining above 0.97. As the congestion level intensifies and the network scale expands, its performance advantage remains stable, showing good adaptability across the tested scenarios.

To verify the statistical significance of the performance differences, paired t-tests based on 5000 groups of OD demand samples are conducted. The results demonstrate that the performance improvements of HetGNN compared with GAT and T-GCN are statistically significant (p < 0.05). Further error analysis based on multi-scenario MAE and RMSE results shows that prediction errors are closely related to traffic congestion levels. Higher prediction errors appear in oversaturated and severely congested scenarios with complex traffic fluctuations, resulting in increased MAE and RMSE values. In contrast, the model achieves more accurate outputs and lower error metrics under stable free-flow conditions.

The computational efficiency is summarized in Table 3, where Train (min) denotes the model training time in minutes and Infer (s) represents the single-network inference time in seconds.

In the small-scale Sioux Falls road network shown in Table 3, the training time of HetGNN ranges from 21.5 to 22.1 min, and the inference time for a single network is between 0.13 and 0.14 s. Both metrics are lower than those of GAT and T-GCN, showing competitive performance in inference efficiency. In the medium-to-large-scale EMA road network, although the training time of HetGNN (33.3–34.2 min) is slightly longer than that of T-GCN, it is shorter than that of GAT. Its single-network inference time (0.20–0.21 s) is close to T-GCN and outperforms GAT (0.25–0.26 s) to a certain extent. Overall, the proposed model achieves a reasonable trade-off between prediction accuracy and computational efficiency under the experimental settings of this study, which is essential for real-world sustainable traffic management.

The coefficients of determination of the model under different congestion levels for the Sioux Falls and EMA traffic networks are presented in Figure 6 and Figure 7.

Figure 6 and Figure 7 illustrate the coefficient of determination (R²) distributions of flow predictions generated by the HetGNN model under non-congested condition, moderately congested condition, and congested condition for the Sioux Falls and EMA traffic networks, respectively.

For the Sioux Falls network, the R² values of HetGNN reach 0.9999, 0.9954, and 0.9890 under the three traffic conditions, which are higher than those of GAT and T-GCN. Even in congested conditions, the model maintains a high fitting accuracy; the scatter points closely align with the fitting line without obvious discrete deviations. For the larger and more topologically complex EMA network, the R² values of HetGNN are 0.9914, 0.9841, and 0.9725 under the three conditions, also outperforming GAT and T-GCN. As the congestion level increases, the decline in R² is smaller than that of the baseline models, indicating better robustness under varying traffic conditions. The decrease in R² under severe congestion is mainly attributed to the intensified nonlinearity and increased uncertainty of traffic flow on oversaturated road segments, which is a common challenge in traffic assignment tasks.

To explore the effects of key hyperparameters on the prediction performance of the proposed HetGNN model, this section conducts a sensitivity analysis on three representative core hyperparameters using a controlled variable method, with all other experimental conditions and model parameters held constant. Notably, all experiments are carried out under the uncongested condition. These three hyperparameters correspond to three distinct dimensions: model architecture, core mechanism, and training strategy. This selection enables an assessment of the model’s sensitivity to different categories of hyperparameters. Adopting Mean Absolute Error as the quantitative evaluation metric, the sensitivity level of each parameter is determined by comparing the magnitude of error fluctuations across different parameter settings. The results are presented in Figure 8. The red circles mark the optimal hyperparameter settings with the lowest MAE for each dataset.

The number of hidden units determines the model’s feature representation capability and nonlinear fitting capability. As shown in Figure 8a, as the number of hidden units gradually increases from 8 to 32, the MAE for both the Sioux Falls and EMA road networks exhibits a significant downward trend, with prediction accuracy improving progressively. After exceeding the optimal value of 32, the number of redundant parameters in the network increases, leading to a decline in the model’s generalization ability. Overall, the magnitude of error fluctuations induced by adjusting the number of hidden units is the largest among the three hyperparameters, indicating that the proposed HetGNN model is most sensitive to the number of hidden units.

The number of multi-head attention heads governs the feature aggregation efficiency of the dual encoders and their ability to capture long-distance dependencies. As shown in Figure 8b, the MAE for both road networks continues to decrease as the number of attention heads increases from 1 to 8. Once the number of heads exceeds the optimal value of 8, further increases fail to enhance feature representation capability, and the error only shows a slight rebound. By comparison, the magnitude of error fluctuations under different attention head settings is significantly smaller than that caused by the number of hidden units, classifying it as a moderately sensitive parameter.

As a training-phase hyperparameter, batch size primarily affects the model’s training convergence speed and computational load on hardware, with a limited impact on the model’s inherent feature learning and inference logic. As shown in Figure 8c, when the batch size is adjusted within a reasonable range, the MAE curves for both road networks remain generally flat, with only minor fluctuations in errors. This indicates that batch size does not significantly interfere with the traffic flow modeling process of HetGNN, and the model is least sensitive to batch size.

Although the proposed HetGNN model achieves robust and stable traffic prediction in small and medium-scale road networks under regular and congested conditions, it presents evident performance limitations and prediction degradation in large-scale complex networks such as Anaheim under severe congestion.

For the small-scale Sioux Falls network, the model obtains a high R² of 0.9890 in congested scenarios. The medium-scale EMA network still maintains satisfactory fitting performance with an R² of 0.9725. By contrast, the model performance degrades progressively with the expansion of network scale and congestion severity in the Anaheim network. Its R² declines from 0.8748 in uncongested conditions and 0.6449 under moderate congestion to only 0.4639 in severe congestion, with a reduction of 53.5% compared with the Sioux Falls network. In this case, the model cannot accurately capture the spatiotemporal distribution characteristics of traffic flow.

The Anaheim network contains massive nodes and edges with intricate topological structures, and congested traffic flow shows strong nonlinearity and time-varying disorder. When applied to large-scale congested networks, the dual encoders suffer from limited long-range feature propagation, and the multi-head attention mechanism fails to mine cross-regional traffic dependencies effectively. Consequently, the heterogeneous graph aggregation and mapping capacity is weakened, restricting the model’s applicability to complex traffic patterns in large-scale congested road networks.

5.2. Ablation Experiments on Core Modules

To analyze the functional mechanisms and performance contributions of the three core components in the proposed HetGNN, namely virtual OD edges (Ev), the Transformer multi-head self-attention mechanism (MHA), and the dual encoder feature aggregation architecture (V-Encoder and R-Encoder), systematic ablation experiments are designed. Using the controlled variable method, in which core modules are removed one by one, three ablation models are constructed and compared with the original model on the EMA network to ensure the validity and interpretability of the experimental results.

The definitions and design principles of each ablation model are described as follows:

Ablation Model 1 (HetGNN-ΔEv): Virtual OD edges in the heterogeneous graph are removed, retaining only physical road edges to form a homogeneous graph. This model verifies the essential role of virtual OD edges in enabling long-range propagation between OD demands and topological features.

Ablation Model 2 (HetGNN-ΔMHA): The Transformer multi-head self-attention mechanism inside the V-Encoder is replaced with conventional single-head attention for feature aggregation. This setup quantifies the performance gain brought by multi-head attention in mining multi-dimensional long-range demand correlations and boosting feature representation capability.

Ablation Model 3 (HetGNN-ΔEncoder): The dual-encoder collaborative framework is eliminated. Instead, a single attention encoder integrating both physical edges and virtual OD edges is adopted for global feature aggregation. This model validates the functional complementarity between the V-Encoder (capturing long-range demand correlations) and the R-Encoder (capturing local topological dependencies).

The ablation experiments adopt prediction accuracy (MAE, RMSE) and fitting goodness (R²) as key evaluation indicators to comprehensively compare performance discrepancies between the original model and the ablation variants. The detailed results are summarized in Table 4. The intuitive comparisons of MAE and RMSE among HetGNN and its ablation versions on the EMA network are presented in Figure 9, revealing the contribution degree of each core module on the overall model performance.

The ablation results show that the virtual OD edges, multi-head self-attention, and dual-encoder architecture are three core components of HetGNN, and removing any module greatly degrades model performance. In the EMA network, the full HetGNN obtains the optimal prediction. The model without virtual OD edges suffers the most obvious performance drop, which highlights the importance of long-range OD correlations for traffic assignment. Removing MHA or the dual-encoder structure also raises prediction errors, indicating that multi-head attention captures complex feature relationships, and the dual-encoder design effectively fuses global OD dependence and local topological information. In addition, although deleting individual modules can slightly reduce inference time, the obvious accuracy loss cannot be ignored. The complete HetGNN finally realizes a reasonable balance between prediction accuracy and computational efficiency.

5.3. Robustness Analysis Under Abnormal Traffic Scenarios

Practical road network operations are frequently affected by detector failures, data transmission interruptions, and missing acquisition information, often leading to incomplete OD demand data. To fully validate the anti-interference capability and generalization performance of the proposed HetGNN under incomplete information, comparative experiments for abnormal scenarios are conducted under stable, free-flow traffic conditions. A random masking strategy is adopted to gradually zero out entries in the original OD demand matrix, setting three missing levels of 20%, 30%, and 40% to simulate data loss conditions ranging from mild to severe. Under unified experimental settings and evaluation metrics, the flow prediction and assignment performance of HetGNN are compared with baseline models, including GAT and T-GCN, under incomplete demand inputs, so as to quantify the influence of different missing ratios on model accuracy.

The quantitative results are summarized in Table 5, illustrating error variation patterns across multiple scenarios. Meanwhile, visualized comparisons including the MAE curves under different OD missing ratios for the Sioux Falls network are presented in Figure 10. These results further highlight that, empowered by its heterogeneous graph structure and dual-encoder feature completion mechanism, the proposed HetGNN can maintain stable, high-accuracy traffic assignment even when key demand information is incomplete.

Under partial OD demand missing scenarios, the proposed model demonstrates better robustness and anti-interference capability than baseline models. As the OD missing ratio increases, the model’s error grows at a slower rate than baseline models, maintaining relatively low prediction errors across all missing ratio levels. This outcome confirms that the model’s heterogeneous graph structure and dual-encoder design can effectively capture the intrinsic correlation between OD demands and network flow. Even with incomplete OD information, the model can supplement missing data through global feature propagation, ensuring stable performance under abnormal data conditions.

6. Conclusions and Future Work

This study develops a HetGNN integrated with the Transformer multi-head self-attention mechanism for urban road traffic assignment. The model adopts a dual-encoder architecture to decouple and fuse long-range OD demand correlations and local topological features of road networks. A composite loss function embedded with flow conservation constraints is adopted to ensure the physical rationality of the prediction outputs. Experimental validations demonstrate that the proposed model outperforms the baseline models in prediction accuracy, computational efficiency under various congestion conditions, and robustness against incomplete OD demands. Nevertheless, the model suffers from significant performance degradation in the severely congested scenarios of the large-scale Anaheim road network.

In future research, targeting the core limitations of insufficient long-range feature propagation efficiency and weak cross-regional traffic dependency mining capability in large-scale congested networks, a hierarchical heterogeneous graph topology and an adaptive dynamic attention mechanism will be designed to strengthen global feature aggregation and local topology perception for large-scale complex road networks. The feature propagation and fusion strategies of the dual encoders will be further refined to improve the fitting accuracy of the model for strongly nonlinear traffic flow under severe congestion. In addition, model lightweight optimization will be carried out to reduce the computational overhead of large-scale network inference, so as to comprehensively improve the prediction performance and engineering applicability of the model in complex large-scale urban road networks [50]. Furthermore, accurate traffic flow modeling and intelligent path guidance help mitigate traffic congestion, cut down carbon emissions and fossil fuel consumption, and promote the low-carbon and sustainable eco-friendly transformation of urban transportation [51].

Author Contributions

Conceptualization, G.X.; formal analysis, G.X. and T.X.; investigation, G.X. and T.X.; methodology, G.X. and T.X.; resources, G.X., X.C. and A.N.; software, T.X.; supervision, G.X., X.C. and A.N.; validation, T.X.; writing—original draft, T.X.; writing—review and editing, G.X. and T.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors gratefully acknowledge the support from the School of Economics and Management, Shanghai Maritime University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sheffi, Y. Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods; Prentice-Hall: Englewood Cliffs, NJ, USA, 1985; pp. 23–32. [Google Scholar]
Li, Y.; Pu, Z.; Liu, P.; Qian, T.; Hu, Q.; Zhang, J.; Wang, Y. Efficient predictive control strategy for mitigating the overlap of EV charging demand and residential load based on distributed renewable energy. Renew. Energy 2025, 240, 122154. [Google Scholar] [CrossRef]
Zhang, H.; Dong, Y.; Xu, X.; Liu, Z.; Liu, P. A novel framework of the alternating direction method of multipliers with application to traffic assignment problem. Transp. Res. Part C Emerg. Technol. 2024, 169, 104843. [Google Scholar] [CrossRef]
Xiao, Y.; Xiao, G.; Li, J. Photovoltaic-energy storage systems empowered: Low-carbon and economic scheduling for electric buses. Transp. Res. Part D Transp. Environ. 2026, 150, 105082. [Google Scholar] [CrossRef]
Tang, R.; Yu, D.; Li, Y.; Tan, Y.; Shang, W.L.; Han, C.; Yang, M.; Ieromonachou, P. Accelerating the global energy transition through carbon pricing: An ex-post analysis of emissions reduction effects and mechanisms based on international data. Front. Eng. Manag. 2026; in press. [CrossRef]
Shang, W.; Chen, H.; Watling, D.; Ochieng, W. How Far Are we from the large-scale adoption of V2G technology? Front. Eng. Manag. 2026, 13, 240–245. [Google Scholar] [CrossRef]
Wang, C.; Tang, Y.Q. The Discussion of System Optimism and User Equilibrium in Traffic Assignment with the Perspective of Game Theory. Transp. Res. Procedia 2017, 25, 2970–2979. [Google Scholar] [CrossRef]
Guo, X.; Xiao, G. Display slot competition and multi-homing in ride-hailing aggregator platforms: A game-theoretic analysis of profit and welfare implications. Sustainability 2026, 18, 3625. [Google Scholar] [CrossRef]
Batista, S.F.A.; Cantelmo, G.; Menéndez, M.; Antoniou, C.; Leclercq, L. Activity-based user equilibrium considering aggregated traffic dynamics emulated using the Macroscopic Fundamental Diagram. Transp. Res. Part C Emerg. Technol. 2025, 171, 104980. [Google Scholar] [CrossRef]
Van Vliet, D. The Frank-Wolfe algorithm for equilibrium traffic assignment viewed as a variational inequality. Transp. Res. Part B Methodol. 1987, 21, 87–89. [Google Scholar] [CrossRef]
Xu, Z.; Chen, A.; Li, G.; Li, Z.; Liu, X. Elastic-demand bi-criteria traffic assignment under the continuously distributed value of time: A two-stage gradient projection algorithm with graphical interpretations. Transp. Res. Part E Logist. Transp. Rev. 2024, 183, 103425. [Google Scholar] [CrossRef]
Chen, X.; Zhang, Z.; Li, Z.; Han, B.; Zheng, Y.; Biancardo, S.A. Microscopic aggregated traffic parameter extraction against complex camera motion interference. Transp. Saf. Environ. 2025, 7, tdaf056. [Google Scholar] [CrossRef]
Raadsen, M.P.H.; Bliemer, M.C.J.; Bell, M.G.H. Aggregation, disaggregation and decomposition methods in traffic assignment: Historical perspectives and new trends. Transp. Res. Part B Methodol. 2020, 139, 199–223. [Google Scholar] [CrossRef]
Ahmed, S.F.; Kuldeep, S.A.; Rafa, S.J.; Fazal, J.; Hoque, M.; Liu, G.; Gandomi, A.H. Enhancement of traffic forecasting through graph neural network-based information fusion techniques. Inf. Fusion 2024, 110, 102466. [Google Scholar] [CrossRef]
Shao, F.; Shao, H.; Wu, X.; Cheng, Q.; Lam, W.H.K. A physics-informed machine learning framework for speed-flow prediction: Integrating an S-shaped traffic stream model with deep learning models. Transp. Res. C Emerg. Technol. 2025, 180, 105362. [Google Scholar] [CrossRef]
Guarda, P.; Qian, S. Traffic estimation in unobserved network locations using data-driven macroscopic models. Transportmetr. A Transp. Sci. 2025; in press. [CrossRef]
Liu, X.; Zhou, M.; Dong, H. Joint rescheduling for timetable and platform assignment of high-speed railways via graph neural network based deep reinforcement learning. Transp. Res. Part E 2025, 202, 104277. [Google Scholar] [CrossRef]
Dong, P.; Zhang, X. ST-GTrans: Spatio-temporal graph transformer with road network semantic awareness for traffic flow prediction. Neural Netw. 2025, 190, 107623. [Google Scholar] [CrossRef]
He, Y.; He, J.; Zhu, D.; Zhou, J. Traffic network equilibrium with capacity constraints and generalized Wardrop equilibrium. Nonlinear Anal. Real World Appl. 2010, 11, 4248–4253. [Google Scholar] [CrossRef]
Lee, D.H.; Nie, Y.; Chen, A. A conjugate gradient projection algorithm for the traffic assignment problem. Math. Comput. Model. 2003, 37, 863–878. [Google Scholar] [CrossRef]
Fukushima, M. A modified Frank-Wolfe algorithm for solving the traffic assignment problem. Transp. Res. B 1984, 18, 169–177. [Google Scholar] [CrossRef]
Babazadeh, A.; Javani, B.; Gentile, G.; Florian, M. Reduced gradient algorithm for user equilibrium traffic assignment problem. Transp. A 2020, 16, 1111–1135. [Google Scholar] [CrossRef]
Yun, L.; Qin, Y.; Fan, H.; Ji, C.; Li, X.; Jia, L. A reliability model for facility location design under imperfect information. Transp. Res. Part B Methodol. 2015, 81, 596–615. [Google Scholar] [CrossRef]
Larsson, T.; Patriksson, M. An augmented Lagrangean dual algorithm for link capacity side constrained traffic assignment problems. Transp. Res. Part B Methodol. 1995, 29, 433–455. [Google Scholar] [CrossRef]
Maksoud, A.; Alawneh, S.I.A.-R.; Hussien, A.; Abdeen, A.; Abdalla, S.B. Computational design for multi-optimized geometry of sustainable flood-resilient urban design habitats in Indonesia. Sustainability 2024, 16, 2750. [Google Scholar] [CrossRef]
Obeidat, M.S.; Alomari, A.H.; Jaradat, A.S.; Barhoush, M.M. Traffic sign detection and recognition in Jordan based on machine learning and deep learning. Egypt. Inform. J. 2025, 31, 100761. [Google Scholar] [CrossRef]
Zhang, P.; Qian, S. Low-rank approximation of path-based traffic network models. Transp. Res. Part C Emerg. Technol. 2025, 172, 105027. [Google Scholar] [CrossRef]
Rahman, R.; Hasan, S. A deep learning approach for network-wide dynamic traffic prediction during hurricane evacuation. Transp. Res. Part C Emerg. Technol. 2023, 152, 104126. [Google Scholar] [CrossRef]
Hu, X.; Liu, W.; Huo, H. An intelligent network traffic prediction method based on Butterworth filter and CNN-LSTM. Comput. Netw. 2024, 240, 110172. [Google Scholar] [CrossRef]
Zhu, H.; Sun, F.; Tang, K.; Qin, G.; Chung, E. Lane level traffic flow prediction in urban networks with missing data—A time accessibility based multi-task learning framework. Transp. Res. Part C Emerg. Technol. 2025, 180, 105343. [Google Scholar] [CrossRef]
Zhang, H.; Lu, G.; Zhang, Y.; Ariano, A.; Wu, Y. Railcar itinerary optimization in railway marshalling yards: A graph neural network based deep reinforcement learning method. Transp. Res. Part C Emerg. Technol. 2025, 171, 104970. [Google Scholar] [CrossRef]
Rowan, D.; He, H.; Hui, F.; Yasir, A.; Mohammed, Q. A systematic review of machine learning-based microscopic traffic flow models and simulations. Commun. Transp. Res. 2025, 5, 100164. [Google Scholar] [CrossRef]
Li, X.; Xiao, G.; Li, A.; Lai, F.; Xu, L. Network analysis of port clusters in the context of regional coordinated development: A case study of the Bohai bay port cluster. Marit. Policy Manag. 2026; in press. [CrossRef]
Tran, T.; He, D.; Kim, J.; Hickman, M. M2NN: Multi-view multi-task graph neural network using congestion heatmap imagery for predicting traffic incidents across heterogeneous areas. Transp. Res. Part C Emerg. Technol. 2026, 183, 105458. [Google Scholar] [CrossRef]
Fang, J.; Wei, W.; Shi, B.; Wang, C.-D.; Cai, Y.; Yang, J. Metapath-based feature aggregated heterogeneous graph neural network for adverse drug reactions prediction. Neurocomputing 2026, 667, 132271. [Google Scholar] [CrossRef]
Fu, X.; King, I. MECHH: Metapath Context Convolution-based Heterogeneous Graph Neural Networks. Neural Netw. 2024, 170, 266–275. [Google Scholar] [CrossRef]
Li, S.; Liu, P.; Stouffs, R. Fine-grained local climate zone classification using graph networks: A building-centric approach. Build. Environ. 2025, 278, 112928. [Google Scholar] [CrossRef]
Liu, T.; Meidani, H. End-to-end heterogeneous graph neural networks for traffic assignment. Transp. Res. Part C Emerg. Technol. 2024, 165, 104695. [Google Scholar] [CrossRef]
Tian, R.; Sun, H.; Mei, B.; Fu, Y.; Zhu, W. A heterogeneous graph neural network with spatial-temporal and operating condition-aware message passing mechanism for RUL prediction of aero-engines. Adv. Eng. Inf. 2026, 73, 104507. [Google Scholar] [CrossRef]
Liu, T.; Meidani, H. Multi-class traffic assignment using multi-view heterogeneous graph attention networks. Expert Syst. Appl. 2025, 286, 128072. [Google Scholar] [CrossRef]
Guo, X.; Yu, Z.; Huang, F.; Chen, X.; Yang, D.; Wang, J. Dynamic meta-graph convolutional recurrent network for heterogeneous spatiotemporal graph forecasting. Neural Netw. 2025, 181, 106805. [Google Scholar] [CrossRef]
Xu, M.; Xiang, J.; Xie, Z.; Meng, X. Learning to rank critical road segments via heterogeneous graphs with origin-destination flow integration. Inf. Process. Manag. 2026, 63, 104702. [Google Scholar] [CrossRef]
Chen, X.; Wu, P.; Wang, Z.; Feng, Z.; Luo, L.; Zhang, H.; Biancardo, S.A. MKG-GNN: Maritime knowledge graph and GNN framework for ship speed forecasting in port. Ocean Eng. 2026, 355, 125179. [Google Scholar] [CrossRef]
Cai, B.; Camarcat, L.; Shang, W.L.; Quddus, M. A new spatiotemporal convolutional neural network model for short-term crash prediction. Front. Eng. Manag. 2024, 12, 86–98. [Google Scholar] [CrossRef]
Chen, X.; Xin, Z.; Zhang, H.; Wu, Y.; Wei, C.; Postolache, O. Vision transformer-based image dehazing for climate-resilient maritime navigation. IEEE Trans. Intell. Transp. Syst. 2026, 1–13. [Google Scholar] [CrossRef]
Huang, B.; Ruan, K.; Yu, W.; Xiao, J.; Xie, R.; Huang, J. OD former: Spatial-temporal transformers for long sequence Origin-Destination matrix forecasting against cross application scenario. Expert Syst. Appl. 2023, 222, 119835. [Google Scholar] [CrossRef]
Zong, X.; Yu, F.; Chen, Z.; Xia, X. MSSTGCN: Multi-Head Self-Attention and Spatial-Temporal Graph Convolutional Network for Multi-Scale Traffic Flow Prediction. Comput. Mater. Contin. 2025, 82, 3517–3537. [Google Scholar] [CrossRef]
Peng, X.; Li, L.; Lo, H.K.; Huang, W. A physics-constrained deep learning approach for dynamic origin-destination estimation using link counts. Transp. A Transp. Sci. 2026; in press. [CrossRef]
Bar-Gera, H.; Stabler, B. Transportation Networks: A Repository for Transportation Research. GitHub Repository. 2018. Available online: https://github.com/bstabler/TransportationNetworks (accessed on 5 January 2026).
Yang, H.; Jiang, C.; Song, Y.; Fan, W.; Deng, Z.; Bai, X. TARGCN: Temporal attention recurrent graph convolutional neural network for traffic prediction. Complex Intell. Syst. 2024, 10, 8179–8196. [Google Scholar] [CrossRef]
Wu, K.; Ding, J.; Lin, J.; Zheng, G.; Sun, Y.; Fang, J.; Xu, T.; Zhu, Y.; Gu, B. Big-data empowered traffic signal control could reduce urban carbon emission. Nat. Commun. 2025, 16, 1. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Construction of a traffic heterogeneous graph incorporating virtual OD links.

Figure 2. Core components of the Virtual Encoder (V-Encoder).

Figure 3. Architecture of the dual-encoder heterogeneous graph neural network for traffic assignment.

Figure 4. Traffic network diagrams of Sioux Falls (a) and EMA (b).

Figure 5. Boxplots of link V/C ratios for the Sioux Falls network under different congestion conditions.

Figure 6. Correlation coefficients of HetGNN in Sioux Falls under different congestion conditions.

Figure 7. Correlation coefficients of HetGNN in EMA under different congestion conditions.

Figure 8. Sensitivity analysis of key hyperparameters.

Figure 9. Ablation analysis of HetGNN components on the EMA network.

Figure 10. MAE comparison of HetGNN with partial OD missing in Sioux Falls.

Table 1. Characteristics of the Sioux Falls and EMA networks.

Network Name	Nodes	Link	OD Pairs	Network Characteristic Description
Sioux Falls	24	76	528	A small-scale network with a simple and regular topological structure.
EMA	74	258	1113	A medium-scale network simulating the traffic connection between urban core areas and suburban regions.

Table 2. Overall performance comparison of all models on the datasets.

Congestion Condition		Non-Congestion			Moderate Congestion			Congestion
Network	Model	MAE	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²
Sioux Falls	GAT	42.86	65.32	0.9872	51.39	78.45	0.9795	63.71	95.88	0.9683
	T-GCN	38.57	58.14	0.9921	45.23	69.77	0.9864	56.42	85.19	0.9765
	HetGNN	34.91	52.76	0.9999	39.87	61.29	0.9954	48.65	74.38	0.9890
EMA	GAT	59.43	90.17	0.9786	72.85	109.62	0.9654	89.37	134.59	0.9482
	T-GCN	52.11	79.24	0.9853	63.47	96.83	0.9748	78.69	118.75	0.9597
	HetGNN	46.99	70.83	0.9914	55.32	84.76	0.9841	68.94	104.52	0.9725

Table 3. Computational efficiency comparison of different models on the datasets.

Congestion Condition		Non-Congestion		Moderate Congestion		Congestion
Network	Model	Train (min)	Infer (s)	Train (min)	Infer (s)	Train (min)	Infer (s)
Sioux Falls	GAT	24.3	0.17	24.7	0.18	25.1	0.18
	T-GCN	23.1	0.16	23.5	0.16	23.8	0.17
	HetGNN	21.5	0.13	21.8	0.13	22.1	0.14
EMA	GAT	41.6	0.25	42.3	0.26	42.9	0.26
	T-GCN	31.2	0.19	31.7	0.19	32.1	0.20
	HetGNN	33.3	0.20	33.8	0.20	34.2	0.21

Table 4. Quantitative ablation results of HetGNN components on the EMA network.

Model	MAE	RMSE	R²	Infer (s)
HetGNN	46.99	70.83	0.9914	0.20
HetGNN-ΔEv	113.27	154.49	0.9536	0.17
HetGNN-ΔMHA	76.22	113.33	0.9758	0.18
HetGNN-ΔEncoder	68.54	99.94	0.9805	0.25

Table 5. Results of the partial OD demand missing test.

Missing Ratio	Network	Siouxfalls		EMA
Missing Ratio	Model	MAE	RMSE	MAE	RMSE
20%	GAT	48.35	73.69	66.81	101.45
	T-GCN	43.29	65.47	59.43	89.97
	HetGNN	38.76	58.52	52.18	78.94
30%	GAT	54.17	82.53	74.26	112.83
	T-GCN	48.92	73.86	66.58	99.34
	HetGNN	40.12	60.78	54.73	82.61
40%	GAT	59.72	91.06	79.64	120.97
	T-GCN	53.85	81.29	71.92	106.58
	HetGNN	42.35	63.94	57.41	86.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, G.; Xia, T.; Chen, X.; Ni, A. Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks. Sustainability 2026, 18, 5044. https://doi.org/10.3390/su18105044

AMA Style

Xiao G, Xia T, Chen X, Ni A. Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks. Sustainability. 2026; 18(10):5044. https://doi.org/10.3390/su18105044

Chicago/Turabian Style

Xiao, Guangnian, Tong Xia, Xinqiang Chen, and Anning Ni. 2026. "Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks" Sustainability 18, no. 10: 5044. https://doi.org/10.3390/su18105044

APA Style

Xiao, G., Xia, T., Chen, X., & Ni, A. (2026). Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks. Sustainability, 18(10), 5044. https://doi.org/10.3390/su18105044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Traffic Assignment of Urban Road Based on Heterogeneous Graph Neural Networks

Abstract

1. Introduction

2. Literature Review

2.1. Traffic Assignment Problem

2.2. Applications of Heterogeneous Graph Neural Networks in Transportation

2.3. Research Gaps and Summary

3. Technical Background

3.1. Core Theory of Traffic Assignment

3.2. Graph Neural Network Technology

3.3. Fundamentals of the Transformer Multi-Head Self-Attention Mechanism

4. Architecture of the Dual Encoder Heterogeneous Graph Neural Network for Traffic Assignment

4.1. Virtual Encoder (V-Encoder)

4.2. Real Encoder (R-Encoder)

4.3. Traffic Assignment Prediction and Loss Function Design

5. Experiments and Conclusions

5.1. Experimental Design and Overall Performance Analysis

5.2. Ablation Experiments on Core Modules

5.3. Robustness Analysis Under Abnormal Traffic Scenarios

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI