Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction

Ma, Shihan; Yang, Jidong J.

doi:10.3390/app16031260

Open AccessArticle

Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction

by

Shihan Ma

and

Jidong J. Yang

^*

Smart Mobility and Infrastructure Laboratory, College of Engineering, University of Georgia, Athens, GA 30602, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1260; https://doi.org/10.3390/app16031260

Submission received: 26 December 2025 / Revised: 19 January 2026 / Accepted: 21 January 2026 / Published: 26 January 2026

(This article belongs to the Special Issue Advances in Vehicle Dynamics and Road Safety: Technologies, Simulations and Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a novel deep learning framework for traffic crash prediction that leverages graph-based representations to model complex interactions among road users. At its core is a dynamic Graph Attention Network (GAT), which abstracts road users and their interactions as evolving nodes and edges in a spatiotemporal graph. Each node represents an individual road user, characterized by its state as features, such as location and velocity. A node-wise Long Short-Term Memory (LSTM) network is employed to capture the temporal evolution of these features. Edges are dynamically constructed based on spatial and temporal proximity, existing only when distance and time thresholds are met for modeling interaction relevance. The GAT learns attention-weighted representations of these dynamic interactions, which are subsequently used by a classifier to predict the risk of a crash. Experimental results demonstrate that the proposed GAT-based method achieves 86.1% prediction accuracy, highlighting its effectiveness for proactive collision risk assessment and its potential to inform real-time warning systems and preventive safety interventions.

Keywords:

traffic collision prediction; road user interaction modeling; deep learning; graph neural networks; graph attention networks; graph representation learning; long short-term memory

1. Introduction

Traffic crashes remain a major concern for all road users, often resulting in serious injuries, fatalities, and significant economic costs. Beyond the immediate impact, these incidents can trigger secondary crashes, traffic congestion, and widespread disruptions. Accurate prediction of traffic collisions is essential for enhancing road safety and improving traffic management. As vehicular traffic continues to grow, the need for reliable crash prediction becomes increasingly critical to reducing injuries and fatalities. Timely collision prediction systems can provide drivers and safety mechanisms with crucial lead time to react, potentially preventing crashes and saving lives. Advanced driver-assistance systems, such as automated emergency braking and driver alertness monitoring [1,2], can greatly benefit from predictive models that forecast imminent collisions. Consequently, the development of accurate traffic crash prediction methods has attracted growing attention from the research community [3,4,5,6,7,8].

Collision avoidance systems have become a focal point in traffic safety research. These systems leverage a range of sensors to detect potential hazards around a vehicle and either alert the driver or automatically initiate preventive maneuvers. For example, Al-Smadi et al. [1] proposed an intelligent collision avoidance and safety system that uses ultrasonic sensors to detect imminent crashes by measuring inter-vehicle distances and activating safety protocols when a collision trajectory is identified. Similarly, Tan et al. [2] investigated the effectiveness of Automatic Emergency Braking (AEB) systems, reporting a significant reduction in accident severity, fatalities, and injuries following their deployment. Recent innovations have further advanced the capabilities of these systems. Adaptive collision avoidance switching systems [9] have been introduced to enhance system flexibility under varying traffic scenarios. Additionally, the integration of laser radar with Controller Area Network (CAN) systems has enabled more comprehensive in-vehicle communication and real-time collision warnings [10].

Timely collision prediction also offers substantial benefits for traffic management. By anticipating potential collisions, traffic control systems can reroute traffic in real time, mitigating congestion and improving the overall efficiency of road networks. This not only shortens travel times for road users but also reduces the economic losses associated with traffic delays caused by crashes. Furthermore, integrating collision prediction into traffic management enables the development of more effective control strategies that promote smoother traffic flow and help lower vehicle emissions, contributing to broader goals of environmental benefits [11]. A growing body of research has focused on predicting traffic collisions to enhance both safety and operational efficiency [12,13,14,15,16]. In this context, machine learning has emerged as a powerful tool, evolving from traditional statistical methods to advanced models capable of capturing non-linear patterns and complex spatiotemporal dynamics in traffic environments [4,7,11,12,13,14,15,16,17,18,19,20,21,22,23].

Deep learning techniques have further advanced traffic crash prediction by enabling the analysis of complex, high-dimensional, heterogeneous data. Prior research has consistently demonstrated the ability of deep learning models to capture intricate patterns in multi-dimensional traffic data and to model complex spatial and temporal relationships within traffic networks [3,4,5,6,7,8,24,25,26,27,28,29]. For example, Chavan et al. [3] introduced COLLIDE-PRED, a real-time collision prediction system that leverages surveillance video and computer vision techniques. Wang et al. [5] highlighted the robustness and high accuracy of deep learning models across diverse traffic scenarios and environmental conditions. Other noteworthy approaches include the integration of Long Short-Term Memory (LSTM) networks with Gradient Boosting algorithms [25], the application of Convolutional Neural Networks (CNNs) [27], and various hybrid approaches [30,31,32,33], all of which have demonstrated enhanced performance and adaptability in traffic safety analysis.

To more comprehensively address the challenges of traffic crash prediction, recent models have increasingly adopted hybrid and mobile approaches, combining complementary techniques to boost performance and adaptability. For example, the RFCNN model [30] integrates Random Forest and CNN to improve the prediction of crash severity. The Hetero-ConvLSTM [31] captures both spatial dependencies and temporal dynamics in traffic data, effectively modeling evolving traffic patterns. In parallel, the emergence of mobile deep-learning architecture has opened new avenues for real-time, edge-based crash prediction. Notably, models based on the MobileNet architecture [32] have demonstrated the feasibility of performing on-device crash severity prediction, offering a lightweight yet powerful solution suitable for in-vehicle deployment. These advancements highlight the growing potential of hybrid and mobile deep-learning-based systems to support proactive, scalable, and context-aware traffic safety interventions.

Traditional traffic crash prediction methods often rely on historical accident records and static variables, such as road conditions, aggregated traffic volumes, and weather data, which may fail to capture the dynamic, real-time nature of traffic environments. In contrast, the emergence of Graph Attention Networks (GATs) [34] offers a powerful new paradigm for enhancing predictive accuracy by explicitly modeling interactions among nodes. The GAT architecture leverages an attention mechanism to selectively weigh the importance of neighboring nodes and their features, allowing the model to focus on the most relevant and influential interactions. GATs have been applied for speed and traffic forecasting [35,36]. A key strength of GATs lies in their ability to integrate and process rich, heterogeneous data [37]. Moreover, GATs can effectively incorporate contextual information from various sources, including road geometry, traffic signals, and environmental conditions [38,39]. This enables the model to understand the richer context, leading to more accurate predictions.

Recent studies have demonstrated the effectiveness of GATs across various domains, particularly in their ability to process multi-dimensional data and uncover complex relationships within structured environments. Their application to traffic systems has shown great promise in modeling dynamic environments such as traffic on road networks [40,41,42,43,44]. Leveraging GATs for traffic collision prediction not only enhances predictive performance but also offers a flexible framework for incorporating emerging technologies and diverse data sources into intelligent traffic safety and management systems.

Evolving technologies, such as Advanced Driver-Assistance Systems (ADAS) and Connected and Autonomous Vehicles (CAV), continually produce detailed, real-time data on vehicle speed, position, trajectory, and driver behavior. GATs are well suited to synthesize these rich, high-resolution data sources into a dynamic graph representation of the traffic scene, enabling the model to capture transient, subtle, and often critical interactions that traditional methods fail to capture.

In this study, we propose a novel GAT-based framework for predicting potential traffic collisions, representing the first work to leverage graph structures for modeling real-time interactions among road users. By modeling vehicles and other road users as nodes and their interactions as edges on a dynamic graph, our approach learns to recognize spatiotemporal patterns and interdependencies of road users that frequently precede collisions. This interaction-centric perspective enables more context-aware and timely assessments of collision risk and supports the development of more responsive safety interventions and traffic management strategies.

2. Dataset Description

The data used in this study is from an open-source dataset: DeepAccident [45]. The CARLA simulator [46] was used for creating the DeepAccident dataset, reflecting diverse real-world traffic collision scenarios. The DeepAccident dataset was created across seven town maps in CARLA, each one including several traffic intersections where accidents were simulated and collision data was collected. The seven town maps include a wide range of different scenes, such as urban streets, highways, and rural areas. There are 2 different intersection types for signalized intersections, four-way and three-way. While for unsignalized intersections, besides four-way and three-way intersections, there are also merging ramp intersections on the express way. An example of unsignalized four-way intersection from CARLA ‘Town03’ map is shown in Figure 1.

There is a total of 691 simulated scenarios in the DeepAccident dataset. However, only 196 scenarios involve actual collisions. Therefore, we used only these collision scenarios for model training and evaluation.

The collision scenarios were captured by both video frames and ground truth labels, where the frequency of videos is 10 frames per second. In this study, we only used the ground truth annotations of vehicles for both training and testing our collision prediction model. The ground truth annotations for collision videos include Frame ID (starting from index number 0), Object ID (individually assigned for each traffic object), Object Category (including Car, Van, Truck, Cyclist, and Pedestrian), Object Location (coordinates x and y in meters), and Object Velocity (absolute speed in meters per second). It should be noted that all collisions in the dataset are between two road users.

The collision events were categorized into 8 types: Vehicle–Pedestrian Collision, Rear End, Switch Lane, T-Bone, Opposite Frontal Impact, Opposite Merging, Angle Merging, and Right-Vehicle-Turn-Left. Specifically, Vehicle–Pedestrian Collision refers to an impact between a motor vehicle and a pedestrian. Rear End collisions occur when the front of a following vehicle strikes the rear of a leading vehicle, often due to sudden braking by the leading vehicle. Switch Lane collisions occur when a vehicle changes lanes without yielding to another vehicle already traveling in the target lane.

A T-Bone collision occurs at an intersection when one vehicle strikes the side of another vehicle while both vehicles are traveling straight through the intersection. Opposite Frontal Impact and Opposite Merging collisions involve vehicles approaching from opposite directions. When one vehicle travels straight while the opposing vehicle travels straight or makes a left turn, the resulting head-on crash is classified as an Opposite Frontal Impact. When both vehicles turn into the same roadway from opposite directions, the collision is classified as an Opposite Merging collision.

Angle Merging and Right-Vehicle-Turn-Left collisions involve vehicles approaching from perpendicular directions. An Angle Merging collision occurs when both vehicles merge into the same roadway. A Right-Vehicle-Turn-Left collision occurs when one vehicle travels straight or turns left, while another vehicle approaching from its right side makes a left turn, leading to a collision.

Figure 2 illustrates the representative trajectories of colliding vehicles for all collision types except Vehicle–Pedestrian Collision and Rear End. Each pair of blue and orange arrows represents a collision scenario between two vehicles.

These collision categories provide insightful information on colliding vehicle behaviors. Therefore, training and testing datasets were created for each collision type, resulting in 161 collision scenes for training and 35 collision scenes for testing. The distribution of collision scenes among collision types is shown in Figure 3.

Angle Merging and Right-Vehicle-Turn-Left have the greatest numbers of collision scenes while Vehicle–Pedestrian has only 8 collision scenes in total. Splitting the dataset based on collision categories ensures that the testing set share similar collision type distribution as the training set.

3. Dynamic Graph Attention Networks

For the traffic collision prediction task, we construct a graph representation that captures the dynamic states and interactions among road users across sequential frames in a traffic video. It should be noted that the proposed method is not restricted to video data and can be applied to CAV data. In this study, we used the public dataset extracted from simulated videos for demonstrating our approach. Specifically, for a given reference frame at timestamp T, we consider a sequence of preceding frames within a defined temporal window to model the evolving scene context. Within this window, a dynamic graph is constructed where nodes represent individual road users and edges encode their spatiotemporal relationships. This graph is processed by a GAT, which learns informative embeddings for each node by attending to relevant neighboring nodes and capturing their dynamics. The learned embeddings encode both motion patterns and inter-user interactions that could signal potential collision risk. Finally, the embeddings are passed to a binary classifier to predict the likelihood of a collision. The overall process is illustrated in Figure 4.

GATs represent a special neural network architecture designed to handle graph-structured data by leveraging an attention mechanism to model relationships and interactions between nodes. The core structure of our GAT includes multiple layers, each consisting of nodes that represent road user entities (e.g., vehicles and pedestrians) and edges that represent relationships (e.g., interactions between road users). In each layer, an attention mechanism computes the importance of neighboring nodes, allowing the network to focus on the most relevant neighbors based on both node and edge attributes. As shown in Figure 5, this mechanism involves computing attention coefficients for each pair of connected nodes, normalizing them, and using these coefficients to weigh the influence of neighboring nodes. The attention coefficients are computed by Equation (1).

a_{i j} = {s o f t m a x}_{j} (s_{i j}) = \frac{e x p (s_{i j})}{\sum_{k \in N_{i}} e x p (s_{i k})}

(1)

where

$N_{i}$ : the set of neighbors for Node $i$
$a_{i j}$ : attention weights between Node $i$ and Node $j$
$s_{i j}$ : attention scores between Node $i$ and Node $j$

The attention score

s_{i j}

is computed by Equation (2).

s_{i j} = L e a k y R e L U (a_{n}^{Τ} [W h_{i} ∥ W h_{j}] + a_{e}^{Τ} e_{i j})

(2)

where

$e_{i j}$ : edge attributes between Node $i$ and Node $j$
$W$ : attention weight matrix
$h_{i}$ : embedding for Node $i$
$a_{n}^{Τ}$ , $a_{e}^{Τ}$ : learnable parameters for nodes and edges

The attention scores are obtained from both node embeddings and edge attributes (Equation (2)), aggregated in a way that prioritizes critical interactions. This structure enables the GAT to dynamically adjust to complex and changing environments, making them particularly effective for applications like traffic collision predictive modeling where the relationships between nodes (road users) are crucial and constantly evolving. The computation of embeddings for an example node, Node 1 (N1) is illustrated by Figure 5.

Figure 5. Computing the embeddings for Node 1 with the multi-head self-attention mechanism. (a) Road user graph representation; (b) Concatenating multi-head self-attentions.

3.1. Graph Generation

A graph

G

is composed of a set of nodes

V

(also called vertices) and a set of edges

E

. The nodes represent entities, while the edges represent relationships between these entities. Formally, a graph is denoted as

G = (V, E)

. Each node

v \in V

can represent various entities depending on the application, such as individuals in a social network or intersections/segments in a road network. Each edge

e \in E

represents a relationship between a pair of nodes, which can be directed or undirected, and may have associated weights to indicate the strength or capacity of the connection. In the context of traffic crash prediction, the graph can be structured with nodes representing road users, and edges representing interactions between vehicles (e.g., proximity, communication). The adjacency matrix

A

typically represents the connection status between nodes, with non-zero entries indicating the presence of edges. Each node or edge can have associated features.

For traffic collision prediction, each node represents a road user, and each edge encodes the interaction between a pair of road users. Node features describe object-level characteristics, including spatial location

(x, y)

, object size (w, h), velocity components

{(v}_{x}, v_{y})

, and object category (i.e., Car, Van, Truck, Motorcycle, Cyclist, or Pedestrian).

The adjacency matrix, which determines the existence of edges between nodes, is constructed based on two spatiotemporal proximity criteria that must be satisfied simultaneously. Specifically, an edge is established between two nodes if (1) the minimum Euclidean distance

D

between the corresponding road users is smaller than a predefined spatial threshold

D_{t h r e s h o l d}

, and (2) the time

t

required for the two road users to reach this minimum distance is less than a predefined temporal threshold

t_{t h r e s h o l d}

. When both criteria are met, the interaction is considered collision-relevant and an edge is created in the graph. Let

\vec{∆ d}

and

\vec{∆ v}

denote the relative position and relative velocity vectors between two road users, respectively. The angle θ between these vectors characterizes their interaction state, indicating whether the two road users are approaching or departing, as illustrated in Figure 6.

Specifically, when θ is acute, the projection of the relative velocity onto the relative position vector is negative, implying that the two road users are moving closer to each other and will reach their minimum separation at a future time. In contrast, when θ is a right angle or obtuse, the projection is non-negative, indicating that the minimum distance has already been reached and the road users are moving apart.

It is worth noting that when

|\vec{∆ v}| = 0

, the two road users are relatively stationary; consequently, their mutual distance remains constant over time. Based on the geometric relationships illustrated in Figure 6, the minimum distance

D

is computed by Equation (3) or Equation (4), depending on whether the relative speed satisfies

|\vec{∆ v}| = 0

. The corresponding time

t

to reach this minimum distance is calculated by Equation (5).

If

|\vec{∆ v}| = 0

,

D = |\vec{∆ d}| = \sqrt{{∆ d_{x}}^{2} + {∆ d_{y}}^{2}};

(3)

If

|\vec{∆ v}| \neq 0

,

D = |\vec{∆ d}| \cdot \sin θ = |\vec{∆ d}| \cdot \frac{|\vec{∆ d} \times \vec{∆ v}|}{|\vec{∆ d}| \cdot |\vec{∆ v}|} = \frac{|∆ d_{x} \cdot ∆ v_{y} - ∆ d_{y} \cdot ∆ v_{x}|}{\sqrt{{∆ v_{x}}^{2} + {∆ v_{y}}^{2}}},

(4)

t = \frac{|\vec{∆ d}| \cdot \cos θ}{|\vec{∆ v}|} = \frac{|\vec{∆ d}| \cdot \frac{\vec{∆ d} \cdot \vec{∆ v}}{|\vec{∆ d}| \cdot |\vec{∆ v}|}}{|\vec{∆ v}|} = \frac{∆ d_{x} \cdot ∆ v_{x} + ∆ d_{y} \cdot ∆ v_{y}}{{∆ v_{x}}^{2} + {∆ v_{y}}^{2}} .

(5)

For graph construction, an edge is created between

N_{1}

and

N_{2}

if both criteria:

D < D_{θ}

and

t < t_{θ}

are satisfied, where

D_{θ}

and

t_{θ}

are predefined spatial and temporal thresholds, respectively. These thresholds regulate the sparsity of the interaction graph. Larger values of

D_{θ}

and

t_{θ}

produce a denser graph by introducing more edges; however, many of these edges may correspond to spatiotemporally distant interactions and thus introduce inference noise, as such road users are unlikely to collide. Conversely, smaller thresholds yield a sparser graph by retaining only the most relevant interactions. In the limiting case, sufficiently small values of

D_{θ}

and

t_{θ}

preserve only edges associated with imminent or highly probable collision scenarios.

Beyond encoding the existence of edges through the adjacency matrix, D and t are further incorporated as continuous edge attributes. These edge features explicitly quantify the spatiotemporal proximity between interacting road users, enabling the GAT to differentiate varying levels of collision risk rather than treating all edges as equally informative. By embedding both geometric and temporal cues into edge representations, the model can more effectively learn risk-sensitive interaction patterns and capture the heterogeneous nature of potential collision scenarios.

3.2. Consideration of Sequential Information

In this study, two representation strategies are evaluated. The first strategy uses a single frame, in which graph features are extracted solely from the video frame at timestamp T, denoted as F^T. As illustrated in Figure 7, node features, the adjacency matrix, and edge features are all derived from this single frame, resulting in a purely spatial interaction graph at time T.

The second strategy explicitly incorporates temporal dynamics by leveraging a short sequence of four consecutive frames, including the previous three frames F^T^-3, F^T^-2, F^T^-1, and the current frame F^T. Graph nodes are defined based on the road users detected in the current frame, ensuring a consistent node set for prediction at timestamp T. For each node, the corresponding locations and velocities at timestamps T-3, T-2, T-1, and T are concatenated to form an input sequence to a node-wise Long Short-Term Memory (LSTM) network, which applied to each node separately. Each time step in the sequence represents the road user’s state at a specific timestamp and includes location and velocity as features.

The LSTM produces an output sequence with the same dimensionality as the input (dimension = 6), effectively encoding the spatiotemporal evolution of each road user. The resulting LSTM embeddings are then concatenated with the object category to construct the final node features of the graph, as shown in Figure 8. The adjacency matrix and edge features are still constructed from the current frame F^T. Through this integration of LSTM-based temporal encoding, sequential motion information is embedded into the graph representation, enabling more expressive modeling of dynamic interactions among road users.

3.3. Graph Attention Modeling

Following graph construction, a Graph Attention Network (GAT) is employed to learn latent, risk-aware representations of road users and their interactions. The GAT leverages attention mechanisms to assign adaptive importance weights to neighboring nodes, enabling the model to selectively emphasize interactions that are more relevant to collision risk. As shown in Figure 4, the node-level embeddings produced by the GAT are aggregated into a graph-level representation, which is subsequently fed into a binary classification head to predict whether a collision would occur.

To enable effective discrimination between collision-prone and non-collision scenarios, a contrastive labeling strategy is adopted in which safe and dangerous graphs are constructed from the same video sequence but at different temporal offsets relative to the collision event. This design minimizes scene-specific confounding factors (e.g., road geometry, lighting conditions, etc.) and encourages the model to focus on interaction dynamics that evolve as a collision becomes imminent.

The collision prediction task is formulated as a binary classification problem, where the GAT and the classification head are trained end-to-end to estimate the probability of a future collision given a graph representation at time T. A key methodological consideration is the selection of the lead time to collision, which governs the assignment of graph-level labels. The lead time must balance two competing objectives: (1) enabling sufficiently early prediction to support preventive interventions, and (2) preserving discriminative spatiotemporal cues indicative of collision risk.

Consistent with prior findings that average driver reaction time can be as high as 1.5 s [47,48], a lead time of 1.5 s is adopted in this study. Accordingly, during model training, a video frame occurring 1.5 s prior to the collision, and its corresponding interaction graph, is labeled as dangerous. A temporally earlier frame, captured 4 s before the dangerous frame, is labeled as safe. While shorter lead times typically yield higher classification accuracy due to the increased availability of collision-relevant features, they offer limited practical value for early warning. The adopted lead time therefore reflects a principled trade-off between predictive performance and operational relevance. Figure 9 illustrates the labeling procedure for safe and dangerous frames within a video sequence.

4. Experiments

The models were implemented using PyTorch 1.10.2 and PyTorch Geometric 1.7.2, and supporting libraries such as scikit-learn 1.0.2 and pandas 1.3.5. To evaluate the performance of the proposed method, all models are trained and tested on an NVIDIA RTX A6000 GPU (NVIDIA, Santa Clara, CA, USA). The experimental design systematically compares the two graph construction strategies, single-frame-based and multi-frame (four-frame) temporal encoding, under the same modeling framework consisting of a GAT followed by a binary classification head. For clarity, the node and edge features are summarized in Table 1, while the model architecture and training settings are presented in Table 2.

In addition, we investigate the sensitivity of the proposed approach to key hyperparameters, including the preset lead time to collision and the spatiotemporal thresholds used for graph construction, namely the spatial distance threshold

D_{t h r e s h o l d}

, and the temporal threshold

t_{t h r e s h o l d}

. Table 3 presents a summary of the model prediction accuracy under different experimental settings.

Sequential-frame graph construction consistently outperforms the single-frame approach across nearly all parameter settings and lead times. Accuracy gains are substantial in a range of 4–10 percentage points, demonstrating that incorporating short-term temporal dynamics via sequential frames provides more discriminative information than relying on instantaneous snapshots alone. This confirms that collision precursors are inherently temporal and benefit from motion-history encoding.

With respect to the lead time to collision, prediction accuracy consistently decreases across both graph construction strategies as the lead time increases from 0.5 s to 1.5 s. This trend is expected, as shorter lead times (0.5 s) capture stronger and more explicit collision cues, such as rapidly converging trajectories. In contrast, longer lead times (1.5 s) require the model to infer risk from weaker, more uncertain interaction signals. Despite this challenge, the sequential-frame GAT model maintains relatively high accuracy even at a lead tie of 1.5 s (80.1%), highlighting its robustness for early collision warning.

Regarding spatial and temporal thresholds, configurations with

D_{t h r e s h o l d} =

20 m and

t_{t h r e s h o l d} =

1.5 s, yield the highest prediction accuracies (86.1%, 82.9%, and 80.1%). These settings strike a balance between interaction coverage (i.e., capturing all potentially relevant interactions) and noise suppression (i.e., excluding edges between spatiotemporally distant road users unlikely to collide).

For comparison, overly large temporal thresholds tend to introduce inference noise. For instance, increasing

t_{t h r e s h o l d}

to 3 s consistently degrades performance for both models. This suggests that interactions too far into the future dilute collision-relevant signals and introduce inference noise, confirming the importance of temporal selectivity in graph construction. Conversely, overly small spatial and temporal thresholds risk excluding critical contextual information. The most restrictive setting (

D_{t h r e s h o l d} =

10 m and

t_{t h r e s h o l d} =

0.5 s) results in unstable or degraded performance, particularly for longer lead times. Although such tight thresholds retain only imminent interactions, they may exclude early yet informative precursors that are essential for advance prediction.

Overall, the sequential approach demonstrates greater robustness to parameter variation, especially under longer lead times and moderate-to-large thresholds. These results demonstrate that collision risk is best captured by temporally informed, selectively connected interaction graphs.

5. Conclusions

This study proposes a novel graph-based framework for traffic collision prediction that leverages GATs to perform binary classification of potential collision events. By explicitly modeling road users as nodes and their interactions as edges, the proposed approach effectively integrates spatial and temporal user-level information into a unified graph representation. An analytical process for transforming dynamic traffic scenes into interaction graphs is developed, enabling GATs to extract latent, risk-aware features for collision prediction. By assigning adaptive attention weights to interacting road users, the GAT model selectively emphasizes high-risk interactions while suppressing less relevant ones. This adaptive focus is essential for capturing transient and rapidly evolving interactions that are often overlooked by traditional rule-based or fixed-structure models. Experimental results demonstrate that the proposed method achieves a prediction accuracy of up to 86.1%, highlighting its effectiveness in capturing collision-relevant interaction patterns. This capability enables proactive collision warnings and provides critical support for early safety interventions, with the potential to significantly reduce traffic crashes.

While the proposed approach demonstrates promising performance, several limitations point to important directions for future research. First, the current dataset is relatively small and highly imbalanced, particularly for rare yet safety-critical collision types such as vehicle–pedestrian collisions and lane-change-related crashes. This limitation restricts the model to binary collision prediction and may affect its generalizability across diverse traffic scenarios. Future efforts should focus on expanding the dataset and incorporating real-world data and a wider range of collision types, which would enable multi-class collision prediction and enhance model robustness and generalization to diverse, real-world traffic conditions. The ability to predict specific collision types would further support targeted safety interventions, inform roadway and intersection design, and assist traffic management agencies in deploying context-aware control strategies in high-risk locations.

Second, the growing availability of ADAS data and vehicle-to-everything (V2X) communication technologies provides continuous streams of high-resolution, road user-level information. These data sources can be naturally integrated into the proposed graph-based framework to construct real-time, dynamic representations of traffic scenes and further improve collision prediction accuracy.

Furthermore, the selection of spatiotemporal thresholds for graph construction is currently based on predefined values, which may not generalize optimally across varying traffic densities, road geometries, and behavioral contexts. Future research should explore adaptive or data-driven threshold selection strategies to dynamically balance interaction coverage and noise suppression, thereby improving model scalability and performance under diverse traffic conditions.

In summary, this study establishes a strong foundation for graph-based traffic collision prediction by demonstrating the effectiveness of GATs in modeling complex, dynamic road user interactions. The proposed framework shows significant promise for advancing intelligent transportation systems and proactive safety applications. Continued methodological refinements, larger and more diverse datasets, and deeper integration with emerging sensing and communication technologies can play a pivotal role in enhancing traffic safety, improving mobility, and reducing crash-related injuries and fatalities for all road users.

Author Contributions

Conceptualization, J.J.Y.; methodology, S.M. and J.J.Y.; software, S.M.; validation, S.M. and J.J.Y.; formal analysis, S.M.; investigation, S.M. and J.J.Y.; resources, J.J.Y.; data curation, S.M.; writing—original draft preparation, S.M.; writing—review and editing, J.J.Y.; visualization, S.M.; supervision, J.J.Y.; project administration, J.J.Y.; funding acquisition, J.J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data used in the study are openly accessible at https://deepaccident.github.io/download.html (accessed on 4 April 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADAS	Advanced Driver-Assistance Systems
CAV	Connected and Autonomous Vehicles
GNN	Graph Neural Networks
GAT	Graph Attention Networks
LSTM	Long Short-Term Memory
CNN	Convolutional Neural Networks
V2X	Vehicle-to-Everything
CARLA	Car Learning to Act

References

Al-Smadi, A.M.; Al-Ksasbeh, W.; Ababneh, M.; Al-Nsairat, M. Intelligent automobile collision avoidance and safety system. In Proceedings of the 2020 17th International Multi-Conference on Systems, Signals & Devices (SSD), Sfax, Tunisia, 20–23 July 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
Tan, H.; Zhao, F.; Hao, H.; Liu, Z.; Amer, A.A.; Babiker, H. Automatic emergency braking (AEB) system impact on fatality and injury reduction in China. Int. J. Environ. Res. Public Health 2020, 17, 917. [Google Scholar] [CrossRef] [PubMed]
Chavan, D.; Saad, D.; Chakraborty, D.B. COLLIDE-PRED: Prediction of on-road collision from surveillance videos. arXiv 2021, arXiv:2101.08463. [Google Scholar]
Xiong, X.; Chen, L.; Liang, J. A new framework of vehicle collision prediction by combining SVM and HMM. IEEE Trans. Intell. Transp. Syst. 2017, 19, 699–710. [Google Scholar] [CrossRef]
Wang, X.; Liu, J.; Qiu, T.; Mu, C.; Chen, C.; Zhou, P. A real-time collision prediction mechanism with deep learning for intelligent transportation system. IEEE Trans. Veh. Technol. 2020, 69, 9497–9508. [Google Scholar] [CrossRef]
Lin, D.-J.; Chen, M.-Y.; Chiang, H.-S.; Sharma, P.K. Intelligent traffic accident prediction model for Internet of Vehicles with deep learning approach. IEEE Trans. Intell. Transp. Syst. 2021, 23, 2340–2349. [Google Scholar] [CrossRef]
Kamenev, A.; Wang, L.; Bohan, O.B.; Kulkarni, I.; Kartal, B.; Molchanov, A.; Birchfield, S.; Nister, D.; Smolyanskiy, N. Predictionnet: Real-time joint probabilistic traffic prediction for planning, control, and simulation. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
He, S.; Sadeghi, M.A.; Chawla, S.; Alizadeh, M.; Balakrishnan, H.; Madden, S. Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
Liu, G.; Bei, S.; Li, B.; Liu, T.; Daoud, W.; Tang, H.; Guo, J.; Zhu, Z. Research on collision avoidance systems for intelligent vehicles considering driver collision avoidance behaviour. World Electr. Veh. J. 2023, 14, 150. [Google Scholar] [CrossRef]
Wang, A.-P.; Chen, J.-C.; Hsu, P.-L. Intelligent CAN-based automotive collision avoidance warning system. In Proceedings of the IEEE International Conference on Networking, Sensing and Control, Taipei, Taiwan, 21–23 March 2004; IEEE: New York, NY, USA, 2004. [Google Scholar]
Oyoo, J.O.; Wekesa, J.S.; Ogada, K.O. Predicting Road Traffic Collisions Using a Two-Layer Ensemble Machine Learning Algorithm. Appl. Syst. Innov. 2024, 7, 25. [Google Scholar] [CrossRef]
Lin, Y.; Li, R. Real-time traffic accidents post-impact prediction: Based on crowdsourcing data. Accid. Anal. Prev. 2020, 145, 105696. [Google Scholar]
Hazaymeh, K.; Almagbile, A.; Alomari, A.H. Spatiotemporal analysis of traffic accidents hotspots based on geospatial techniques. ISPRS Int. J. Geo-Inf. 2022, 11, 260. [Google Scholar]
Zhang, Y.; Lu, H.; Qu, W. Geographical detection of traffic accidents spatial stratified heterogeneity and influence factors. Int. J. Environ. Res. Public Health 2020, 17, 572. [Google Scholar]
Berhanu, Y.; Alemayehu, E.; Schröder, D. Examining Car Accident Prediction Techniques and Road Traffic Congestion: A Comparative Analysis of Road Safety and Prevention of World Challenges in Low-Income and High-Income Countries. J. Adv. Transp. 2023, 2023, 6643412. [Google Scholar] [CrossRef]
Chavhan, S.; Venkataram, P. Prediction based traffic management in a metropolitan area. J. Traffic Transp. Eng. 2020, 7, 447–466. (In English) [Google Scholar]
Yu, R.; Liu, X. Study on traffic accidents prediction model based on RBF neural network. In Proceedings of the 2010 2nd International Conference on Information Engineering and Computer Science, Hangzhou, China, 25–26 December 2010; IEEE: New York, NY, USA, 2010. [Google Scholar]
Ryder, B.; Dahlinger, A.; Gahr, B.; Zundritsch, P.; Wortmann, F.; Fleisch, E. Spatial prediction of traffic accidents with critical driving events–Insights from a nationwide field study. Transp. Res. Part A Policy Pract. 2019, 124, 611–626. [Google Scholar] [CrossRef]
Budiawan, W.; Sriyanto; Saptadi, S.; Arvianto, A.; Pamuji, H.; Andarani, P. Design of traffic accident prediction model in toll road using a decision tree algorithm. Int. J. Appl. Sci. Eng. Rev. 2022, 3, 11–31. [Google Scholar] [CrossRef]
Marcillo, P.; Valdivieso Caraguay, Á.L.; Hernández-Álvarez, M. A systematic literature review of learning-based traffic accident prediction models based on heterogeneous sources. Appl. Sci. 2022, 12, 4529. [Google Scholar] [CrossRef]
Yang, G.; Zhang, Y.; Hang, J.; Feng, X.; Xie, Z.; Zhang, D.; Yang, Y. CARPG: Cross-City Knowledge Transfer for Traffic Accident Prediction via Attentive Region-Level Parameter Generation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023. [Google Scholar]
Gao, X.; Jiang, X.; Haworth, J.; Zhuang, D.; Wang, S.; Chen, H.; Law, S. Uncertainty-Aware Probabilistic Graph Neural Networks for Road-Level Traffic Accident Prediction. arXiv 2024, arXiv:2309.05072v2. [Google Scholar]
Wang, S.; Changshun, Y.; Yong, S. A review of road traffic accident prediction methods. Am. J. Manag. Sci. Eng. 2023, 8, 73–77. [Google Scholar] [CrossRef]
Ren, H.; Song, Y.; Wang, J.; Hu, Y.; Lei, J.H. A deep learning approach to the citywide traffic accident risk prediction. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
Zhang, Z.; Yang, W.; Wushour, S. Traffic Accident Prediction Based on LSTM-GBRT Model. J. Control Sci. Eng. 2020, 2020, 4206919. [Google Scholar] [CrossRef]
Zhang, S. Urban traffic accident prediction research based on meteorological data. In Proceedings of the 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, 25–27 February 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
Thaduri, A.; Polepally, V.; Vodithala, S. Traffic accident prediction based on CNN model. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Kim, N.; Ko, S.; Kim, M.; Lee, S. Deep Learning Model for Traffic Accident Prediction Using Multiple Feature Interactions. In Proceedings of the 2024 IEEE International Conference on Big Data and Smart Computing (BigComp), Bangkok, Thailand, 18–21 February 2024; IEEE: New York, NY, USA, 2024. [Google Scholar]
Jin, Z.; Noh, B. From prediction to prevention: Leveraging deep learning in traffic accident prediction systems. Electronics 2023, 12, 4335. [Google Scholar] [CrossRef]
Manzoor, M.; Umer, M.; Sadiq, S.; Ishaq, A.; Ullah, S.; Madni, H.A.; Bisogni, C. RFCNN: Traffic accident severity prediction based on decision level fusion of machine and deep learning model. IEEE Access 2021, 9, 128359–128371. [Google Scholar] [CrossRef]
Yuan, Z.; Zhou, X.; Yang, T. Hetero-convlstm: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
Aboulola, O.I. Improving traffic accident severity prediction using MobileNet transfer learning model and SHAP XAI technique. PLoS ONE 2024, 19, e0300640. [Google Scholar] [CrossRef] [PubMed]
Girija, M.; Divya, V. Deep Learning-Based Traffic Accident Prediction: An Investigative Study for Enhanced Road Safety. EAI Endorsed Trans. Internet Things 2024, 10, 1–8. [Google Scholar] [CrossRef]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P. Yoshua Bengio Graph attention networks. Stat 2017, 1050, 10-48550. [Google Scholar]
Wang, B.; Wang, J. ST-MGAT: Spatio-temporal multi-head graph attention network for Traffic prediction. Phys. A Stat. Mech. Its Appl. 2022, 603, 127762. [Google Scholar]
Jin, C.; Ruan, T.; Wu, D.; Xu, L.; Dong, T.; Chen, T.; Wang, S.; Du, Y.; Wu, M. HetGAT: A heterogeneous graph attention network for freeway traffic speed prediction. J. Ambient. Intell. Humaniz. Comput. 2021, 1–12. [Google Scholar] [CrossRef]
Wang, Y.; Jing, C.; Xu, S.; Guo, T. Attention based spatiotemporal graph attention networks for traffic flow forecasting. Inf. Sci. 2022, 607, 869–883. [Google Scholar] [CrossRef]
Chen, Y.; Shu, T.; Zhou, X.; Zheng, X.; Kawai, A.; Fueda, K.; Yan, Z.; Liang, W.; Wang, K.I.-K. Graph attention network with spatial-temporal clustering for traffic flow forecasting in intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 2022, 24, 8727–8737. [Google Scholar] [CrossRef]
Wang, T.; Ni, S.; Qin, T.; Cao, D. TransGAT: A dynamic graph attention residual networks for traffic flow forecasting. Sustain. Comput. Inform. Syst. 2022, 36, 100779. [Google Scholar]
Tian, K.; Guo, J.; Ye, K.; Xu, C.-Z. St-mgat: Spatial-temporal multi-head graph attention networks for traffic forecasting. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
He, H.; Ye, K.; Xu, C.-Z. Multi-feature urban traffic prediction based on unconstrained graph attention network. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Sun, B.; Zhao, D.; Shi, X.; He, Y. Modeling global spatial–temporal graph attention network for traffic prediction. IEEE Access 2021, 9, 8581–8594. [Google Scholar] [CrossRef]
Tang, J.; Zeng, J. Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 3–23. [Google Scholar] [CrossRef]
Wang, P.; Wu, X.; He, X. Vibration-Theoretic Approach to Vulnerability Analysis of Nonlinear Vehicle Platoons. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11334–11344. [Google Scholar] [CrossRef]
Wang, T.; Kim, S.; Wenxuan, J.; Xie, E.; Ge, C.; Chen, J.; Li, Z.; Luo, P. Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024. [Google Scholar]
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning (PMLR), Mountain View, California, USA, 13–15 November 2017; Volume 78, pp. 1–16. [Google Scholar]
Bhanote, V.S.; Pandey, A.K.; Iqbal, A.; Swathika, O.G. Smart Vehicle Monitoring and Accident Detection System. In Smart Grids as Cyber Physical Systems: Smart Grids Paving the Way to Smart Cities; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2024; Volume 2, pp. 163–184. [Google Scholar]
Prusty, R.; Ashar, H.; Iyer, V.; Vijayan, K. Augmented Reality Navigation: A Vision for Safer Roads and Seamless Driving Experience. In Proceedings of the 2024 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI), Chennai, India, 17–18 April 2024. [Google Scholar]

Figure 1. Unsignalized four-way intersection simulated by CARLA (https://carla.org/).

Figure 2. Trajectory illustrations of six collision types.

Figure 3. Data distribution by collision type.

Figure 4. Predictive modeling of traffic collision.

Figure 6. Illustration of

\vec{∆ d}

and

\vec{∆ v}

between node

N_{1}

and node

N_{2}

, where

\vec{∆ d} = (∆ d_{x}, ∆ d_{y}) = (x_{2} - x_{1}, y_{2} - y_{1})

;

\vec{∆ v} = (∆ v_{x}, ∆ v_{y}) = {(v}_{x_{1}} - v_{x_{2}}, v_{y_{1}} - v_{y_{2}})

.

Figure 6. Illustration of

\vec{∆ d}

and

\vec{∆ v}

between node

N_{1}

and node

N_{2}

, where

\vec{∆ d} = (∆ d_{x}, ∆ d_{y}) = (x_{2} - x_{1}, y_{2} - y_{1})

;

\vec{∆ v} = (∆ v_{x}, ∆ v_{y}) = {(v}_{x_{1}} - v_{x_{2}}, v_{y_{1}} - v_{y_{2}})

.

Figure 7. Graph creation from a single frame.

Figure 8. Process of sequence graph creation.

Figure 9. Annotation of safe and dangerous frames.

Table 1. Summary of Node and Edge Features.

	Feature	Type	Description	Normalization
Node	x	Numeric	Object bounding box center x coordinate	L₁
	y		Object bounding box center y coordinate
	w		Object bounding box width
	h		Object bounding box height
	v_x		Velocity in x coordinate
	v_y		Velocity in y coordinate
	Object category 1	Categorical (one-hot encoding)	Car	None
	Object category 2		Van
	Object category 3		Truck
	Object category 4		Motorcycle
	Object category 5		Cyclist
	Object category 6		Pedestrian
Edge	D	Numeric	Spatial separation, computed from Equations (3) and (4)	L₁
Edge	t	Numeric	Temporal separation, Computed from Equation (5)	L₁

Table 2. Model Architecture and Training Settings.

Model Component	Architecture	Training Settings
LSTM	Bi-Directional LSTM: No Input Dimension: 6 (including 6 numeric node features in Table 1) Hidden Dimension: 6 Sequence Length: 4 Number of Layers: 4 Output Dimension: 6	Loss function: Binary Cross-entropy loss (BCE) Optimizer: Adam Initial learning rate: 0.0004 Early stopping: when validation loss is not further reduced and the checkpoint with best validation result is used for final evaluation on the test dataset.
GAT	Input Dimension: 12 (including 6 numeric node features and 6 categorical node features in Table 1) Hidden Dimension: 8 Number of Heads: 8 Activation: Exponential Linear Unit (ELU) Number of layers: 4 Output dimension: 32
Classifier	Input dimension: 32 Hidden dimension: 1000 Activation: Rectified Linear Unit (ReLU) Output dimension: 2

Table 3. Comparison of the model performance (accuracy in percent).

GAT Model	D_threshold (m)	t_threshold (s)	Lead Time to Collision (s)
GAT Model	D_threshold (m)	t_threshold (s)	0.5	1	1.5
Single Frame	20	1.5	80.2%	79.2%	76.0%
	20	3	78.1%	72.9%	69.8%
	30	1.5	79.2%	77.1%	72.6%
	10	0.5	75.0%	71.9%	67.0%
Sequential Frames	20	1.5	86.1%	82.9%	80.1%
	20	3	84.3%	80.6%	76.7%
	30	1.5	85.6%	82.9%	79.2%
	10	0.5	84.3%	69.0%	70.8%

Note: underline indicates the best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, S.; Yang, J.J. Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction. Appl. Sci. 2026, 16, 1260. https://doi.org/10.3390/app16031260

AMA Style

Ma S, Yang JJ. Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction. Applied Sciences. 2026; 16(3):1260. https://doi.org/10.3390/app16031260

Chicago/Turabian Style

Ma, Shihan, and Jidong J. Yang. 2026. "Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction" Applied Sciences 16, no. 3: 1260. https://doi.org/10.3390/app16031260

APA Style

Ma, S., & Yang, J. J. (2026). Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction. Applied Sciences, 16(3), 1260. https://doi.org/10.3390/app16031260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Road User Interactions with Dynamic Graph Attention Networks for Traffic Crash Prediction

Abstract

1. Introduction

2. Dataset Description

3. Dynamic Graph Attention Networks

3.1. Graph Generation

3.2. Consideration of Sequential Information

3.3. Graph Attention Modeling

4. Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI