Next Article in Journal
Dual-Path Enhanced YOLO11 for Lightweight Instance Segmentation with Attention and Efficient Convolution
Previous Article in Journal
Digital Twin-Assisted Deep Reinforcement Learning for Joint Caching and Power Allocation in Vehicular Networks
Previous Article in Special Issue
Learning-Based MPC Leveraging SINDy for Vehicle Dynamics Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Collision Risk Assessment of Lane-Changing Vehicles Based on Spatio-Temporal Feature Fusion Trajectory Prediction

1
Shanghai Tongtao Technology Co., Ltd., Shanghai 201804, China
2
School of Automotive Studies, Tongji University, Shanghai 201804, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(17), 3388; https://doi.org/10.3390/electronics14173388
Submission received: 29 July 2025 / Revised: 19 August 2025 / Accepted: 22 August 2025 / Published: 26 August 2025
(This article belongs to the Special Issue Feature Papers in Electrical and Autonomous Vehicles, Volume 2)

Abstract

Accurate forecasting of potential collision risk in dense traffic is addressed by a framework grounded in multi-vehicle trajectory prediction. A spatio-temporal fusion architecture, STGAT-EDGRU, is proposed. A Transformer encoder learns temporal motion patterns from each vehicle’s history; a boundary-aware graph (GAT) attention network models inter-vehicle interactions; and a Gated Multimodal Unit (GMU) adaptively fuses the temporal and spatial streams. Future positions are parameterized as bivariate Gaussians and decoded by a two-layer GRU. Using probabilistic trajectory forecasts for the main vehicle and its surrounding vehicles, collision probability and collision intensity are computed at each prediction instant and integrated via a weighted scheme into a Collision Risk Index (CRI) that characterizes risk over the entire horizon. On HighD, for 3–5 s horizons, average RMSE reductions of 0.02 m, 0.12 m, and 0.26 m over a GAT-Transformer baseline are achieved. In high-risk lane-change scenarios, CRI issues warnings 0.4–0.6 s earlier and maintains a stable response across the high-risk interval. These findings substantiate improved long-horizon accuracy together with earlier and more reliable risk perception, and indicate practical utility for lane-change assistance, where CRI can trigger early deceleration or abort decisions, and for risk-aware motion planning in intelligent driving.

1. Introduction

Driving risk assessment is crucial for ensuring traffic safety, as accurately predicting and quantifying risk allows vehicles to take preventive measures, thereby avoiding collisions or reducing accident severity [1]. With the rapid development of intelligent driving and vehicle-circuit coordination technology, the information available for risk assessment of connected autonomous vehicles (CAVs) has become richer and richer, and modeling and analyzing the evolution trend of vehicle behavior and traffic situation based on various types of information, and then predicting the probability of future collision has become a key task of CAV risk assessment.
In the early stage, research mainly focused on the spatio-temporal relationship between vehicles at a certain static moment. Li et al. [2] modeled the rear-end collision risk with the inverse of Time to Collision (TTC) and constructed the mapping relationship between the vehicle speed difference, acceleration difference and the risk level. Some studies [3] introduced statistical modeling methods to enhance the time dynamic expression ability, such as the fault time survival model constructed based on the TTC index, which is able to portray the evolution trend of the risk over time, expanding the application scenario of the traditional threshold method. With the deepening of risk perception, researchers gradually incorporate driving intentions and behavioral rules into risk modeling, evolving from pure “state judgment” to “response boundary modeling”. Liu et al. [4] use natural driving data to empirically calibrate a responsibility-sensitive safety (RSS) model and optimize its response boundary setting in real scenarios; Chen et al. [5] linearly map the TTC to the probability of collision and construct a staged risk model in overtaking scenarios. These methods give preliminary behavioral discriminative ability to static risk estimation by introducing behavioral semantics and rule constraints, but they are still insufficient in dealing with complex and variable interaction situations. In order to break through the above limitations, research has gradually shifted to trajectory prediction-driven collision risk estimation, which simulates potential interactions between vehicles by predicting future trajectory sequences and combines them with probabilistic models to estimate the risk level within the prediction window. Shangguan et al. [6] were the first to apply LSTMs to trajectory prediction, and introduced data-driven time series modeling mechanisms by extrapolating the time distribution of collisions through Monte Carlo simulation. Xie et al. [7] further constructed a multimodal trajectory generation network, fusing speed difference, intent recognition and environment awareness to enhance the dimensions of risk modeling, while Huang et al. [8] based on Gaussian Mixture Model (GMM) for trajectory prediction and combined with a fuzzy inference system to achieve a multi-class soft risk assessment, which enhances the adaptive ability of the model in uncertain scenarios. In recent years, transformer architecture has also been widely introduced into risk inference tasks due to its powerful time-series modeling capability. Anik et al. [9] proposed the inTformer model, which constructs a transformer network based on networked vehicle data, directly predicts the probability of collision in the intersection scenario, and completes the process from “trajectory input” to “risk input”. J. Chen et al. [10] proposed RI-DiT (Risk-Informed Diffusion Transformer) on the basis of this model, which introduces explicit methods such as TTC into the long-tailed trajectory prediction process. The RI-DiT is introduced in the process of long-tailed trajectory prediction, which introduces explicit risk features such as TTC to achieve early identification of high-risk behaviors, reflecting the development trend of fusion modeling of trajectory prediction and risk features. Chai et al. [11] constructed GACNet (Graph Attention Cooperative Network) network, used GAN (Generative Adversarial Network) + GAT to generate trajectories with high interactivity, and designed a conflict analysis module to identify potential risky scenarios, which enhances the model’s ability to recognize high-risk interactions, and Meng et al. [12] proposed a trajectory prediction model that integrates LSTM (Long Short-Term Memory), CSP (Common Spatial Pattern), and GAT, a trajectory prediction model that combines LSTM, CSP, and GAT, and calculates continuous risk scores by combining TTC and Minimum Distance Boundary (MDM) to achieve dynamic modeling of the risk evolution process in complex behaviors such as lane changing. In summary, vehicle collision risk assessment has experienced a gradual evolution from static state judgment models, behavioral rule enhancement models, to dynamic probabilistic inference models based on trajectory prediction. The current research trend is shifting from passive risk judgment at a single point in time to an active perception and risk prediction framework oriented to future time domains and multi-subject interactions to cope with more complex and dynamic transportation environments.
However, existing research primarily focuses on static representations of vehicle behaviors and risks, often without effectively integrating trajectory prediction and risk assessment processes. While trajectory prediction has made significant progress, existing methods are often decoupled from risk assessment, which limits their ability to provide dynamic and continuous risk evaluation over time. This gap, along with the insufficient modeling of complex multi-vehicle interactions in real-time scenarios, motivates the development of the proposed collision risk assessment framework. Table 1 presents a comparison of existing driving risk assessment models, summarizing their strengths and limitations in trajectory prediction and collision risk assessment. As shown in the table, while several models achieve good prediction accuracy for single-vehicle trajectories, they fail to adequately model the interaction between multiple vehicles or to combine trajectory prediction with risk assessment effectively. Aiming at the above problems, this paper proposes a collision risk assessment framework based on multi-vehicle trajectory prediction. The framework performs multi-vehicle trajectory prediction through the STGAT-EDGRU model, outputs the Gaussian distribution of the future positions of the main vehicle and the surrounding vehicle, calculates the collision probability and collision intensity of the main vehicle and the surrounding vehicle at any moment based on the prediction results, and designs a weighted fusion strategy to construct the collision risk index (CRI), which achieves the deep coupling between the trajectory prediction and risk assessment processes. The contributions of this paper are as follows:
  • The STGAT-EDGRU trajectory prediction model is constructed, which uses Transformer to extract the vehicle motion temporal features, the improved GAT structure to extract the spatial interaction features, and achieves the effective fusion of the temporal and spatial features through the gated multimodal unit (GMU), and the decoding end adopts a two-layer GRU network in order to generate the future location of 2D Gaussian distribution. Comparison experiments on the HighD dataset verify the significant advantage of this method in long-time prediction accuracy;
  • Based on the predicted trajectories of the main vehicle and the surrounding vehicle, combining the collision probability and the collision intensity, a collision risk index (CRI) is designed to weigh the likelihood and severity of the risk in the predicted time domain. It is verified that the CRI can warn of potential risk about 0.4 s earlier than the traditional collision risk index in a typical high-risk lane-changing scenario, and maintains the stable risk tracking ability during this critical time. It also has risk tracking ability;
  • This paper proposes a collision risk assessment framework based on multi-vehicle trajectory prediction, which is capable of portraying the uncertainty in the process of multi-vehicle interaction and risk evolution in complex road scenarios. Compared with existing driving risk assessment models, the model achieves the deep coupling of trajectory prediction and collision risk assessment, providing more prospective risk warning support for autonomous driving systems.
This paper is organized as follows:
Section 2 is a review of related literature. Section 3 proposes a vehicle trajectory prediction model STGAT-EDGRU, which integrates Transformer and improved GAT, and elaborates on the structural design and functions of each module, including temporal modeling, spatial interaction extraction, and spatio-temporal feature fusion mechanism. Section 4 constructs a collision risk assessment model based on the predicted trajectories, combines the collision probability and collision intensity to construct CRI indicators, comprehensively measures the potential collision risk in the predicted time domain, and carries out example validation in the high-risk lane-changing scenario. Section 5 carries out comparison experiments, ablation experiments and visualization analysis through a large number of experiments on the HighD dataset to comprehensively verify the effectiveness and advantages of the proposed method in terms of trajectory prediction accuracy and risk assessment capability. Section 6 summarizes the research work of this paper and provides an outlook on future research directions.

2. Literature Review

2.1. Trajectory Prediction

Early trajectory prediction research primarily employed physics-based modeling methods, such as constant velocity (CV), constant acceleration (CA), and constant turning rate plus acceleration (CTRA) models [13], which extrapolate future trajectories based on current motion state variables. However, these methods generally assume linear vehicle motion, making them ill-suited to the uncertainties and complex interactions inherent in real-world road conditions, resulting in limited prediction accuracy and robustness. To overcome these limitations, researchers have proposed trajectory prediction methods based on maneuvering intent, guiding trajectory generation by identifying the driver’s intent (e.g., straight driving, turning, lane changing, etc.). Representative methods include the Gaussian mixture model (GMM) [14] and the hierarchical expert mixture model (HME) [15], some of which incorporate uncertainty modeling in trajectory generation. However, such methods typically assume that vehicles are independent of each other, failing to adequately model the interactive effects between traffic participants. In recent years, deep learning methods have demonstrated significant advantages in trajectory prediction tasks. Among these, encoder–decoder architecture models based on recurrent neural networks (RNNs) and their variants have gained widespread favor due to their excellent temporal data modeling capabilities. Ji Xuewu et al. [16] proposed a driving intention recognition and vehicle trajectory prediction model based on LSTM networks; Yang et al. [17] introduced attention mechanisms to enhance modeling capabilities at critical moments. However, most methods still focus on single-vehicle prediction of the main vehicle, lacking modeling of the motion states and spatial relationships of surrounding vehicles. To model vehicle spatial interaction relationships, Deo et al. [18] proposed a “social pooling” mechanism, projecting the motion states of adjacent vehicles onto a unified spatial grid and using pooling and convolution operations to extract interaction features. Reference [19] modeled vehicle interactions as a graph and utilized graph convolutional networks (GCN) to aggregate neighbor information. Reference [20] addresses data denoising techniques, including wavelet filters and moving averages, to clean traffic flow data. These methods are crucial for improving data quality, ensuring more accurate trajectory predictions in dynamic traffic scenarios. Reference [21] employs a GAT to model multi-vehicle interactions, dynamically focusing on neighboring vehicles that have a more significant impact on the main vehicle. Additionally, reference [22] provides a detailed analysis of lane-changing behavior using a large volume of vehicle trajectory data, emphasizing the significance of interaction modeling in traffic flow prediction.
Although existing trajectory prediction methods have to some extent modeled spatial interactions between vehicles through social pooling and graph neural networks, most models are still limited to static interaction representations at specific time points and fail to capture the temporal evolution characteristics of interaction relationships in historical trajectories. As a result, their performance is constrained when dealing with highly dynamic and complex interaction scenarios such as lane-changing games. Additionally, current mainstream models generally focus on single-vehicle trajectory prediction for the main vehicle, lacking the ability to collaboratively model the trajectories of surrounding vehicles, making it difficult to provide comprehensive surrounding vehicle motion information support for collision risk assessment based on interaction understanding.

2.2. Risk Assessment Methods

Existing risk assessment methods are primarily divided into two categories: deterministic methods and probabilistic methods [23], distinguished by whether they account for the uncertainty of future vehicle movements. Deterministic methods are based on traditional physics-driven models [24,25,26,27,28], utilizing current state variables such as the distance, speed, and acceleration between the self-vehicle and the preceding vehicle to calculate metrics such as time-to-collision (TTC) [29], braking time (TTB) [30], post-encroachment time (PET) [31], deceleration rate for collision avoidance (DRAC) [32], and time headway (THW) [33]. These methods are simple to model and computationally efficient, making them widely adopted in engineering practice. However, these metrics essentially assume that vehicle motion is state-invariant, such as constant speed and constant acceleration, failing to capture the dynamic and uncertain nature of driving behavior during interactions. This makes it difficult to provide accurate decision-making criteria for vehicle safety [34]. For example, if a driver or autonomous driving system deviates from expected inputs, collision predictions may overlook potential risks; or if the state estimation of a perception system exhibits significant deviations under certain edge cases, risks may also be misestimated. To address these limitations, researchers have recently proposed various probabilistic risk assessment methods that incorporate uncertainty factors to enhance the model’s expressive capability [35]. These methods not only focus on the vehicle’s current state but also consider uncertainties arising from potential disturbances in behavior, perception, and the environment. For instance, Noh et al. [31] proposed a Bayesian network-based risk probability model that incorporates positional uncertainty to more reasonably measure safety margins; Li et al. [32] constructed a scenario assessment module based on conditional random fields, integrating vehicle state and environmental factors into a unified model to predict potential risk levels; Wang et al. [36] introduced the “driving safety field” theory, establishing a driving risk field model inspired by physical field principles to describe the overall risk force field acting on the vehicle; Tian Ye et al. [37] further integrated TTC into the risk field model and added vehicle geometric and heading angle information to enhance model adaptability. Although the above methods significantly improve risk modeling capabilities, most remain at the static situational awareness level of the current moment and lack the ability to predict dynamic risk evolution in the future time domain. Recent studies have combined trajectory prediction methods with risk assessment, predicting the vehicle’s motion trajectory in the future time domain, modeling its position uncertainty using a probability distribution, and then assessing the collision probability over the entire prediction time window. Compared to instantaneous risk indicators based on the current state, such methods provide more forward-looking and continuous risk assessment results. For example, the RI-DiT model [10] uses predicted future trajectories as input and calculates collision risk features such as TTC at each time step based on the predicted position and velocity distributions, thereby achieving continuous collision probability assessment.
However, existing methods that combine trajectory prediction with risk assessment suffer from decoupling between the trajectory prediction and risk assessment processes, lack effective fusion modeling, and fail to effectively integrate collision probability with collision intensity, resulting in limitations in the comprehensive assessment of risk. To address this issue, this paper proposes a collision risk assessment framework based on multi-vehicle trajectory prediction. It calculates the collision probability and collision intensity between the main vehicle and surrounding vehicles at any given time based on the multi-vehicle trajectory distribution from the trajectory prediction module and designs a weighted fusion strategy to construct a collision risk index (CRI), achieving deep coupling between trajectory prediction and risk assessment processes, thereby enabling more precise risk assessment.

3. Vehicle Trajectory Prediction Model

3.1. Problem Description

The trajectory prediction problem aims to infer the motion trajectory of a vehicle in the future time domain based on its observed data in the historical time domain. To meet the practical needs of collision risk assessment, the trajectory prediction model constructed in this paper not only focuses on the motion evolution of the main vehicle itself, but also encompasses the trajectory prediction of its surrounding key vehicles, supporting a more comprehensive dynamic risk assessment. Assuming that the total number of vehicles in the target scenario is N and the observation time is t o b s , the trajectory X of these N vehicles in the historical time domain t h is shown in Equation (1):
X = P 1 , P 2 , , P n , , P N
where P n represents the historical trajectory of a certain vehicle, encompassing six-dimensional features including horizontal and vertical positions, speed, and acceleration, as illustrated in Equation (2):
P n = x n t o b s t h + 1 , y n t o b s t h + 1 , v x n t o b s t h + 1 , v y n t o b s t h + 1 , a x n t o b s t h + 1 , a y n t o b s t h + 1 , , x n t o b s , y n t o b s , v x n t o b s , v y n t o b s , a x n t o b s , a y n t o b s
The trajectory Y of the N vehicles within the prediction time horizon t f is shown in Equation (3):
Y = P ^ 1 , P ^ 2 , , P ^ n , , P ^ N
where P ^ n represents the predicted trajectory of a certain vehicle, including both horizontal and vertical positions, as shown in Equation (4):
P ^ n = x ^ n t o b s + 1 , y ^ n t o b s + 1 , , x ^ n t o b s + t , y ^ n t o b s + t , , x ^ n t o b s + t f , y ^ n t o b s + t f
Referring to references [38,39,40,41], it is believed that the predicted position of a vehicle at a certain moment follows a two-dimensional Gaussian distribution, as shown in Equation (5):
x ^ n t , y ^ n t N μ ^ n t , σ ^ n t , ρ ^ n t
where μ ^ n t is the mean, σ ^ n t is the standard deviation, and ρ ^ n t is the correlation coefficient.
Therefore, the trajectory prediction problem in this paper can be defined as follows: given the trajectories X of the main vehicle and all its neighboring vehicles in the historical time domain t h under the target scenario, predict their trajectories Y in the prediction time domain t f , and represent the distribution of vehicle positions at each moment in the prediction time domain by a two-dimensional Gaussian distribution.

3.2. Model Architecture

The vehicle trajectory prediction model based on spatio-temporal feature fusion graph attention network proposed in this paper adopts an encoder–decoder structure, including a trajectory encoding module, a spatial interaction feature extraction module, a spatio-temporal feature fusion module, and a decoding prediction module. The model structure is shown in Figure 1.

3.2.1. Trajectory Feature Encoding Module

The trajectory feature encoding module aims to extract discriminative temporal features from historical trajectory sequences and model the dynamic evolution patterns of vehicle motion. This module is based on the Transformer architecture, which leverages its powerful global modeling capabilities and attention mechanisms to effectively capture long-range dependencies between time steps while ensuring parallel computing efficiency. The specific computational process is described as follows:
First, as shown in Equation (6), the linear layer Linear maps the historical trajectory X into a high-dimensional space, transforming it into a high-dimensional tensor X that is more suitable for matrix operations. This serves as the embedding of the vehicle’s historical trajectory for subsequent trajectory prediction tasks:
X = L i n e a r X
Utilizing the attention mechanism of the Transformer model, the historical trajectory embedding X is multiplied by learnable weight matrices W q , W k , and W v , mapping it to a query vector Q , a key vector K , and a value vector V , as shown in Equation (7).
Q = X W q , K = X W k , V = X W v
The similarity between trajectory feature vectors at different time steps is calculated using the dot product of Q and K . The similarity is normalized using the softmax function to obtain the attention weights, and the output of the attention mechanism is obtained by weighted summation of V , as shown in Equation (8):
A t t e n t i o n Q , K , V = soft m a x Q K T d k V
where softmax is the normalized exponential function and d k is the K dimension.
To enhance the expressive power of the model, the Transformer model adopts a multi-head attention mechanism, using m projection matrices to map the vehicle’s historical trajectory into m subspaces, calculating the output of each attention head in each subspace, and obtaining the output of the multi-head attention mechanism through concatenation and linear transformation. This process is represented by Equations (9) and (10):
h e a d i = A t t e n t i o n X W i q , X W i k , X W i v
M u l t i H e a d A t t e n t i o n ( X ) = C o n c a t ( h e a d 1 , , h e a d m ) W O
where h e a d i is the self-attention output of the i-th head; the number of attention heads is m ; W i q , W i k , and W i v are the learnable parameter matrices in the self-attention of the i-th head; W O is a linear transformation matrix used to map the concatenated multi-head attention output to the final output space; M u l t i H e a d A t t e n t i o n ( X ) is the output of the multi-head attention mechanism.
In addition, to prevent network degradation and gradient explosion, the output of the multi-head attention mechanism in the Transformer model undergoes two residual connections and layer normalization operations to ensure stable data distribution across layers. This ultimately yields the encoding matrix R of vehicle trajectory features in high-dimensional space. The specific calculation process is shown in Equations (11) and (12):
X = L a y e r N o r m X + M u l t i H e a d A t t e n t i o n X
R = L a y e r N o r m X + F e e d F o r w a r d X
where L a y e r N o r m is a layer normalization function; F e e d F o r w a r d is a feedforward layer.

3.2.2. Improved GAT Interactive Feature Extraction Module

In complex road scenarios, there are significant spatial interaction behaviors between vehicles, such as parallel driving, following, and lane changing, especially in dense or high-speed scenarios, where the spatial relationships between vehicles significantly affect their future trajectories. Therefore, accurately modeling spatial interaction relationships is crucial for improving trajectory prediction accuracy. In traditional graph neural networks, Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) are common approaches, typically achieving interaction information input by aggregating the features of neighboring nodes around the central node. In contrast, GAT can handle dynamic graph structures, while GCN is constrained by the Laplacian matrix and cannot be applied to dynamic graphs. For scenarios with frequent changes in dynamic vehicle states, GAT offers greater flexibility, and prior work has demonstrated its excellent performance in describing vehicle-to-vehicle interaction relationships.
Therefore, this paper proposes an improved Graph Attention Network (GAT) interaction feature extraction module to extract local interaction features between all vehicles at the current time. This module retains the “weighted aggregation of adjacent nodes” concept from GAT while introducing distance-constrained sparse adjacency relationship construction and local attention mechanisms to reimplement a local attention mechanism based on distance metrics and vehicle pair-wise combinations.
At each time step, suppose there are N vehicles in the scene, and let the spatio-temporal fusion feature of vehicle j be f j (for j   = 1 , 2 , , N ). For each vehicle j , we define a local neighborhood N ( j ) as the set of vehicles that are spatially close at that moment. Concretely, a vehicle k is included in N ( j ) if its center-to-center distance from j in world coordinates is within D c l o s e ; vehicles beyond D c l o s e are excluded from attention. This distance-constrained neighborhood yields a sparse, stable interaction graph and, over the 3–5 s prediction horizon used on HighD, consistently captures the vehicles most relevant to lane-change behavior. The D c l o s e = 50   m threshold is fixed in this study to maintain a consistent receptive field across scenes; exploring adaptive thresholds is left for future work.
To extract the spatial interaction between each vehicle j and its neighboring vehicles, this paper adopts a local attention mechanism based on Transformer. Firstly, the feature vector f j of each vehicle undergoes a linear transformation to generate a query vector q i , a key vector k i , and a value vector v i :
q i = W q f j , k i = W k f j , v i = W v f j
where W q , W k , and W v are learnable parameters that perform pooling operations on the main vehicle and surrounding vehicles, respectively.
For each vehicle j , the attention score between it and its neighboring vehicle i is represented by the dot product of the query vector q j T and the key vector k i :
e j i = q j T k i
Subsequently, the attention scores are calculated through matrix multiplication, followed by softmax normalization, to obtain the attention weights α j i between the main vehicle and the surrounding vehicles:
α j i = e x p e j i j N ( j ) e x p e j j
The value vectors of neighboring vehicles are weighted and aggregated based on attention weights to obtain the spatial interaction representation of vehicle j :
r j = i N b α j i v i
In this study, a multi-head attention mechanism is adopted rather than a single-head implementation. Multiple heads project the features into different subspaces and compute attention in parallel, allowing the model to capture diverse interaction patterns among vehicles. The outputs of the individual heads are concatenated and linearly transformed, which improves the expressiveness and robustness of the spatial interaction representation.
To enhance the feature representation ability, residual connections and layer normalization mechanisms are introduced. The original feature f j is added to the aggregation result r j and then normalized:
r j = LayerNorm f j + r j
Finally, advanced features are further extracted through the full-layer and static MLP to obtain the final spatial interaction feature s j of vehicle j :
s j = MLP r j
It should be noted that in this study no explicit continuous distance-decay function is applied. Instead, a hard threshold of D c l o s e = 50   m is used to determine neighbors. Within this range, the relative influence is entirely determined by the attention weights after Softmax normalization.

3.2.3. Spatio-Temporal Feature Fusion Module

The temporal dimension modeling module is responsible for extracting the evolutionary patterns of vehicle motion from historical trajectories. It uses a Transformer encoder to model vehicle time-series data, simulating their continuous motion processes and physical dynamic characteristics. The spatial interaction modeling module focuses on dynamically capturing the interaction relationships between vehicles and neighboring vehicles at each frame moment, reflecting their spatial collaborative behavior under the current traffic conditions.
To further investigate the role of the GMU, we analyzed its gating weights. Results show that temporal features are generally favored over spatial features in most scenarios. This reflects the fact that the main vehicle’s historical motion provides the strongest cues for trajectory continuation. By contrast, spatial features become more dominant in interaction-intensive situations such as lane changes or dense traffic. These results confirm that the GMU adaptively rebalances the two modalities, prioritizing temporal information for stable driving while amplifying spatial information when interactions are critical.
Therefore, this paper introduces the GMU module after the temporal and spatial modeling modules to achieve spatio-temporal feature fusion [42], thereby effectively coupling the two types of information. This module fuses the temporal encoding of the last frame of the main vehicle with its current spatial interaction features through feature alignment and concatenation operations, constructing a more predictive and discriminative joint representation as input for the subsequent trajectory decoder. Through this fusion mechanism, the model can simultaneously perceive the evolution trend of the vehicle’s own intentions and the real-time influence of neighboring vehicles on its behavior, thereby generating a more coherent, reasonable, and interaction-aware future trajectory.
As shown in Figure 2, at each time step of trajectory prediction, the vehicle inputs the time feature vector s extracted from its historical trajectory and the spatial interaction feature vector r extracted by the improved GAT module at the current time into the gated spatio-temporal fusion module to generate the fused joint feature representation f, which serves as the input for the trajectory decoder.
First, the input temporal features s and spatial features r are subjected to a tanh activation function to extract nonlinear representations. Then, an S-shaped activation function is applied to obtain the fusion gate, which primarily controls the weight ratio between temporal and spatial features during fusion, ensuring that the sum of elements at corresponding positions in the weight matrix remains constant at 1 when the two input points are multiplied. The calculation formula for this process is shown in Equations (19)–(21):
f s = tanh W s · s
f r = tanh W r · r
z = σ W z · f s , f r
where W s , W r , W z are learnable parameter matrices; z is the control gate; and σ is the sigmoid activation function.
Then, the spatial and temporal features are fused using a fusion gate, and the calculation formula for fusion is
f = z f s + 1 z f r
where denotes the Hadamard product of matrices, which is the product of matrices element by element. The gating vector z = σ(·) is produced by a Sigmoid (S-shaped) activation, which maps inputs to the (0,1) interval and thus provides a soft weighting mechanism for adaptively balancing temporal features f s and spatial features f r . This ensures that, for each feature dimension, the weights are non-negative and sum to one, resulting in a convex combination of the two streams.
The choice of Sigmoid is motivated by its suitability for continuous gating: for two inputs, σ ( u v ) is mathematically equivalent to a two-input softmax, and its bounded gradient (maximum 0.25) promotes stable optimization. This property is particularly beneficial in spatiotemporal fusion, where the relative importance of temporal and spatial cues can vary significantly depending on the driving context (e.g., lane keeping vs. lane changing).
A potential limitation of Sigmoid is its tendency to saturate near 0 or 1 when the pre-activation magnitude is large, which may hinder gradient flow. To mitigate this effect, we initialize the gate bias to zero, adopt moderate parameter scales, and standardize the inputs to keep pre-activation values within an effective range. Under these settings, the fusion gate exhibited stable behavior and consistent convergence in our experiments.
The specific algorithm flow in the spatio-temporal fusion module is shown in Algorithm 1.
Algorithm 1: Spatio-temporal fusion algorithm
Input: temporal feature s; spatial feature r
Output: fused spatio-temporal feature f
 1. f s = tanh(s)
 2. f r   = tanh(r)
 3. z = sigmoid(add( f s , f r ))
 4. f = add(multiply( f s , 1-z), multiply( f r , z))

3.2.4. Decoding and Prediction Module

In the decoding prediction module, a dual-layer GRU network is employed, consisting of an E _ G R U that extracts global representations of spatio-temporal features and a D _ G R U that performs autoregressive trajectory prediction. We adopt GRUs instead of a Transformer decoder because the prediction horizon in this study (3–5 s) requires stable step-by-step auto-regressive generation rather than long-range parallel modeling. GRUs efficiently capture sequential dependencies with lower computational cost, providing a better trade-off between accuracy, efficiency, and robustness for this task.
Firstly, the spatio-temporal feature vector sequence in the historical time domain is taken as the input of E _ G R U to obtain the hidden state vector h t o b s at the observation time, as shown in Equation (23).
h t = E G R U h t 1 , f t , W e ,   t = t o b s t h , t o b s t h + 1 , , t o b s
where h t is the hidden state vector at time t ; h t 1 is the hidden state vector at time t -1; the initial hidden state vector is the zero vector; f t is the spatio-temporal feature vector at time t; W e is the weight matrix of the E _ G R U layer.
Then, the position coordinates of the vehicle at the previous moment are used as the input to the D_GRU to generate the vehicle position distribution at the predicted moment, as shown in Equation (24).
h t = D G R U h t 1 , p t 1 , W d , t = t o b s + 1 , t o b s + 2 , , t o b s + t f
where h t is the hidden state vector at time t ; h t 1 is the hidden state vector at time t-1; p t 1 is the vehicle position coordinate at the previous time step. During the model training phase, p t 1 represents the actual vehicle position coordinate, while during the model prediction phase, p t 1 represents the mean value from the predicted position distribution at the previous time step; W d is the weight matrix of the D _ G R U layer.
Finally, using Equation (25), we obtain the predicted results of the vehicle position distribution at time t, o t = μ ^ t , σ ^ t , ρ ^ t , where μ ^ t   is the mean of the predicted position, σ ^ t is the standard deviation of the position distribution, and ρ ^ t is the correlation coefficient.
o t = F C h t , W f c , t = t o b s + 1 , t o b s + 2 , , t o b s + t f
where F C is a fully connected layer, and W f c is the weight matrix of the F C layer.
During the decoding prediction process, the input first consists of the hidden state vector h t o b s at the observation time and the true coordinates μ t o b s . The output at the current time is then used as the input for the next D_GRU unit. This prediction process is repeated iteratively until the model outputs the predicted vehicle trajectories for all time points within the time domain.

4. Collision Risk Assessment Model

4.1. Two-Vehicle Collision Risk Assessment

To comprehensively measure the collision risk between two vehicles at a given moment, this paper combines the calculation of collision probability and collision intensity [43] to assess the collision risk F i , j t between two vehicles, as shown in Equation (26).
F i , j t = P i , j t · E i , j t
where P i , j t represents the probability of a collision between the two vehicles at time t ; E i , j t represents the collision intensity when the two vehicles collide at time t .
The position distributions of vehicle i   and vehicle j at time t obtained through the trajectory prediction model are as follows:
p i t ~ N o i t = N μ ^ i t , σ ^ i t , ρ ^ i t
p j t ~ N o j t = N μ ^ j t , σ ^ j t , ρ ^ j t
Assuming that the predicted position distributions of the two vehicles are independent of each other, the relative position of the two vehicles also follows a two-dimensional Gaussian distribution, as shown in Equation (29):
p i , j t ~ N μ ^ i t μ ^ j t , σ ^ i t + σ ^ j t , ρ ^ i t + ρ ^ j t
The collision condition is defined as the distance between the two vehicles being less than a set threshold, i.e., p i , j t δ , where δ is determined by the vehicle dimensions. In this paper, it is defined as
δ = 1 2 L i + L j 2 + W i + W j 2  
where L i , W i , L j , and W j are the length and width of vehicles i and j , respectively.
To calculate the probability of a collision between two vehicles, the Monte Carlo method is used to randomly sample M data points from the two-dimensional Gaussian distribution p i , j t of the relative positions of the two vehicles. The relative distance of each sample is calculated, and the number of samples that collide, M h i t , is counted. Finally, the probability of a collision between the two vehicles at time t is obtained as:
P i , j t = M h i t M
Collision intensity is closely related to vehicle speed. The collision intensity calculation expression is defined as in Equation (32):
E i , j t = e a r c t a n v i v j
where v represents the travel speed of each vehicle. Since the trajectory prediction model does not predict vehicle speed, the speed is obtained by calculating the rate of change in vehicle position.

4.2. Multi-Vehicle Collision Risk Assessment

Vehicle collision risk is related to multiple neighboring vehicles. That is, for vehicle j , its collision risk F j t is the sum of the risks caused by vehicle i ( i N j , where N j is the set of vehicles connected to vehicle j in the interaction graph). To avoid errors in collision risk assessment caused by differences in the number of neighboring surrounding vehicles for different main vehicles, this paper introduces a weighted average strategy. Each surrounding vehicle is assigned a weight based on its collision probability, thereby more reasonably measuring each surrounding vehicle’s contribution to the main vehicle’s collision risk. The specific calculation formulas are as follows:
w j = P i , j t i N j   P i , j t
F j t = i N j   w j F i , j t

4.3. Predictive Time-Domain Collision Risk Fusion

Collision risk assessment indicators can be used to calculate the collision risk of the main vehicle at each time point in the predicted time domain. However, in practical applications, the focus is on utilizing the prediction results to support risk warnings and real-time decision-making for the main vehicle at the observation time point. Therefore, this paper converts the collision risk time series in the predicted time domain into a comprehensive collision risk index (CRI) to comprehensively measure the probability, urgency, and severity of potential collision risks in the predicted time domain.
First, the weighted average risk F m e a n at each time point within the predicted time domain is calculated, as shown in Equation (35):
F m e a n = t = 1 t h 1 α t h 2.5 + 1
where F j t represents the risk value of vehicle j at time t , and w t is the weight of the risk value at that time.
Then, the average risk F m e a n and the maximum risk   F m a x are fused to obtain the CRI, calculated using Equation (36):
C R I = α 1 F m e a n + α 2 F m a x
where α 1 and α 2 are adjustable weight parameters used to adjust the CRI’s emphasis on average risk and maximum risk.

5. Model Experiment and Analysis

5.1. Data Processing

The experiment selected the HighD dataset for model training and testing. This dataset consists of vehicle trajectories extracted from drone surveillance videos of German highways, including multi-dimensional features such as time frames, coordinates, speed, acceleration, driving direction, and vehicle size, with a data recording frequency of 25 Hz. This study identified three types of events: maintaining lane keeping (LK), left lane change (LCL), and right lane change (LCR).
Lane change events can be divided into the lane change intention generation phase (LCI) and the lane change execution phase (LCE). Recent HighD-based analysis reports lane-changing durations of 2.88–7.32 s and speeds of 18.83–43.30 m/s, obtained via a fusion criterion of frame-interval and lateral-displacement with visual calibration [22]. These empirical ranges support our choice of a 3–5 s prediction horizon for trajectory forecasting and risk evaluation.
By determining the starting frame t s t a r t of the LCE, lane change events can be extracted. Previous studies [44] have provided detailed definitions and selection methods for t s t a r t , which are not repeated here. To avoid any overlap between observation and prediction, a half-open time convention is adopted: the historical time domain is [ t s t a r t   − 3 s, t s t a r t ) (exclusive of t s t a r t ), and the prediction time domain is [ t s t a r t , t s t a r t   + 5 s] (with t s t a r t belonging to the prediction domain). Based on these intervals, the trajectories of the lane-changing vehicle and its surrounding vehicles that meet the selection requirements are extracted from the dataset to construct lane-change event samples. At the same time, vehicles that did not change lanes are filtered from the dataset, and the trajectory data of the main vehicle and surrounding vehicles traveling straight for 8 s is extracted. The historical time domain and prediction time domain are divided to construct a sample of straight driving events. Finally, 2169 LCL samples and 2385 LCR samples are constructed. Since the number of LK events far exceeds that of LCL and LCR, to balance the number of samples of each type, 3000 LK samples are randomly selected and divided into training, validation, and test sets in an 8:1:1 ratio.
To reduce computational resource consumption, this paper employs frame sampling to reduce the trajectory recording frequency in the samples to 5 Hz, retaining 15 frames and 25 frames in the historical and predictive time domains, respectively. Additionally, this paper establishes a coordinate system with the position of the main vehicle in the first frame of each sample as the origin, with the longitudinal direction of the main vehicle’s movement as the positive direction of the x-axis, and a 90° clockwise rotation as the positive direction of the y-axis, as shown in Figure 3. A unified coordinate system avoids data feature differences between samples from different lanes and different driving directions in the HighD dataset under an absolute coordinate system, thereby improving model training performance and prediction accuracy.

5.2. Analysis of Experimental Results for Trajectory Prediction Models

5.2.1. Model Parameter Settings

The model training epochs were set to 120, the batch size to 64, the optimizer to Adam, and the learning rate to 0.0002. In the trajectory feature encoding module, the linear layer maps the trajectory features from 6 dimensions to 128 dimensions, the transformer encoder has 2 layers, the attention head count is 8, the embedding dimension is 128, and the dropout is 0.2. The spatial interaction feature extraction module improves the GAT layer attention head count to 8 and the embedding dimension to 128. The embedding dimension in the spatio-temporal feature fusion module is 128. The decoding prediction module has a hidden layer dimension of 128, a fully connected layer input dimension of 128, and an output dimension of 5.
Since the model output is the probability distribution of the predicted vehicle position, the negative log-likelihood loss (NLL) shown in Equation (34) is used for model training.
L O S S = n = 1 N   t = t o b s + 1 t o b s + t f   log P x n t , y n t μ ^ n t , σ ^ n t , ρ ^ n t
where N is the total number of vehicles in the target scene; P x n t , y n t   μ ^ n t , σ ^ n t ,   ρ ^ n t represents the likelihood of the actual coordinates ( x n t , y n t ) at time t on the predicted probability distribution.
When validating and testing the model, to enable a side-by-side comparison with other trajectory prediction models, the root mean square error (RMSE) is used for evaluation, as shown in Equation (35):
R M S E = 1 N · t f n = 1 N   t = t o b s + 1 t o b s + t f   [ x ^ n t x n t 2 + y ^ n t y n t 2 ]
where N is the total number of vehicles in the target scene; x ^ n t , y ^ n t are the mean values of the predicted position distribution of the vehicle at time t ; x n t , y n t are the actual coordinates of the vehicle at time t .

5.2.2. Model Performance Analysis

This paper selects the following mainstream models in the field of trajectory prediction for comparison with the STGAT-EDGRU model proposed in this paper. Since most models can only predict the trajectory of a single target, this paper calculates the RMSE only for the predicted trajectory of the main vehicle in the horizontal comparison. The experimental results are shown in Table 2.
(1)
S-LSTM [45]: An LSTM Encoder-decoder structure that uses a fully connected social pooling layer to model vehicle-to-vehicle interactions.
(2)
CS-LSTM [18]: An LSTM Encoder-decoder structure that uses a convolutional social pooling layer to model vehicle-to-vehicle interactions.
(3)
Attention-LSTM [17]: An LSTM Encoder-decoder structure with a simple attention mechanism embedded.
(4)
GAT-LSTM [21]: An LSTM Encoder-decoder structure that uses a GAT network to model vehicle-to-space interactions.
(5)
GAT-Transformer [46]: A Transformer Encoder-decoder structure that uses a GAT network to model vehicle-space interactions.
From the prediction results in Table 2, we can observe that:
(1) Overall, models using the GAT module achieve significantly lower RMSE across all prediction time domains compared to models that do not model spatial interactions. This result indicates that the GAT, which incorporates an attention mechanism, can effectively capture the dynamic spatial relationships between vehicles, thereby enhancing the model’s overall trajectory modeling capability. (2) Both GAT-LSTM and GAT-Transformer employ spatial interaction modeling structures, but their encoder structures differ. GAT-LSTM uses LSTM to model temporal dependencies, while GAT-Transformer employs a Transformer architecture based on multi-head attention. Experimental results show that GAT-Transformer exhibits a slower increase in RMSE over longer time intervals (3–5 s), significantly outperforming GAT-LSTM. This indicates that the transformer architecture has an advantage in capturing long-term dependencies in trajectory features. (3) The STGAT-EDGRU model proposed in this paper performs similarly to GAT-Transformer in short-term predictions (1–3 s). However, in the long-term prediction stages of 4 s and 5 s, the prediction accuracy of STGAT-EDGRU is significantly better than that of the GAT-Transformer model. This result indicates that the gated spatio-temporal feature fusion mechanism introduced by STGAT-EDGRU effectively alleviates the information fragmentation between temporal features and spatial interaction features, enhancing the model’s ability to understand the evolution of interactive behaviors in complex dynamic environments, thereby significantly improving long-term trajectory prediction performance.

5.2.3. Ablation Experiment

To validate the effectiveness and necessity of the STGAT-EDGRU model structure, ablation experiments were conducted on each module of the model. Three comparison models were designed, and the RMSE was calculated only for the predicted trajectory of the main vehicle. The experimental results are shown in Table 3.
In the M1 model, the transformer in the trajectory feature encoding module was removed, and only the Linear layer was used to map the trajectory features from 6 dimensions to 128 dimensions. In the M2 model, the attention mechanism of the improved GAT in the spatial interaction feature extraction module was removed, so that the attention of the nodes in the interaction graph to the neighboring nodes was the same. In the M3 model, the spatio-temporal feature fusion module was removed, and the vehicle motion time features and spatial interaction features at the observation time were directly input into the decoding prediction module after being concatenated.
As shown in Table 3: (1) Compared to the original model, the M1 and M3 models show a slight decrease in short-term trajectory prediction accuracy, and as the prediction time domain increases, their prediction accuracy rapidly declines. This aligns with the analysis results from the model comparison, indicating that both the Transformer and spatio-temporal feature fusion mechanisms effectively enhance the model’s long-term trajectory prediction performance. (2) The prediction accuracy of the M2 model is significantly lower across the entire prediction time domain. This may be because removing the attention mechanism from the improved GAT and replacing it with a fixed attention vector not only fails to extract spatial interaction information but also introduces spatial interaction information, significantly impairing the model’s prediction performance. (3) While M3 retains some predictive capability in short-term scenarios, it struggles to effectively integrate temporal and spatial information when faced with complex interactions and evolving movement intentions in long-term scenarios, leading to a significant decline in prediction accuracy.
These findings indicate that: (a) the Transformer encoder is particularly important for maintaining long-horizon accuracy; (b) the GAT attention mechanism is essential across all horizons for capturing dynamic inter-vehicle interactions; and (c) the GMU fusion module outperforms naive concatenation by better preserving complementary temporal–spatial cues. The greatest improvements in long-term prediction are achieved when the Transformer encoder and GMU fusion are jointly retained, while adaptive GAT attention ensures robust performance in diverse scenarios.

5.2.4. Analysis of Multi-Objective Trajectory Prediction Results

Figure 4 shows the visualization results of lane-changing main vehicle and surrounding vehicle trajectory predictions for a sample from the test set. In the figure, solid lines represent the actual historical trajectories of each vehicle within the observation time domain, while the two dashed lines represent the actual trajectories and model-predicted trajectories within the prediction time domain, respectively. At each time point in the prediction time domain, three sampling points are obtained by sampling the two-dimensional Gaussian distribution of the predicted position, marked with “*”. In this multi-vehicle trajectory prediction task, the average RMSEs of the model in the prediction time domain from 1 to 5 s are 0.25, 0.38, 0.55, 0.92, and 1.74, respectively. Compared to the single-objective prediction task for the main vehicle, the overall accuracy of multi-objective trajectory prediction is slightly lower, with a more significant increase in error in the later stages.
The primary reasons for this are the way the training samples are constructed and the asymmetry in the interaction modeling. During training, scene samples were constructed centered on the lane-changing main vehicle, enabling the main vehicle to obtain more comprehensive historical trajectory information and interaction influences from surrounding neighboring vehicles in the input features, thereby gaining an advantage in spatio-temporal feature modeling. In contrast, the surrounding vehicles are often only within a local observation range, lacking complete interaction relationships with other vehicles, resulting in insufficient spatial context modeling capabilities. This makes their prediction results more susceptible to uncertainty disturbances, leading to relatively lower accuracy.

5.3. Collision Risk Assessment Model Case Analysis

5.3.1. Scene Design

Based on the minTTC field provided in the tracksmeta file of the HighD dataset, we used minTTC < 1 s as the screening condition and extracted test scenes with vehicles experiencing high-risk moments during driving as the main vehicles. As shown in Figure 5, the target scene can be described as a collision risk between Vid986 and the vehicle in front of the target lane during the right lane change. Using the lane-changing vehicle Vid986 as the main vehicle, we extracted the trajectory data of the main vehicle and surrounding vehicles and reduced the recording frequency to 5 Hz using frame extraction, resulting in 59 frames of trajectory data, as shown in the figure. Combining the concept of a sliding time window from the risk warning system, we set the window size to 15 frames and the step size to 1 frame, constructing a total of 45 time windows, with each time window serving as a sample.

5.3.2. Result Analysis

For each time window, trajectory prediction is performed using the STGAT-EDGRU model proposed in this paper, and then the collision risk index (CRI) is calculated using the collision risk assessment model. The metrics selected for comparison with the CRI include the classic traffic conflict metrics ITTC (Inverse Time to Collision) [47] and THW [48], as well as LCTTC (Lane-Change Time to Collision) [49], which considers the two-dimensional collision risk of adjacent vehicles. For the ITTC, THW, and LCTTC metrics, calculations are performed using the data from the last frame of each time window, with the results shown in Figure 6.
The results show that the ITTC metric significantly increases starting at 7.0 s, reaching a peak of 1.41 s−1 at 8.6 s, and then rapidly decreases to below 0.2 s−1, indicating the strongest potential for longitudinal conflict between the main vehicle and the leading vehicle in the target lane at that moment, with collision risk rapidly alleviating after the lane change is completed. LCTTC and THW reached their minimum values at 8.4 s and 8.6 s, respectively, at 0.675 s and 0.29 m, further indicating that this time period was the most critical moment in the entire process, with the smallest lateral spacing and longitudinal time safety margin between vehicles, entering a critical state. The three traditional indicators based on physical variables successfully reflected the risk peak moments before and after the lane change in this case. However, they fundamentally rely on current frame states such as speed and distance, lacking the ability to preemptively model behavioral trends, resulting in a certain degree of lag in risk warnings.
In contrast, the CRI (Collision Risk Index) metric proposed in this paper demonstrates stronger foresight and stability in identifying risks. The CRI value reaches its peak of 0.75 at 8.0 s, leading the most dangerous points of ITTC, LCTTC, and THW by 0.4–0.6 s, indicating its ability to provide advance warning of impending high-risk states. At the same time, the CRI remains at a high level of around 0.7 between 8.0 s and 8.6 s, without experiencing significant fluctuations due to short-term state changes (such as a vehicle entering the target lane), demonstrating excellent continuous risk perception capabilities. This performance is attributed to CRI being based on multi-vehicle trajectory prediction, no longer relying on single-frame state information, but instead using the overlap of probability distributions to characterize potential conflict trends over a future period, thereby achieving greater stability and robustness.

6. Conclusions

This work proposed STGAT-EDGRU, a spatio-temporal fusion architecture for multi-vehicle trajectory prediction and collision risk assessment, together with a Collision Risk Index (CRI) computed from predicted trajectories of the main and surrounding vehicles. On the HighD dataset, for 3–5 s horizons, the model achieved average RMSE reductions of 0.02 m, 0.12 m, and 0.26 m over a GAT-Transformer baseline. In high-risk lane-change scenarios, the index issued warnings 0.4–0.6 s earlier and maintained a stable response across the high-risk interval.

6.1. Contributions

The contributions of this paper are as follows:
A unified prediction architecture in which a Transformer learns temporal motion features, a GAT models inter-vehicle interactions, a GMU fuses temporal and spatial features, and a two-layer GRU decodes bivariate (2D) Gaussian future positions.
A CRI that combines collision probability and collision intensity at each prediction instant through a weighted fusion strategy, providing earlier and more stable warnings over the prediction horizon.

6.2. Applications

Lane-change assistance: CRI can be mapped to early warnings and interventions, enabling early deceleration or lane-change abort decisions before risk peaks; thresholds can be calibrated by speed and traffic density.
Trajectory selection: The CRI can be used as a score to re-rank or filter candidate trajectories, preferring options with lower combined collision probability and collision intensity over the 3–5 s horizon.
Dataset curation and evaluation: The CRI can label high-risk intervals and mine typical LCL/LCR cases in datasets such as HighD, supporting benchmarking and ablation studies without changing the model structure.

6.3. Limitations and Future Work

Scene scope and generalization: Current experiments focus on motorway-style multi-lane traffic using HighD. Future work will evaluate urban scenes and additional datasets to quantify the generalization of STGAT-EDGRU and CRI.
Graph neighborhood specification: Inter-vehicle interactions are defined by a fixed graph neighborhood. The approach performs stably under this setting; future work will investigate adaptive neighborhood definitions and compare alternative window sizes.
Real-time deployment: The full model may challenge embedded compute. Future work will profile latency and memory, explore lighter encoder–decoder settings, and perform closed-loop tests to verify end-to-end performance.

Author Contributions

Conceptualization, H.S., N.W. and X.W.; methodology, H.S.; software, H.S.; validation, H.S., N.W. and X.W.; formal analysis, H.S.; investigation, H.S.; resources, H.S.; data curation, H.S.; writing—original draft preparation, H.S., N.W. and X.W.; writing—review and editing, H.S. and N.W.; visualization, H.S.; supervision, X.W.; project administration, X.W.; funding acquisition, N.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science Foundation of Shanghai (Grant No. 25ZR1401360).

Data Availability Statement

The data presented in this study are from the openly available dataset HighD, which can be accessed via its official repository https://www.highd-dataset.com/ (accessed on 5 November 2024).

Acknowledgments

The authors sincerely acknowledge the contributions of the editor and anonymous reviewers. Their detailed comments and academic rigor have been crucial in enhancing the scientific integrity and presentation of this study.

Conflicts of Interest

Author Hongtao Su is affiliated with Shanghai Tongtao Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
STGAT-EDGRUSpatio-Temporal Graph Attention Transformer with Enhanced Gated Recurrent Unit
GATGraph Attention Network
RMSERoot Mean Square Error
CRICollision Risk Index
CAVsConnected Autonomous vehicles
TTCTime to Collision
RSSResponsibility-Sensitive Safety
GMMGaussian Mixture Model
GACNetGraph Attention Cooperative Network
RI-DiTRisk-Informed Diffusion Transformer
GANGenerative Adversarial Network
LSTMLong Short-Term Memory
CSPCommon Spatial Pattern
MDMMinimum Distance Boundary
GMUGated Multimodal Unit
CVConstant Velocity
CAConstant Acceleration
CTRAConstant Turn Rate and Acceleration
HMEHierarchical Mixture of Experts
RNNsRecurrent Neural Networks
GCNGraph Convolutional Network
TTBTime to Brake
PETPost-Encroachment Time
DRACDeceleration Rate to Avoid Crash
THWTime Headway
LKLane Keeping
LCLLeft Lane Change
LCRRight Lane Change
LCILane Change Intention generation phase
LCELane Change Execution phase
ITTCInverse Time to Collision
LCTTCLane-Change Time to Collision

References

  1. Katrakazas, C.; Quddus, M.; Chen, W.H. A new integrated collision risk assessment methodology for autonomous vehicles. Accid. Anal. Prev. 2019, 127, 61–79. [Google Scholar] [CrossRef]
  2. Li, Y.; Wu, D.; Lee, J.; Yang, M.; Shi, Y. Analysis of the transition condition of rear-end collisions using time-to-collision index and vehicle trajectory data. Accid. Anal. Prev. 2020, 144, 105676. [Google Scholar] [CrossRef]
  3. Li, Y.; Wu, D.; Chen, Q.; Lee, J.; Long, K. Exploring transition durations of rear-end collisions based on vehicle trajectory data: A survival modeling approach. Accid. Anal. Prev. 2021, 159, 106271. [Google Scholar] [CrossRef]
  4. Liu, S.; Wang, X.; Hassanin, O.; Xu, X.; Yang, M.; Hurwitz, D.; Wu, X. Calibration and evaluation of responsibility-sensitive safety (RSS) in automated vehicle performance during cut-in scenarios. Transp. Res. Part C Emerg. Technol. 2021, 125, 103037. [Google Scholar] [CrossRef]
  5. Chen, J.; Wang, K.; Xiong, Z. Collision probability prediction algorithm for cooperative overtaking based on TTC and conflict probability estimation method. Int. J. Veh. Des. 2018, 77, 195. [Google Scholar] [CrossRef]
  6. Shangguan, A.; Xie, G.; Wang, D.; Fei, R.; Hei, X.; Ji, W. Analyzing the collision probability of autonomous vehicles at crossroads. In Proceedings of the 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Liuzhou, China, 19–21 June 2020; IEEE: New York, NY, USA, 2020; pp. 30–35. [Google Scholar]
  7. Xie, G.; Zhang, X.; Gao, H.; Qian, L.; Wang, J.; Ozguner, U. Situational assessments based on uncertainty-risk awareness in complex traffic scenarios. Sustainability 2017, 9, 1582. [Google Scholar] [CrossRef]
  8. Huang, C.; Hang, P.; Hu, Z.; Lv, C. Collision-probability-aware human-machine cooperative planning for safe automated driving. IEEE Trans. Veh. Technol. 2021, 70, 9752–9763. [Google Scholar] [CrossRef]
  9. Anik, B.M.T.; Islam, Z.; Abdel-Aty, M. inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data. arXiv 2023. [Google Scholar] [CrossRef]
  10. Chen, J.; Liu, P.; Zhang, Z.; Zhao, H.; Ji, Y.; Pu, Z. Risk-Informed Diffusion Transformer for Long-Tail Trajectory Prediction in the Crash Scenario. arXiv 2024, arXiv:2401.08581. [Google Scholar]
  11. Chai, J.; Liu, J.; Huang, J.; Huang, C. GACNet: Interactive Prediction of Surrounding Vehicle Behavior under High Collision Risk. Artif. Intell. Sci. Eng. 2025, 7, 2401040. [Google Scholar] [CrossRef]
  12. Meng, D.; Xiao, W.; Zhang, L.; Zhang, Z.; Liu, Z. Vehicle Trajectory Prediction-Based Predictive Collision Risk Assessment for Autonomous Driving in Highway Scenarios. IEEE Trans. Intell. Transp. Syst. 2023, arXiv:2304.05610. [Google Scholar] [CrossRef]
  13. Lefèvre, S.; Vasquez, D.; Laugier, C. A Survey on Motion Prediction and Risk Assessment for Intelligent Vehicles. Robomech J. 2014, 1, 1. [Google Scholar] [CrossRef]
  14. Gao, J.; Mao, Y.C.; Li, Z.T. Trajectory prediction based on Gauss mixture time series model. Comput. Appl. 2019, 39, 2261–2270. [Google Scholar]
  15. Joseph, J.M.; Doshi-Velez, F.; Roy, N. A Bayesian Nonparametric Approach to Modeling Mobility Patterns. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI), Atlanta, GA, USA, 11–15 July 2010; AAAI Press: Washington, DC, USA, 2010; pp. 1587–1593. [Google Scholar]
  16. Ji, X.; Fei, C.; He, X.; Liu, Y. Driving intention recognition and vehicle trajectory prediction based on LSTM network. J. China Highw. Transp. 2019, 32, 34–42. [Google Scholar]
  17. Yang, S.; Chen, Y.; Cao, Y.; Wang, R.; Shi, R.; Lu, J. Lane change trajectory prediction based on spatio-temporal attention mechanism. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; IEEE: New York, NY, USA, 2022; pp. 2366–2371. [Google Scholar]
  18. Deo, N.; Trivedi, M.M. Convolutional Social Pooling for Vehicle Trajectory Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 1468–1476. [Google Scholar]
  19. Wang, M.; Cai, Y.; Wang, H.; Rao, Z.; Chen, L.; Li, Y. Vehicle trajectory prediction method based on graph convolution interaction network. Automot. Eng. 2024, 46, 1863–1872. [Google Scholar]
  20. Chen, S.; Piao, L.; Zang, X.; Luo, Q.; Li, J.; Yang, J.; Rong, J. Analyzing differences of highway lane-changing behavior using vehicle trajectory data. Phys. A Stat. Mech. Its Appl. 2023, 624, 128980. [Google Scholar] [CrossRef]
  21. Wen, H.; Zhang, X.; Huang, J.; Xu, P. Prediction of Intelligent Vehicle Trajectories Considering Dynamic Interactions. J. Transp. Syst. Eng. Inf. 2024, 24, 60–68. [Google Scholar]
  22. Chen, X.; Wu, S.; Shi, C.; Huang, Y.; Yang, Y.; Ke, R. Sensing Data Supported Traffic Flow Prediction via Denoising Schemes ANN: A Comparison. IEEE Sens. J. 2020, 20, 14317–14328. [Google Scholar] [CrossRef]
  23. Xiong, L.; Wu, J.; Xing, X.; Chen, J. A review of autonomous vehicle driving risk assessment methods. J. Automot. Eng. 2024, 14, 745–759. [Google Scholar]
  24. Noh, S.; An, K. Decision-making framework for automated driving in highway environments. IEEE Trans. Intell. Transp. Syst. 2017, 19, 58–71. [Google Scholar] [CrossRef]
  25. Chu, K.; Lee, M.; Sunwoo, M. Local path planning for off-road autonomous driving with avoidance of static obstacles. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1599–1616. [Google Scholar] [CrossRef]
  26. Kaempchen, N.; Schiele, B.; Dietmayer, K. Situation assessment of an autonomous emergency brake for arbitrary vehicle-to-vehicle collision scenarios. IEEE Trans. Intell. Transp. Syst. 2009, 10, 678–687. [Google Scholar] [CrossRef]
  27. Ferguson, D.; Darms, M.; Urmson, C.; Kolski, S. Detection, prediction, and avoidance of dynamic obstacles in urban environments. In Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; IEEE: New York, NY, USA, 2008; pp. 1149–1154. [Google Scholar]
  28. Yang, Z.; Shi, C.; Zheng, Y.; Gu, S.; Chen, F. A study on a vehicle semi-active suspension control system based on road elevation identification. PLoS ONE 2022, 17, e0269406. [Google Scholar] [CrossRef] [PubMed]
  29. Tan, H.S.; Huang, J. DGPS-based vehicle-to-vehicle cooperative collision warning: Engineering feasibility viewpoints. IEEE Trans. Intell. Transp. Syst. 2006, 7, 415–428. [Google Scholar] [CrossRef]
  30. Hillenbrand, J.; Spieker, A.M.; Kroschel, K. A multilevel collision mitigation approach—Its situation assessment, decision making, and performance tradeoffs. IEEE Trans. Intell. Transp. Syst. 2006, 7, 528–540. [Google Scholar] [CrossRef]
  31. Noh, S. Decision-making framework for autonomous driving at road intersections: Safeguarding against collision, overly conservative behavior, and violation vehicles. IEEE Trans. Ind. Electron. 2018, 66, 3275–3286. [Google Scholar] [CrossRef]
  32. Li, G.; Yang, Y.; Zhang, T.; Qu, X.; Cao, D.; Cheng, B.; Li, K. Risk assessment-based collision avoidance decision-making for autonomous vehicles in multi-scenarios. Transp. Res. Part C Emerg. Technol. 2021, 122, 102820. [Google Scholar] [CrossRef]
  33. Vogel, K. A Comparison of Headway and Time to Collision as Safety Indicators. Accid. Anal. Prev. 2003, 35, 427–433. [Google Scholar] [CrossRef]
  34. Kim, J.; Kum, D. Collision risk assessment algorithm via lane-based probabilistic motion prediction of surrounding vehicles. IEEE Trans. Intell. Transp. Syst. 2017, 19, 2965–2976. [Google Scholar] [CrossRef]
  35. de Campos, G.R.; Runarsson, A.H.; Granum, F.; Falcone, P.; Alenljung, K. Collision avoidance at intersections: A probabilistic threat-assessment and decision-making system for safety interventions. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; IEEE: New York, NY, USA, 2014; pp. 649–654. [Google Scholar]
  36. Wang, J.; Wu, J.; Zheng, X.; Ni, D.; Li, K. Driving safety field theory modeling and its application in pre-collision warning system. Transp. Res. Part C Emerg. Technol. 2016, 72, 306–324. [Google Scholar] [CrossRef]
  37. Tian, Y.; Pei, H.; Yan, S.; Zhang, Y. Extended driving risk field model for i-VICS and its application. J. Tsinghua Univ. (Sci. Technol.) 2022, 62, 447–457. [Google Scholar]
  38. Lambert, A.; Gruyer, D.; Saint Pierre, G. A fast Monte Carlo algorithm for collision probability estimation. In Proceedings of the 2008 10th International Conference on Control, Automation, Robotics and Vision, Hanoi, Vietnam, 17–20 December 2008; IEEE: New York, NY, USA, 2008; pp. 406–411. [Google Scholar]
  39. Ammoun, S.; Nashashibi, F. Real-time trajectory prediction for collision risk estimation between vehicles. In Proceedings of the 2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 27–29 August 2009; IEEE: New York, NY, USA, 2009; pp. 417–422. [Google Scholar]
  40. Houénou, A.; Bonnifait, P.; Cherfaoui, V. Risk assessment for collision avoidance systems. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; IEEE: New York, NY, USA, 2014; pp. 386–391. [Google Scholar]
  41. Lee, K.; Peng, H. Evaluation of automotive forward collision warning and collision avoidance algorithms. Veh. Syst. Dyn. 2005, 43, 735–751. [Google Scholar] [CrossRef]
  42. Arevalo, J.; Solorio, T.; Manuel, M.; González, F.A. Gated multimodal networks. Neural Comput. Appl. 2020, 32, 10209–10228. [Google Scholar] [CrossRef]
  43. Jia, S.; Xu, J.; Wang, S.; Liu, X.; Li, G. Connected multi-vehicle crash risk assessment considering probability and intensity. PLoS ONE 2025, 20, e0313317. [Google Scholar] [CrossRef] [PubMed]
  44. Yang, L.; Zhang, J.; Lyu, N.; Zhao, Q. Predicting lane change maneuver and associated collision risks based on multi-task learning. Accid. Anal. Prev. 2024, 209, 107830. [Google Scholar] [CrossRef] [PubMed]
  45. Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 961–971. [Google Scholar]
  46. Chen, W.; Zhang, Y.; Han, X. Vehicle Trajectory Prediction Model Based on Transformer and Graph Attention Network. J. Jilin Univ. (Eng. Ed.), 2025; Online ahead of print. [Google Scholar] [CrossRef]
  47. Laureshyn, A.; Svensson, Å.; Hydén, C. Evaluation of traffic safety, based on micro-level behavioural data: Theoretical framework and first implementation. Accid. Anal. Prev. 2010, 42, 1637–1646. [Google Scholar] [CrossRef]
  48. Rajaram, V.; Subramanian, S.C. Heavy vehicle collision avoidance control in heterogeneous traffic using varying time headway. Mechatronics 2018, 50, 328–340. [Google Scholar] [CrossRef]
  49. Xie, J.; Xia, Y.; Qian, Z.; Liu, B.; Qin, Y. Lane change risk warning for interweaving zones considering intelligent connected vehicle neighbor information. J. Transp. Eng. 2023, 23, 287–300. [Google Scholar]
Figure 1. STGAT-EDGRU trajectory prediction model architecture.
Figure 1. STGAT-EDGRU trajectory prediction model architecture.
Electronics 14 03388 g001
Figure 2. Schematic diagram of GMU structure.
Figure 2. Schematic diagram of GMU structure.
Electronics 14 03388 g002
Figure 3. Sample unified coordinate system.
Figure 3. Sample unified coordinate system.
Electronics 14 03388 g003
Figure 4. Visualization of multi-vehicle trajectory prediction results. Historical segments are shown as light lines, ground truth as dashed lines, predictions as solid lines, and sampled points as semi-transparent markers.
Figure 4. Visualization of multi-vehicle trajectory prediction results. Historical segments are shown as light lines, ground truth as dashed lines, predictions as solid lines, and sampled points as semi-transparent markers.
Electronics 14 03388 g004
Figure 5. Instance of high-risk lane-changing scenario. Lane-change snapshots (t = 4.0–9.4 s). Colored boxes denote vehicles (color indicates ID only); the labeled 986 is the main vehicle, others are surrounding vehicles.
Figure 5. Instance of high-risk lane-changing scenario. Lane-change snapshots (t = 4.0–9.4 s). Colored boxes denote vehicles (color indicates ID only); the labeled 986 is the main vehicle, others are surrounding vehicles.
Electronics 14 03388 g005
Figure 6. Comparison of trends in different risk assessment indicators over time.
Figure 6. Comparison of trends in different risk assessment indicators over time.
Electronics 14 03388 g006
Table 1. Risk assessment model.
Table 1. Risk assessment model.
ReferenceTrajectory
Prediction
Collision
Probability
Collision
Intensity
Fusing
Spatio-Temporal
Features
Multi-Vehicle
Interaction Modeling
Method Overview
[2,3] TTC countdown risk function
[4] RSS rule model
[5] TTC mapping to collision probability
[6] LSTM trajectory prediction
Monte Carlo simulation
[8] GMM trajectory prediction
fuzzy logic
[7] Multimodal trajectory generation network
[9] Transformer trajectory prediction
collision probability
[10] RI-DiT trajectory prediction
TTC risk characteristics
[11] GAN + GAT trajectory prediction
Conflict analysis module
[12]LSTM + CSP + GAT trajectory prediction
TTC + MDM continuous risk function
This studyTransformer + GAT trajectory prediction
CRI Collision Risk Index
Table 2. Comparison of RMSE for different prediction models.
Table 2. Comparison of RMSE for different prediction models.
ModelRMSE(/m)
1 s2 s3 s4 s5 s
S-LSTM0.310.701.292.313.47
CS-LSTM0.270.611.242.193.30
Attention-LSTM0.240.521.051.762.63
GAT-LSTM0.190.370.511.041.87
GAT-Transformer0.120.200.350.731.38
STGAT-EDGRU0.140.210.330.611.12
Table 3. Ablation experiment results of STGAT-EDGRU.
Table 3. Ablation experiment results of STGAT-EDGRU.
ModelRMSE(/m)
1 s2 s3 s4 s5 s
STGAT-EDGRU0.140.210.330.611.12
M10.220.360.490.831.65
M20.310.621.272.303.51
M30.200.350.641.081.99
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, H.; Wang, N.; Wang, X. Collision Risk Assessment of Lane-Changing Vehicles Based on Spatio-Temporal Feature Fusion Trajectory Prediction. Electronics 2025, 14, 3388. https://doi.org/10.3390/electronics14173388

AMA Style

Su H, Wang N, Wang X. Collision Risk Assessment of Lane-Changing Vehicles Based on Spatio-Temporal Feature Fusion Trajectory Prediction. Electronics. 2025; 14(17):3388. https://doi.org/10.3390/electronics14173388

Chicago/Turabian Style

Su, Hongtao, Ning Wang, and Xiangmin Wang. 2025. "Collision Risk Assessment of Lane-Changing Vehicles Based on Spatio-Temporal Feature Fusion Trajectory Prediction" Electronics 14, no. 17: 3388. https://doi.org/10.3390/electronics14173388

APA Style

Su, H., Wang, N., & Wang, X. (2025). Collision Risk Assessment of Lane-Changing Vehicles Based on Spatio-Temporal Feature Fusion Trajectory Prediction. Electronics, 14(17), 3388. https://doi.org/10.3390/electronics14173388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop