You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

23 August 2023

Holistic Spatio-Temporal Graph Attention for Trajectory Prediction in Vehicle–Pedestrian Interactions

and
Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI 48128, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Navigation Filters for Autonomous Vehicles

Abstract

Ensuring that intelligent vehicles do not cause fatal collisions remains a persistent challenge due to pedestrians’ unpredictable movements and behavior. The potential for risky situations or collisions arising from even minor misunderstandings in vehicle–pedestrian interactions is a cause for great concern. Considerable research has been dedicated to the advancement of predictive models for pedestrian behavior through trajectory prediction, as well as the exploration of the intricate dynamics of vehicle–pedestrian interactions. However, it is important to note that these studies have certain limitations. In this paper, we propose a novel graph-based trajectory prediction model for vehicle–pedestrian interactions called Holistic Spatio-Temporal Graph Attention (HSTGA) to address these limitations. HSTGA first extracts vehicle–pedestrian interaction spatial features using a multi-layer perceptron (MLP) sub-network and max pooling. Then, the vehicle–pedestrian interaction features are aggregated with the spatial features of pedestrians and vehicles to be fed into the LSTM. The LSTM is modified to learn the vehicle–pedestrian interactions adaptively. Moreover, HSTGA models temporal interactions using an additional LSTM. Then, it models the spatial interactions among pedestrians and between pedestrians and vehicles using graph attention networks (GATs) to combine the hidden states of the LSTMs. We evaluate the performance of HSTGA on three different scenario datasets, including complex unsignalized roundabouts with no crosswalks and unsignalized intersections. The results show that HSTGA outperforms several state-of-the-art methods in predicting linear, curvilinear, and piece-wise linear trajectories of vehicles and pedestrians. Our approach provides a more comprehensive understanding of social interactions, enabling more accurate trajectory prediction for safe vehicle navigation.

1. Introduction

Driving in an urban environment (Figure 1) is a challenging task that is associated with heavy mixed traffic flows. In a mixed traffic flow, vehicles and vulnerable road users, such as pedestrians, bicycles, and tricycles, share the same road. As a result, vehicle–pedestrian conflicts, vehicle–vehicle conflicts, and many other critical interactions regularly occur. According to U.S. National Highway Traffic Safety Administration (NHTSA) data, in 2020, 6516 pedestrians died in traffic accidents, and almost 55,000 pedestrians were injured nationwide [1].
The conflict between pedestrians and vehicles (Figure 2) is an important safety issue, not only in the USA but everywhere in the world. This issue is even worse in developing countries. Road accidents claim over 1.3 million lives annually, which translates to more than two lives lost every minute [2]. Shockingly, around ninety percent of these tragedies happen in countries with limited resources [2]. The sad truth is that road accidents are still the primary reason for the loss of young lives, specifically those aged 5 to 29, on a global scale [2]. For instance, in the United States, car accidents are unequivocally recognized as a principal catalyst of mortality [3]. In 2020, almost 40,000 individuals died as a direct consequence of car accidents [3]. Moreover, a considerable number, roughly 2.1 million individuals, were taken to hospital due to injuries sustained in those traffic accidents [3]. Pedestrians are among the most vulnerable road users (VRUs) because they lack the physical protection to reduce accident consequences [4]. It is not surprising that pedestrian conflicts with vehicles are most problematic in urban areas, since pedestrian activity is higher there. The problem of collisions between vehicles and pedestrians has been the subject of deep study for a long time [5,6,7,8,9,10,11,12].
Figure 1. Urban environment scenarios [13,14].
Figure 2. An example of vehicle–pedestrian conflicts.
The meaning of traffic conflict varies among research publications. In [15], the authors noted that operational definitions of traffic conflict could generally be categorized into two types: those based on evasive actions and those based on spatio-temporal proximity. A situation involving two or more road users, in which one user’s activity induces another user to perform an evasive move to avoid a collision, is characterized as an evasive action-based traffic conflict [16]. Pedestrian–vehicle conflicts can occur when an incoming vehicle must quickly stop or swerve to avoid a pedestrian, or when a pedestrian must take evasive action to prevent a collision. This term focuses on either the driver’s or pedestrian’s evasive actions. In contrast, proximity-based traffic conflicts are characterized as a scenario in which two or more road users are so close in space and time that there is a chance of an accident if their movements do not alter [17]. This concept suggests that the likelihood of accidents increases when road users are in close proximity to each other. Proximity can be measured using either time or space, and this conceptual definition can be put into practice by utilizing traffic detectors to measure the dimensions of time and space [18].
Numerous research studies have been conducted on conflicts between pedestrians and vehicles. However, these studies have primarily focused on examining the factors that influence such conflicts, including personal characteristics, traffic conditions, and environmental factors at crosswalks [18]. From a personal characteristics standpoint, factors such as age, gender, and disability have been investigated. For instance, the authors of [19] reported that elderly pedestrians have greater vulnerability while crossing roads as a result of a decrease in their walking capabilities. Yagil [20] identified a tendency among men to exhibit lower awareness compared to women regarding potential conflicts with vehicles when crossing roads. Tom and Granié [21] conducted an investigation focusing on gender differences in pedestrian adherence to traffic regulations, considering both signalized and unsignalized intersections. Additionally, several studies have explored factors related to traffic conditions, including variables like traffic volume and vehicle speed. Cheng [22] proposed that a high vehicle volume can lead to more severe pedestrian–vehicle conflicts because pedestrians’ protracted waiting times exceed their tolerance limits, whereas a high vehicle speed increases the chance of pedestrian–vehicle crashes. Cheng developed comprehensive models aimed at exploring the intricate associations among various variables, including pedestrian waiting time, vehicle volume, and so on. In a related study, Himanen and Kulmala [23] meticulously examined a substantial dataset consisting of 799 pedestrian–vehicle conflict incidents, ultimately identifying the most pertinent explanatory factors. These factors encompassed the distance of pedestrians from the curb, the scale of the urban environment, the number of individuals crossing simultaneously, vehicle speed, and vehicle platoon size. Additionally, researchers have extensively investigated environmental factors that contribute to pedestrian–vehicle conflicts, including city size, signal settings, road width, and lane delineation.
In the realm of autonomous vehicles (AVs), the ability to anticipate the movement of pedestrians is of paramount significance, and the consequences of neglecting it could be catastrophic. This prediction enables AVs to chart safe routes while confidently engaging in related driving tasks. Unfortunately, the intricate nature of pedestrian motion creates significant challenges for long-term trajectory prediction. It is worth noting that pedestrians’ movements are slower than those of vehicles but can change rapidly due to the complexities of human behavior. Furthermore, a pedestrian’s gait can be subjective, depending on various factors such as personal characteristics, walking objectives, and the ever-changing environment. In this dissertation, we focus on predicting the trajectory of pedestrians when interacting with other pedestrians and vehicles. Trajectory prediction is crucial for autonomous vehicles because it allows them to predict the movements of the surrounding road users several seconds into the future and make the right decision to avoid any critical conflicts. Achieving precise trajectory predictions requires the development of efficient algorithms that can accurately model and replicate real-world scenarios. Consequently, the design of such algorithms represents the most critical aspect of the task of accurate trajectory prediction.
To achieve precise pedestrian trajectory prediction, it is imperative to obtain accurate measurements. This task, however, is quite difficult due to a number of factors that can introduce inaccuracies in the collected data. These factors include occlusions caused by large vehicles and illumination issues like shadows and glare [24,25]. Additionally, pedestrians are physically smaller and lighter than most objects in their surroundings, and they can suddenly change their speed and direction, which further complicates trajectory prediction. This dissertation focuses on this challenging problem and aims to develop an efficient method for predicting pedestrian behavior via trajectory prediction. Accurate trajectory prediction assists autonomous vehicles in collision avoidance and can also be employed in smart intersections. The proposed method can also be extended to encompass the trajectory prediction of other vulnerable road users, such as bicycles, scooters, and others.
In recent years, there has been an increasing interest in developing LSTM-based methods for capturing the dynamic interactions of pedestrians. These methods utilize pooling and attention mechanisms to represent the latent motion dynamics of pedestrians in local neighborhoods or the whole scene. While pooling collects the motion dynamics of nearby pedestrians, attention assigns different importance to each pedestrian to better understand crowd behaviors based on spatial interactions. However, the temporal continuity of interactions in the crowd has been neglected in previous works. Pedestrians need to consider others’ historical movements to determine their current motion behavior and avoid potential collisions in the future, making temporal correlations of interactions important. Many other studies on predicting pedestrian trajectories have been conducted. However, most of these studies fail to take into account one of the most important factors influencing pedestrian behavior: the presence of multiple surrounding vehicles and the interaction between these vehicles and pedestrians. Although some recent studies, such as the one by Eiffert et al. [26], have attempted to incorporate such influences, they only considered a single vehicle in the presence of pedestrians. Furthermore, previous research on predicting the trajectories of heterogeneous traffic agents, such as pedestrians, has tended to focus on vehicles or motorcycles [27,28,29,30]. Additionally, it is challenging to evaluate the accuracy of pedestrian trajectory predictions due to the absence of datasets containing annotations for both pedestrian crowds and vehicles. The widely used ETH [31] and UCY [32] datasets, for example, do not include annotations for automobiles and are hence unsuitable for evaluating this task. As a result, there is a need for more research that considers the impact of various surrounding vehicles and pedestrians on pedestrian behavior, captures the spatio-temporal interactions between them, and develops more accurate algorithms for this task. Moreover, diverse datasets that contain many vehicles and pedestrians should be used to accurately investigate pedestrian trajectory prediction. To address these limitations, in this dissertation, we build a novel spatio-temporal graph attention network called Holistic Spatio-Temporal Graph Attention (HSTGA) for trajectory prediction in vehicle–pedestrian interactions, where the spatial and temporal interactions among pedestrians, as well as between pedestrians and vehicles, are encoded. Moreover, we use multiple datasets, including VCI-DUT [33], rounD [34], and uniD [35], which contain data on both pedestrians and vehicles. This enables the modeling of the influence of pedestrian–vehicle conflict on the accurate prediction of pedestrian (and vehicle) trajectories. This paper makes the following four contributions:
  • We develop a novel encoder–decoder interaction model called Holistic Spatio-Temporal Graph Attention (HSTGA) for trajectory prediction in vehicle–pedestrian interactions. HSTGA models pedestrian–vehicle interactions in non-signalized and non-crosswalk scenarios using a trajectory-based model for long-horizon pedestrian and vehicle trajectory prediction.
  • We develop a vehicle–pedestrian interaction feature extraction model using a multi-layer perceptron (MLP) sub-network and max pooling.
  • We develop an LSTM network to adaptively learn the vehicle–pedestrian spatial interaction.
  • We predict pedestrian and vehicle trajectories by modeling the spatio-temporal interactions between pedestrian–pedestrian, vehicle–vehicle, and vehicle–pedestrian using only the historical trajectories of pedestrians and vehicles. This approach reduces the information requirements compared to other learning-based methods.

3. Problem Definition

Assume that prior image processing has already been applied to a raw video feed to extract the position and pose of individual pedestrians and vehicles in each video frame. We assume that there are N pedestrians and M vehicles present in a video frame, represented by p 1 , p 2 , , p N for pedestrians and v 1 , v 2 , , v M for vehicles. The state of pedestrians p i ( i [ 1 , N ] ) and vehicles v j ( j [ 1 , M ] ) at time step t is denoted as follows:
A o b s t = [ P 1 t , P 2 t , , P N t ]
B o b s t = [ V 1 t , V 2 t , , V M t ]
where P i t and V j t are the lateral and longitudinal positions with the heading angles of pedestrian i and vehicle j, respectively, at time step t. The number of pedestrians i and vehicles j are variables in Equations (1) and (2) because different datasets/scenarios are used to evaluate this study. Equations (1) and (2) are the observed trajectories that are used as inputs to our deep learning model. P i t and V j t are expressed as follows:
P i t = ( x i t , y i t , θ i t )
V j t = ( x j t , y j t , θ j t )
In Equations (3) and (4), x i t , x j t , y i t , y j t , θ i t , and θ j t are the position coordinates and the heading angles of pedestrians and vehicles at each time step t. The positions of the vehicles and pedestrians are relative to the world space. Using the observed trajectories A o b s t and B o b s t in the past m frames at time steps t = 1 , , T o b s , our goal is to predict the future trajectories A f t and B f t several seconds into the future h frames at time steps t = T o b s + 1 , , T f as follows:
A f t = [ P 1 t + h , P 2 t + h , , P N t + h ]
B f t = [ V 1 t + h , V 2 t + h , , V M t + h ]

4. Methodology

This section provides a general overview of the key components and architectural design of our multi-trajectory prediction model (HSTGA). We also delve into the specifics of each module within the framework.

4.1. HSTGA Overview

In order to predict the trajectories and interactions of pedestrians and vehicles within a given scene, a vehicle–pedestrian feature extraction model and a graph attention network (GAT) are employed in conjunction with two separate Long Short-Term Memory (LSTM) models, as shown in Figure 3. The first LSTM is referred to as SLSTM, where the ’S’ designates the spatial stage. The proposed SLSTM model is detailed in Section 4.3.1. It is important to note that the SLSTM discussed here is distinct from the S-LSTM introduced in [50]. This LSTM handles the individual trajectories of both vehicles and pedestrians. The GAT, situated between the SLSTM and the second model known as TLSTM, is responsible for capturing interactions between the two objects within the scene. Conversely, Temporal Long Short-Term Memory (TLSTM), where the ’T’ represents the temporal stage, is specifically designed to capture temporal interactions between vehicles and pedestrians. Both models, SLSTM and TLSTM, share the same architecture, as detailed in Section 4.3.1.
Figure 3. Illustration of the vehicle–pedestrian interaction model.

4.2. Vehicle–Pedestrian Interaction (VPI) Feature Extraction

The interaction between vehicles and pedestrians is a significant factor in predicting their future trajectories. We build upon the work of [59,60,137,138] and implement a VPI cell into the LSTM to improve trajectory prediction by encoding vehicle–pedestrian interaction features into the individual agent LSTM. The process of extracting features related to vehicle–pedestrian interactions involves two steps, and each step has two stages, as depicted in Figure 4.
Figure 4. Vehicle–pedestrian interaction feature extraction model.
The first step extracts the vehicle–pedestrian interaction feature when considering the vehicle’s spatial influence on pedestrians. This step’s feature is then used with the pedestrian’s motion state feature (spatial feature) and is fed to the SLSTM for each pedestrian. In the first stage of this step, the interaction weights between the vehicle and pedestrian are learned using their calculated relative positions. Next, a separate embedding module is used to extract the movement state of the vehicle. Finally, the two stages are combined to obtain the features related to vehicle–pedestrian interaction, which are then fed to the SLSTM for trajectory prediction. On the other hand, the second step extracts the vehicle–pedestrian interaction feature when considering the pedestrian’s spatial influence on vehicles. The resulting feature from this step is then fed with the vehicle’s motion state (spatial feature) to the SLSTM. Stages one and two of both steps are discussed below. In stage one, the vehicle–pedestrian interaction attention weights v p i j t between the ith pedestrian and the jth vehicle are calculated using max pooling, as shown in Equation (7).
v p i j t = P o o l i n g { M L P ( φ ( d i j t ; W d ) ; W a ) } , i { 1 , , N } , j { 1 , , M }
Here, Pooling ( · ) is the pooling layer, and M L P ( · ) is the multi-layer perceptron sub-network with weight W a . Moreover, ϕ ( · ) is the embedding layer with weights W d . Finally, the relative position ( d i j t ) between the pedestrian and the vehicle is then calculated. Equations (3) and (4) are used to calculate the relative position using the x and y coordinates and the heading angle θ , as shown in Equation (8).
d i j t = ( x j v e h , t x i p e d , t , y j v e h , t y i p e d , t , θ j v e h , t θ i p e d , t ) , i { 1 , , N } , j { 1 , , M }
To accurately predict pedestrian trajectories, we must consider the motion state of the jth vehicle and then aggregate the vehicle–pedestrian interaction weights v p i j t and the vehicle motion states m j v e h , t of the vehicle to obtain the vehicle–pedestrian interaction features or vehicle impact. We calculate the vehicle’s motion state using the equation below:
m j v e h _ p e d , t = ϕ ( Δ V j t ; W m v e h _ p e d ) , j { 1 , , M }
In Equation (9), ϕ ( · ) represents the embedding with weights W m v e h _ p e d , and Δ V j t is the relative position of the jth vehicle between the current and last time steps. The final step is aggregating the vehicle–pedestrian interaction weights v p i j t and the vehicle motion states m j v e h _ p e d , t as follows:
v i t = A G G V P I ( m j v e h _ p e d , t , v p i j t ) , i { 1 , , N } , j { 1 , , M }
Equation (10) is the vehicle–pedestrian interaction feature when considering the vehicle’s influence. This feature is then aggregated with the motion state of the individual pedestrian and fed to the SLSTM. For the vehicle–pedestrian interaction feature, when considering the pedestrian’s influence, the motion state of the pedestrian m i p e d _ v e h , t should be calculated and then aggregated with the vehicle–pedestrian interaction weights v p i j t to obtain the following equation:
p j t = A G G V P I ( m i p e d _ v e h , t , v p i j t ) , i { 1 , , N } , j { 1 , , M }
p j t is then aggregated with the motion state of the individual vehicle and fed to the SLSTM network. In Equations (10) and (11), A G G V P I represents the aggregation module stage, as shown in Figure 4.

4.3. Trajectory Encoding

LSTMs have been widely used to capture the motion state of pedestrians [50,56,57,60,139]. We build upon this prior work. The way an intelligent vehicle navigates through a crowded pedestrian area is, in general, similar to how human drivers do. The vehicle must consider the movements of all surrounding pedestrians and their distances from the vehicle’s trajectory. The inherent relationship between the vehicle’s movement and its proximity to the target pedestrian is a crucial factor. Moreover, the pedestrian’s motion contributes to changing the gap between them and the vehicle. This significant observation indirectly suggests that both the vehicle’s trajectory and the gap between the vehicle and the pedestrian have a significant impact on predicting the pedestrian’s trajectory. Furthermore, the pedestrian’s trajectory and their distance from the vehicle intricately affect the vehicle’s future maneuvers. Moreover, the precise prediction of forthcoming trajectories based solely on past trajectories poses a formidable challenge, primarily due to the inherent uncertainty that accompanies future trajectories, even when past trajectories are indistinguishable. To overcome this challenge, supplementary information cues, such as pedestrian intention, vehicle speed, and global scene dynamics, play a critical role in advancing the accuracy of future trajectory prediction, as these cues exhibit strong correlations with predicting pedestrian trajectories.
Expanding on this insightful understanding and drawing inspiration from comprehensive studies [59,60,137,138], we propose the integration of an additional memory cell and dynamic rescaling of the output gate in response to changes in vehicle–pedestrian spatial interaction. We have developed a concept termed the “vehicle–pedestrian interaction (VPI) cell” to further augment the intrinsic interactions among these cues. This thoughtfully designed component aims to unravel the complex interplay between the spatial characteristics of the vehicle, the resulting changes in the pedestrian’s trajectory, and the interaction between the pedestrian’s spatial attributes and subsequent adjustments in the vehicle’s course. In our work, we propose utilizing an individual LSTM for every pedestrian and each vehicle. The architectures of the proposed Long Short-Term Memory (LSTM) and a conventional LSTM are compared in Figure 5. The initial input to the VPI cell varies based on whether the LSTM is focused on encoding the pedestrian’s or the vehicle’s trajectory. In the case of the LSTM designed for the pedestrian’s trajectory, the VPI cell’s initial input comprises a concatenation that involves gathering all the relative positions of the jth vehicle between the current and preceding time steps, in addition to the relative distance between the pedestrian and the vehicle as observed across frames. For a more comprehensive understanding, refer to Section 4.2 and Figure 4. With each successive time step, a new vehicle state and vehicle–pedestrian spatial features (including relative distance) are computed. Subsequently, the VPI component seamlessly integrates into the LSTM’s output gate. This strategic fusion facilitates the dynamic adjustment of the output responses, adeptly capturing alterations in the encoding of the pedestrian’s trajectory. Ultimately, the refined LSTM output ( h t ) collaborates with the VPI state ( v i t or p j t ), as elaborated upon in Section 4.2. These merged states then proceed to the neuron of the subsequent step, ensuring the seamless continuity of information flow.
Figure 5. (a) The structure of a standard LSTM neuron. (b) The structure of our proposed LSTM.

4.3.1. Pedestrian Trajectory Encoding

The implementation comprises two steps, as follows:
  • We first calculate each pedestrian’s relative position and pose to the previous time step.
    Δ x i t = x i t x i t 1
    Δ y i t = y i t y i t 1
    For the relative pose:
    Δ θ i t = θ i t θ i t 1
  • The calculated relative positions and pose are then embedded into a fixed-length vector e i t for every time step, which is called the spatial feature of the pedestrian.
    e i p e d , t = ϕ ( Δ x i t , Δ y i t , Δ θ i t ; W e p e d )
    where ϕ ( · ) is an embedding function, and W e is the embedding weight. This vector e i p e d , t is the input to the SLSTM cell. Then, this vector is aggregated with the vehicle–pedestrian interaction feature v i t from Equation (10) and then fed to the SLSTM hidden state.
    m i p e d , t = S L S T M ( m i t 1 , e i t , v i t 1 ; W m p e d )
    where m i p e d , t is the hidden state of the SLSTM at time step t, and W m p e d is the weight of the SLSTM cell.

4.3.2. Vehicle Trajectory Encoding

The methodology for encoding vehicle trajectories is identical to that of pedestrian trajectories. The following two steps are followed:
  • We first calculate each vehicle’s relative position and pose to the previous time step.
    Δ x j t = x j t x j t 1
    Δ y j t = y j t y j t 1
    For the relative pose:
    Δ θ j t = θ j t θ j t 1
  • The calculated relative positions and pose are then embedded into a fixed-length vector e j v e h , t for every time step, which is called the spatial feature of the vehicle.
    e j v e h , t = ϕ ( Δ x j t , Δ y j t , Δ θ j t ; W e v e h )
    where ϕ ( · ) is an embedding function, and W e is the embedding weight. This vector e j v e h , t is the input to the SLSTM cell. Then, this vector is aggregated with the vehicle–pedestrian interaction feature p j t from Equation (11) and then fed to the SLSTM hidden state.
    m j v e h , t = S L S T M ( m j t 1 , e j t , p j t 1 ; W m v e h )
    where m j v e h , t is the hidden state of the SLSTM at time step t, and W m v e h is the weight of the SLSTM cell.

4.4. Interaction Modeling and Prediction

Employing one LSTM with the VPI feature extraction model for each pedestrian and vehicle trajectory fails to capture the intricate and temporal interactions between humans and vehicles. To address this shortcoming and enable more information sharing across different pedestrians and vehicles in crowded environments, we propose treating pedestrians and vehicles as nodes on a directed graph and utilizing the recent advances in graph neural networks (GNNs). By assigning varying levels of importance to different nodes, graph attention network (GAT) models enable us to aggregate information from neighbors. Thus, we adopt a GAT as the sharing mechanism in our approach. As demonstrated in Figure 6, pedestrians and vehicles are represented as nodes in the graph, and the GAT serves as the sharing mechanism. Moreover, Figure 5 presents an illustration of the expected ways pedestrians and vehicles interact when sharing road spaces. In situations where a pedestrian or vehicle is trying to move through an environment with other moving pedestrians and vehicles, it becomes crucial for the pedestrian or vehicle to take into account all the other surrounding objects. This consideration is necessary to ensure safe movement and make correct decisions about how to effectively navigate within that specific situation.
Figure 6. Interaction as a directed graph. Pedestrians and vehicles are nodes. The edges are the interactions between these objects.
A graph attention network (GAT) is designed to process graph-structured data and compute node features by attending to the features of their neighboring nodes based on a self-attention mechanism [140]. Multiple graph attention layers can be stacked to form a complete GAT model [140]. A single graph attention layer is illustrated in Figure 7.
Figure 7. Graph attention network [140].
The input of the graph attention layer is h = h 1 , h 2 , , h N O , where h i R F , N O is the number of nodes, and F is the feature dimension of each node.
The output is h = h 1 , h 2 , , h N O , where h i R F . F and F can be unequal.
In the observation period of m i p e d , t where t = 1 , , T o b s is fed to the graph attention layer. The coefficients in the attention mechanism of the node pair ( i , j ) can be computed by:
α i j t = ( e x p ( L e a k y R e L U ( a T [ W m i p e d , t W m j v e h , t ] ) ) ) k NO e x p ( L e a k y R e L U ( a T [ W m i p e d , t W m j v e h , t ] ) )
where ‖ is the concatenation operation, { · } T represents transposition, α i j t is the attention coefficient of node j to i at time step t, and NO represents the neighbors of node i on the graph. The weight matrix W R F × F is an important element in Equation (22). It represents the applied shared linear transformation of every node. The dimension of the weight matrix W is based on the dimension of the input and output of the graph attention network. F is the dimension of m i p e d , t , and F is the dimension of the output. The vector a R 2 F in Equation (22) is defined as the weight vector of a single-layer feedforward neural network. The softmax with LeakyReLU is utilized to normalize the weight vector a. Equation (23) defines the output of one graph attention layer for node i at time step t after normalizing the attention coefficients.
m ^ i p e d , t = σ ( j NO α i j t W m j v e h , t )
In Equation (23), σ is the nonlinear function. Moreover, W is the weight matrix of a shared linear transformation from Equation (22). m ^ i p e d , t , obtained following the application of two graph attention layers, incorporates the collective internal state of pedestrian i at time step t.
Moreover, the output of one graph attention layer for node j at t is given by:
m ^ j v e h , t = σ ( j NO α i j t W m j v e h , t )
To capture the temporal correlations between interactions, another LSTM, called TLSTM, is used, as shown below:
g i p e d , t = T L S T M ( g i p e d , t 1 , m ^ i p e d , t , W g p e d ) )
g j v e h , t = T L S T M ( g j v e h , t 1 , m ^ i v e h , t , W g v e h ) )
where m ^ i p e d , t and m ^ j v e h , t are from Equations (23) and (24). W g p e d and W g v e h are the TLSTM weights for the pedestrian and vehicle, respectively, and are shared among all the sequences. In our proposed method, SLSTM is used to model the motion pattern of each pedestrian and vehicle in the scene. Moreover, another LSTM, called TLSTM, is used to model the temporal correlations of the interactions. These two LSTMs are part of the encoder structure. Then, these two LSTMs are utilized to fuse the spatial and temporal data.
At time step T o b s , there are two hidden variables ( m i p e d , T o b s , g i p e d , T o b s ) from two LSTMs of each pedestrian. In our implementation, these two variables are fed to two different multi-layer perceptrons, ( δ 1 ( · ) and δ 2 ( · ) ) , before being concatenated:
m ¯ i p e d = δ 1 ( m i T o b s )
g ¯ i p e d = δ 2 ( g i T o b s )
h p e d = m ¯ i p e d g ¯ i p e d
Furthermore, at each time step T o b s , there are also two hidden variables ( m j v e h , T o b s , g j v e h , T o b s ) for each vehicle. Then, these two variables are fed to two different perceptrons, ( δ 1 ( · ) and δ 2 ( · ) ) , before being concatenated:
m ¯ j v e h = δ 1 ( m j T o b s )
g ¯ j v e h = δ 2 ( g j T o b s )
h v e h = m ¯ j v e h g ¯ j v e h
Using real-world data, our goal is to simulate pedestrians’ and vehicles’ motions and the interaction between them. Three components represent the intermediate state vector of our model, namely the hidden variables of SLSTM, the hidden variables of TLSTM, and the added noise (as shown in Figure 3). The intermediate state vector is calculated as:
d i p e d , T o b s = h i p e d z
d j v e h , T o b s = h j v e h z
where z represents noise, and h i p e d and h j v e h are from Equations (29) and (32). The intermediate state vectors, d i p e d , T o b s and d j v e h , T o b s , then act as the initial hidden state of the decoder LSTM (termed DLSTM). The pedestrian’s and vehicle’s predicted relative positions are shown below:
d i p e d , T o b s + 1 = D L S T M ( d i p e d , T o b s , e i p e d , T o b s ; W d p e d )
d j v e h , T o b s + 1 = D L S T M ( d j v e h , T o b s , e j v e h , T o b s ; W d v e h )
( Δ x i p e d , T o b s + 1 , Δ y i p e d , T o b s + 1 , Δ θ i p e d , T o b s ) = δ 3 ( d i p e d , T o b s )
( Δ x j v e h , T o b s + 1 , Δ y j p e d , T o b s + 1 , Δ θ j v e h , T o b s ) = δ 3 ( d j v e h , T o b s )
In Equations (35) and (36), W d is the weight of the Decoder Long Short-Term Memory (DLSTM). This weight plays a pivotal role in the optimization process. e i p e d , T o b s and e j v e h , T o b s are the spatial features of the pedestrian and vehicle, respectively, and are from Equations (15) and (20). In Equations (37) and (38), δ 3 ( · ) is a linear layer. Once the anticipated relative position at time step T o b s + 1 is acquired, the DLSTM proceeds to compute the subsequent inputs. These inputs are determined by considering the most recent projected relative position, as outlined in Equation (15). Moreover, the process of translating relative positions into absolute positions, a crucial step in loss computation, can be accomplished with great simplicity. For the loss computation, we used the variety loss, as presented in reference [52]. The calculation of the variety loss is determined by following these steps. For every vehicle and pedestrian, the deep learning model generates many predicted trajectories by randomly sampling z from a standard normal distribution with a mean of 0 and a standard deviation of 1. Subsequently, it opts for the trajectory that exhibits the least deviation from the ground truth, using this trajectory as the model’s output for loss computation:
L v a r i e t y p e d = min k p e d Y i Y ^ i k p e d 2
L v a r i e t y v e h = min k v e h Y j Y ^ j k v e h 2
In Equations (39) and (40), the variables Y i , Y ^ i k p e d , and k p e d correspond to the ground-truth trajectory, the predicted trajectory, and a hyperparameter, respectively. By focusing solely on the most optimal trajectory, this particular loss function motivates the neural network to explore and encompass the range of potential outcomes aligned with the trajectory history.

5. Implementation Details

In our approach, training the weights of the Holistic Spatio-Temporal Graph Attention (HSTGA) trajectory prediction model involves several key components and hyperparameters to ensure effective learning. The training process aims to minimize the difference between the model’s predicted trajectories and the ground-truth trajectories from the dataset. The following steps are followed to make sure our model is performing well:
  • The variety loss is selected, as shown in Equations (39) and (40), to quantify the difference between the predicted and actual trajectories. Moreover, we used two evaluation metrics, namely the Average Displacement Error (ADE) and Final Displacement Error (FDE), to report the prediction errors.
  • The Adam optimizer is used with a good learning rate to balance fast convergence and avoid overshooting.
  • Batch-size, backpropagation, weight-update, and regularization techniques are included in our model implementation.
  • Proper datasets for training and validation are an essential part of our model implementation.
  • We monitor the performance of our model and tune the hyperparameters if needed.
The training process of our model includes fine-tuning the weights of the LSTM layers and the graph attention networks (GATs) to effectively capture vehicle–pedestrian interactions and spatio-temporal dynamics. This process progressively enhances the model’s parameters to accurately predict trajectories in complex scenarios.
In our implementation, each LSTM consists of only one layer. In Equations (15) and (20), the dimensions of e i p e d , t and e j v e h , t are set to 256, and in Equations (16) and (21), the dimensions of m i p e d , t and m j v e h , t are set to 64. The weight matrix W (Equation (22)) for the first graph attention layer has a dimension of 32 × 32, whereas for the second layer, it has a dimension of 32 × 64. The dimension of the attention coefficient matrix a in Equation (22) is set to 32 for the first graph attention layer and 64 for the second layer. Batch normalization is applied to the input of the graph attention layer. In Equations (25) and (26), the dimensions of g i p e d , t and g j v e h , t are set to 32. The activation function δ 1 ( · ) (Equations (27) and (30)) contains three layers with ReLU activation functions. The number of hidden nodes in these layers is 32, 64, and 24, respectively. Similarly, the activation function δ 2 ( · ) (Equations (28) and (31)) consists of three layers with ReLU activation functions, and the number of hidden nodes is 32, 64, and 16, respectively. The dimension of z in Equations (33) and (34) is set to 16. We trained the network using the Adam optimizer with a learning rate of 0.01 and a batch size of 64.

6. Experiments

6.1. Dataset

Datasets play a crucial role in developing and assessing deep learning models. For example, researchers frequently employ the widely used ETH [31] and UCY [32] datasets to evaluate the efficacy of pedestrian trajectory prediction models. However, these datasets are not specifically designed for urban traffic scenarios. We employed the VCI-DUT [33] and inD datasets [141] to overcome this limitation to train and evaluate our proposed HSTGA model. These datasets contain large numbers of real-world vehicle–pedestrian trajectories, encompassing various human–human, human–vehicle, and vehicle–vehicle interactions. Additionally, we compared our model against state-of-the-art pedestrian trajectory prediction models on several pedestrian datasets, including ETH, UCY, and the Stanford Drone Dataset (SDD) [142].
The VCI-DUT dataset comprises real-world pedestrian and vehicle trajectories collected from two locations on China’s Dalian University of Technology (DUT) campus, as depicted in Figure 8. The first location consists of a pedestrian crosswalk at an intersection without traffic signals, where the right of way is not prioritized for either pedestrians or vehicles. The second location is a relatively large shared space near a roundabout, where pedestrians and vehicles have free movement. Similar to the CITR dataset, the recordings were captured using a DJI Mavic Pro Drone equipped with a downward-facing camera, which was positioned high enough to go unnoticed by pedestrians and vehicles. The footage has a resolution of 1920 × 1080 with a frame rate of 23.98 fps. The dataset primarily comprises trajectories of college students leaving their classrooms and regular cars passing through the campus. The dataset comprises 17 clips of crosswalk scenarios and 11 clips of shared-space scenarios, including 1793 trajectories. Some of the clips involve multiple VCIs, i.e., more than two vehicles simultaneously interacting with pedestrians, as illustrated in Figure 8.
Figure 8. VCI-DUT Dataset with trajectories of vehicles (red dashed lines) and pedestrians (colorful solid lines). Upper: Intersection. Lower: Roundabout [33].
The second dataset utilized in this study is the inD dataset, as depicted in Figure 9. This new dataset contains naturalistic vehicle trajectories captured at intersections in Germany. Traditional data collection methods are prone to limitations such as occlusions; however, by using a drone, these obstacles are overcome. Traffic at four distinct locations was recorded, and the trajectory for each road user was extracted, along with their corresponding type. State-of-the-art computer vision algorithms were used to obtain positional errors, typically less than 10 cm. The inD dataset is applicable to numerous tasks, including road-user prediction, driver modeling, scenario-based safety validation of automated driving systems, and data-driven development of highly automated driving (HAD) system components.
Figure 9. inD dataset [141].

6.2. Evaluation Metrics

Following prior works [50,56,57,60,139], we used two error metrics to report prediction errors:
  • Average Displacement Error (ADE): The mean distance between the actual and predicted trajectories over all predicted time steps, as specified in Equation (40).
  • Final Displacement Error (FDE): The mean distance between the actual and predicted trajectories at the last predicted time step, which is expressed in Equation (41).
A D E p e d = i N t = T o b s + 1 T f Y i p e d , t Y ^ i p e d , t 2 N × ( T f T o b s )
F D E p e d = i N Y i p e d , t Y ^ i p e d , t 2 N , t = T f
In Equations (41) and (42), N is the number of pedestrians. To find the ADE and FDE for vehicles, N is replaced with M, which is the number of vehicles.

7. Results and Analysis

7.1. Quantitative Results

Our model has been extensively trained and evaluated using two datasets: the VCI-DUT dataset and the inD dataset. The VCI-DUT dataset consists of 17 video clips that effectively portray crosswalk scenarios and an additional 11 video clips that depict shared-space scenarios. To ensure optimal model performance, a training subset of 10% from the VCI-DUT dataset was utilized, whereas the remaining portion was exclusively employed for rigorous model evaluation. It is noteworthy to mention that the training subset predominantly encompasses intersection scenarios, focusing on the intricate dynamics between pedestrians and vehicles in such settings.
However, it is important to highlight that our model was intentionally not trained on roundabout scenarios. This decision was based on the recognition of the heightened complexity and increased interaction complexity between pedestrians and vehicles observed in roundabouts. By excluding roundabout scenarios from the training process, we aimed to evaluate the generalization capability of our model, specifically in the context of previously unseen and intricate scenarios, such as roundabouts. By conducting an in-depth evaluation of the proposed Holistic Spatio-Temporal Graph Attention (HSTGA) model in roundabout settings, we aim to provide valuable insights into its generalization capabilities and further contribute to the advancement of pedestrian–vehicle interaction research.
Moreover, we trained our model on additional datasets, including ETH, UCY, HOTEL, ZARA1, and ZARA2. We also used 40% of the dataset for training and the remainder for evaluation. We started the investigation by evaluating our model on the pedestrian-only dataset. The ADE and FDE results (in meters) for 12 time-step predictions are shown in Table 1; lower results are better. The bold font represents the best results. The proposed model outperformed the previous approaches, such as Social-LSTM [50], Social Attention [143], Social-GAN [136], CIDNN [57], STGAT [56], and Step Attention [37], in both the ADE and FDE. The results demonstrate that the use of human–human, human–vehicle, and vehicle–vehicle information improves the accuracy of pedestrian trajectory predictions.
Table 1. Quantitative results of all the baseline models and our model (in bold). Two evaluation metrics, namely the ADE and FDE, are presented (lower results are better).
Table 2 presents a comparative analysis of the factors that influence pedestrian trajectory in LSTM-based models and our proposed method. We investigated the influence of the social interaction (SI), the pedestrian–vehicle interaction (VPI), and different inputs, including the relative position (RP), the relative velocity (RV), and learning the vehicle–pedestrian interaction adaptively (LIA).
Table 2. Interaction and influencing factors of LSTM-based models and our model (in bold).
In Table 3, we demonstrate the evaluation outcomes of our method on the VCI-DUT and inD datasets and compare them with baseline techniques, including state-of-the-art DNN-based pedestrian prediction methods.
Table 3. Quantitative results on DUT and inD datasets.
  • Constant Velocity (CV) [79]: The pedestrian is assumed to travel at a constant velocity.
  • Social GAN (SGAN) [52]: A GAN architecture that uses a permutation-invariant pooling module to capture pedestrian interactions at different scales.
  • Multi-Agent Tensor Fusion (MATF) [54]: A GAN architecture that uses a global pooling layer to combine trajectory and semantic information.
  • Off-the-Sidewalk Predictions (OSP) [79]: The probabilistic interaction model introduced in [79].
As shown in Table 3, the proposed HSTGA method outperformed previous works in both the shared spaces of the DUT dataset and the unsignalized intersections of the inD dataset.

7.2. Qualitative Results

To qualitatively analyze the performance of the HSTGA model, we plotted the predicted trajectories against the ground truth. The following scenarios (Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14) show the qualitative results of our model, where pedestrians interact with vehicles in a very challenging environment. The background images presented in Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 are screenshots extracted from the raw video. The video itself has a resolution of 1920 × 1080 pixels and operates at a frame rate of 23.98 frames per second (fps). It is important to note that the coordinates of the image extracted from the video (left image) are in terms of image pixels, whereas the predicted trajectories of the image (right image) are on a scale of meters.
Figure 10. The output trajectories of the model on the 1st scenario of the DUT dataset. Left: Visual of the scene. Right: Trajectory model and prediction. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory). In the lower image, objects distinguished by a specific color are enclosed within a drawn outline and an arrow that indicates the direction of movement.
Figure 11. The output trajectories of the model on the 2nd scenario of the DUT dataset. Left: Visual of the scene. Right: Trajectory model and prediction. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory). In the lower image, objects distinguished by a specific color are enclosed within a drawn outline and an arrow that indicates the direction of movement.
Figure 12. The output trajectories of the model on the 3rd scenario of the DUT dataset. Left: Visual of the scene. Right: Trajectory model and prediction. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory). In the lower image, objects distinguished by a specific color are enclosed within a drawn outline and an arrow that indicates the direction of movement.
Figure 13. The output trajectories of the model on the 4th scenario of the DUT dataset. Left: Visual of the scene. Right: Trajectory model and prediction. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory). In the lower image, objects distinguished by a specific color are enclosed within a drawn outline and an arrow that indicates the direction of movement.
Figure 14. The output trajectories of the model on the 5th scenario of the DUT dataset. Left: Visual of the scene. Right: Trajectory model and prediction. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory). In the lower image, objects distinguished by a specific color are enclosed within a drawn outline and an arrow that indicates the direction of movement.
Our investigation involved a rigorous evaluation of the predictive capacities of our model, which entailed the prediction of future outcomes across a range of distinct time steps. Specifically, we examined the predictive accuracy at 8, 12, 14, 16, 18, 20, 22, and 24 time steps ahead. These chosen time steps were critical in assessing the model’s efficacy in forecasting future events. To illustrate our findings, we present the following figures (Figure 15 and Figure 16) that offer a comprehensive depiction of the obtained results for each designated time step. Importantly, the data presented in these figures pertain specifically to the 5th scenario, ensuring a focused and contextually relevant analysis.
Figure 15. Predicted trajectories at 8, 12, 14, and 16 time steps. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory).
Figure 16. Predicted trajectories at 18, 20, 22, and 24 time steps. Pedestrians: red (observed trajectory), blue (ground truth), and green (predicted trajectory). Vehicles: turquoise (observed trajectory), yellow (ground truth), and pink (predicted trajectory).
The experimental results in Figure 15 and Figure 16 demonstrate the capability of our model in long-term trajectory prediction. These figures serve as empirical evidence, substantiating the claim that our model exhibits remarkable efficacy in predicting trajectories over extended time periods. Notably, our findings reveal that the accuracy of long-term predictions, spanning 16, 18, 20, 22, and 24 time steps, is on par with that of short-term predictions covering 8 time steps. This signifies the robustness and reliability of our model’s predictive capabilities across varying temporal horizons.

8. Conclusions

In this study, we propose a novel encoder–decoder interaction model named Holistic Spatio-Temporal Graph Attention (HSTGA) for trajectory prediction in vehicle–pedestrian interaction. HSTGA aims to predict long-horizon pedestrian and vehicle trajectories by modeling pedestrian–vehicle interactions in non-signalized and non-crosswalk scenarios. The proposed model uses a trajectory-based approach to capture the complex interactions between pedestrians and vehicles. HSTGA integrates a holistic spatio-temporal graph attention mechanism that learns the attention weights of the spatial and temporal features of pedestrians and vehicles. The proposed method outperforms state-of-the-art pedestrian trajectory prediction models on various benchmark datasets, highlighting the effectiveness of the HSTGA model. In order to effectively capture the interaction features between pedestrians and vehicles, a vehicle–pedestrian interaction feature extraction model that utilizes a multi-layer perceptron (MLP) sub-network and max pooling has been proposed. The MLP sub-network is responsible for extracting the features of both pedestrians and vehicles, whereas the max pooling operation aggregates these features into a single vector. The extracted features are then input into an LSTM network to predict the trajectories of both pedestrians and vehicles. This feature extraction model enhances the model’s ability to capture the intricate interactions between pedestrians and vehicles, resulting in heightened prediction accuracy. Compared to other methods, the proposed approach reduces both computational and data requirements, rendering it suitable for real-time applications. The MLP sub-network extracts features in parallel, reducing the overall time complexity of the model. The max pooling operation combines the features of pedestrians and vehicles into a single vector, thereby decreasing the number of input parameters required for the LSTM network. Furthermore, the proposed approach solely utilizes the historical trajectories of pedestrians and vehicles, thus eliminating the need for external data sources. Extensive evaluations conducted on diverse datasets containing numerous challenging scenarios involving the interactions between vehicles and pedestrians demonstrate the effectiveness and efficiency of the proposed approach.

Author Contributions

Conceptualization, H.A. and S.L.; Methodology, H.A. and S.L.; Software, H.A.; Validation, H.A.; Formal analysis, H.A.; Investigation, H.A. and S.L.; Resources, H.A. and S.L.; Data curation, H.A.; Writing—original draft, H.A.; Writing—review & editing, H.A. and S.L.; Visualization, H.A.; Supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Two datasets are used namely VCI-DUT and inD datasets. The VCI-DUT data is available in a publicly accessible repository at [https://github.com/dongfang-steven-yang/vci-dataset-dut, accessed on 19 June 2023]. The inD dataset is available upon request at [https://www.ind-dataset.com/, accessed on 19 June 2023].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pedestrian Safety | NHTSA. Available online: https://www.nhtsa.gov/road-safety/pedestrian-safety (accessed on 6 April 2023).
  2. Seize the Moment to Tackle Road Crash Deaths and Build a Safe and Sustainable Future. Available online: https://www.who.int/news/item/25-06-2023-seize-the-moment-to-tackle-road-crash-deaths-and-build-a-safe-and-sustainable-future (accessed on 14 August 2023).
  3. Ahmed, S.K.; Mohammed, M.G.; Abdulqadir, S.O.; El-Kader, R.G.A.; El-Shall, N.A.; Chandran, D.; Rehman, M.E.U.; Dhama, K. Road traffic accidental injuries and deaths: A neglected global health issue. Health Sci. Rep. 2023, 6, e1240. [Google Scholar] [CrossRef] [PubMed]
  4. Pedestrian Safety Campaign. Available online: http://txdot.gov/en/home/safety/traffic-safety-campaigns/pedestrian-safety.html (accessed on 16 April 2023).
  5. Lu, Y.; Shen, J.; Wang, C.; Lu, H.; Xin, J. Studying on the design and simulation of collision protection system between vehicle and pedestrian. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147719900109. [Google Scholar] [CrossRef]
  6. Crandall, J.R.; Bhalla, K.S.; Madeley, N.J. Designing road vehicles for pedestrian protection. BMJ 2002, 324, 1145–1148. [Google Scholar] [CrossRef]
  7. Stcherbatcheff, G.; Tarriere, C.; Duclos, P.; Fayon, A.; Got, C.; Patel, A. Simulation of Collisions Between Pedestrians and Vehicles Using Adult and Child Dummies; SAE Technical Paper 751167; SAE International: Warrendale, PA, USA, 1975. [Google Scholar] [CrossRef]
  8. Ganichev, A.; Batishcheva, O. Evaluating the conflicts between vehicles and pedestrians. Transp. Res. Procedia 2020, 50, 145–151. [Google Scholar] [CrossRef]
  9. Tahmasbi-Sarvestani, A.; Mahjoub, H.N.; Fallah, Y.P.; Moradi-Pari, E.; Abuchaar, O. Implementation and Evaluation of a Cooperative Vehicle-to-Pedestrian Safety Application. IEEE Intell. Transp. Syst. Mag. 2017, 9, 62–75. [Google Scholar] [CrossRef]
  10. Gandhi, T.; Trivedi, M.M. Pedestrian Protection Systems: Issues, Survey, and Challenges. IEEE Trans. Intell. Transp. Syst. 2007, 8, 413–430. [Google Scholar] [CrossRef]
  11. Amini, R.E.; Yang, K.; Antoniou, C. Development of a conflict risk evaluation model to assess pedestrian safety in interaction with vehicles. Accid. Anal. Prev. 2022, 175, 106773. [Google Scholar] [CrossRef]
  12. Bai, S.; Legge, D.D.; Young, A.; Bao, S.; Zhou, F. Investigating External Interaction Modality and Design Between Automated Vehicles and Pedestrians at Crossings. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 1691–1696. [Google Scholar] [CrossRef]
  13. Plitt, A. New York City’s Streets Are ‘More Congested Than Ever’: Report. Curbed NY, 15 August 2019. Available online: https://ny.curbed.com/2019/8/15/20807470/nyc-streets-dot-mobility-report-congestion (accessed on 6 May 2023).
  14. Pedestrian Scramble. Wikipedia. 2 May 2023. Available online: https://en.wikipedia.org/w/index.php?title=Pedestrian_scramble&oldid=1152818953 (accessed on 6 May 2023).
  15. Zheng, L.; Ismail, K.; Meng, X. Traffic conflict techniques for road safety analysis: Open questions and some insights. Can. J. Civ. Eng. 2014, 41, 633–641. [Google Scholar] [CrossRef]
  16. Parker, M.R. Traffic Conflict Techniques for Safety and Operations: Observers Manual; Federal Highway Administration: McLean, VA, USA, 1989.
  17. Amundsen and Hydén. In Proceedings of the 1st Workshop on Traffic Conflicts, Oslo, Norway, November 1977.
  18. Almodfer, R.; Xiong, S.; Fang, Z.; Kong, X.; Zheng, S. Quantitative analysis of lane-based pedestrian-vehicle conflict at a non-signalized marked crosswalk. Transp. Res. Part F Traffic Psychol. Behav. 2016, 42, 468–478. [Google Scholar] [CrossRef]
  19. Liu, Y.-C.; Tung, Y.-C. Risk analysis of pedestrians’ road-crossing decisions: Effects of age, time gap, time of day, and vehicle speed. Saf. Sci. 2014, 63, 77–82. [Google Scholar] [CrossRef]
  20. Yagil, D. Beliefs, motives and situational factors related to pedestrians’ self-reported behavior at signal-controlled crossings. Transp. Res. Part F Traffic Psychol. Behav. 2000, 3, 1–13. [Google Scholar] [CrossRef]
  21. Tom, A.; Granié, M.-A. Gender differences in pedestrian rule compliance and visual search at signalized and unsignalized crossroads. Accid. Anal. Prev. 2011, 43, 1794–1801. [Google Scholar] [CrossRef] [PubMed]
  22. Cheng, G.; Wang, Y.; Li, D. Setting Conditions of Crosswalk Signal on Urban Road Sections in China. ScholarMate. 2013. Available online: https://www.scholarmate.com/A/Evu6ja (accessed on 18 April 2023).
  23. Himanen, V.; Kulmala, R. An application of logit models in analysing the behaviour of pedestrians and car drivers on pedestrian crossings. Accid. Anal. Prev. 1988, 20, 187–197. [Google Scholar] [CrossRef] [PubMed]
  24. Shetty, A.; Yu, M.; Kurzhanskiy, A.; Grembek, O.; Tavafoghi, H.; Varaiya, P. Safety challenges for autonomous vehicles in the absence of connectivity. Transp. Res. Part C Emerg. Technol. 2021, 128, 103133. [Google Scholar] [CrossRef]
  25. Iftikhar, S.; Zhang, Z.; Asim, M.; Muthanna, A.; Koucheryavy, A.; El-Latif, A.A.A. Deep Learning-Based Pedestrian Detection in Autonomous Vehicles: Substantial Issues and Challenges. Electronics 2022, 11, 21. [Google Scholar] [CrossRef]
  26. Eiffert, S.; Li, K.; Shan, M.; Worrall, S.; Sukkarieh, S.; Nebot, E. Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction using a Graph Vehicle-Pedestrian Attention Network. IEEE Robot. Autom. Lett. 2020, 5, 5026–5033. [Google Scholar] [CrossRef]
  27. Chandra, R.; Bhattacharya, U.; Bera, A.; Manocha, D. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8475–8484. [Google Scholar] [CrossRef]
  28. Chandra, R.; Bhattacharya, U.; Roncal, C.; Bera, A.; Manocha, D. RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs. arXiv 2019, arXiv:1907.08752. [Google Scholar]
  29. Chandra, R.; Guan, T.; Panuganti, S.; Mittal, T.; Bhattacharya, U.; Bera, A.; Manocha, D. Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs. arXiv 2020, arXiv:1912.01118. [Google Scholar] [CrossRef]
  30. Carrasco, S.; Llorca, D.F.; Sotelo, M.Á. SCOUT: Socially-COnsistent and UndersTandable Graph Attention Network for Trajectory Prediction of Vehicles and VRUs. arXiv 2021, arXiv:2102.06361. [Google Scholar]
  31. Pellegrini, S.; Ess, A.; Schindler, K.; van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar] [CrossRef]
  32. Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by Example. Comput. Graph. Forum 2007, 26, 655–664. [Google Scholar] [CrossRef]
  33. Yang, D.; Li, L.; Redmill, K.; Özgüner, Ü. Top-view Trajectories: A Pedestrian Dataset of Vehicle-Crowd Interaction from Controlled Experiments and Crowded Campus. arXiv 2019, arXiv:1902.00487. [Google Scholar]
  34. Krajewski, R.; Moers, T.; Bock, J.; Vater, L.; Eckstein, L. The rounD Dataset: A Drone Dataset of Road User Trajectories at Roundabouts in Germany. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
  35. Bock, J.; Vater, L.; Krajewski, R.; Moers, T. Highly Accurate Scenario and Reference Data for Automated Driving. ATZ Worldw 2021, 123, 50–55. [Google Scholar] [CrossRef]
  36. Rudenko, A.; Palmieri, L.; Herman, M.; Kitani, K.M.; Gavrila, D.M.; Arras, K.O. Human Motion Trajectory Prediction: A Survey. Int. J. Robot. Res. 2020, 39, 895–935. [Google Scholar] [CrossRef]
  37. Zhang, E.; Masoud, N.; Bandegi, M.; Lull, J.; Malhan, R.K. Step Attention: Sequential Pedestrian Trajectory Prediction. IEEE Sensors J. 2022, 22, 8071–8083. [Google Scholar] [CrossRef]
  38. Kim, S.; Guy, S.J.; Liu, W.; Wilkie, D.; Lau, R.W.H.; Lin, M.C.; Manocha, D. BRVO: Predicting pedestrian trajectories using velocity-space reasoning. Int. J. Robot. Res. 2015, 34, 201–217. [Google Scholar] [CrossRef]
  39. Zanlungo, F.; Ikeda, T.; Kanda, T. Social force model with explicit collision prediction. EPL 2011, 93, 68005. [Google Scholar] [CrossRef]
  40. Martinelli, A.; Gao, H.; Groves, P.D.; Morosi, S. Probabilistic Context-Aware Step Length Estimation for Pedestrian Dead Reckoning. IEEE Sensors J. 2018, 18, 1600–1611. [Google Scholar] [CrossRef]
  41. SmartPDR: Smartphone-Based Pedestrian Dead Reckoning for Indoor Localization. IEEE Sens. J. 2015, 15, 15018804. Available online: https://ieeexplore.ieee.org/document/6987239 (accessed on 5 May 2023).
  42. Indoor Trajectory Prediction Algorithm Based on Communication Analysis of Built-In Sensors in Mobile Terminals. IEEE Sens. J. 2021, 21, 21388524.
  43. Ziebart, B.D.; Ratliff, N.; Gallagher, G.; Mertz, C.; Peterson, K.; Bagnell, J.A.; Hebert, M.; Dey, A.K.; Srinivasa, S. Planning-based prediction for pedestrians. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 3931–3936. [Google Scholar] [CrossRef]
  44. Galata, A.; Johnson, N.; Hogg, D. Learning Variable-Length Markov Models of Behavior. Comput. Vis. Image Underst. 2001, 81, 398–413. [Google Scholar] [CrossRef]
  45. Deo, N.; Trivedi, M.M. Learning and predicting on-road pedestrian behavior around vehicles. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
  46. Rehder, E.; Wirth, F.; Lauer, M.; Stiller, C. Pedestrian Prediction by Planning Using Deep Neural Networks. arXiv 2017, arXiv:1706.05904. [Google Scholar]
  47. Dendorfer, P.; Ošep, A.; Leal-Taixé, L. Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation. arXiv 2020, arXiv:2010.01114. [Google Scholar]
  48. Yao, Y.; Atkins, E.; Johnson-Roberson, M.; Vasudevan, R.; Du, X. BiTraP: Bi-directional Pedestrian Trajectory Prediction with Multi-modal Goal Estimation. arXiv 2020, arXiv:2007.14558. [Google Scholar] [CrossRef]
  49. Tran, H.; Le, V.; Tran, T. Goal-driven Long-Term Trajectory Prediction. arXiv 2020, arXiv:2011.02751. [Google Scholar]
  50. Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar] [CrossRef]
  51. Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1186–1194. [Google Scholar] [CrossRef]
  52. Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. arXiv 2018, arXiv:1803.10892. [Google Scholar]
  53. Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction. arXiv 2019, arXiv:1903.02793. [Google Scholar]
  54. Zhao, T.; Xu, Y.; Monfort, M.; Choi, W.; Baker, C.; Zhao, Y.; Wang, Y.; Wu, Y.N. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction. arXiv 2019, arXiv:1904.04776. [Google Scholar]
  55. Nikhil, N.; Morris, B.T. Convolutional Neural Network for Trajectory Prediction. arXiv 2018, arXiv:1809.00696. [Google Scholar]
  56. Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6271–6280. [Google Scholar] [CrossRef]
  57. Xu, Y.; Piao, Z.; Gao, S. Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5275–5284. [Google Scholar] [CrossRef]
  58. Pedestrian Trajectory Prediction Based on Deep Convolutional LSTM Network. IEEE Trans. Intell. Transp. Syst. Available online: https://ieeexplore.ieee.org/document/9043898 (accessed on 5 May 2023).
  59. Quan, R.; Zhu, L.; Wu, Y.; Yang, Y. Holistic LSTM for Pedestrian Trajectory Prediction. IEEE Trans. Image Process 2021, 30, 3229–3239. [Google Scholar] [CrossRef]
  60. Zhang, C.; Berger, C. Learning the Pedestrian-Vehicle Interaction for Pedestrian Trajectory Prediction. arXiv 2022, arXiv:2202.05334. [Google Scholar]
  61. Anvari, B.; Bell, M.G.H.; Sivakumar, A.; Ochieng, W.Y. Modelling shared space users via rule-based social force model. Transp. Res. Part C Emerg. Technol. 2015, 51, 83–103. [Google Scholar] [CrossRef]
  62. Johora, F.T.; Müller, J.P. Modeling Interactions of Multimodal Road Users in Shared Spaces. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
  63. Hesham, O.; Wainer, G. Advanced models for centroidal particle dynamics: Short-range collision avoidance in dense crowds. Simulation 2021, 97, 529–543. [Google Scholar] [CrossRef] [PubMed]
  64. Prédhumeau, M.; Mancheva, L.; Dugdale, J.; Spalanzani, A. An Agent-Based Model to Predict Pedestrians Trajectories with an Autonomous Vehicle in Shared Spaces. J. Artif. Intell. Res. 2021, 73. [Google Scholar] [CrossRef]
  65. Zhang, Z.; Fu, D. Modeling pedestrian–vehicle mixed-flow in a complex evacuation scenario. Phys. A Stat. Mech. Its Appl. 2022, 599, 127468. [Google Scholar] [CrossRef]
  66. Golchoubian, M.; Ghafurian, M.; Dautenhahn, K.; Azad, N.L. Pedestrian Trajectory Prediction in Pedestrian-Vehicle Mixed Environments: A Systematic Review. IEEE Trans. Intell. Transp. Syst. 2023, 1–24. [Google Scholar] [CrossRef]
  67. Helbing, D.; Molnar, P. Social Force Model for Pedestrian Dynamics. Phys. Rev. E 1995, 51, 4282–4286. [Google Scholar] [CrossRef]
  68. Yang, D.; Maroli, J.M.; Li, L.; El-Shaer, M.; Jabr, B.A.; Redmill, K.; Özguner, F.; Özguner, Ü. Crowd Motion Detection and Prediction for Transportation Efficiency in Shared Spaces. In Proceedings of the 2018 IEEE International Science of Smart City Operations and Platforms Engineering in Partnership with Global City Teams Challenge (SCOPE-GCTC), Porto, Portugal, 10–13 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
  69. Borsche, R.; Meurer, A. Microscopic and macroscopic models for coupled car traffic and pedestrian flow. J. Comput. Appl. Math. 2019, 348, 356–382. [Google Scholar] [CrossRef]
  70. Yang, D.; Özgüner, Ü.; Redmill, K. A Social Force Based Pedestrian Motion Model Considering Multi-Pedestrian Interaction with a Vehicle. ACM Trans. Spat. Algorithms Syst. 2020, 6, 1–27. [Google Scholar] [CrossRef]
  71. Yang, D.; Kurt, A.; Redmill, K.; Özgüner, Ü. Agent-based microscopic pedestrian interaction with intelligent vehicles in shared space. In Proceedings of the 2nd International Workshop on Science of Smart City Operations and Platforms Engineering, Pittsburgh, PA, USA, 18–21 April 2017; pp. 69–74. [Google Scholar] [CrossRef]
  72. Anvari, B.; Bell, M.G.H.; Angeloudis, P.; Ochieng, W.Y. Long-range Collision Avoidance for Shared Space Simulation based on Social Forces. Transp. Res. Procedia 2014, 2, 318–326. [Google Scholar] [CrossRef]
  73. Yang, D.; Özgüner, Ü.; Redmill, K. Social Force Based Microscopic Modeling of Vehicle-Crowd Interaction. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1537–1542. [Google Scholar] [CrossRef]
  74. Rinke, N.; Schiermeyer, C.; Pascucci, F.; Berkhahn, V.; Friedrich, B. A multi-layer social force approach to model interactions in shared spaces using collision prediction. Transp. Res. Procedia 2017, 25, 1249–1267. [Google Scholar] [CrossRef]
  75. Johora, F.T.; Müller, J.P. On transferability and calibration of pedestrian and car motion models in shared spaces. Transp. Lett. 2021, 13, 172–182. [Google Scholar] [CrossRef]
  76. Johora, F.T.; Müller, J.P. Zone-Specific Interaction Modeling of Pedestrians and Cars in Shared Spaces. Transp. Res. Procedia 2020, 47, 251–258. [Google Scholar] [CrossRef]
  77. Zhang, L.; Yuan, K.; Chu, H.; Huang, Y.; Ding, H.; Yuan, J.; Chen, H. Pedestrian Collision Risk Assessment Based on State Estimation and Motion Prediction. IEEE Trans. Veh. Technol. 2022, 71, 98–111. [Google Scholar] [CrossRef]
  78. Jan, Q.H.; Kleen, J.M.A.; Berns, K. Self-aware Pedestrians Modeling for Testing Autonomous Vehicles in Simulation. In Proceedings of the 6th International Conference on Vehicle Technology and Intelligent Transport Systems, Prague, Czech Republic, 2–4 August 2023; pp. 577–584. Available online: https://www.scitepress.org/Link.aspx?doi=10.5220/0009377505770584 (accessed on 7 August 2023).
  79. Anderson, C.; Vasudevan, R.; Johnson-Roberson, M. Off The Beaten Sidewalk: Pedestrian Prediction In Shared Spaces For Autonomous Vehicles. arXiv 2020, arXiv:2006.00962. [Google Scholar] [CrossRef]
  80. Kabtoul, M.; Spalanzani, A.; Martinet, P. Towards Proactive Navigation: A Pedestrian-Vehicle Cooperation Based Behavioral Model. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6958–6964. [Google Scholar] [CrossRef]
  81. Bi, H.; Fang, Z.; Mao, T.; Wang, Z.; Deng, Z. Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10382–10391. [Google Scholar] [CrossRef]
  82. Rasouli, A.; Kotseruba, I.; Kunic, T.; Tsotsos, J. PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6261–6270. [Google Scholar] [CrossRef]
  83. Santos, A.C.D.; Grassi, V. Pedestrian Trajectory Prediction with Pose Representation and Latent Space Variables. In Proceedings of the 2021 Latin American Robotics Symposium (LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021 Workshop on Robotics in Education (WRE), Natal, Brazil, 11–15 October 2021; pp. 192–197. [Google Scholar] [CrossRef]
  84. Yin, Z.; Liu, R.; Xiong, Z.; Yuan, Z. Multimodal Transformer Networks for Pedestrian Trajectory Prediction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 7–15 August 2021; pp. 1259–1265. [Google Scholar] [CrossRef]
  85. Rasouli, A.; Rohani, M.; Luo, J. Bifold and Semantic Reasoning for Pedestrian Behavior Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15580–15590. [Google Scholar] [CrossRef]
  86. Cheng, H.; Liao, W.; Yang, M.Y.; Sester, M.; Rosenhahn, B. MCENET: Multi-Context Encoder Network for Homogeneous Agent Trajectory Prediction in Mixed Traffic. arXiv 2020, arXiv:2002.05966. [Google Scholar]
  87. Hassan, M.A.; Khan, M.U.G.; Iqbal, R.; Riaz, O.; Bashir, A.K.; Tariq, U. Predicting humans future motion trajectories in video streams using generative adversarial network. Multimed. Tools Appl. 2021. [Google Scholar] [CrossRef]
  88. Wang, Y.; Chen, S. Multi-Agent Trajectory Prediction With Spatio-Temporal Sequence Fusion. IEEE Trans. Multimed. 2023, 25, 13–23. [Google Scholar] [CrossRef]
  89. Girase, H.; Gang, H.; Malla, S.; Li, J.; Kanehara, A.; Mangalam, K.; Choi, C. LOKI: Long Term and Key Intentions for Trajectory Prediction. arXiv 2021, arXiv:2108.08236. [Google Scholar]
  90. Li, J.; Ma, H.; Zhang, Z.; Li, J.; Tomizuka, M. Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking. IEEE Trans. Intell. Transp. Syst. 2021, 23, 21954051. [Google Scholar] [CrossRef]
  91. Hu, Y.; Chen, S.; Zhang, Y.; Gu, X. Collaborative Motion Prediction via Neural Motion Message Passing. arXiv 2020, arXiv:2003.06594. [Google Scholar]
  92. Li, J.; Yang, F.; Ma, H.; Malla, S.; Tomizuka, M.; Choi, C. RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting. arXiv 2021, arXiv:2108.01316. [Google Scholar]
  93. Zhang, X.; Zhang, W.; Wu, X.; Cao, W. Probabilistic trajectory prediction of heterogeneous traffic agents based on layered spatio-temporal graph. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 2413–2424. [Google Scholar] [CrossRef]
  94. Su, Y.; Du, J.; Li, Y.; Li, X.; Liang, R.; Hua, Z.; Zhou, J. Trajectory Forecasting Based on Prior-Aware Directed Graph Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16773–16785. [Google Scholar] [CrossRef]
  95. Mo, X.; Huang, Z.; Xing, Y.; Lv, C. Multi-Agent Trajectory Prediction With Heterogeneous Edge-Enhanced Graph Attention Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21948356. [Google Scholar] [CrossRef]
  96. Men, Q.; Shum, H.P.H. PyTorch-based implementation of label-aware graph representation for multi-class trajectory prediction. Softw. Impacts 2022, 11, 100201. [Google Scholar] [CrossRef]
  97. Rainbow, B.A.; Men, Q.; Shum, H.P.H. Semantics-STGCNN: A Semantics-guided Spatial-Temporal Graph Convolutional Network for Multi-class Trajectory Prediction. arXiv 2021. [Google Scholar] [CrossRef]
  98. Li, Z.; Gong, J.; Lu, C.; Yi, Y. Interactive Behavior Prediction for Heterogeneous Traffic Participants in the Urban Road: A Graph-Neural-Network-Based Multitask Learning Framework. IEEE/ASME Trans. Mechatronics 2021, 26, 1339–1349. [Google Scholar] [CrossRef]
  99. Cai, Y.; Dai, L.; Wang, H.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. Pedestrian Motion Trajectory Prediction in Intelligent Driving from Far Shot First-Person Perspective Video. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5298–5313. [Google Scholar] [CrossRef]
  100. Herman, M.; Wagner, J.; Prabhakaran, V.; Möser, N.; Ziesche, H.; Ahmed, W.; Bürkle, L.; Kloppenburg, E.; Gläser, C. Pedestrian Behavior Prediction for Automated Driving: Requirements, Metrics, and Relevant Features. arXiv 2021, arXiv:2012.08418. [Google Scholar] [CrossRef]
  101. Ridel, D.A.; Deo, N.; Wolf, D.; Trivedi, M.M. Understanding Pedestrian-Vehicle Interactions with Vehicle Mounted Vision: An LSTM Model and Empirical Analysis. arXiv 2019, arXiv:1905.05350. [Google Scholar]
  102. Kim, K.; Lee, Y.K.; Ahn, H.; Hahn, S.; Oh, S. Pedestrian Intention Prediction for Autonomous Driving Using a Multiple Stakeholder Perspective Model. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 7957–7962. [Google Scholar] [CrossRef]
  103. Jyothi, R.; Mahalakshmi, K.; Vaishnavi, C.K.; Apoorva, U.; Nitya, S. Driver Assistance for Safe Navigation Under Unstructured Traffic Environment. In Proceedings of the 2019 Global Conference for Advancement in Technology (GCAT), Bangalore, India, 18–20 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
  104. Kerscher, S.; Balbierer, N.; Kraust, S.; Hartmannsgruber, A.; Müller, N.; Ludwig, B. Intention-based Prediction for Pedestrians and Vehicles in Unstructured Environments. In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems, Funchal, Madeira, Portugal, 27–29 April 2018; SCITEPRESS—Science and Technology Publications: Setúbal, Portugal, 2018; pp. 307–314. [Google Scholar] [CrossRef]
  105. Golchoubian, M.; Ghafurian, M.; Azad, N.L.; Dautenhahn, K. Characterizing Structured Versus Unstructured Environments Based on Pedestrians’ and Vehicles’ Motion Trajectories. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 2888–2895. [Google Scholar] [CrossRef]
  106. Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction. arXiv 2020, arXiv:2002.11927. [Google Scholar]
  107. Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, S.H.; Savarese, S. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. arXiv 2018, arXiv:1806.01482. [Google Scholar]
  108. Manh, H.; Alaghband, G. Scene-LSTM: A Model for Human Trajectory Prediction. arXiv 2019, arXiv:1808.04018. [Google Scholar]
  109. Azadani, M.N.; Boukerche, A. STAG: A novel interaction-aware path prediction method based on Spatio-Temporal Attention Graphs for connected automated vehicles. Ad. Hoc. Netw. 2023, 138, 103021. [Google Scholar] [CrossRef]
  110. Agamennoni, G.; Nieto, J.I.; Nebot, E.M. A bayesian approach for driving behavior inference. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; pp. 595–600. [Google Scholar] [CrossRef]
  111. Brand, M.; Oliver, N.; Pentland, A. Coupled hidden Markov models for complex action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 994–999. [Google Scholar] [CrossRef]
  112. Gindele, T.; Brechtel, S.; Dillmann, R. A probabilistic model for estimating driver behaviors and vehicle trajectories in traffic environments. In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal, 19–22 September 2010; pp. 1625–1631. [Google Scholar] [CrossRef]
  113. Liebner, M.; Baumann, M.; Klanner, F.; Stiller, C. Driver intent inference at urban intersections using the intelligent driver model. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Madrid, Spain, 3–7 June 2012; pp. 1162–1167. [Google Scholar] [CrossRef]
  114. A Survey on Motion Prediction and Risk Assessment for Intelligent Vehicles. Robomech J. 2014, 1, 1. Available online: https://robomechjournal.springeropen.com/articles/10.1186/s40648-014-0001-z (accessed on 7 May 2023). [CrossRef]
  115. Modeling Vehicle Interactions via Modified LSTM Models for Trajectory Prediction. IEEE Access 2019, 7, 38287–38296. Available online: https://ieeexplore.ieee.org/document/8672889 (accessed on 7 May 2023). [CrossRef]
  116. Ma, Y.; Zhu, X.; Zhang, S.; Yang, R.; Wang, W.; Manocha, D. TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents. arXiv 2019, arXiv:1811.02146. [Google Scholar] [CrossRef]
  117. Ding, W.; Shen, S. Online Vehicle Trajectory Prediction using Policy Anticipation Network and Optimization-based Context Reasoning. arXiv 2019, arXiv:1903.00847. [Google Scholar]
  118. Koschi, M.; Althoff, M. Set-Based Prediction of Traffic Participants Considering Occlusions and Traffic Rules. IEEE Trans. Intell. Veh. 2021, 6, 249–265. [Google Scholar] [CrossRef]
  119. Ding, W.; Chen, J.; Shen, S. Predicting Vehicle Behaviors Over An Extended Horizon Using Behavior Interaction Network. arXiv 2019, arXiv:1903.00848. [Google Scholar]
  120. Deo, N.; Trivedi, M.M. Multi-Modal Trajectory Prediction of Surrounding Vehicles with Maneuver based LSTMs. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 1179–1184. [Google Scholar] [CrossRef]
  121. Deo, N.; Trivedi, M.M. Convolutional Social Pooling for Vehicle Trajectory Prediction. arXiv 2018, arXiv:1805.06771. [Google Scholar]
  122. Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Attention Based Vehicle Trajectory Prediction. IEEE Trans. Intell. Veh. 2021, 6, 175–185. [Google Scholar] [CrossRef]
  123. Messaoud, K.; Yahiaoui, I.; Verroust-Blondet, A.; Nashashibi, F. Non-local Social Pooling for Vehicle Trajectory Prediction. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 975–980. [Google Scholar] [CrossRef]
  124. Diehl, F.; Brunner, T.; Le, M.T.; Knoll, A. Graph Neural Networks for Modelling Traffic Participant Interaction. arXiv 2019, arXiv:1903.01254. [Google Scholar]
  125. Li, X.; Ying, X.; Chuah, M.C. GRIP: Graph-based Interaction-aware Trajectory Prediction. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3960–3966. [Google Scholar] [CrossRef]
  126. Azadani, M.N.; Boukerche, A. An Interaction-Aware Vehicle Behavior Prediction for Connected Automated Vehicles. In Proceedings of the ICC 2022—IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 279–284. [Google Scholar] [CrossRef]
  127. Wu, Y.; Chen, G.; Li, Z.; Zhang, L.; Xiong, L.; Liu, Z.; Knoll, A. HSTA: A Hierarchical Spatio-Temporal Attention Model for Trajectory Prediction. IEEE Trans. Veh. Technol. 2021, 70, 11295–11307. [Google Scholar] [CrossRef]
  128. Sheng, Z.; Xu, Y.; Xue, S.; Li, D. Graph-Based Spatial-Temporal Convolutional Network for Vehicle Trajectory Prediction in Autonomous Driving. IEEE Trans. Intell. Transport. Syst. 2022, 23, 17654–17665. [Google Scholar] [CrossRef]
  129. Gao, J.; Sun, C.; Zhao, H.; Shen, Y.; Anguelov, D.; Li, C.; Schmid, C. VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation. arXiv 2020, arXiv:2005.04259. [Google Scholar]
  130. Alghodhaifi, H.; Lakshmanan, S. Autonomous Vehicle Evaluation: A Comprehensive Survey on Modeling and Simulation Approaches. IEEE Access 2021, 9, 151531–151566. [Google Scholar] [CrossRef]
  131. Alghodhaifi, H.; Lakshmanan, S. Simulation-based model for surrogate safety measures analysis in automated vehicle-pedestrian conflict on an urban environment. In Autonomous Systems: Sensors, Processing, and Security for Vehicles and Infrastructure, 2020; SPIE: San Diego, CA, USA, 2020; pp. 8–21. [Google Scholar]
  132. Lakshmanan, S.; Yan, Y.; Baek, S.; Alghodhaifi, H. Modeling and simulation of leader-follower autonomous vehicles: Environment effects. In Unmanned Systems Technology XXI; SPIE: San Diego, CA, USA, 2019; pp. 116–123. [Google Scholar] [CrossRef]
  133. Cheek, E.; Alghodhaifi, H.; Adam, C.; Andres, R.; Lakshmanan, S. Dedicated short range communications used as fail-safe in autonomous navigation. In Unmanned Systems Technology XXII; SPIE: San Diego, CA, USA, 2020; pp. 159–177. [Google Scholar] [CrossRef]
  134. Alghodhaifi, H.; Lakshmanan, S.; Baek, S.; Richardson, P. Autonomy modeling and validation in a highly uncertain environment. In Proceedings of the 2018 Ndia Ground Vehicle Systems Engineering and Technology Symposiumat, Novi, MI, USA, 7–9 August 2018. [Google Scholar]
  135. Alghodhaifi, H.; Lakshmanan, S. Safety model of automated vehicle-VRU conflict under uncertain weather conditions and sensors failure. In Unmanned Systems Technology XXII; SPIE: San Diego, CA, USA, 2020; pp. 56–65. [Google Scholar]
  136. Alghodhaifi, H.M. Prediction of Intelligent Vehicle-Pedestrian Conflict in a Highly Uncertain Environment. Ph.D. Thesis, University of Michigan, Ann Arbor, MI, USA, 2023. Available online: https://deepblue.lib.umich.edu/handle/2027.42/177045 (accessed on 19 June 2023).
  137. Chen, K.; Zhu, H.; Tang, D.; Zheng, K. Future pedestrian location prediction in first-person videos for autonomous vehicles and social robots. Image Vis. Comput. 2023, 134, 104671. [Google Scholar] [CrossRef]
  138. Czech, P.; Braun, M.; Kreßel, U.; Yang, B. Behavior-Aware Pedestrian Trajectory Prediction in Ego-Centric Camera Views with Spatio-Temporal Ego-Motion Estimation. Mach. Learn. Knowl. Extr. 2023, 5, 3. [Google Scholar] [CrossRef]
  139. Su, H.; Zhu, J.; Dong, Y.; Zhang, B. Forecast the plausible paths in crowd scenes. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, in IJCAI’17, Melbourne, Australia, 19–25 August 2017; AAAI Press: Palo Alto, CA, USA, 2017; pp. 2772–2778. [Google Scholar]
  140. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]
  141. Bock, J.; Krajewski, R.; Moers, T.; Runde, S.; Vater, L.; Eckstein, L. The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–17 November 2020; pp. 1929–1934. [Google Scholar] [CrossRef]
  142. Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; pp. 549–565. [Google Scholar] [CrossRef]
  143. Vemula, A.; Muelling, K.; Oh, J. Social Attention: Modeling Attention in Human Crowds. arXiv 2018, arXiv:1710.04689. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.