Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review

Mirzabagheri, Alireza; Ahmadi, Majid; Zhang, Ning; Alirezaee, Reza; Mozaffari, Saeed; Alirezaee, Shahpour

doi:10.3390/vehicles7020057

Open AccessFeature PaperReview

Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review

by

Alireza Mirzabagheri

¹

,

Majid Ahmadi

¹,

Ning Zhang

¹,

Reza Alirezaee

¹,

Saeed Mozaffari

²

and

Shahpour Alirezaee

^2,*

¹

Department of Electrical and Computer Engineering, University of Windsor, 401 Sunset Avenue, Windsor, ON N9B 3P4, Canada

²

Department of Mechanical, Automotive and Material Engineering, University of Windsor, 401 Sunset Avenue, Windsor, ON N9B 3P4, Canada

^*

Author to whom correspondence should be addressed.

Vehicles 2025, 7(2), 57; https://doi.org/10.3390/vehicles7020057

Submission received: 27 February 2025 / Revised: 3 May 2025 / Accepted: 14 May 2025 / Published: 9 June 2025

Download

Browse Figures

Versions Notes

Abstract

The World Health Organization reports approximately 1.35 million fatalities annually due to road traffic accidents, with pedestrians constituting 23% of these deaths. This highlights the critical need to enhance pedestrian safety, especially given the significant role human error plays in road accidents. Autonomous vehicles present a promising solution to mitigate these fatalities by improving road safety through advanced prediction of pedestrian behavior. With the autonomous vehicle market projected to grow substantially and offer various economic benefits, including reduced driving costs and enhanced safety, understanding and predicting pedestrian actions and intentions is essential for integrating autonomous vehicles into traffic systems effectively. Despite significant advancements, replicating human social understanding in autonomous vehicles remains challenging, particularly in predicting the complex and unpredictable behavior of vulnerable road users like pedestrians. Moreover, the inherent uncertainty in pedestrian behavior adds another layer of complexity, requiring robust methods to quantify and manage this uncertainty effectively. This review provides a structured and in-depth analysis of pedestrian intention prediction techniques, with a unique focus on how uncertainty is modeled and managed. We categorize existing approaches based on prediction duration, feature type, and model architecture, and critically examine benchmark datasets and performance metrics. Furthermore, we explore the implications of uncertainty types—epistemic and aleatoric—and discuss their integration into autonomous vehicle systems. By synthesizing recent developments and highlighting the limitations of current methodologies, this paper aims to advance the understanding of Pedestrian intention Prediction and contribute to safer and more reliable autonomous vehicle deployment.

Keywords:

pedestrian intention prediction; navigating uncertainty; autonomous vehicles; vulnerable road user; long-term Prediction; short-term Prediction

1. Introduction

According to the World Health Organization (WHO) report on road safety [1], approximately 1.35 million people are fatally injured in road crashes each year. Pedestrians account for 23% of all road traffic deaths globally, which is a disturbingly high percentage. As the most vulnerable road users (VRUs), pedestrians are essential participants in traffic and require robust protection. Given that human error is a significant contributor to most road traffic accidents [2], autonomous vehicles (AVs) have the potential to reduce these fatalities and improve road safety.

Autonomous vehicle technology offers numerous economic benefits, including reduced driving costs, improved fuel efficiency, and enhanced safety. By eliminating human drivers, this technology aims to make the driving experience error-free and less stressful for both drivers and passengers, reducing human errors and lowering accident rates, thus creating a safer traffic environment for all road users, including pedestrians. Moreover, AVs offer the convenience of allowing passengers to engage in other productive activities or leisure during travel, without needing to focus on road conditions [3,4].

However, a major challenge for AVs is replicating the human capacity to understand social cues and predict the behavior of VRUs, such as pedestrians, who are not protected by vehicle safety features. To ensure pedestrian safety, AVs must accurately anticipate pedestrian intentions early enough to allow for safe maneuvering. This task is challenging due to the high level of uncertainty in pedestrian behavior, which is influenced by factors such as demographics, traffic conditions, and the environment [5]. To account for this uncertainty, AVs often adopt safe driving strategies, such as reducing speed and avoiding complex interactions, which, while enhancing safety, can also disrupt traffic flow and reduce efficiency [4]. Furthermore, pedestrian intention is often shaped by the physical design of the road, visual cues like traffic lights and signage, environmental conditions like lighting or weather, and even local customs around jaywalking or right-of-way expectations. These external variables introduce added complexity to pedestrian modeling.

While action and trajectory prediction methods [6,7,8,9] can provide insights into pedestrian movements, they often fail to capture the underlying intentions due to the uncertainty and complexity of human behavior. A more accurate prediction of pedestrian intentions requires a deeper understanding of the pedestrian’s context, past behavior, and environmental factors [4].

This paper contributes a structured and comprehensive review of pedestrian intention prediction (PIP) methods, with an emphasis on the integration of uncertainty modeling. Unlike existing surveys, we offer a multi-dimensional classification based on prediction duration, feature types, and modeling approaches. Furthermore, we provide a critical analysis of how uncertainty is modeled and managed in prediction tasks, bridging a gap between theoretical modeling and real-world applicability in AV contexts.

The paper is organized as follows: Section 2 reviews existing surveys on pedestrian intention prediction methods and highlights the role of uncertainty in these models. Section 3 classifies and analyzes various approaches for predicting pedestrian behavior, focusing on trajectory prediction and intention recognition models. Section 4 discusses uncertainty in PIP, exploring the different types of uncertainty (e.g., epistemic and aleatoric) and how they are handled in PIP. Section 5 describes key datasets and sensors used in PIP tasks. Section 6 discusses evaluation metrics for assessing the performance of PIP models. Section 7 discusses the challenges in PIP, such as the complexity of human behavior, the limitations of current models, and the integration of uncertainty. Additionally, it explores future research directions to address these challenges. Finally, Section 8 concludes the study by summarizing the main findings and suggesting further advancements in the field. The organization of this paper is visually outlined in Figure 1.

2. Related Work

In the domain of AVs, predicting the intentions and behaviors of pedestrians is crucial for ensuring safe navigation in complex environments. In the following, we review key papers on pedestrian intention prediction and categorize them for a structured overview.

2.1. Pedestrian Intention and Behavior Estimation

Sharma et al. [4] comprehensively surveyed techniques for predicting pedestrian intentions in the context of AVs, emphasizing the challenges posed by the variability in pedestrian behavior and the social norms influencing road scenarios. They categorized prediction methods, reviewed datasets capturing complex human behavior in traffic environments, and conducted a comparative analysis of these approaches using benchmark datasets and evaluation metrics. Additionally, the authors identified key challenges and proposed directions for future research to enhance PIP and support safer AV implementation.

Ridel et al. [10] discuss the critical challenge of anticipating pedestrian actions for intelligent vehicles, highlighting its importance for safety. The authors review the complexities of predicting pedestrian behavior, which is influenced by diverse factors such as movement variability, occlusions, distractions, and external interactions. The paper surveys existing approaches and advancements in pedestrian intention estimation, noting progress in predicting positions shortly before crossing but emphasizing limitations in accurately forecasting when pedestrians will pause at curbs.

2.2. Trajectory Prediction in Crowd Scenarios

Korbmacher et al. [11] explored the challenges of predicting pedestrian trajectories in crowd scenarios, focusing on the influences of scene topology and pedestrian interactions. They reviewed classical knowledge-based models alongside recent deep learning approaches, driven by advancements in data science and collection technologies. Their comparison highlighted the higher accuracy of deep learning methods for local trajectory prediction while questioning the continued relevance of knowledge-based models. However, the study noted limitations in deep learning methods for large-scale simulations and capturing collective dynamics. The authors proposed hybrid approaches, combining the strengths of both methodologies, as a promising direction to address issues like the lack of explainability in deep learning models and to enhance predictive capabilities.

Sighencea et al. [12] reviewed recent advancements in pedestrian trajectory prediction, a critical aspect of computer vision in the automotive industry, particularly for advanced driver assistance systems and AVs. The study explored DL-based methods, noting their dependence on enhanced sensor systems and modern signal processing. It also provided an overview of key datasets, evaluation metrics, and practical applications. While noting significant progress, the authors identified research gaps and proposed future directions to address the remaining challenges in pedestrian trajectory prediction.

Galvão et al. [13] reviewed state-of-the-art algorithms designed to enhance behavior prediction systems in AVs, with a focus on predicting the trajectories and intentions of both pedestrians and vehicles. Despite significant advancements, AV systems remain limited, as evidenced by collision and near-miss reports involving AVs, such as those by Google. The authors highlighted that improving prediction capabilities is crucial for preventing such incidents. Their review synthesized findings from previous literature, recent studies, and experiments conducted on established datasets, offering insights into current progress and areas for improvement in AV behavior prediction.

Kim et al. [14] introduce a Multiple Stakeholder Perspective Model (MSPM) to enhance pedestrian trajectory prediction by incorporating both driver and pedestrian viewpoints. Their model integrates data from virtual reality simulations, achieving a 4.48% reduction in short- and mid-term trajectory errors and an 11.14% reduction in long-term errors. The study highlights the importance of head orientation data for accurate trajectory forecasting.

2.3. VRU Intention Estimation and Safety

Bighashdel et al. [15] focus on the analysis of VRUs behavior, emphasizing its significance for applications like video surveillance and autonomous driving. They highlight the complexity of VRU movements, which has led to the development of diverse predictive models in the literature. This paper provides a comprehensive review of path prediction methods, categorizes these approaches from multiple perspectives, and proposes a framework to enhance the understanding of various aspects of VRU path prediction challenges.

Ahmed et al. [16] review recent advancements in pedestrian and cyclist detection and intent estimation to enhance the safety of AVs. They emphasize the importance of understanding the intentions of VRUs to prevent accidents. The study explores deep learning (DL) techniques, including Fast R-CNN, Faster R-CNN, and SSD, which have significantly advanced pedestrian detection. They highlight the growing feasibility of DL due to advancements in hardware and its application in tracking, motion modeling, and pose estimation for intent prediction. While substantial progress has been made in pedestrian detection using vision-based approaches, the authors note a need for further focus on cyclist detection. Additionally, they recommend exploring sensor fusion and advanced intent estimation methods to improve VRU safety.

Xue et al. [17] explore the complexities of estimating VRU intentions by analyzing scene dynamics from the ego-vehicle’s perspective. Their multimodal PIP framework, using attention mechanisms and a novel (MHAAdjMat)-based GCN, demonstrates superior performance over state-of-the-art models, predicting pedestrian crossing intent with high accuracy up to 2.5 s before the event.

Rasouli et al. [18] analyze various factors influencing VRU behavior and their interconnected effects on intentions. They emphasize a multimodal approach to understanding complex human psychology, enhancing AI-driven vehicles’ scene reasoning and decision-making. The study also addresses the challenges in interpreting social interactions and their impact on VRU behavior.

Pandey et al. [19] explore the challenges AVs face in achieving full autonomy, particularly in urban environments. They emphasize the importance of effective communication and understanding the intentions of road users, including pedestrians, as critical for safe interactions. The authors emphasize the intricacy of verbal and non-verbal social cues, stressing their critical role in life-and-death decision-making scenarios of correctly identifying whether a pedestrian intends to cross the road. The paper discusses challenges in pedestrian-autonomous vehicle interactions and proposes a novel architecture for intention identification, integrating pedestrian detection, pose estimation, and classification algorithms. Additionally, the authors review various methods for these tasks, aiming to enhance safety and interaction norms in urban driving scenarios.

Zou et al. [20] explore how roadway centerline designs and AV signaling affect pedestrian behavior at unmarked midblock crossings. Using VR simulations, they find that roadway features and AV signals significantly impact waiting and crossing times. Older pedestrians tend to wait longer, and past behaviors have limited effects. The findings suggest improvements in AV communication strategies and roadway designs to enhance pedestrian safety.

2.4. Scene Understanding and Event Reasoning

Xue et al. [21] examine the progression towards full autonomy in vehicles, emphasizing the limitations of traditional low-level vision tasks like detection, tracking, and segmentation for understanding traffic scenes. They argue that comprehensive scene understanding requires insights into the past, present, and future behaviors of traffic participants. The paper explores autonomous driving through the lens of event reasoning, reviewing literature and advancements in scene representation, event detection, and intention prediction. The authors also discuss current challenges and propose potential solutions to bridge gaps in achieving fully automated driving systems.

Zhou and Zeng [22] propose a multi-task model for AVs that handles pedestrian detection, tracking, and attribute recognition. Their model, which operates in two stages, uses low-resolution images for initial tasks and high-resolution images for detailed attribute detection. The approach, trained on multiple datasets, shows significant resource savings and accurate detection.

2.5. Specialized Approaches and Case Studies

Haque et al. [23] investigate pedestrian signal violations at urban intersections in New Delhi, using video data from 11 sites. Their study identifies key factors like pedestrian speed and waiting time, modeling these behaviors with an ANN that achieved 85% accuracy, surpassing traditional regression models. Recommendations include site-specific facility design and shorter pedestrian signals.

Razali et al. [24] propose a vision-based system for real-time pedestrian localization, body pose estimation, and intention prediction using a neural network operating at 5 fps. Their model, which utilizes a 5-block ResNet-50 network with parallel convolutional heads, achieves a 20% improvement in intention prediction precision. Despite this, the multitask approach presents trade-offs, such as a minor reduction in pose detection performance. The source code is publicly available for integration into ADAS or traffic light management systems.

Chen et al. [25] examine drivers’ recognition of pedestrian crossing intentions using eye-tracking data. They find that experienced drivers are more conservative and engage in more detailed processing. Both experienced and novice drivers are quicker to detect and respond to pedestrians intending to cross, focusing on the upper body for intention recognition. The study outlines a two-phase intention recognition process involving initial detection and detailed evaluation.

2.6. Use of Historical Road Incident Data for Road Redesign Potential

In the paper by Gkyrtis and Pomoni [26], the authors explore the use of historical road incident data to assess the potential effectiveness of various road redesigns in improving traffic safety. Their approach utilizes a data-driven methodology, which analyzes accident reports and historical traffic data to identify patterns and correlations between road features (e.g., road geometry, signage, lighting) and incident occurrence. They employed advanced statistical modeling techniques to quantify how specific design changes could impact accident rates, providing a valuable framework for future infrastructure improvements.

The findings of their study indicate that certain road features, such as improved lighting, better signage, and revised intersections, significantly reduce the likelihood of road incidents. Particularly, their analysis demonstrated that roads with poor visibility and complex intersections were prone to higher accident rates, suggesting that addressing these issues could enhance pedestrian safety.

However, they primarily focused on specific geographical regions, which may reduce the generalizability of the findings to other urban settings. Additionally, the authors note that driver behavior and external environmental factors (e.g., weather conditions or vehicle types) were not fully integrated into their analysis, which could affect the robustness of the results. Despite this, their study remains valuable for understanding how infrastructure design can be optimized to mitigate road incidents, particularly in urban environments.

This research ties closely with PIP efforts, as pedestrian safety is a critical component of overall road safety. Insights from Gkyrtis and Pomoni’s work suggest that enhancing road design could significantly lower the risk of accidents involving pedestrians, making these findings highly relevant to the development of more effective PIP models. For instance, pedestrian prediction algorithms could integrate factors like road geometry and signage quality into their models to better anticipate pedestrian movements and enhance safety measures in AVs. Moreover, their work highlights the importance of considering road design when implementing safety protocols for VRUs, which directly informs the decision-making processes of AV systems in complex traffic scenarios.

2.7. Unique Contributions of This Survey

While several existing surveys have explored aspects of PIP, they often focus on isolated components such as model architectures or datasets, without providing a holistic analysis of the field. Unlike prior works, this survey offers the most comprehensive review to date, covering a wide range of approaches while identifying critical gaps in the literature. Notably, while some studies focus on model architectures without addressing sensor modalities, others overlook the importance of datasets, limiting their applicability. Additionally, we emphasize the role of uncertainty in PIP, a crucial factor often neglected in previous studies. By integrating discussions on methodologies, sensor modalities, datasets, and uncertainty estimation, this survey provides a well-rounded perspective to guide future research in developing more reliable and safety-oriented PIP models for AVs.

3. Pedestrian Intention Prediction Approaches

The prediction of pedestrian intentions typically follows three main stages: the input stage, the feature extraction and encoding stage, and the decoding or classification stage, depending on the required output, as illustrated in Figure 2. The input stage consists of frames extracted from video sequences, which can be obtained in real-time or from pre-recorded footage captured by cameras positioned at different angles. During pre-processing, these frames are analyzed to extract relevant attributes based on the specific needs of the proposed algorithm. Various feature extraction techniques encode spatial and temporal features. In the final stage, a classifier such as a neural network-based one is utilized to predict pedestrian crossings and forecast trajectories. The subsequent sections present a detailed categorization of different pedestrian intention estimation approaches, covering a broad spectrum of techniques in the literature. This classification is structured around three primary factors: duration, model type, and input feature type, as depicted in Figure 3 [4].

3.1. Classification Based on Duration of Prediction

Pedestrian intention prediction techniques can be categorized according to the duration over which predictions are made:

3.1.1. Short-Term Prediction

Short-term predictions focus on anticipating the behavior of pedestrians or cyclists within a few seconds. These approaches typically utilize features such as head orientation or body movement patterns. Commonly predicted intentions include walking, stopping, crossing, and waiting for pedestrians, as well as lane changes, turning, or stopping for cyclists. These methods are gaining popularity due to their practicality in real-time applications and lower computational demands [4,24,27,28,29,30].

3.1.2. Long-Term Prediction

Long-term predictions are goal-oriented, aiming to predict a pedestrian’s trajectory or final destination. These methods often incorporate contextual and environmental information to improve trajectory estimation accuracy. Long-term prediction approaches have received significant attention in recent years, particularly in the autonomous vehicle (AV) research community, as they are crucial for fostering trust in fully autonomous systems. Despite their complexity, these methods play a vital role in AV development [4,7,31,32,33,34].

3.2. Classification Based on the Selected Features

The features used in predicting pedestrian intention in traffic scenarios can be broadly classified into three main types [4]:

3.2.1. Pedestrian-Centric Features

These features are specific to the pedestrian and include pose information [5,35,36,37,38], past trajectory [39,40,41,42,43,44,45,46,47,48], and head orientation [4,49,50,51].

Joints/Pose

Information about a pedestrian’s joints and skeleton provides more distinct features than RGB images, especially under varying lighting conditions. The absence of body dynamics and pose data can delay predictions of changes in crossing intentions. Notably, joints related to the shoulders and legs are more significant for estimating real-time pedestrian activities, while upper body joints, like arms, contribute less to action recognition accuracy [5,35,36,37,38,52]. However, the use of pose features is limited to pedestrians and cannot be generalized to other VRUs like cyclists [4,53].

Trajectories

Past trajectories play a crucial role in predicting future movements or poses of pedestrians and other VRUs. Recent research suggests that incorporating the uncertainty of human actions through trajectory distribution mapping can improve prediction accuracy [4,40,54,55,56,57,58,59,60,61].

Head Orientation

The direction in which a pedestrian is looking is a critical cue for assessing their intention. Integrating head orientation with other features, such as leg movement, improves the accuracy of crossing intention predictions. Combining head pose data with trajectory information has also yielded promising results in predicting future pedestrian trajectories [4,49,50,51,62,63,64].

Displacement

The displacement of a pedestrian, observed through methods like motion history images (MHIs) from stereo data, helps differentiate between standing, stopping, and starting actions. However, displacement alone is insufficient and needs to be combined with joint information for more reliable predictions [4,35,65].

3.2.2. Contextual Features

Contextual features pertain to the environment and include elements like scene infrastructure, road layout, and weather conditions. These features reduce the cognitive load on intelligent driving systems and enhance the accuracy of crossing behavior predictions [4,66].

Social Interaction

The behavior of pedestrians is influenced by the actions of others around them. Thus, capturing social interactions, such as the movement of neighboring pedestrians, is essential for accurate intention prediction. Recent approaches incorporate the trajectories and actions of both immediate and distant neighbors to better understand the social dynamics that affect pedestrian behavior [4,7,9,39,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83].

Scene Information

Understanding the scene context is crucial for predicting pedestrian behavior. This includes interactions with elements like zebra crossings, intersections, and waiting areas. Integrating scene information with pedestrian trajectories has been shown to improve prediction accuracy [4,34,42,52,84,85,86,87,88,89,90,91].

Ego-Vehicle Information

Ego-vehicle dynamics, such as speed and relative displacement, play a significant role in predicting pedestrian intentions. The fusion of ego-vehicle features with pedestrian-specific or scene-specific data provides a more comprehensive understanding of the scene and improves the accuracy of intention predictions [4,38,40,62,92,93,94]. These features encompass aspects such as the speed of the ego-vehicle and its relative displacement in relation to the target pedestrian. Figure 4 illustrates an example of contextual data annotation [95] from an image in the JAAD dataset.

3.2.3. Hybrid Features

Hybrid features combine pedestrian-centric and contextual information to enhance the understanding of human behavior in traffic environments. Many recent studies emphasize the need to integrate walking patterns, social interactions, and contextual details for more accurate predictions of crossing behavior [4,34,52,88,95,96,97,98,99,100,101].

Figure 5 illustrates various features used in predicting pedestrian intention, including localization, pose estimation, object categorization, motion patterns, and depth estimation.

3.3. Classification Based on the Type of Model

Pedestrian intention prediction models can be broadly categorized into two main types: the Knowledge-Based (KB) approach and the DL approach. This section provides an in-depth look at these two categories.

3.3.1. The Knowledge-Based Approach

In the early stages of pedestrian dynamics research, scholars primarily relied on direct observations, photographs, and time-lapse films to enhance their understanding of pedestrian behavior [103]. This understanding was instrumental in developing concepts such as level of service, designing elements for pedestrian facilities, and creating planning guidelines [104,105,106,107]. While these concepts and guidelines are valuable for understanding and managing pedestrian dynamics, they are not well-suited for predicting pedestrian flows or trajectories. As a result, researchers began developing simulation models, including force-based microscopic models [108], queuing models [109], the transition matrix model [110], and Henderson’s models [111,112], which proposed that pedestrian crowd behavior is analogous to the behavior of gases or fluids. These models, which focus on aggregated behaviors rather than individual pedestrian actions, are known as macroscopic models [11].

Today, KB pedestrian models span a range of scales, from macroscopic to mesoscopic and microscopic models, each capturing different aspects of pedestrian dynamics. Macroscopic and mesoscopic approaches are inspired by continuous fluid dynamics or gas-kinetic models, which describe dynamics at an aggregated level, while microscopic approaches focus on individual pedestrian movements. Numerous reviews in the literature explore the various modeling scales in pedestrian dynamics and the transitions between these scales [113,114,115,116,117]. Other reviews emphasize collective pedestrian dynamics [118,119,120,121] or discuss applications in layout design [11,122]. In the following section, we review KB pedestrian models with a focus on microscopic approaches and their use in predicting pedestrian trajectories.

(A): Microscopic Pedestrian Models
Numerous researchers have focused on modeling individual pedestrian movement using various microscopic approaches. A significant advantage of microscopic models over macroscopic ones is their ability to capture various behaviors. By considering each pedestrian individually, these models can attribute specific characteristics to each agent and accommodate behavioral diversity. However, microscopic models can be computationally demanding, which limits their use in large-scale simulations [11].
Microscopic pedestrian models analyze individual behaviors and interactions among pedestrians. These interactions contribute to the emergent crowd dynamics at a macroscopic level [122]. These models are designed to replicate macroscopic features, such as fundamental diagrams or collective formations like band structures [123,124]. Such models, which focus on individual pedestrian dynamics, can predict pedestrian trajectories at various scales. The behavior of individual pedestrians is governed by specific rules based on physical, social, or psychological factors [117]. These rules are expressed through manually crafted dynamic equations based on Newton’s laws of motion. Given initial conditions such as position, velocity, and acceleration, these equations can simulate and predict future trajectories [11].
The approach for determining a pedestrian’s new position can vary depending on the model’s inputs and outputs. Models that provide new velocity or acceleration, which are then used to calculate the new position, are classified as velocity-based or acceleration-based models, respectively. Conversely, models that determine position directly through specific rules without relying on differential equations are categorized as decision-based models [11].
Acceleration-Based Models
Acceleration-based models, particularly force-based models, describe pedestrian movement through the interaction of external forces [11]. These models generally include a relaxation term towards the desired direction and an interaction term that accounts for repulsion (social force) from neighbors and obstacles [125,126]. One of the earliest force-based models was introduced in the 1970s by Hirai and Tairu [108].
The interaction forces in these models can vary in their mathematical representation: exponential in social force models [127], algebraic in centrifugal force models [128,129], or partly linear as in the optimal velocity model [130]. A vision field concept is often utilized to prioritize obstacles directly in front of the pedestrian. Since these models are of the second order, they require a fine discretization scheme and may encounter numerical challenges [131]. Many of the current developments in acceleration-based models are extensions of the social force model [11,132].
Velocity-Based Models
Velocity-based models, which gained prominence in the 2000s, are designed to model pedestrian dynamics using first-order differential equations [11,117]. These models focus on describing speed functions based on position differences with neighbors and obstacles. Unlike acceleration-based models, which describe inertial effects, velocity-based models are more concerned with collision avoidance, often utilizing techniques such as collision cones [11,133,134,135,136,137,138,139,140].
Extensions of these models, like the Reciprocal Velocity Obstacle (RVO) [135] and Optimal Reciprocal Collision Avoidance (ORCA) [133], have been frequently used in computer graphics to simulate crowd behavior. Other velocity-based models are derived from concepts like bearing angle [141], gradient navigation [142], or time gap variables [143,144]. These models are generally formulated as optimization problems on the ensemble of feasible trajectories that avoid collisions [11,145,146].
Decision-Based Models and Cellular Automata
In decision-based or rule-based models, pedestrian behavior is not modeled using differential equations but instead is governed by rules or decisions that determine the new positions, velocities, and other states of agents [117]. Time is treated as a discrete variable in these models, meaning pedestrians make decisions at a future time step $t + Δ t$ based on the system’s state at time t. The time step $Δ t$ , which acts as a reaction time, has a direct physical meaning and can be used for model calibration [11].
A well-known type of decision-based model is the cellular automata (CA) model, where not only time but also space and pedestrian states (such as velocity) are discrete. In these models, pedestrians move on a lattice, typically square or hexagonal, with each cell representing a space of approximately 40 cm by 40 cm, which corresponds to a maximum density of 6.25 pedestrians per square meter [147]. The early pedestrian CA models were developed in the late 1990s [11,148,149,150,151].
In the floor field CA models, the rules and transition probabilities for moving to neighboring cells are derived from static and dynamic floor fields. The static floor field represents the pedestrian’s desired velocity, while the dynamic floor field models interactions with neighbors, inspired by the chemotaxis process observed in insects, like the use of pheromones by ants [152]. One critical aspect of CA models is handling conflicts, such as when two pedestrians simultaneously attempt to occupy the same cell. Solutions to these conflicts include priority rules, which may be random [153], or friction probabilities, where no pedestrian reaches the desired cell if a conflict arises, helping to explain clogging effects at bottlenecks [11,154].
Recent advances in decision-based models have incorporated cognitive effects [155,156] and learning processes [157], further enhancing their ability to model pedestrian behavior in complex environments [11].
(B): Trends During the Past Decades
The study of pedestrian dynamics is a relatively recent field, with foundational research and models emerging in the 1960s and 1970s [103,108,110,111]. Significant advancements, however, have primarily occurred over the past three decades. During the 2010s, there was a notable increase in experimental studies conducted in laboratory settings, focusing on various pedestrian flow scenarios such as uni-directional flow, counter-flow, bottlenecks, and intersecting flows. An extensive data archive related to these experiments is available in Germany [11,158].
In parallel with these experimental efforts, a range of KB pedestrian models, spanning from microscopic to macroscopic scales, has been developed [113,114,117,119,159]. The microscopic social force model by Helbing and Molnár is particularly prominent and widely referenced in the literature. Although traditional KB approaches, such as cellular automata and models analogous to fluid or gas dynamics, appear to have plateaued, they remain relevant. Microscopic force-based models and collision avoidance techniques continue to be significant, often serving as benchmarks for evaluating new methods, including those based on deep learning [11].
(C): Knowledge-Based Models for Understanding and Predicting
Knowledge-based models aim to elucidate the mechanisms and fundamental parameters that govern pedestrian dynamics. A key aspect of these models is the consideration of body exclusion effects, which are responsible for phenomena such as jamming, clogging, and maximal density. KB models often rely on the fundamental diagram—a phenomenological relationship between flow and density, first highlighted in the 1960s. The shape and variability of this relationship continue to be subjects of active research [11,160,161,162,163,164,165,166].
Key parameters in KB models include the desired speed, agent size, and reaction time at microscopic scales, as well as maximal density and capacity at macroscopic scales. The estimation of these parameters and their number and nature are influenced by various factors such as flow type (e.g., uni-directional vs. bi-directional), context, and demographic characteristics like age and cultural background [118]. Simple microscopic rules can explain the macroscopic shapes of the fundamental diagram, with temporal parameters such as reaction time and time gaps being particularly relevant [11,143,167,168,169].
A significant highlight of KB models is their ability to identify self-organization phenomena and the emergence of coordinated dynamics at macroscopic scales. Multi-scale approaches help understand how individual microscopic behaviors lead to collective dynamics [123,124]. Examples of collective phenomena include lane formation [170,171], stop-and-go waves [172,173], freezing-by-heating effects [174,175], herding effects [153,174], and pattern formation at bottlenecks and intersections [104,176,177,178]. These self-organization phenomena are also observed in social systems and networks, such as opinion formation [11,179,180,181].
In the literature of statistical physics, similar phenomena are studied in non-equilibrium systems of self-driven or active particles, often referred to as active matter [114,182,183,184,185,186,187]. Understanding these complex non-linear dynamics across different scales remains a challenge and is an area of active research, particularly through data-based approaches [11,188,189,190,191,192].
In Table 1 and Table 2, we present a selection of important articles in the literature on knowledge-based pedestrian models, specifically focusing on microscopic approaches. This table highlights the key contributions of each article to the field.

3.3.2. The Deep Learning Approach

Deep learning, a subfield of machine learning, has gained significant attention due to its ability to process large datasets, recognize complex patterns, and extract meaningful insights across various domains. Unlike traditional machine learning methods that rely on manual feature engineering, deep learning utilizes deep neural networks (DNNs) with multiple hidden layers, enabling automatic hierarchical feature learning from raw data [201].

Several deep learning-based approaches have been proposed for pedestrian intention prediction. This section categorizes the most commonly used methods based on their deep neural network architectures. The primary architectures employed for this task include the following [12]:

Recurrent Neural Networks (RNNs), often in form of Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Autoencoders.

Each of these architectures offers distinct advantages and challenges, which influence their application in various aspects of pedestrian intention prediction. RNNs and LSTMs are particularly valuable for capturing temporal dependencies in pedestrian behavior, CNNs excel in processing and analyzing visual data, GANs enhance the model’s ability to generate and simulate realistic pedestrian behaviors, and autoencoders contribute to efficient data representation and feature learning.

The integration of these methods into comprehensive prediction systems allows for a more nuanced understanding of pedestrian intentions, ultimately improving the safety and reliability of autonomous driving systems.

(A): Pedestrian Behavior Prediction Using RNNs
Recurrent Neural Networks (RNNs), particularly in their basic form known as Vanilla RNNs, extend the capabilities of standard two-layer fully connected networks by incorporating feedback loops within the hidden layer (see Figure 6). This enhancement allows RNNs to process sequential data more effectively by considering both current input and information from previous time steps, which is preserved in the hidden neurons. RNNs are crucial in sequence-based predictions and have broad applications across various domains. To overcome the limitations in retaining long-term information, the Long Short-Term Memory (LSTM) architecture was introduced. Initially successful in natural language processing (NLP), LSTMs have also proven effective in pedestrian trajectory prediction [12].
(B): Pedestrian Behavior Prediction Using CNNs
The convolutional neural network (CNN) is a type of deep neural network (DNN) known for its strong performance in various domains, including object classification and recognition, such as identifying handwritten digits, letters, and faces. As shown in Figure 7, a typical CNN architecture consists of multiple layers, including convolutional layers, non-linearity layers, pooling layers, dropout, batch normalization, and fully connected layers. Through the process of training and optimization, CNNs learn to extract object features. By carefully selecting the network architecture and parameters, these features can capture the most important discriminative information necessary for the accurate identification of the target objects [12].
(C): Pedestrian Behavior Prediction Using Generative Adversarial Networks (GANs)
Generative adversarial networks (GANs) operate on a generator (G)–discriminator (D) framework, where the two networks are in constant competition: the generator tries to deceive the discriminator by creating fake data, while the discriminator adapts to recognize these forgeries. In a GAN setup, both models are trained simultaneously (as shown in Figure 8).
In the context of tracking, GANs help reduce the fragmentation often seen in conventional trajectory prediction models and lessen the need for computationally expensive appearance features. The generative component generates and updates candidate observations, discarding the least updated ones. To process and classify these candidate sequences, an LSTM component is used alongside the generative–discriminative model. This approach can produce highly accurate models of human behavior, especially in group settings, while being significantly more lightweight compared to traditional CNN-based solutions. Recently, GAN architectures have been employed by many researchers to achieve multi-modality in prediction outputs [12].
(D): Pedestrian Behavior Prediction Using Autoencoders
Autoencoders are neural networks designed to learn efficient representations of data, typically for the purpose of dimensionality reduction or feature learning. In the context of pedestrian behavior prediction, autoencoders are used to encode high-dimensional trajectory data into a lower-dimensional latent space, capturing the essential features of pedestrian movements (illustrated in Figure 9). This latent representation is then decoded to predict future behavior. Autoencoders are particularly useful for handling noisy data and extracting meaningful patterns that can improve prediction accuracy.

3.3.3. Recent Advances in Deep Learning-Based Models

L. Xin et al. [202] tackled the challenge of long-horizon trajectory prediction for surrounding vehicles using an intention-aware LSTM architecture. Their model captures high-level spatial-temporal features of driver behavior and was trained on the NGSIM dataset, achieving precise predictions with longitudinal and lateral errors of less than 5.77 m and 0.49 m, respectively.

Lee et al. [98] introduced the DESIRE framework, which leverages deep stochastic inverse-optimal-control RNN encoders. This approach generates potential future trajectories using a conditional variational autoencoder and ranks them through an RNN model incorporating inverse optimal control. By considering scene context and multi-modal prediction, their method demonstrated strong performance on the KITTI and Stanford Drone datasets.

Zheng et al. [203] proposed a hierarchical policy model to predict both micro-actions and macro-goals. This method integrates recurrent convolutional neural networks with supervised learning and attention mechanisms. Zhan et al. [204] later extended this approach using variational RNNs.

Martinez et al. [205] developed an RNN architecture based on gated recurrent units (GRUs), which models velocities instead of absolute angles. This “residual architecture” emphasizes first-order motion derivatives, improving accuracy and generating smoother trajectory predictions.

Salzmann et al. [206] proposed a trajectory forecasting method for diverse agents, such as pedestrians and vehicles, by integrating heterogeneous data and semantic classifications. Their approach was evaluated on the ETH, UCY, and nuScenes datasets, demonstrating its effectiveness in predicting future movements across different environments.

Refer to Table 3, Table 4, Table 5 and Table 6 for a summary of the experimental results from the deep learning models discussed above for predicting pedestrian trajectories.

3.3.4. The Ensemble Approach

The ensemble approach combines KB models and DL algorithms to leverage their respective strengths. KB models are advantageous for their interpretable parameters and low data requirements, while DL algorithms offer high predictive power but require extensive data. This synergy has been applied across various fields, such as climate pattern discovery [225,226], material science [227,228], quantum chemistry [229], and biomedical imaging [11,230,231].

Three main methods exist for integrating KB and DL approaches. The first involves using KB models to improve DL algorithms. For example, KB models can generate synthetic data for training neural networks, as seen in autonomous vehicle training [232,233]. Other strategies include knowledge-guided design of neural network architectures, such as embedding the social force model into a neural network for predicting human motion [234], and using knowledge-guided loss functions to ensure outputs adhere to physical laws [11,235,236].

The second method enhances KB models using DL algorithms. This includes residual modeling, where DL corrects the errors of KB models, and parameter calibration using DL, which has been applied in vehicle trajectory prediction [237] and pedestrian motion modeling [11,238,239,240].

Finally, the ensemble approach helps address the limitations of DL algorithms in scenarios with limited data by incorporating the structured knowledge from KB models, making it a promising method for improving pedestrian Intention and trajectory predictions [11].

3.3.5. Visualization of PIP Classification Systems

In the previous sections, we provided a comprehensive overview of various pedestrian intention prediction classification methods. To enhance the visualization of a PIP classification system, A proposed overarching framework for a behavior prediction system is illustrated in Figure 10. This framework typically consists of several key components. Initially, a camera sensor captures RGB images that are processed by detection algorithms, which identify both static and dynamic objects on the road, such as vehicles, pedestrians, traffic lights, and road signs. Each detected object is then assigned a unique ID, enabling the system to track its past trajectories, which are essential for further analysis.

The image processing algorithms utilize the RGB images and past trajectories to generate various forms of data, including optical flow, depth, appearance, and both global and local context images. These images provide critical context, such as cropping specific areas around detected objects to focus on localized regions for deeper analysis. To better understand interactions between different traffic agents, the system employs interaction representation algorithms. These algorithms calculate distances between objects, construct graph networks, and generate grid maps that reflect these interactions. Additional features are derived from the objects’ trajectories and internal sensor data from the AV, such as steering angles and velocities.

The outputs of the perception module feed into the behavior prediction module, which includes automated feature extraction and embedding layers. These components generate feature vectors that capture the spatial and temporal properties of the inputs, which are then used to predict various aspects of object behavior, including future trajectories and intentions. Finally, these predictions are utilized by the AV’s planning module to make decisions that ensure safe navigation [13].

4. Uncertainty Measurement

While pedestrian intention prediction models have made significant advancements, they still face challenges due to the inherent uncertainty in human behavior and environmental conditions. Accurate predictions are often hindered by unpredictable pedestrian actions, sensor limitations, and varying external factors. To address these challenges, it is essential to understand and quantify uncertainty in predictions. This section explores different types of uncertainty—epistemic and aleatoric—and their implications for pedestrian intention prediction. It further discusses how uncertainty impacts autonomous vehicle decision-making and presents strategies for effectively handling uncertainty in predictive models.

Uncertainty is a common factor in many real-world scenarios, including financial investments, medical diagnoses, sports outcome predictions, and weather forecasting. In these domains, decisions are based on available data and the inherent uncertainty of that data. Machine learning and deep learning models are increasingly applied for decision-making and inference across various fields. Given the widespread adoption of artificial intelligence (AI), assessing the reliability and effectiveness of AI systems before deployment has become critical [241], as their predictions are often affected by noise and errors in model inference. Consequently, accurately and reliably representing uncertainty is essential in AI systems. Uncertainty principles play a fundamental role in machine learning [242] and deep learning, particularly in applications like pedestrian intention prediction, which are crucial for AVs [243,244,245].

Uncertainty in pedestrian intention prediction refers to the inherent unpredictability and lack of precise knowledge about a pedestrian’s future actions or movements. This uncertainty arises from various factors, including the dynamic nature of pedestrian behavior, environmental conditions, and limitations in sensor data. These challenges make it difficult to accurately predict a pedestrian’s intentions, creating significant hurdles for autonomous vehicle systems. In general, as shown in Figure 11, uncertainty in pedestrian intention prediction can be categorized into epistemic uncertainty (model uncertainty) and aleatoric uncertainty (data or environmental uncertainty).

4.1. Epistemic Uncertainty (Model Uncertainty)

Epistemic uncertainty arises from a lack of knowledge or information about the system, which can potentially be reduced by gathering more data or improving the model’s design and training. This type of uncertainty reflects the limitations in the model’s understanding of the environment and pedestrian behavior.

This type of uncertainty may occur by the following

Limited Contextual Information

When the model lacks access to all relevant information—such as the pedestrian’s past actions, intentions, or environmental conditions—its predictions become less certain.

Incomplete Modeling

Models may not fully capture the complex dynamics of pedestrian behavior, including social interactions or the influence of external factors such as weather, road conditions, or nearby traffic signals.

For example, consider a scenario where a pedestrian approaches an intersection. If the model is unaware of nearby road signs, the pedestrian’s history of behavior, or the presence of other pedestrians, it may struggle to predict whether the pedestrian intends to cross the street or wait for a signal. This gap in knowledge results in epistemic uncertainty.

4.2. Aleatoric Uncertainty (Data or Environmental Uncertainty)

Aleatoric uncertainty arises from inherent randomness or noise in the environment or the data. Unlike epistemic uncertainty, this type of uncertainty cannot be reduced by gathering more data, as it reflects the unpredictable and stochastic nature of the real world.

This type of uncertainty may occur due to the following

Human Behavior Randomness

Pedestrian movements are often highly unpredictable, especially in dynamic environments where social interactions, distractions, or personal intentions influence behavior.

Environmental Variability

Variations in environmental factors, such as road layout, traffic conditions, and weather, can introduce additional noise into predictions. External influences like the actions of nearby pedestrians or vehicles can further complicate prediction.

Sensor Noise:

Inaccuracies in sensor data, caused by occlusions, low resolution, or adverse conditions (e.g., poor lighting, rain, or fog), can distort the input data and contribute to uncertainty in predictions.

For example, consider a pedestrian who suddenly stops or changes direction due to an unexpected distraction, such as receiving a phone call or reacting to another person’s actions. Such behavior introduces randomness that makes it challenging for predictive models to accurately anticipate their future trajectory.

4.3. Importance of Uncertainty in Pedestrian Intention Prediction

Understanding and handling uncertainty is essential because pedestrians’ intentions can change rapidly due to dynamic and unpredictable environmental factors. For AVs and pedestrian tracking systems, accounting for uncertainty provides several critical benefits:

Safer Decision-Making

Recognizing the uncertainty in predictions allows systems to adopt more cautious behaviors, minimizing the risk of accidents in complex or ambiguous scenarios.

Robustness in Real-Time Prediction

Handling uncertainty enables systems to maintain strong performance in real-time, even when they lack complete knowledge of all influencing factors, such as environmental changes or unseen pedestrian behavior patterns.

Improved Interaction with Other Road Users

By incorporating uncertainty, models can better predict how pedestrians might react to other vehicles, pedestrians, or environmental conditions. This ensures safer and more harmonious interactions among all road users.

4.4. Addressing Uncertainty in Pedestrian Intention Prediction

Deep Neural Networks (DNNs) have demonstrated remarkable success in diverse domains such as language modeling [246], speech recognition [247], and image classification [248]. However, while DNNs are widely used for image classification, their deployment in high-stakes applications like AVs [249] and healthcare [250] presents notable challenges. A key concern in these areas is the ability to estimate uncertainty, a capability not inherently provided by standard DNN training methods [251]. To address both epistemic and aleatoric uncertainty, several strategies can be employed:

Probabilistic Models

Models such as Gaussian processes or Bayesian networks can quantify uncertainty and predict a range of possible outcomes, rather than offering a single deterministic trajectory. These models provide more flexible predictions, accounting for various potential pedestrian behaviors.

Ensemble Methods

By combining multiple models with different underlying assumptions, ensemble methods help reduce overall uncertainty by averaging predictions. This approach allows the system to consider diverse scenarios and enhances robustness.

By incorporating these strategies, pedestrian intention prediction systems can deliver more reliable and safer predictions, especially in complex and dynamic environments. A schematic comparison of three different uncertainty models—MC dropout, the Bootstrap model, and the GMM(Gaussian Mixture Model)is presented in Figure 12. Additionally, Figure 13 shows two graphical representations of uncertainty-aware models: BNN and OoD (Out-of-Distribution). The next section provides a brief overview of various approaches to uncertainty quantification.

The hierarchical structure of PIP, including the uncertainty option, is presented as a block diagram in Figure 14. Additionally, Table 7 provides an overview of studies that utilize uncertainty approximation in PIP for various applications.

4.5. Balancing Computational Efficiency and Safety

In the context of pedestrian intention prediction, handling uncertainty requires not only accurate models but also the ability to balance computational demands with safety considerations. Achieving a balance between these tow remains a significant challenge in the presence of both epistemic and aleatoric uncertainty. Several studies have explored different methods to address this trade-off:

Bayesian Neural Networks (BNNs)

These networks incorporate uncertainty to improve safety, but they are computationally expensive. Nayak et al. [260] applied BNNs for trajectory forecasting, which enhances predictive confidence but requires substantial computational resources.

Transformer-based Models

These models achieve both high accuracy and real-time performance, offering a promising solution to balancing efficiency and safety. Xie et al. [261] introduced GTransPDM, a transformer-based model that efficiently decouples spatial information and predicts pedestrian intention with both high accuracy and real-time performance. The model achieves 92% accuracy on the PIE dataset with an inference time of just 0.05 ms, making it a viable option for real-time systems.

Real-time Tracking with Kalman Filter

For efficient real-time prediction, the Kalman filter remains a widely used technique due to its low computational overhead. However, while these systems are effective in simpler environments, they may struggle in dynamic or complex scenarios. Guo et al. [262] combined Camshift with the Kalman filter for pedestrian tracking, enabling real-time operation, though it may struggle in highly dynamic settings.

Multi-task Learning Frameworks

These frameworks combine trajectory and intention prediction, improving both efficiency and robustness. PTINet, developed by Munir and Kucner [263], simultaneously predicts pedestrian trajectory and intention. By incorporating contextual features, PTINet provides a more efficient solution while maintaining robust performance in various environments.

These techniques demonstrate that managing the trade-off between computational demands and safety is an ongoing challenge. Future research may focus on dynamic models that adjust their complexity based on context, balancing both computational efficiency and safety.

4.6. Impact of Traffic Regulations on Pedestrian and Trajectory Prediction

Traffic regulations play a critical role in shaping pedestrian and vehicle behavior, and their integration into pedestrian intention prediction models can significantly reduce uncertainty and improve the safety of AVs. Pedestrian behavior is often influenced by external regulatory factors, such as crosswalk rules, traffic light patterns, and the right-of-way laws, which can serve as constraints in predicting pedestrian movements. Incorporating these regulatory constraints into prediction models helps in improving their accuracy and reliability by narrowing down the possible pedestrian intentions.

4.6.1. Key Regulatory Constraints

Crosswalk Rules: Pedestrians are generally expected to cross streets at designated crosswalks. Integrating this constraint allows prediction models to assume that pedestrians will follow this pattern, especially when the AV can detect the presence of a crosswalk. This reduces epistemic uncertainty, as the model can predict pedestrian behavior more confidently when it recognizes that a pedestrian is approaching a crosswalk.

Traffic Light Patterns

Pedestrians typically obey traffic signals, waiting for the green pedestrian light to cross. Accounting for these signals in pedestrian trajectory prediction models helps to reduce uncertainty by establishing predictable patterns for when pedestrians are likely to cross roads. This is particularly useful in urban environments, where AVs can interact with multiple traffic lights and pedestrian signals.

Right-of-Way Laws

Pedestrian right-of-way laws influence when pedestrians choose to cross streets. In many jurisdictions, pedestrians have the right of way at intersections, meaning they are more likely to begin crossing when traffic is stopped. This factor can be incorporated into models to enhance predictions of pedestrian behavior in regulated traffic environments.

Speed Limits and Vehicle Behavior

Regulations governing vehicle speed also impact pedestrian safety and behavior. Pedestrians may be more likely to cross streets when vehicles are moving at lower speeds, as they feel safer. Additionally, AVs can adjust their behavior in response to speed limits, ensuring that they respect the safety of pedestrians by slowing down or stopping when necessary.

4.6.2. Incorporating Regulations into Prediction Models

Incorporating these regulations into pedestrian intention prediction models offers several advantages:

Reduction in Epistemic Uncertainty

By embedding these constraints into models, AV systems can better anticipate pedestrian behavior, reducing the uncertainty associated with human unpredictability.

Context-Aware Prediction

Regulations provide additional context for understanding pedestrian behavior. For instance, if the pedestrian is at a crosswalk or waiting at a red pedestrian light, the system can more confidently predict that they will wait or cross, respectively.

Safer Autonomous Decision-Making

By adhering to traffic regulations, AVs can make safer decisions, anticipating pedestrian movements more accurately and adjusting their behavior accordingly to avoid accidents. As pedestrian behavior often follows patterns governed by these regulations, the integration of regulatory information into prediction models can enhance the AV’s ability to safely interact with pedestrians in complex environments.

While incorporating regulations into pedestrian intention prediction models is beneficial, several challenges remain:

Variability in Regulations

Traffic regulations can vary across different regions, making it necessary for AV systems to adapt to local laws and rules.

Model Flexibility

Predictive models must be flexible enough to handle varying levels of enforcement of traffic regulations and the possibility that pedestrians may not always follow them.

Real-Time Adaptation

AVs must be capable of integrating real-time sensor data and adjusting predictions based on observed pedestrian behavior, which may deviate from expected regulatory patterns due to environmental factors, distractions, or individual pedestrian choices.

In conclusion, by incorporating regulatory constraints into pedestrian intention prediction, AV systems can make more accurate predictions, ultimately improving safety and decision-making. Future research should focus on developing more sophisticated models that integrate both traffic regulations and real-time sensor data, enabling AVs to navigate complex, regulated environments safely and efficiently.

5. Datasets

The foundation of any successful pedestrian intention prediction model lies in the quality and comprehensiveness of the datasets used for its development. In the domain of automated driving, where accurate and timely predictions of pedestrian behavior are critical for safety and efficiency, the choice of dataset becomes even more crucial. These datasets provide the raw data needed for training and testing models and shape the models’ performance and generalizability across different environments and scenarios.

This section delves into the various aspects that define a robust dataset for pedestrian intention prediction. We start by outlining the essential prerequisites that such datasets must meet to be effective in real-world applications. Following this, we review some of the most popular and widely used datasets in this field, highlighting their unique features and contributions to the advancement of pedestrian intention prediction. Additionally, we will discuss the types of sensors commonly employed in creating these datasets, along with their advantages and limitations. Finally, a table summarizing key studies that have utilized these datasets will be presented, offering insights into their application and impact on the research community.

Refer to Table 8 for a comprehensive summary of the datasets frequently used in pedestrian intention prediction research.

5.1. Dataset Requirements for Pedestrian Intention Prediction

The effectiveness of pedestrian intention prediction models heavily relies on the quality and characteristics of the datasets used for training and evaluation. Some of the essential prerequisites for datasets in this domain, particularly for applications in automated driving, are outlined as follows [4,275]:

Naturalistic Data

The dataset should consist of traffic agents that behave naturally, without being influenced by the method of data collection, such as visible sensors.

Size

Data-driven methods rely on large datasets for training to ensure accurate performance during testing. Consequently, the dataset should contain a sufficient number of trajectories or instances from diverse traffic agents.

Heterogeneity

The dataset should be collected from diverse locations and times, encompassing different types of road infrastructure, traffic densities, legal and implied traffic norms, and pedestrian social interactions.

Accuracy

The positional accuracy of the recorded trajectories should be high, with positioning errors not exceeding 0.1 m, to minimize errors in model predictions.

5.2. Popular Datasets for Pedestrian Intention Prediction

Several well-known datasets have been developed and widely adopted in the field of pedestrian intention prediction. These datasets offer a range of scenarios and challenges that are critical for advancing research and development in this domain. Below are descriptions of these datasets, highlighting their unique features and contributions:

JAAD (Joint Attention in Autonomous Driving)

The JAAD dataset focuses on the behavioral aspects of pedestrians in traffic scenarios, providing detailed annotations related to pedestrian intention, such as crossing and stopping. Despite its usefulness in studying pedestrian awareness, the action labels in JAAD can sometimes be ambiguous and may not always accurately represent pedestrian intentions. For example, if a pedestrian waits for an ego-vehicle to pass before crossing, this may be labeled as “not crossing”, which could be misleading [276].

PIE (pedestrian intention estimation)

Created by the same team behind JAAD, the PIE dataset addresses ambiguities in JAAD by explicitly distinguishing between crossing and non-crossing intentions rather than merely annotating pedestrian actions. It provides rich visual context, including vehicle-pedestrian interactions, environmental factors, and temporal dynamics, making it a valuable resource for studying pedestrian behavior prediction in complex and dynamic settings [211].

STIP (Spatio-Temporal Intention Prediction)

The STIP dataset is notable for its multi-camera setup, which includes cameras mounted on the front, left, and right of the ego-vehicle, offering a comprehensive view of the traffic scene. This wider perspective helps in better understanding pedestrian intentions in relation to the entire environment, making STIP a valuable resource for advanced PIP models. It provides annotations for pedestrian intentions in various scenarios, thereby widening the scope of research in this field [277].

TITAN (Trajectory Information and Tracking Analysis)

TITAN provides a comprehensive collection of pedestrian trajectories in urban environments, accompanied by detailed annotations that facilitate both trajectory and intention prediction tasks. A key feature of TITAN is its unique 5-tier non-conflicting action labeling system, which ranges from basic postures to high-level contextual interpretations. This extensive annotation framework helps eliminate ambiguities in forecasting future trajectories by capturing a diverse range of pedestrian behaviors. Additionally, TITAN includes labels for transportive and communicative actions, further enhancing its utility for in-depth pedestrian behavior analysis [278].

DPDD Driving-Pedestrian Dynamics Dataset

The DPDD dataset is among the earliest to include annotated pedestrian actions and offers ground-truth data for pedestrian intention analysis. However, since it was recorded in a controlled environment with actors simulating pedestrian behavior, its realism is limited. Due to the absence of natural pedestrian density and common real-world occlusions, DPDD is less suitable for training models compared to more naturalistic datasets [279].

BDD100K (Berkeley DeepDrive 100K)

BDD100K is a large-scale, diverse dataset built to support a wide range of tasks, including object detection, tracking, and behavioral cue analysis. It captures a vast array of variations present in traffic scenarios, from different appearances and pose configurations of objects or people to various environmental conditions. BDD100K’s comprehensive coverage makes it an invaluable resource for training robust PIP models [280].

ETH/UCY

Although primarily used for trajectory prediction, the ETH and UCY datasets have been adapted for intention prediction by extracting pedestrian behavioral cues and contextual information from the scenes [136,281].

LOKI (Looking to Intention)

Recognizing the importance of human intention prediction for trajectory estimation, LOKI was released by Girase et al. [282] as a dataset targeting the anticipation of human intentions over both short-term and long-term temporal horizons. LOKI’s detailed annotations make it a key resource for improving the predictability of pedestrian actions in real-world scenarios [4].

PSI (Pedestrian Signal Interpretation)

To address limitations in existing datasets, such as the lack of human-level reasoning and the ability to handle sudden intention changes, Chen et al. [283] introduced the PSI dataset. PSI integrates visual and language cues to improve the explainability of human behavior prediction algorithms. This novel approach aims to significantly enhance the cognitive abilities of AI models in understanding pedestrian behaviors across diverse scenarios [4].

Waymo Open Dataset

The Waymo Open Dataset is a large-scale, high-quality dataset designed to advance research in autonomous driving, collected from Waymo’s fleet of self-driving vehicles. It includes a wide range of real-world driving scenarios and offers detailed sensor data such as camera images, LiDAR point clouds, and rich annotations for various road users, including pedestrians. The dataset is particularly valuable for pedestrian intention prediction as it provides comprehensive multi-sensor data from multiple perspectives around the vehicle, capturing diverse environmental conditions and complex interactions in urban and suburban settings. This makes the Waymo Open Dataset an essential resource for developing and evaluating models that aim to predict pedestrian behavior in dynamic, real-world environments [284].

While these datasets offer valuable resources for the PIP domain, they also highlight the ongoing challenges in this field. Datasets like DPDD, despite their early contributions, face limitations in realism and variability, which newer datasets like BDD100K and PSI aim to overcome. The evolution of these datasets reflects the growing complexity and demands of pedestrian intention prediction, pushing the boundaries of what AI models can achieve in understanding and predicting human behavior.

5.3. Sensors Used in Pedestrian Intention Prediction Datasets

The creation of PIP datasets relies on various sensors that capture the necessary data for analysis and model training. Each sensor type has its advantages and limitations, which can impact the quality and applicability of the dataset. Below is an overview of the commonly used sensors:

Camera

Cameras are one of the most widely used sensors for capturing pedestrian behavior in urban environments. They provide rich visual information that can be used to extract features such as body posture, gestures, and contextual cues (e.g., traffic lights, crosswalks). These features are essential for understanding pedestrian intentions and predicting their movements accurately.

You can see an example of a camera used for this purpose in Figure 15. This example illustrates how cameras are strategically positioned to maximize coverage and effectively monitor pedestrian dynamics, ensuring the relevant data are captured for further analysis.

Pros: High-resolution imagery, ability to capture contextual and environmental details, and compatibility with deep learning methods for visual recognition.
Cons: Sensitive to lighting conditions (e.g., low light, glare), occlusions, and weather-related issues like rain or fog, which can degrade image quality.

LiDAR

Sensors, as illustrated in Figure 16 and Figure 17, emit laser pulses to measure distances and generate high-precision 3D maps of the environment. These sensors are crucial for accurately detecting the position, size, and shape of pedestrians and other objects, providing detailed spatial information that aids in understanding complex environments.

Pros: High accuracy in distance measurement, effective in low-light conditions, and capable of generating detailed 3D point clouds.
Cons: High cost, large data storage requirements, and performance can be affected by weather conditions such as heavy rain or fog.

Radar

Radar sensors use radio waves to detect the speed, distance, and movement of objects, including pedestrians, as shown in Figure 18. They are particularly effective in detecting objects in adverse weather conditions, such as heavy rain, fog, or snow, where other sensors like cameras might struggle.

Pros: Robust performance in all weather conditions, capable of measuring velocity directly, and lower cost compared to LiDAR.
Cons: Lower resolution compared to cameras and LiDAR, which can result in less detailed object detection and limited capability to capture fine-grained details.

Infrared (IR) Sensors

Infrared sensors detect the heat emitted by objects, such as pedestrians, making them useful for detection in low-visibility conditions. They are frequently used alongside other sensors to improve accuracy in challenging environments (see Figure 19).

Pros: Effective in low-light and nighttime conditions and capable of detecting heat signatures, which can be used to identify living beings even in poor visibility.
Cons: Limited resolution and range, may struggle with temperature variations in the environment, and cannot capture detailed contextual information.

GPS (Global Positioning System)

GPS sensors provide accurate location data and are often used alongside other sensors to track the movement of pedestrians and other agents over time. A sample of a GPS sensor and an illustration of how GPS works can be seen in Figure 20.

Pros: High accuracy in position tracking, particularly in open environments, and useful for capturing long-range trajectories.
Cons: Reduced accuracy in dense urban environments due to signal reflections (multipath effects) and obstructions, and cannot capture visual or contextual information.

Each of these sensors plays a critical role in the development of PIP datasets, and the choice of sensors depends on the specific requirements of the application, such as the need for high accuracy, robustness in different environmental conditions, and the level of detail required in the captured data.

6. Performance Evaluation Metrics

Evaluating the performance of pedestrian intention prediction models is essential to understanding how accurately these models can forecast pedestrian behavior. Several metrics are commonly used to quantify the deviation between predicted intentions and the ground truth. Each metric offers unique insights into different aspects of model performance, such as positional accuracy, multimodal prediction accuracy, and tracking precision. This section provides an overview of key performance evaluation metrics used in this field, starting with error-based metrics that assess positional accuracy, followed by metrics for multimodal predictions, and concluding with metrics for tracking performance.

6.1. Average Displacement Error (ADE)

ADE measures the average discrepancy between predicted and ground truth intention coordinates over all future time steps. It is computed as the L2 norm of the difference between predicted and ground truth coordinates, averaged across all time steps and pedestrians. Given

z_{t i}

and

{\hat{z}}_{t i}

as the ground truth and predicted coordinates at future time step t for the i-th sample, ADE is calculated by

ADE = \frac{1}{P \times (T - m)} \sum_{i = 1}^{P} \sum_{t = m + 1}^{T} {∥ {\hat{z}}_{t i} - z_{t i} ∥}_{2}^{2}

(1)

where P denotes the number of pedestrians and T is the total number of time steps considered.

6.2. Final Displacement Error (FDE)

FDE computes the L2 norm of the error between the predicted and ground truth intention coordinates only at the final time step. It provides insight into the accuracy of predictions at the end of the prediction horizon. The FDE is calculated as

FDE = \frac{1}{P} \sum_{i = 1}^{P} {∥ {\hat{z}}_{T, i} - z_{T, i} ∥}_{2}^{2}

(2)

where

z_{T, i}

and

{\hat{z}}_{T, i}

represent the ground truth and predicted coordinates at the last time step, respectively.

6.3. Minimum ADE ( ${minADE}_{K}$ )

{minADE}_{K}

evaluates model performance considering multimodal predictions. It is the minimum Euclidean distance between K intention samples generated for each pedestrian and the ground truth. For each intention sample i, the metric is defined as

{minADE}_{K} = \frac{1}{P} \sum_{i = 1}^{P} {min}_{k = 1}^{K} (\frac{1}{T - m} \sum_{t = m + 1}^{T} {∥ {\hat{z}}_{t, (k), i} - z_{t i} ∥}_{2}^{2})

(3)

6.4. Minimum FDE ( ${minFDE}_{K}$ )

Similar to

{minADE}_{K}

,

{minFDE}_{K}

calculates the minimum Euclidean distance between K intention samples and the ground truth but focuses only on the final time step. It is given by

{minFDE}_{K} = \frac{1}{P} \sum_{i = 1}^{P} {min}_{k = 1}^{K} {∥ {\hat{z}}_{T, (k), i} - z_{T, i} ∥}_{2}^{2}

(4)

6.5. Center Mean Square Error (CMSE)

CMSE measures the mean square deviation of the predicted center location of bounding boxes representing pedestrian intentions from the ground truth. It is averaged over all future time steps and is calculated as

CMSE = \frac{1}{N \times (T - m)} \sum_{j = 1}^{N} \sum_{t = m + 1}^{T} {({({\hat{x}}_{j t} - x_{j t})}^{2} + {({\hat{y}}_{j t} - y_{j t})}^{2})}^{1 / 2}

(5)

where

({\hat{x}}_{j t}, {\hat{y}}_{j t})

and

(x_{j t}, y_{j t})

represent the center coordinates of the predicted and ground truth bounding boxes, respectively.

6.6. Center Final Mean Square Error (CFMSE)

CFMSE evaluates the mean square deviation of the predicted center location of bounding boxes from the ground truth only at the last predicted time step. It is calculated by

CFMSE = \frac{1}{N} \sum_{j = 1}^{N} {({({\hat{x}}_{j T} - x_{j T})}^{2} + {({\hat{y}}_{j T} - y_{j T})}^{2})}^{1 / 2}

(6)

where

{\hat{x}}_{j T}

,

{\hat{y}}_{j T}

,

x_{j T}

, and

y_{j T}

denote the center coordinates of the bounding box at the last time step for the predicted and ground truth locations, respectively.

6.7. Multiple Object Tracking Accuracy (MOTA)

MOTA is a comprehensive metric used to evaluate the overall tracking performance by combining multiple aspects of tracking quality, including missed targets, false positives, and identity switches. It is calculated as

MOTA = 1 - \frac{False Positives + False Negatives + ID Swaps}{Ground Truth Count}

(7)

where False Positives are the number of incorrectly detected targets, False Negatives are the number of missed targets, and ID Swaps refer to the number of times the identity of a tracked target is mistakenly assigned to a different target. The Ground Truth Count is the total number of true objects.

6.8. Multiple Object Tracking Precision (MOTP)

MOTP measures the average accuracy of the predicted locations relative to the ground truth. It provides insight into how close the predicted positions are to the actual positions of the tracked objects. MOTP is calculated as

MOTP = \frac{\sum_{i} {∥ {\hat{z}}_{i} - z_{i} ∥}_{2}}{Number of Matches}

(8)

where

{\hat{z}}_{i}

and

z_{i}

are the predicted and ground truth coordinates of the i-th matched object, respectively, and the Number of Matches is the total number of true positive matches between predicted and ground truth objects.

The metrics outlined provide a comprehensive evaluation of pedestrian intention prediction models from various perspectives. ADE and FDE focus on measuring prediction errors in terms of positional accuracy, evaluating the average and final discrepancies between predicted and ground truth coordinates. In contrast,

{minADE}_{K}

and

{minFDE}_{K}

account for multimodal predictions by assessing the minimum error across multiple prediction samples. CMSE and CFMSE evaluate the accuracy of bounding box center locations over all time steps and at the final time step, respectively. Finally, MOTA and MOTP offer insights into tracking performance, with MOTA assessing overall tracking accuracy by considering false positives, false negatives, and identity swaps and MOTP measuring the precision of predicted object locations. Together, these metrics provide a robust framework for evaluating both the accuracy and quality of pedestrian intention predictions and tracking performance.

7. Challenges and Future Directions

Despite significant advancements in pedestrian intention prediction, several challenges remain. For example, stationary road users pose a particular challenge due to their limited motion history [286,287]. High crowd density exacerbates occlusion issues, complicating feature extraction and detection [288]. Additionally, adverse lighting conditions hinder accurate predictions, suggesting the need for supplementary sensors such as LiDAR to enhance camera-based data [289]. Other types of challenges also exist.

In this section, we outline these challenges and propose future research directions to address them.

7.1. Complexity in Modeling Human Behavior

Current pedestrian intention prediction models face difficulties in accurately capturing complex human behaviors and interactions. Many existing models struggle with understanding subtle interactions between pedestrians or between pedestrians and AVs due to a lack of contextual information. For instance, models often assume pedestrians will always attempt to avoid obstacles, overlooking scenarios where they might stop, approach, or interact with others instead [75,76,217,290]. Additionally, sudden changes in pedestrian movement, such as abrupt stops or sharp turns, add to the challenge of maintaining prediction accuracy [4,39,286,287].

Contextual Information Integration

Future research should enhance models by incorporating detailed contextual information, such as the intent behind interactions and environmental factors that influence pedestrian behavior. This could involve developing more sophisticated sensors or integrating multiple data sources to provide richer contextual insights.

Dynamic Behavior Modeling

Improved modeling of dynamic behaviors, including abrupt changes in movement or sudden stops, could help in predicting more accurate pedestrian trajectories. Techniques such as behavioral clustering or advanced machine learning methods could be explored to better capture these dynamics.

7.2. Handling Stochastic Human Trajectories

Human trajectories exhibit considerable randomness due to epistemic uncertainties (e.g., unpredictability of goals) and aleatoric uncertainties (e.g., environmental randomness). While some studies attempt to model these uncertainties, real-time applications still struggle due to insufficient contextual information and computational limitations [4,216,291].

Uncertainty Modeling

Develop models that better account for both epistemic and aleatoric uncertainties by incorporating probabilistic approaches or ensemble methods. This would enhance the ability to predict a range of possible trajectories and improve robustness in varying conditions.

Real-Time Contextual Analysis

Explore methods to dynamically integrate contextual information in real-time, enabling more accurate predictions despite the inherent uncertainties in human behavior.

7.3. Improving Spatial and Temporal Consistency

Accurate prediction requires simultaneous modeling of spatial and temporal interactions between road agents. Recent advancements include dilated convolutions [292] and combined temporal-spatial coherence techniques [41,214], which enhance model performance by retaining longer trajectory traces and improving inference speed [4].

Integrated Coherence Models

Further develop models that simultaneously address spatial and temporal relationships, potentially leveraging advances in deep learning and graph-based approaches to enhance both coherence and accuracy.

Real-World Application

Focus on applying these models to real-world scenarios to test their effectiveness in diverse traffic conditions and improve their practical utility.

7.4. Addressing Data Limitations

The reliance on large-scale labeled datasets for supervised learning presents challenges, including high costs and labor intensity. Recent efforts, such as the annotation-free method “AutoTrajectory” [45], show promise but also have limitations, such as focusing more on vehicles than pedestrians [4].

Unsupervised Learning

Invest in developing unsupervised or semi-supervised learning techniques to reduce reliance on extensive labeled datasets and improve model training efficiency.

Data Augmentation

Enhance existing methods for generating synthetic data and augmenting real-world datasets to address the scarcity of annotated ground-truth data.

7.5. Bias Due to Occlusion and Tracking Issues

Partial occlusion and inadequate tracking often result in biased estimates and incomplete interaction modeling [8,9,32,75]. Recent approaches, such as the Gumbel Social Transformer [293], attempt to mitigate these issues, but more research is needed [4].

Enhanced Occlusion Handling

Develop methods to better handle occluded data and improve tracking accuracy, potentially using advanced sensors or multi-modal fusion techniques.

Contextual Enhancement

Incorporate contextual understanding to improve the accuracy of models dealing with partial occlusion and incomplete data.

7.6. Integrating Various Road Users

Research has predominantly focused on pedestrians, with less attention to other VRUs like cyclists and motorists [294,295,296]. Cyclist intention estimation, in particular, remains underdeveloped.

Unified VRU Prediction Models

Explore methods for joint prediction of all VRUs, considering their unique dynamics and interactions. This would provide a more comprehensive understanding of road user behavior.

Cross-VRU Dynamics

Investigate the interactions between different types of VRUs to enhance prediction accuracy and safety.

7.7. Latent Behavioral Traits and Personalization

Current models often overlook individual behavioral traits, focusing primarily on observable features [73,88,99,221]. Incorporating latent behavioral traits could improve prediction accuracy.

Behavioral Profiling

Develop models that account for individual differences in behavior, possibly using multi-style networks or personalized learning approaches [297].

Contextual Adaptation

Enhance models to adapt to individual behavior patterns and preferences, improving the relevance and accuracy of predictions.

7.8. Adapting to Variances in Camera Views

Models trained on limited camera views may struggle with generalizability to new perspectives [7,72,91,298]. Techniques such as synthetic view generation [59] and cross-domain trajectory prediction [273] aim to address these issues.

Robust View Adaptation

Develop algorithms that can adapt to multiple camera views and perspectives without requiring extensive re-training or manual adjustments.

Comprehensive Data Collection

Expand data collection efforts to include a wider range of camera views and conditions, enhancing model robustness and applicability.

7.9. Implications for Urban Design and Policy

The findings and methodologies discussed in this paper can also offer meaningful insights for transportation authorities, traffic engineers, and urban planners. As cities begin to prepare for widespread AV deployment, predictive pedestrian behavior models can guide infrastructure decisions to enhance safety and efficiency. For instance, heatmaps generated from uncertainty-aware pedestrian prediction models can identify high-risk areas—such as locations with erratic crossing patterns or inconsistent pedestrian behavior—which may benefit from better lighting, clearer signage, or modified crosswalk placement. Moreover, transportation agencies could use these models to simulate pedestrian-AV interactions in different environments (e.g., school zones, tourist hubs, or poorly lit intersections) to inform AV policies or operational constraints. Urban planning tools can also integrate these models to evaluate how proposed street layouts or zoning changes affect pedestrian safety in AV scenarios. By incorporating behavioral predictions, urban simulations could more accurately reflect how real people interact with autonomous systems, ultimately helping cities design more adaptive and responsive public spaces.

Policy-Aware Prediction Frameworks

Incorporating real-world policies and traffic laws into pedestrian prediction models can help urban stakeholders understand how regulations influence behavior and safety outcomes.

Cross-Domain Collaboration

Greater integration between AV researchers and city planners could lead to co-developed models that reflect both technical feasibility and policy constraints.

7.10. Validation and Calibration in Pedestrian Intention Prediction

Robust validation and calibration are essential for ensuring that pedestrian intention prediction models generalize effectively across diverse real-world environments. While many studies evaluate models using benchmark datasets, these datasets often vary significantly in context, sensor setup, and cultural pedestrian behavior, leading to inconsistencies in comparative analysis. A key challenge is that current evaluation protocols may not account for uncertainty quantification or calibration error, especially in safety-critical AV settings. Many models, particularly deep learning-based ones, may be overconfident in their predictions, which can lead to unsafe decisions in edge-case scenarios.

Standardized Validation Protocols

There is a need for more comprehensive validation frameworks that go beyond displacement errors (e.g., ADE, FDE) and incorporate probabilistic calibration metrics such as Expected Calibration Error (ECE), Brier Score, or Negative Log Likelihood (NLL).

Cross-Dataset Validation

Future studies should assess model robustness by testing across multiple datasets that differ in geography, weather, lighting, and traffic dynamics, which can reveal sensitivity to unseen conditions.

Human-in-the-Loop Testing

Incorporating human feedback into the validation process (e.g., through driving simulators or expert annotations) could provide richer insights into model trustworthiness and interpretability.

Real-World Deployment Trials

Calibration in live environments remains underexplored. Pilot deployments with continuous feedback loops could help refine prediction models in ways static datasets cannot.

8. Conclusions

In this comprehensive survey, we explored advanced techniques in pedestrian intention prediction for AVs, highlighting key methodologies, challenges, and future directions. Unlike previous reviews, this work stands out as the most extensive, covering a broad spectrum of approaches while identifying critical gaps in the existing literature. Notably, while some studies focus on model architectures without addressing sensor modalities, others overlook the importance of datasets, limiting their applicability. More importantly, we highlight a significant missing aspect in prior research—the role of uncertainty in pedestrian intention prediction. Given the inherent unpredictability of human movement, accounting for uncertainty is crucial for improving the reliability and safety of autonomous driving systems. This survey also underscores the practical applications of pedestrian intention prediction models for urban design and policy-making, offering valuable insights for transportation agencies and urban planners to create safer, AV-compatible environments. We highlight the role of traffic regulations—such as crosswalk rules, traffic lights, and right-of-way laws—as structural constraints that can improve the accuracy and contextual awareness of prediction models, a dimension largely overlooked in prior reviews. We further address the trade-off between computational efficiency and safety by reviewing recent models, including transformer-based and multi-task learning approaches, which strive to optimize both. Lastly, we emphasize the need for standardized validation protocols and cross-dataset testing to ensure the robustness and generalizability of PIP models across diverse real-world environments.

By addressing these gaps, our survey provides researchers with a comprehensive foundation to develop more robust and safety-enhancing PIP models in the future while also supporting real-world applications in urban planning and AV deployment.

Author Contributions

A.M.: Writing—original draft, Methodology, Conceptualization, Formal analysis, Data curation, Writing—review and editing, Investigation, Visualization, Validation. M.A.: Writing—review and editing. N.Z.: Writing—review and editing. R.A.: Writing—review and editing. S.M.: Writing—review and editing, Conceptualization. S.A.: Project administration, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Data Availability Statement

No data used in this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

World Health Organization. Global Status Report on Road Safety 2018: Summary; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
Rumar, K. Transport safety visions, targets and strategies: Beyond 2000. In Proceedings of the Transport Safety Visions, Targets and Strategies: Beyond 2000, Brussels, Belgium, 26 January 1999; pp. 6–8. [Google Scholar]
Rasouli, A. The Role of Context in Understanding and Predicting Pedestrian Behavior in Urban Traffic Scenes. Ph.D. Thesis, York University Toronto, North York, ON, Canada, 2020. [Google Scholar]
Sharma, N.; Dhiman, C.; Indu, S. Pedestrian Intention Prediction for Autonomous Vehicles: A Comprehensive Survey. Neurocomputing 2022, 508, 120–152. [Google Scholar] [CrossRef]
Fang, Z.; Vázquez, D.; López, A.M. On-Board Detection of Pedestrian Intentions. Sensors 2017, 17, 2193. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Berger, C.; Dozza, M. Social-IWSTCNN: A Social Interaction-Weighted Spatio- Temporal Convolutional Neural Network for Pedestrian Trajectory Prediction in Urban Traffic Scenarios. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 1515–1522. [Google Scholar] [CrossRef]
Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar] [CrossRef]
Zhu, Y.; Qian, D.; Ren, D.; Xia, H. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Macau, China, 3–8 November 2019; pp. 8075–8080. [Google Scholar] [CrossRef]
Zhang, P.; Ouyang, W.; Zhang, P.; Xue, J.; Zheng, N. SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12077–12086. [Google Scholar] [CrossRef]
Ridel, D.; Rehder, E.; Lauer, M.; Stiller, C.; Wolf, D. A Literature Review on the Prediction of Pedestrian Behavior in Urban Scenarios. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Maui, HI, USA, 4–7 November 2018; pp. 3105–3112. [Google Scholar] [CrossRef]
Korbmacher, R.; Tordeux, A. Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based Approaches. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24126–24144. [Google Scholar] [CrossRef]
Sighencea, B.I.; Stanciu, R.I.; Căleanu, C.D. A Review of Deep Learning-Based Methods for Pedestrian Trajectory Prediction. Sensors 2021, 21, 7543. [Google Scholar] [CrossRef]
Galvão, L.G.; Huda, M.N. Pedestrian and vehicle behaviour prediction in autonomous vehicle system—A review. Expert Syst. Appl. 2024, 238, 121983. [Google Scholar] [CrossRef]
Kim, K.; Lee, Y.K.; Ahn, H.; Hahn, S.; Oh, S. Pedestrian Intention Prediction for Autonomous Driving Using a Multiple Stakeholder Perspective Model. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 7957–7962. [Google Scholar] [CrossRef]
Bighashdel, A.; Dubbelman, G. A Survey on Path Prediction Techniques for Vulnerable Road Users: From Traditional to Deep-Learning Approaches. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1039–1046. [Google Scholar]
Ahmed, S.; Huda, M.N.; Rajbhandari, S.; Saha, C.; Elshaw, M.; Kanarachos, S. Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey. Appl. Sci. 2019, 9, 2335. [Google Scholar] [CrossRef]
Sharma, N.; Dhiman, C.; Indu, S. Visual–Motion–Interaction-Guided Pedestrian Intention Prediction Framework. IEEE Sens. J. 2023, 23, 27540–27548. [Google Scholar] [CrossRef]
Rasouli, A.; Tsotsos, J.K. Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice. IEEE Trans. Intell. Transp. Syst. 2020, 21, 900–918. [Google Scholar] [CrossRef]
Pandey, P.; Aghav, J.V. Pedestrian–Autonomous Vehicles Interaction Challenges: A Survey and a Solution to Pedestrian Intent Identification. In Advances in Data and Information Sciences, Proceedings of the ICDIS 2019, Agra, India, 29–30 March 2019; Kolhe, M.L., Tiwari, S., Trivedi, M.C., Mishra, K.K., Eds.; Springer: Singapore, 2020; pp. 283–292. [Google Scholar] [CrossRef]
Zou, F.; Ogle, J.; Jin, W.; Gerard, P.; Petty, D.; Robb, A. Pedestrian behavior interacting with autonomous vehicles during unmarked midblock multilane crossings: Role of infrastructure design, AV operations and signaling. Transp. Res. Part Traffic Psychol. Behav. 2024, 100, 84–100. [Google Scholar] [CrossRef]
Xue, J.; Fang, J.W. A Survey of Scene Understanding by Event Reasoning in Autonomous Driving. Int. J. Autom. Comput. 2018, 15, 249–266. [Google Scholar] [CrossRef]
Zhou, Y.; Zeng, X. Towards comprehensive understanding of pedestrians for autonomous driving: Efficient multi-task-learning-based pedestrian detection, tracking and attribute recognition. Robot. Auton. Syst. 2024, 171, 104580. [Google Scholar] [CrossRef]
Haque, F.; Kidwai, F.A. Modeling pedestrian behavior at urban signalised intersections using statistical-ANN hybrid approach–Case study of New Delhi. Case Stud. Transp. Policy 2023, 13, 101038. [Google Scholar] [CrossRef]
Razali, H.; Mordan, T.; Alahi, A. Pedestrian intention prediction: A convolutional bottom-up multi-task approach. Transp. Res. Part Emerg. Technol. 2021, 130, 103259. [Google Scholar] [CrossRef]
Chen, E.; Zhuang, X.; Cui, Z.; Ma, G. Drivers’ recognition of pedestrian road-crossing intentions: Performance and process. Transp. Res. Part Traffic Psychol. Behav. 2019, 64, 552–564. [Google Scholar] [CrossRef]
Gkyrtis, K.; Pomoni, M. Use of Historical Road Incident Data for the Assessment of Road Redesign Potential. Designs 2024, 8, 88. [Google Scholar] [CrossRef]
Piccoli, F.; Balakrishnan, R.; Perez, M.J.; Sachdeo, M.; Nunez, C.; Tang, M.; Andreasson, K.; Bjurek, K.; Raj, R.D.; Davidsson, E.; et al. FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network. In Proceedings of the Conference Record-Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1–5 November 2020; Volume 2020, pp. 68–72. [Google Scholar] [CrossRef]
Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs. In Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, 9–12 September 2019; pp. 1–13. [Google Scholar]
Lorenzo, J.; Parra, I.; Wirth, F.; Stiller, C.; Llorca, D.F.; Sotelo, M.A. RNN-based Pedestrian Crossing Prediction using Activity and Pose-related Features. In Proceedings of the IEEE Intelligent Vehicles Symposium, Proceedings, Las Vegas, NV, USA, 19 October–13 November 2020; Volume 2020, pp. 1801–1806. [Google Scholar] [CrossRef]
Hamed, O.; Steinhauer, H. Pedestrian’s Intention Recognition, Fusion of Handcrafted Features in a Deep Learning Approach. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15795–15796. [Google Scholar] [CrossRef]
Bartoli, F.; Lisanti, G.; Ballan, L.; Bimbo, A.D. Context-Aware Trajectory Prediction. In Proceedings of the International Conference on Pattern Recognition, Beijing, China, 20–24 August 2018; Volume 2018, pp. 1941–1946. [Google Scholar] [CrossRef]
Mangalam, K.; Girase, H.; Agarwal, S.; Lee, K.H.; Adeli, E.; Malik, J.; Gaidon, A. It Is Not the Journey But the Destination: Endpoint Conditioned Trajectory Prediction. In Computer Vision–ECCV 2020, Proceedings of the 16th European conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 759–776. [Google Scholar] [CrossRef]
Xu, Y.; Piao, Z.; Gao, S. Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5275–5284. [Google Scholar] [CrossRef]
Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018; Volume 2018, pp. 1186–1194. [Google Scholar] [CrossRef]
Minguez, R.; Alonso, I.; Fernandez-Llorca, D.; Sotelo, M. Pedestrian path, pose, and intention prediction through Gaussian process dynamical models and pedestrian activity recognition. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1803–1814. [Google Scholar] [CrossRef]
Gesnouin, J.; Pechberti, S.; Stanciulcscu, B.; Moutarde, F. TrouSPI-Net: Spatio-temporal attention on parallel atrous convolutions and U-GRUs for skeletal pedestrian crossing prediction. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021; pp. 01–07. [Google Scholar] [CrossRef]
Fang, Z.; López, A.M. Is the Pedestrian Going to Cross? Answering by 2D Pose Estimation. In Proceedings of the IEEE Intelligent Vehicles Symposium, Proceedings, Suzhou, China, 26–30 June 2018; pp. 1271–1276. [Google Scholar] [CrossRef]
Gesnouin, J.; Pechberti, S.; Bresson, G.; Stanciulescu, B.; Moutarde, F. Predicting intentions of pedestrians from 2D skeletal pose sequences with a representation-focused multi-branch deep learning network. Algorithms 2020, 13, 331. [Google Scholar] [CrossRef]
Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef]
Völz, B.; Behrendt, K.; Mielenz, H.; Gilitschenski, I.; Siegwart, R.; Nieto, J. A Data-Driven Approach for Pedestrian Intention Estimation. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Rio de Janeiro, Brazil, 1–4 November 2016; pp. 2607–2612. [Google Scholar] [CrossRef]
Li, S.; Zhou, Y.; Yi, J.; Gall, J. Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 1940–1949. [Google Scholar]
Shafiee, N. Introvert: Human Trajectory Prediction via Conditional 3D Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1–11. [Google Scholar]
Rossi, L.; Paolanti, M.; Pierdicca, R.; Frontoni, E. Human Trajectory Prediction and Generation Using LSTM Models and GANs. Pattern Recognit. 2021, 120, 108136. [Google Scholar] [CrossRef]
Yu, C.; Ma, X.; Ren, J.; Zhao, H.; Yi, S. Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Glasgow, UK, 23–28 August 2020; Springer: Berlin, Germany, 2020; Volume 12357 LNCS, pp. 507–523. [Google Scholar]
Ma, Y.; Zhu, X.; Cheng, X.; Yang, R.; Liu, J.; Manocha, D. AutoTrajectory: Label-Free Trajectory Extraction and Prediction from Videos Using Dynamic Points. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Glasgow, UK, 23–28 August 2020; Springer: Berlin, Germany, 2020; Volume 12358 LNCS, pp. 646–662. [Google Scholar] [CrossRef]
Yu, R.; Zhou, Z. Towards Robust Human Trajectory Prediction in Raw Videos. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8059–8066. [Google Scholar] [CrossRef]
Giuliari, F.; Hasan, I.; Cristani, M.; Galasso, F. Transformer Networks for Trajectory Forecasting. In Proceedings of the International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2020; pp. 10335–10342. [Google Scholar] [CrossRef]
Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; Volume 2019, pp. 6271–6280. [Google Scholar] [CrossRef]
Varytimidis, D.; Alonso-Fernandez, F.; Duran, B.; Englund, C. Action and Intention Recognition of Pedestrians in Urban Traffic. In Proceedings of the 14th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2018, Las Palmas de Gran Canaria, Spain, 26–29 November 2018; pp. 676–682. [Google Scholar] [CrossRef]
Schulz, A.T.; Stiefelhagen, R. Pedestrian Intention Recognition Using Latent-Dynamic Conditional Random Fields. In Proceedings of the IEEE Intelligent Vehicles Symposium, Proceedings, Seoul, Republic of Korea, 28 June–1 July 2015; Volume 2015, pp. 622–627. [Google Scholar] [CrossRef]
Hasan, I.; Setti, F.; Tsesmelis, T.; Bue, A.D.; Galasso, F.; Cristani, M. MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6067–6076. [Google Scholar] [CrossRef]
Schneemann, F.; Heinemann, P. Context-Based Detection of Pedestrian Crossing Intention for Autonomous Driving in Urban Environments. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Daejeon, Republic of Korea, 9–14 October 2016; Volume 2016, pp. 2243–2248. [Google Scholar] [CrossRef]
Hoy, M.; Tu, Z.; Dang, K.; Dauwels, J. Learning to Predict Pedestrian Intention via Variational Tracking Networks. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC 2018-Novem. Maui, HI, USA, 4–7 November 2018; pp. 3132–3137. [Google Scholar] [CrossRef]
Saleh, K.; Hossny, M.; Nahavandi, S. Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks. IEEE Trans. Intell. Veh. 2018, 3, 414–424. [Google Scholar] [CrossRef]
Liang, R.; Li, Y.; Li, X.; Tang, Y.; Zhou, J.; Zou, W. Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision. arXiv 2020, arXiv:2012.01884. [Google Scholar] [CrossRef]
Varshneya, D.; Srinivasaraghavan, G. Human Trajectory Prediction using Spatially Aware Deep Attention Models. arXiv 2017, arXiv:1705.09436. [Google Scholar]
Zou, H.; Su, H.; Song, S.; Zhu, J. Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, LA, USA, 2–7 February 2018; Volume 1, pp. 7648–7655. [Google Scholar]
Amirian, J.; Hayet, J.B.; Pettre, J. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; Volume 2019, pp. 2964–2972. [Google Scholar] [CrossRef]
Liang, J.; Jiang, L.; Hauptmann, A. SimAug: Learning Robust Representations from Simulation for Trajectory Prediction. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 275–292. [Google Scholar]
Lv, P.; Wei, H.; Gu, T.; Zhang, Y.; Jiang, X.; Zhou, B.; Xu, M. Trajectory distributions: A new description of movement for trajectory prediction. Comput. Vis. Media 2022, 8, 213–224. [Google Scholar] [CrossRef]
Su, T.; Meng, Y.; Xu, Y. Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network. arXiv 2021, arXiv:2112.06624. [Google Scholar]
Feng, Y.; Zhang, T.; Sah, A.P.; Han, L.; Zhang, Z. Using Appearance to Predict Pedestrian Trajectories through Disparity-Guided Attention and Convolutional LSTM. IEEE Trans. Veh. Technol. 2021, 70, 7480–7494. [Google Scholar] [CrossRef]
Zhang, G.; Tian, L.; Liu, Y.; Liu, J.; Liu, X.A.; Liu, Y.; Chen, Y.Q. Robust real-time human perception with depth camera. Front. Artif. Intell. Appl. 2016, 285, 304–310. [Google Scholar] [CrossRef]
Zhang, G.; Liu, J.; Li, H.; Chen, Y.Q.; Davis, L.S. Joint Human Detection and Head Pose Estimation via Multistream Networks for RGB-D Videos. IEEE Signal Process. Lett. 2017, 24, 1666–1670. [Google Scholar] [CrossRef]
Köhler, S.; Goldhammer, M.; Zindler, K.; Doll, K.; Dietmeyer, K. Stereo-Vision-Based Pedestrian’s Intention Detection in a Moving Vehicle. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Gran Canaria, Spain, 15–18 September 2015; Volume 2015, pp. 2317–2322. [Google Scholar] [CrossRef]
Kalatian, A.; Farooq, B. A context-aware pedestrian trajectory prediction framework for automated vehicles. Transp. Res. Part Emerg. Technol. 2021, 134, 103453. [Google Scholar] [CrossRef]
Vemula, A.; Muelling, K.; Oh, J. Social Attention: Modeling Attention in Human Crowds. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018; pp. 4601–4607. [Google Scholar] [CrossRef]
Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2255–2264. [Google Scholar] [CrossRef]
Ma, W.C.; Huang, D.A.; Lee, N.; Kitani, K.M. Forecasting Interactive Dynamics of Pedestrians with Fictitious Play. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; Volume 2017, pp. 4636–4644. [Google Scholar] [CrossRef]
Robicquet, A.; Alahi, A.; Sadeghian, A.; Anenberg, B.; Doherty, J.; Wu, E.; Savarese, S. Forecasting Social Navigation in Crowded Complex Scenes. arXiv 2016, arXiv:1601.00998. [Google Scholar]
Zhou, Y.; Wu, H.; Cheng, H.; Qi, K.; Hu, K.; Kang, C.; Zheng, J. Social Graph Convolutional LSTM for Pedestrian Trajectory Prediction. IET Intell. Transp. Syst. 2021, 15, 396–405. [Google Scholar] [CrossRef]
Huang, L.; Zhuang, J.; Cheng, X.; Xu, R.; Ma, H. STI-GAN: Multimodal Pedestrian Trajectory Prediction Using Spatiotemporal Interactions and a Generative Adversarial Network. IEEE Access 2021, 9, 50846–50856. [Google Scholar] [CrossRef]
Li, J.; Ma, H.; Zhang, Z.; Tomizuka, M. Social-WaGDAT: Interaction-Aware Trajectory Prediction via Wasserstein Graph Double-Attention Network. arXiv 2020, arXiv:2002.06241. [Google Scholar]
Shi, L.; Wang, L.; Long, C.; Zhou, S.; Zhou, M.; Niu, Z.; Hua, G. SGCN: Sparse Graph Convolution Network for Pedestrian Trajectory Prediction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8990–8999. [Google Scholar] [CrossRef]
Peng, Y.; Zhang, G.; Shi, J.; Xu, B.; Zheng, L. SRA-LSTM: Social Relationship Attention LSTM for Human Trajectory Prediction. arXiv 2021, arXiv:2103.17045. [Google Scholar]
Lv, P.; Wang, W.; Wang, Y.; Zhang, Y.; Xu, M.; Xu, C. SSAGCN: Social Soft Attention Graph Convolution Network for Pedestrian Trajectory Prediction. J. Comput. Sci. Technol. 2021, 14, 1–14. [Google Scholar] [CrossRef] [PubMed]
Fang, Y.; Jin, Z.; Cui, Z.; Yang, Q.; Xie, T.; Hu, B. Modeling Human–Human Interaction with Attention-Based High-Order GCN for Trajectory Prediction. Vis. Comput. 2021, 37, 2257–2269. [Google Scholar] [CrossRef]
Zhou, R.; Zhou, H.; Tomizuka, M.; Li, J.; Xu, Z. Grouptron: Dynamic Multi-Scale Graph Convolutional Networks for Group-Aware Dense Crowd Trajectory Forecasting. arXiv 2021, arXiv:2109.14128. [Google Scholar]
Zhou, B.; Tang, X.; Wang, X. Learning Collective Crowd Behaviors with Dynamic Pedestrian-Agents. Int. J. Comput. Vis. 2015, 111, 50–68. [Google Scholar] [CrossRef]
Su, H.; Zhu, J.; Dong, Y.; Zhang, B. Forecast the Plausible Paths in Crowd Scenes. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2772–2778. [Google Scholar] [CrossRef]
Sun, J.; Jiang, Q.; Lu, C. Recursive Social Behavior Graph for Trajectory Prediction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 657–666. [Google Scholar] [CrossRef]
Ivanovic, B.; Pavone, M. The Trajectron: Probabilistic Multi-Agent Trajectory Modeling with Dynamic Spatiotemporal Graphs. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Volume 2019, pp. 2375–2384. [Google Scholar] [CrossRef]
Sadeghian, A.; Alahi, A.; Savarese, S. Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; Volume 2017, pp. 300–311. [Google Scholar] [CrossRef]
Rasouli, A.; Kotseruba, I.; Tsotsos, J. Understanding Pedestrian Behavior in Complex Traffic Scenes. IEEE Trans. Intell. Veh. 2018, 3, 61–70. [Google Scholar] [CrossRef]
Manh, H.; Alaghband, G. Scene-LSTM: A Model for Human Trajectory Prediction. arXiv 2018, arXiv:1808.04018. [Google Scholar]
Saleh, K.; Hossny, M.; Nahavandi, S. Contextual Recurrent Predictive Model for Long-Term Intent Prediction of Vulnerable Road Users. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3398–3408. [Google Scholar] [CrossRef]
Rehder, E.; Kloeden, H. Goal-Directed Pedestrian Prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; Volume 2015, pp. 139–147. [Google Scholar] [CrossRef]
Sadeghian, A.; Kosaraju, V.; Sadeghian, A.; Hirose, N.; Rezatofighi, H.; Savarese, S. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; Volume 2019, pp. 1349–1358. [Google Scholar] [CrossRef]
Sadeghian, A.; Legros, F.; Voisin, M.; Vesel, R.; Alahi, A.; Savarese, S. CAR-Net: Clairvoyant Attentive Recurrent Network. In Computer Vision–ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 162–180. [Google Scholar]
Yao, Y.; Atkins, E.; Johnson-Roberson, M.; Vasudevan, R.; Du, X. Coupling Intent and Action for Pedestrian Crossing Behavior Prediction. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–27 August 2021; pp. 1238–1244. [Google Scholar] [CrossRef]
Rainbow, B.A.; Men, Q.; Shum, H.P.H. Semantics-STGCNN: A Semantics-Guided Spatial-Temporal Graph Convolutional Network for Multi-Class Trajectory Prediction. arXiv 2021, arXiv:2108.04740. [Google Scholar]
Cao, D.; Fu, Y. Using Graph Convolutional Networks Skeleton-Based Pedestrian Intention Estimation Models for Trajectory Prediction. In Proceedings of the Journal of Physics: Conference Series; IOP: Bristol, UK, 2020; Volume 1621. [Google Scholar] [CrossRef]
Rasouli, A.; Yau, T.; Rohani, M.; Luo, J. Multi-Modal Hybrid Architecture for Pedestrian Action Prediction. arXiv 2020, arXiv:2012.00514. [Google Scholar]
Bhattacharyya, A.; Hanselmann, M.; Fritz, M.; Schiele, B.; Straehle, C. Conditional Flow Variational Autoencoders for Structured Sequence Prediction. arXiv 2019, arXiv:1908.09008. [Google Scholar]
Rasouli, A.; Kotseruba, I.; Tsotsos, J.K. Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, ICCVW, Venice, Italy, 22–29 October 2017; Volume 2018, pp. 206–213. [Google Scholar] [CrossRef]
Huang, X.; Rosman, G.; Gilitschenski, I.; Jasour, A.; McGill, S.G.; Leonard, J.J.; Williams, B.C. HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
Zhao, H.; Gao, J.; Lan, T.; Sun, C.; Sapp, B.; Varadarajan, B.; Shen, Y.; Shen, Y.; Chai, Y.; Schmid, C.; et al. TNT: Target-Driven Trajectory Prediction. arXiv 2020, arXiv:2008.08294. [Google Scholar]
Lee, N.; Choi, W.; Vernaza, P.; Choy, C.B.; Torr, P.H.S.; Chandraker, M. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; Volume 2017, pp. 2165–2174. [Google Scholar] [CrossRef]
Li, J.; Yang, F.; Tomizuka, M.; Choi, C. EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–12 December 2020; Volume 2020, pp. 1–18. [Google Scholar]
Naik, A.Y. On the Utility of Scene Objects to Forecast Pedestrians Intentions. Master’s Thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 2021. [Google Scholar]
Rasouli, A.; Rohani, M.; Luo, J. Bifold and Semantic Reasoning for Pedestrian Behavior Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 15600–15610. [Google Scholar] [CrossRef]
Azarmi, M.; Rezaei, M.; Wang, H.; Glaser, S. PIP-Net: Pedestrian Intention Prediction in the Wild. arXiv 2024, arXiv:2402.12810. [Google Scholar] [CrossRef]
Oeding, D. Verkehrsbelastung und dimensionierung von gehwegen und anderen anlagen des fußangerverkehrs. In Technical Report Tech. Rep. 22; Technische Hochschule Braunschweig: Braunschweig, Germany, 1963. [Google Scholar]
Helbing, D.; Buzna, L.; Johansson, A.; Werner, T. Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transp. Sci. 2005, 39, 1–24. [Google Scholar] [CrossRef]
Fruin, J.J. Designing for Pedestrians: A Level Service Concept; Polytechnic University: New York, NY, USA, 1970. [Google Scholar]
Polus, A.; Schofer, J.; Ushpiz, A. Pedestrian flow and level of service. J. Transp. Eng. 1983, 109, 46–56. [Google Scholar] [CrossRef]
Holl, S.; Boltes, M.; Seyfried, A. Level-of-safety-konzept für den Fußverkehr bei Großveranstaltungen. In Veranstaltungskommunikation; Springer: Wiesbaden, Germany, 2019; pp. 253–277. [Google Scholar]
Hirai, K.; Tarui, K. A simulation of the behavior of a crowd in panic. In Proceedings of the International Conference on Cybernetics and Society, San Francisco, CA, USA, 23–25 September 1975; pp. 409–411. [Google Scholar]
Løvås, G.G. Modeling and simulation of pedestrian traffic flow. Transp. Res. Part Methodol. 1994, 28, 429–443. [Google Scholar] [CrossRef]
Garbrecht, D. Describing pedestrian and car trips by transition matrices. Traffic Q. 1973, 27, 89–110. [Google Scholar]
Henderson, L.F. The statistics of crowd fluids. Nature 1971, 229, 381–383. [Google Scholar] [CrossRef] [PubMed]
Henderson, L.F. On the fluid mechanics of human crowd motion. Transp. Res. 1974, 8, 509–515. [Google Scholar] [CrossRef]
Chowdhury, D.; Santen, L.; Schadschneider, A. Statistical physics of vehicular traffic and some related systems. Phys. Rep. 2000, 329, 199–329. [Google Scholar] [CrossRef]
Castellano, C.; Fortunato, S.; Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 2009, 81, 591–646. [Google Scholar] [CrossRef]
Bellomo, N.; Dogbe, C. On the modeling of traffic and crowds: A survey of models, speculations, and perspectives. SIAM Rev. 2011, 53, 409–463. [Google Scholar] [CrossRef]
Martinez-Gil, F.; Lozano, M.; García-Fernández, I.; Fernández, F. Modeling, evaluation, and scale on artificial pedestrians: A literature review. ACM Comput. Surv. 2017, 50, 1–35. [Google Scholar] [CrossRef]
Chraibi, M.; Tordeux, A.; Schadschneider, A.; Seyfried, A. Modelling of Pedestrian and Evacuation Dynamics; Springer: Berlin, Germany, 2018; pp. 1–22. [Google Scholar]
Boltes, M.; Zhang, J.; Tordeux, A.; Schadschneider, A.; Seyfried, A. Empirical Results of Pedestrian and Evacuation Dynamics; Springer: Berlin, Germany, 2018; pp. 1–29. [Google Scholar]
Schadschneider, A.; Klingsch, W.; Klüpfel, H.; Kretz, T.; Rogsch, C.; Seyfried, A. Evacuation dynamics: Empirical results, modeling and applications. In Encyclopedia of Complexity and Systems Science; Springer: New York, NY, USA, 2009; pp. 3142–3176. [Google Scholar]
Duives, D.C.; Daamen, W.; Hoogendoorn, S.P. State-of-the-art crowd motion simulation models. Transp. Res. Part Emerg. Technol. 2013, 37, 193–209. [Google Scholar] [CrossRef]
Schadschneider, A.; Chraibi, M.; Seyfried, A.; Tordeux, A.; Zhan, J. Pedestrian Dynamics—From Empirical Results to Modeling; Birkhäuser: Cham, Switzerland, 2018. [Google Scholar]
Dong, H.; Zhou, M.; Wang, Q.; Yang, X.; Wang, F.Y. State-of-the-art pedestrian and evacuation dynamics. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1849–1866. [Google Scholar] [CrossRef]
Cristiani, E.; Piccoli, B.; Tosin, A. Multiscale Modeling of Pedestrian Dynamics; Springer: Cham, Switzerland, 2014; Volume 12. [Google Scholar]
Hoogendoorn, S.P.; van Wageningen-Kessels, F.L.M.; Daamen, W.; Duives, D.C. Continuum modelling of pedestrian flows: From microscopic principles to self-organised macroscopic phenomena. Phys. Stat. Mech. Its Appl. 2014, 416, 684–694. [Google Scholar] [CrossRef]
Chraibi, M.; Kemloh, U.; Schadschneider, A.; Seyfried, A. Force-based models of pedestrian dynamics. Networks Heterog. Media 2011, 6, 425. [Google Scholar] [CrossRef]
Totzeck, C. An anisotropic interaction model with collision avoidance. Kinet. Relat. Model. 2020, 13, 1219–1242. [Google Scholar] [CrossRef]
Helbing, D.; Molnar, P. Social force model for pedestrian dynamics. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 1995, 51, 4282–4286. [Google Scholar] [CrossRef]
Yu, W.; Chen, R.; Dong, L.; Dai, S. Centrifugal force model for pedestrian dynamics. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2005, 72, 026112. [Google Scholar] [CrossRef]
Chraibi, M.; Seyfried, A.; Schadschneider, A. Generalized centrifugal-force model for pedestrian dynamics. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2010, 82, 046111. [Google Scholar] [CrossRef] [PubMed]
Nakayama, A.; Hasebe, K.; Sugiyama, Y. Instability of pedestrian flow and phase structure in a two-dimensional optimal velocity model. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2005, 71, 036121. [Google Scholar] [CrossRef]
Köster, G.; Treml, F.; Gödel, M. Avoiding numerical pitfalls in social force models. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2013, 87, 063305. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Treiber, M.; Kanagaraj, V.; Li, H. Social force models for pedestrian traffic–state of the art. Transp. Rev. 2018, 38, 625–653. [Google Scholar] [CrossRef]
van den Berg, J.; Guy, S.; Lin, M.; Manocha, D. Reciprocal n-body collision avoidance. In Robotics Research; Springer: Berlin, Germany, 2011; pp. 3–19. [Google Scholar]
Paris, S.; Pettré, J.; Donikian, S. Pedestrian reactive navigation for crowd simulation: A predictive approach. Comput. Graph. Forum 2007, 26, 665–674. [Google Scholar] [CrossRef]
den Berg, J.V.; Lin, M.; Manocha, D. Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA, 19–23 May 2008; pp. 1928–1935. [Google Scholar]
Pellegrini, S.; Ess, A.; Schindler, K.; Gool, L.V. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar]
Guy, S.J.; Chhugani, J.; Kim, C.; Satish, N.; Lin, M.; Manocha, D.; Dubey, P. ClearPath: Highly parallel collision avoidance for multi-agent simulation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, New Orleans, LA, USA, 1–2 August 2009; pp. 177–187. [Google Scholar]
Guy, S.; Lin, M.; Manocha, D. Modeling collision avoidance behavior for virtual humans. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, ON, Canada, 10–14 May 2010; Volume 2, pp. 575–582. [Google Scholar]
Kim, S.; Guy, S.J.; Liu, W.; Wilkie, D.; Lau, R.W.; Lin, M.C.; Manocha, D. BRVO: Predicting pedestrian trajectories using velocity-space reasoning. Int. J. Robot. Res. 2015, 34, 201–217. [Google Scholar] [CrossRef]
Guo, K.; Wang, D.; Fan, T.; Pan, J. VR-ORCA: Variable responsibility optimal reciprocal collision avoidance. IEEE Robot. Autom. Lett. 2021, 6, 4520–4527. [Google Scholar] [CrossRef]
Ondřej, J.; Pettré, J. A synthetic-vision based steering approach for crowd simulation. ACM Trans. Graph. 2010, 29, 123. [Google Scholar] [CrossRef]
Dietrich, F.; Köster, G. Gradient navigation model for pedestrian dynamics. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 2014, 89, 062801. [Google Scholar] [CrossRef]
Tordeux, A.; Chraibi, M.; Seyfried, A. Collision-free speed model for pedestrian dynamics. In Traffic and Granular Flow; Springer: Cham, Switzerland, 2016; pp. 225–232. [Google Scholar]
Xu, Q.; Chraibi, M.; Tordeux, A.; Zhang, J. Generalized collision-free velocity model for pedestrian dynamics. Phys. A Stat. Mech. Appl. 2019, 535, 122521. [Google Scholar] [CrossRef]
Maury, B.; Venel, J.; Olivier, A.H.; Donikian, S. A mathematical framework for a crowd motion model. Comptes Rendus Math. 2008, 346, 1245–1250. [Google Scholar] [CrossRef]
Maury, B.; Venel, J. A discrete contact model for crowd motion. ESAIM Math. Model. Numer. Anal. 2011, 45, 145–168. [Google Scholar] [CrossRef]
Weidmann, U. Transporttechnik der Fußgänger—Transporttechnische Eigenschaften des Fußgängerverkehrs (Literaturauswertung). In Technical Report 90; ETH Zürich: Zürich, Switzerland, 1993. [Google Scholar]
Fukui, M.; Ishibashi, Y. Self-organized phase transitions in cellular automaton models for pedestrians. J. Phys. Soc. Jpn. 1999, 68, 2861. [Google Scholar] [CrossRef]
Muramatsu, M.; Irie, T.; Nagatani, T. Jamming transition in pedestrian counter flow. Phys. A Stat. Mech. Appl. 1999, 267, 487–498. [Google Scholar] [CrossRef]
Blue, V.J.; Adler, J.L. Cellular automata microsimulation of bidirectional pedestrian flows. Transp. Res. Rec. J. Transp. Res. Board 1999, 1678, 135–141. [Google Scholar] [CrossRef]
Klüpfel, H.; Meyer-König, T.; Wahle, J.; Schreckenberg, M. Microscopic simulation of evacuation processes on passenger ships. In Theory and Practical Issues on Cellular Automata; Bandini, S., Worsch, T., Eds.; Springer: Berlin, Germany, 2000. [Google Scholar]
Ben-Jacob, E. From snowflake formation to growth of bacterial colonies. Part II. Cooperative formation of complex colonial patterns. Contemp. Phys. 1997, 38, 205–241. [Google Scholar] [CrossRef]
Kirchner, A.; Schadschneider, A. Simulation of evacuation processes using a bionics-inspired cellular automaton model for pedestrian dynamics. Phys. A Stat. Mech. Appl. 2002, 312, 260–276. [Google Scholar] [CrossRef]
Kirchner, A.; Nishinari, K.; Schadschneider, A. Friction effects and clogging in a cellular automaton model for pedestrian dynamics. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 2003, 67, 056122. [Google Scholar] [CrossRef] [PubMed]
von Krüchten, C.; Schadschneider, A. A cognitive, decision-based model for pedestrian dynamics. In Traffic and Granular Flow; Springer: Cham, Switzerland, 2020; pp. 141–147. [Google Scholar]
Krüchten, C.V.; Schadschneider, A. Concept of a decision-based pedestrian model. Collective Dyn. 2020, 5, 316–323. [Google Scholar] [CrossRef]
Zhang, Z.; Jia, L. Direction-decision learning based pedestrian flow behavior investigation. IEEE Access 2020, 8, 15027–15038. [Google Scholar] [CrossRef]
Schneider, N.; Gavrila, D.M. Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study. In Proceedings of the 35th German Conference on Pattern Recognition, Saarbrücken, Germany, 3–6 September 2013; pp. 174–183. [Google Scholar] [CrossRef]
Helbing, D. Traffic and related self-driven many-particle systems. Rev. Mod. Phys. 2001, 73, 1067–1141. [Google Scholar] [CrossRef]
Hankin, B.D.; Wright, R.A. Passenger flow in subways. J. Oper. Res. Soc. 1958, 9, 81–88. [Google Scholar] [CrossRef]
Older, S. Movement of pedestrians on footways in shopping streets. Traffic Eng. Control 1968, 10, 160–163. [Google Scholar]
Navin, F.P.; Wheeler, R.J. Pedestrian flow characteristics. Traffic Eng. 1969, 39, 31–36. [Google Scholar]
Seyfried, A.; Steffen, B.; Klingsch, W.; Boltes, M. The fundamental diagram of pedestrian movement revisited. J. Stat. Mech., Theory Exp. 2005, 2005, P10002. [Google Scholar] [CrossRef]
Chattaraj, U.; Seyfried, A.; Chakroborty, P. Comparison of pedestrian fundamental diagram across cultures. Adv. Complex Syst. 2009, 12, 393–405. [Google Scholar] [CrossRef]
Zhang, J.; Klingsch, W.; Schadschneider, A.; Seyfried, A. Ordering in bidirectional pedestrian flows and its influence on the fundamental diagram. J. Stat. Mech. Theory Exp. 2012, 2012, P02002. [Google Scholar] [CrossRef]
Subaih, R.; Maree, M.; Chraibi, M.; Awad, S.; Zanoon, T. Gender-based insights into the fundamental diagram of pedestrian dynamics. In Proceedings of the International Conference on Computational Collective Intelligence, Cham, Switzerland, 4–6 September 2019; pp. 613–624. [Google Scholar]
Moussaïd, M.; Helbing, D.; Theraulaz, G. How simple rules determine pedestrian behavior and crowd disasters. Proc. Natl. Acad. Sci. USA 2011, 108, 6884–6888. [Google Scholar] [CrossRef] [PubMed]
Schadschneider, A.; Seyfried, A. Validation of CA models of pedestrian dynamics with fundamental diagrams. Cybern. Syst. 2009, 40, 367–389. [Google Scholar] [CrossRef]
Gomes, S.N.; Stuart, A.M.; Wolfram, M.T. Parameter estimation for macroscopic pedestrian dynamics models from microscopic data. SIAM J. Appl. Math. 2019, 79, 1475–1500. [Google Scholar] [CrossRef]
Cristín, J.; Méndez, V.; Campos, D. General scaling in bidirectional flows of self-avoiding agents. Sci. Rep. 2019, 9, 18488. [Google Scholar] [CrossRef]
Goldsztein, G.H. Self-organization when pedestrians move in opposite directions. Multi-lane circular track model. Appl. Sci. 2020, 10, 563. [Google Scholar] [CrossRef]
Bain, N.; Bartolo, D. Dynamic response and hydrodynamics of polarized crowds. Science 2019, 363, 46–49. [Google Scholar] [CrossRef]
Friesen, M.; Gottschalk, H.; Rüdiger, B.; Tordeux, A. Spontaneous wave formation in stochastic self-driven particle systems. SIAM J. Appl. Math. 2021, 81, 853–870. [Google Scholar] [CrossRef]
Helbing, D.; Farkas, I.J.; Vicsek, T. Freezing by heating in a driven mesoscopic system. Phys. Rev. Lett. 2000, 84, 1240–1243. [Google Scholar] [CrossRef]
Stanley, H. Non-equilibrium physics: Freezing by heating. Nature 2000, 404, 718. [Google Scholar] [CrossRef]
Helbing, D.; Johansson, A. Pedestrian, Crowd and Evacuation Dynamics; Springer: New York, NY, USA, 2009; pp. 6476–6495. [Google Scholar]
Cividini, J.; Appert-Rolland, C.; Hilhorst, H.J. Diagonal patterns and Chevron effect in intersecting traffic flows. Europhys. Lett. 2013, 102, 20002. [Google Scholar] [CrossRef]
Nicolas, A.; Ibáñez, S.; Kuperman, M.N.; Bouzat, S. A counterintuitive way to speed up pedestrian and granular bottleneck flows prone to clogging: Can ‘more’ escape faster? J. Stat. Mech. Theory Exp. 2018, 2018, 083403. [Google Scholar] [CrossRef]
Hermann, G.; Touboul, J. Heterogeneous connections induce oscillations in large-scale networks. Phys. Rev. Lett. 2012, 109, 018702. [Google Scholar] [CrossRef]
Moussaïd, M.; Kämmer, J.E.; Analytis, P.P.; Neth, H. Social influence and the collective dynamics of opinion formation. PLoS ONE 2013, 8, e78433. [Google Scholar] [CrossRef] [PubMed]
Touboul, J.D. The hipster effect: When anti-conformists all look the same. Discret. Contin. Dyn. Syst. B 2019, 24, 4379. [Google Scholar] [CrossRef]
Bechinger, C.; Leonardo, R.D.; Löwen, H.; Reichhardt, C.; Volpe, G.; Volpe, G. Active particles in complex and crowded environments. Rev. Modern Phys. 2016, 88, 045006. [Google Scholar] [CrossRef]
Schweitzer, F. Brownian Agents and Active Particles: Collective Dynamics in the Natural and Social Sciences; Springer: Berlin, Germany, 2003. [Google Scholar]
Ramaswamy, S. The mechanics and statistics of active matter. Annu. Rev. Condens. Matter Phys. 2010, 1, 323–345. [Google Scholar] [CrossRef]
Vicsek, T.; Zafeiris, A. Collective motion. Phys. Rep. 2012, 517, 71–140. [Google Scholar] [CrossRef]
Marchetti, M.C.; Joanny, J.F.; Ramaswamy, S.; Liverpool, T.B.; Prost, J.; Rao, M.; Simha, R.A. Hydrodynamics of soft active matter. Rev. Modern Phys. 2013, 85, 1143. [Google Scholar] [CrossRef]
Elgeti, J.; Winkler, R.G.; Gompper, G. Physics of microswimmers—Single particle motion and collective behavior: A review. Rep. Prog. Phys. 2015, 78, 056601. [Google Scholar] [CrossRef]
Ourmazd, A. Science in the age of machine learning. Nat. Rev. Phys. 2020, 2, 342–343. [Google Scholar] [CrossRef]
Cichos, F.; Gustavsson, K.; Mehlig, B.; Volpe, G. Machine learning for active matter. Nature Mach. Intell. 2020, 2, 94–103. [Google Scholar] [CrossRef]
Shahhoseini, Z.; Sarvi, M. Collective movements of pedestrians: How we can learn from simple experiments with non-human (ant) crowds. PLoS ONE 2017, 12, e0182913. [Google Scholar] [CrossRef]
Needleman, D.; Dogic, Z. Active matter at the interface between materials science and cell biology. Nature Rev. Mater. 2017, 2, 1–14. [Google Scholar] [CrossRef]
Dulaney, A.R.; Brady, J.F. Machine learning for phase behavior in active matter systems. Soft Matter 2021, 17, 6808–6816. [Google Scholar] [CrossRef] [PubMed]
Jin, C.J.; Fang, S.; Jiang, R.; Xue, K.; Li, D. Cellular automaton simulations of hybrid pedestrian movement in two-route situation. Phys. A Stat. Mech. Its Appl. 2024, 651, 130029. [Google Scholar] [CrossRef]
Wang, J.; Lv, W.; Jiang, Y.; Huang, G. A cellular automata approach for modelling pedestrian-vehicle mixed traffic flow in urban city. arXiv 2024, arXiv:2405.06282v1. [Google Scholar] [CrossRef]
Huang, K.; Zheng, X.; Cheng, Y.; Yang, Y. Behavior-based cellular automaton model for pedestrian dynamics. Appl. Math. Comput. 2017, 292, 417–424. [Google Scholar] [CrossRef]
Burstedde, C.; Klauck, K.; Schadschneider, A.; Zittartz, J. Simulation of pedestrian dynamics using a two-dimensional cellular automaton. Phys. Stat. Mech. Its Appl. 2001, 295, 507–525. [Google Scholar] [CrossRef]
Helbing, D.; Farkas, I.; Vicsek, T. Simulating dynamical features of escape panic. Nature 2000, 407, 487–490. [Google Scholar] [CrossRef]
Karamouzas, I.; Skinner, B.; Guy, S.J. A universal power law governing pedestrian interactions. Phys. Rev. Lett. 2014, 113, 238701. [Google Scholar] [CrossRef]
Treuille, A.; Cooper, S.; Popović, Z. Continuum crowds. ACM Trans. Graph. 2006, 25, 1160–1168. [Google Scholar] [CrossRef]
Hughes, R.L. A continuum theory for the flow of pedestrians. Transp. Res. B Methodol. 2002, 36, 507–535. [Google Scholar] [CrossRef]
Mirzabagheri, A. Exploring the Effectiveness of Various Deep Learning Models in Skin Cancer Detection. Master’s Thesis, University of Windsor, Windsor, ON, Canada, 2024. Available online: https://scholar.uwindsor.ca/etd/9436 (accessed on 26 February 2025).
Xin, L.; Wang, P.; Chan, C.; Chen, J.; Li, S.; Cheng, B. Intention Aware Long Horizon Trajectory Prediction of Surrounding Vehicles Using Dual LSTM Networks. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 1441–1446. [Google Scholar]
Zheng, S.; Yue, Y.; Hobbs, J. Generating Long-Term Trajectories Using Deep Hierarchical Networks. In Proceedings of the Thirtieth Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Zhan, E.; Zheng, S.; Yue, Y.; Lucey, P. Generative Multi-Agent Behavioral Cloning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Martinez, J.; Black, M.; Romero, J. On Human Motion Prediction Using Recurrent Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4674–4683. [Google Scholar]
Salzmann, T.; Ivanovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Dynamically Feasible Trajectory Forecasting with Heterogeneous Data. In Proceedings of the Computer Vision–ECCV, 16th European Conference, Glasgow, UK, 23–28 August 2020; Volume 12363, pp. 683–700. [Google Scholar]
Sharma, N.; Dhiman, C.; Indu, S. Predicting pedestrian intentions with multimodal IntentFormer: A Co-learning approach. Pattern Recognit. 2025, 161, 111205. [Google Scholar] [CrossRef]
Khindkar, V.; Balasubramanian, V.; Arora, C.; Subramanian, A.; Jawahar, C.V. Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach. arXiv 2024, arXiv:2411.13302. [Google Scholar]
Ahmed, S.; Bazi, A.A.; Saha, C.; Rajbhandari, S.; Huda, M.N. Multi-scale pedestrian intent prediction using 3D joint information as spatio-temporal representation. Expert Syst. Appl. 2023, 225, 120077. [Google Scholar] [CrossRef]
Song, X.; Kang, M.; Zhou, S.; Wang, J.; Mao, Y.; Zheng, N. Pedestrian Intention Prediction Based on Traffic-Aware Scene Graph Model. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 9851–9858. [Google Scholar] [CrossRef]
Rasouli, A.; Kotseruba, I.; Kunic, T.; Tsotsos, J. PIE: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Volume 2019, pp. 6261–6270. [Google Scholar] [CrossRef]
Liang, J.; Jiang, L.; Niebles, J.C.; Hauptmann, A.; Fei-Fei, L. Peeking into the Future: Predicting Future Person Activities and Locations in Videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; Volume 2019, pp. 2960–2963. [Google Scholar] [CrossRef]
Zhao, T.; Xu, Y.; Monfort, M.; Choi, W.; Baker, C.; Zhao, Y.; Wang, Y.; Wu, Y. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12118–12126. [Google Scholar]
Lv, Z.; Li, J.; Dong, C.; Wang, Y.; Li, H.; Xu, Z. DeepPTP: A deep pedestrian trajectory prediction model for traffic intersection. KSII Trans. Internet Inf. Syst. 2021, 15, 2321–2338. [Google Scholar] [CrossRef]
Marchetti, F.; Becattini, F.; Seidenari, L.; Del Bimbo, A. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7141–7150. [Google Scholar]
Mangalam, K.; An, Y.; Girase, H.; Malik, J. From Goals, Waypoints & Paths to Long-Term Human Trajectory Forecasting. arXiv 2020, arXiv:2012.01526. [Google Scholar]
Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. SociAl-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14412–14420. [Google Scholar] [CrossRef]
Wang, R.; Cui, Y.; Song, X.; Chen, K.; Fang, H. Multi-information-based Convolutional Neural Network with Attention Mechanism for Pedestrian Trajectory Prediction. Image Vis. Comput. 2021, 107, 104110. [Google Scholar] [CrossRef]
Zhang, B.; Zhang, R.; Bisagno, N.; Conci, N.; Natale, F.D.; Liu, H. Where Are They Going? Predicting Human Behaviors in Crowded Scenes. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 1–19. [Google Scholar] [CrossRef]
Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Tracking by Prediction: A Deep Generative Model for Multi-person Localisation and Tracking. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1122–1132. [Google Scholar]
Kosaraju, V.; Sadeghian, A.; Martín-Martín, R.; Reid, I.; Rezatofighi, S.H.; Savarese, S. Social-BiGAT: Multimodal Trajectory Forecasting Using Bicycle-GAN and Graph Attention Networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 1–10. [Google Scholar]
Kothari, P.; Alahi, A. Human Trajectory Prediction Using Adversarial Loss. In Proceedings of the 19th Swiss Transport Research Conference, Ascona, Switzerland, 15–17 May 2019. [Google Scholar]
Chen, T.; Tian, R.; Ding, Z.; Engineering, C. Visual Reasoning Using Graph Convolutional Networks for Predicting Pedestrian Crossing Intention. In Proceedings of the ICCV Workshop, Montreal, BC, Canada, 11–17 October 2021; pp. 3103–3109. [Google Scholar]
Zhou, H.; Ren, D.; Yang, X.; Fan, M.; Huang, H. Sliding Sequential CVAE with Time Variant Socially-Aware Rethinking for Trajectory Prediction. arXiv 2021, arXiv:2110.15016. [Google Scholar]
Kawale, J.; Liess, S.; Kumar, A.; Steinbach, M.; Snyder, P.; Kumar, V.; Ganguly, A.R.; Samatova, N.F.; Semazzi, F. A Graph-Based Approach to Find Teleconnections in Climate Data. Stat. Anal. Data Mining: Asa Data Sci. J. 2013, 6, 158–179. [Google Scholar] [CrossRef]
Faghmous, J.H.; Frenger, I.; Yao, Y.; Warmka, R.; Lindell, A.; Kumar, V. A Daily Global Mesoscale Ocean Eddy Dataset from Satellite Altimetry. Sci. Data 2015, 2, 1–16. [Google Scholar] [CrossRef] [PubMed]
Hautier, G.; Fischer, C.C.; Jain, A.; Mueller, T.; Ceder, G. Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory. Chem. Mater. 2010, 22, 3762–3767. [Google Scholar] [CrossRef]
Fischer, C.C.; Tibbetts, K.J.; Morgan, D.; Ceder, G. Predicting Crystal Structure by Merging Data Mining with Quantum Mechanics. Nat. Mater. 2006, 5, 641–646. [Google Scholar] [CrossRef]
Li, L.; Snyder, J.C.; Pelaschier, I.M.; Huang, J.; Niranjan, U.N.; Duncan, P.; Rupp, M.; Müller, K.R.; Burke, K. Understanding Machine-Learned Density Functionals. Int. J. Quantum Chem. 2016, 116, 819–833. [Google Scholar] [CrossRef]
Wong, K.C.; Wang, L.; Shi, P. Active Model with Orthotropic Hyperelastic Material for Cardiac Image Analysis. In Proceedings of the International Conference on Functional Imaging and Modeling of the Heart, Nice, France, 3–5 June 2009; pp. 229–238. [Google Scholar]
Xu, J.; Sapp, J.L.; Dehaghani, A.R.; Gao, F.; Horacek, M.; Wang, L. Robust Transmural Electrophysiological Imaging: Integrating Sparse and Dynamic Physiological Models into ECG-Based Inference. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Cham, Switzerland, 5–9 October 2015; pp. 519–527. [Google Scholar]
Lee, K.H.; Ros, G.; Li, J.; Gaidon, A. SPIGAN: Privileged Adversarial Learning from Simulation. arXiv 2018, arXiv:1810.03756. [Google Scholar]
von Rueden, L.; Mayer, S.; Sifa, R.; Bauckhage, C.; Garcke, J. Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions. In Proceedings of the International Symposium on Intelligent Data Analysis, Cham, Switzerland, 27–29 April 2020; pp. 548–560. [Google Scholar]
Antonucci, A.; Papini, G.P.R.; Palopoli, L.; Fontanelli, D. Generating reliable and efficient predictions of human motion: A promising encounter between physics and neural networks. arXiv 2020, arXiv:2006.08429. [Google Scholar]
Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; Kumar, V. Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems. arXiv 2020, arXiv:2003.04919. [Google Scholar] [CrossRef]
Silvestri, M.; Lombardi, M.; Milano, M. Injecting Domain Knowledge in Neural Networks: A Controlled Experiment on a Constrained Problem. In Proceedings of the International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Uppsala, Sweden, 28–31 May 2021; pp. 266–282. [Google Scholar]
Bahari, M.; Nejjar, I.; Alahi, A. Injecting Knowledge in Data-Driven Vehicle Trajectory Predictors. Transp. Res. Part Emerg. Technol. 2021, 128, 103010. [Google Scholar] [CrossRef]
Hossain, S.; Johora, F.T.; Müller, J.P.; Hartmann, S.; Reinhardt, A. SFMGNet: A Physics-Based Neural Network to Predict Pedestrian Trajectories. arXiv 2022, arXiv:2202.02791. [Google Scholar]
Göttlich, S.; Knapp, S. Artificial Neural Networks for the Estimation of Pedestrian Interaction Forces. In Crowd Dynamics, Volume 2; Springer: Berlin, Germnay, 2020; pp. 11–32. [Google Scholar]
Kreiss, S. Deep Social Force. arXiv 2021, arXiv:2109.12081. [Google Scholar]
Jiang, H.; Kim, B.; Guan, M.; Gupta, M. To trust or not to trust a classifier. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 5541–5552. [Google Scholar]
Mitchell, T.M. The Need for Biases in Learning Generalizations; Technical Report; Rutgers University: New Brunswick, NJ, USA, 1980. [Google Scholar]
Nguyen, V.L.; Destercke, S.; Hüllermeier, E. Epistemic Uncertainty Sampling. In Discovery Science, Proceedings of the 22nd International Conference, DS 2019, Split, Croatia, 28–30 October 2019; Kralj Novak, P., Šmuc, T., Džeroski, S., Eds.; Springer: Cham, Switzerland, 2019; pp. 72–86. [Google Scholar]
Aggarwal, C.; Kong, X.; Gu, Q.; Han, J.; Philip, S. Active learning: A survey. In Data Classification; Chapman and Hall/CRC: New York, NY, USA, 2014; pp. 599–634. [Google Scholar]
Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
Mulder, W.; Bethard, S.; Moens, M. A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech. Lang. 2015, 30, 61–98. [Google Scholar] [CrossRef]
Malik, M.; Malik, M.; Mehmood, K.; Makhdoom, I. Automatic speech recognition: A survey. Multimed. Tools Appl. 2021, 80, 9411–9457. [Google Scholar] [CrossRef]
Hashemi, A.; Mozaffari, S. Secure deep neural networks using adversarial image generation and training with Noise-GAN. Comput. Secur. 2019, 86, 372–387. [Google Scholar] [CrossRef]
Ke, H.; Mozaffari, S.; Alirezaee, S.; Saif, M. Cooperative adaptive cruise control using vehicle-to-vehicle communication and deep learning. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 5–9 June 2022; pp. 435–440. [Google Scholar] [CrossRef]
Khosravanian, A.; Rahmanimanesh, M.; Keshavarzi, P.; Mozaffari, S. Enhancing level set brain tumor segmentation using fuzzy shape prior information and deep learning. Int. J. Imaging Syst. Technol. 2023, 33, 323–339. [Google Scholar] [CrossRef]
Hemmatian, M.; Shahzadi, A.; Mozaffari, S. Uncertainty-based knowledge distillation for Bayesian deep neural network compression. Int. J. Approx. Reason. 2024, 175, 109301. [Google Scholar] [CrossRef]
Hubschneider, C.; Hutmacher, R.; Zöllner, J.M. Calibrating Uncertainty Models for Steering Angle Estimation. In Proceedings of the IEEE Intelligent Transportation Systems Conference, Auckland, New Zealand, 27–30 October 2019; pp. 1511–1518. [Google Scholar]
Hafner, D.; Tran, D.; Lillicrap, T.; Irpan, A.; Davidson, J. Noise contrastive priors for functional uncertainty. arXiv 2018, arXiv:1807.09289. [Google Scholar]
Particke, F. Predictive Pedestrian Awareness with Intention Uncertainties for Autonomous Driving. Ph.D. Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2020. [Google Scholar]
Dai, S.; Liu, J.; Cheung, N.M. Uncertainty-Aware Pedestrian Crossing Prediction via Reinforcement Learning. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 9540–9549. [Google Scholar] [CrossRef]
Upreti, M.; Ramesh, J.; Kumar, C.; Chakraborty, B.; Balisavira, V.; Roth, M.; Kaiser, V.; Czech, P. Traffic Light and Uncertainty Aware Pedestrian Crossing Intention Prediction for Automated Vehicles. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–8. [Google Scholar] [CrossRef]
Zhang, Z.; Tian, R.; Ding, Z. TrEP: Transformer-Based Evidential Prediction for Pedestrian Intention with Uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. [Google Scholar]
Liu, Y.; Ye, Z.; Wang, R.; Li, B.; Sheng, Q.Z.; Yao, L. Uncertainty-aware pedestrian trajectory prediction via distributional diffusion. Knowl.-Based Syst. 2024, 296, 111862. [Google Scholar] [CrossRef]
Chen, X.; Zhang, S.; Li, J.; Yang, J. Pedestrian Crossing Intention Prediction Based on Cross-Modal Transformer and Uncertainty-Aware Multi-Task Learning for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2024, 25, 12538–12549. [Google Scholar] [CrossRef]
Nayak, A.; Sharma, S.; Koch, C. Bayesian Approximation for Pedestrian Trajectory Forecasting Under Uncertainty. arXiv 2022, arXiv:2205.01887. [Google Scholar]
Xie, M.; Liu, Z.; Zhang, Y.; Zheng, N. GTransPDM: Decoupling Graph Transformer for Pedestrian Crossing Intention Prediction. arXiv 2024, arXiv:2409.20223. [Google Scholar]
Guo, X.; Liu, L. A pedestrian tracking algorithm combining Camshift and Kalman filter. J. Adv. Transp. 2016, 50, 1796–1810. [Google Scholar]
Munir, U.; Kucner, T. PTINet: Context-Aware Joint Prediction of Pedestrian Trajectory and Intention. arXiv 2024, arXiv:2407.17162. [Google Scholar]
Saleh, K.; Hossny, M.; Nahavandi, S. Real-time Intent Prediction of Pedestrians for Autonomous Ground Vehicles via Spatio-Temporal DenseNet. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; Volume 2019, pp. 9704–9710. [Google Scholar] [CrossRef]
Singh, A.; Suddamalla, U. Multi-Input Fusion for Practical Pedestrian Intention Prediction. In Proceedings of the ICCV Workshop, Montreal, BC, Canada, 11–17 October 2021; pp. 2304–2311. [Google Scholar]
Pop, D. Multi-Task Cross-Modality Deep Learning for Pedestrian Risk Estimation. Ph.D. Thesis, Universitatea Babeș-Bolyai, Facultatea de Matematică și Informatică, Departamentul de Informatică, Cluj-Napoca, Romania, 2019. [Google Scholar]
Vemula, A.; Muelling, K.; Oh, J. Modeling cooperative navigation in dense human crowds. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 1685–1692. [Google Scholar] [CrossRef]
Iccv, A.; Id, P. Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples. In Proceedings of the ICCV, Montreal, QC, Canada, 11–17 October 2021; pp. 7629–7638. [Google Scholar]
Yi, S.; Li, H.; Wang, X. Pedestrian Behavior Understanding and Prediction with Deep Neural Networks. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 263–279. [Google Scholar]
Pfeiffer, M.; Paolo, G.; Sommer, H.; Nieto, J.; Siegwart, R.; Cadena, C. A Data-Driven Model for Interaction-Aware Pedestrian Motion Prediction in Object-Cluttered Environments. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 5921–5928. [Google Scholar] [CrossRef]
Ma, Q.; Zou, Q.; Huang, Y.; Wang, N. Dynamic pedestrian trajectory forecasting with LSTM-based Delaunay triangulation. Appl. Intell. 2021, 52, 3018–3028. [Google Scholar] [CrossRef]
Liu, Y.; Yan, Q.; Alahi, A. Social NCE: Contrastive Learning of Socially-Aware Motion Representations. arXiv 2020, arXiv:2012.11717. [Google Scholar]
Huang, P.; Fang, Y.; Hu, B.; Gao, S.; Li, J. CTP-Net For Cross-Domain Trajectory Prediction. arXiv 2021, arXiv:2110.11645. [Google Scholar]
Li, J.; Ma, H.; Tomizuka, M. Conditional Generative Neural System for Probabilistic Trajectory Prediction. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Macao, Macau, 4–8 November 2019; pp. 6150–6156. [Google Scholar] [CrossRef]
Bock, J.; Krajewski, R.; Moers, T.; Runde, S.; Vater, L.; Eckstein, L. The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections. In Proceedings of the IEEE Intelligent Vehicles Symposium, Proceedings, Paris, France, 19 October–13 November 2020; pp. 1929–1934. [Google Scholar] [CrossRef]
Kotseruba, I.; Rasouli, A.; Tsotsos, J. Joint Attention in Autonomous Driving (JAAD). arXiv 2016, arXiv:1609.04741. [Google Scholar]
Liu, B.; Adeli, E.; Cao, Z.; Lee, K.H.; Shenoi, A.; Gaidon, A.; Niebles, J.C. Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction. IEEE Robot. Autom. Lett. 2020, 5, 3485–3492. [Google Scholar] [CrossRef]
Malla, S.; Dariush, B.; Choi, C. Titan: Future forecast using action priors. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11183–11193. [Google Scholar] [CrossRef]
Flohr, F.; Gavrila, D. PedCut: An iterative framework for pedestrian segmentation combining shape models and multiple data cues. In Proceedings of the BMVC 2013-Electronic Proceedings of the British Machine Vision Conference 2013, Bristol, UK, 9–13 September 2013; pp. 1–11. [Google Scholar] [CrossRef]
Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2633–2642. [Google Scholar] [CrossRef]
Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. Comput. Graph. 2007, 26, 655–664. [Google Scholar] [CrossRef]
Girase, H.; Gang, H.; Malla, S.; Li, J.; Kanehara, A.; Mangalam, K.; Choi, C. LOKI: Long Term and Key Intentions for Trajectory Prediction. arXiv 2021, arXiv:2108.08236. [Google Scholar]
Chen, T.; Jing, T.; Tian, R.; Chen, Y.; Domeyer, J.; Toyoda, H.; Sherony, R.; Ding, Z. PSI: A Pedestrian Behavior Dataset for Socially Intelligent Autonomous Car. arXiv 2021, arXiv:2112.02604. [Google Scholar]
Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2443–2451. [Google Scholar] [CrossRef]
Gu, Y.; Wang, X.; Zhang, C.; Li, B. Advanced Driving Assistance Based on the Fusion of Infrared and Visible Images. Entropy 2021, 23, 239. [Google Scholar] [CrossRef] [PubMed]
Wong, C.; Xia, B.; Hong, Z.; Peng, Q.; You, X. View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums. arXiv 2021, arXiv:2110.07288. [Google Scholar]
Yang, B.; Yan, G.; Wang, P.; Chan, C. A Novel Graph-based Trajectory Predictor with Pseudo Oracle. arXiv 2024, arXiv:2401.00001. [Google Scholar] [CrossRef] [PubMed]
Kotseruba, I.; Rasouli, A.; Tsotsos, J.K. Benchmark for Evaluating Pedestrian Action Prediction. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 1257–1267. [Google Scholar] [CrossRef]
Yang, D.; Zhang, H.; Yurtsever, E.; Redmill, K.; Ozguner, U. Predicting Pedestrian Crossing Intention with Feature Fusion and Spatio-Temporal Attention. IEEE Trans. Intell. Veh. 2022, 14, 221–230. [Google Scholar] [CrossRef]
Liang, J. From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video. arXiv 2020, arXiv:2011.10670. [Google Scholar]
Bhattacharyya, A.; Fritz, M.; Schiele, B. Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4194–4202. [Google Scholar] [CrossRef]
Chaudhary, N.; Misra, S.; Kalamkar, D.; Heinecke, A.; Georganas, E.; Ziv, B.; Adelman, M.; Kaul, B. Efficient and Generic 1D Dilated Convolution Layer for Deep Learning. arXiv 2021, arXiv:2104.08002. [Google Scholar]
Huang, Z.; Li, R.; Shin, K.; Driggs-Campbell, K. Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory Prediction. IEEE Robot. Autom. Lett. 2022, 7, 1198–1205. [Google Scholar] [CrossRef]
Fang, Z.; Lopez, A. Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4773–4783. [Google Scholar] [CrossRef]
Pool, E.A.I.; Kooij, J.F.P.; Gavrila, D.M. Using Road Topology to Improve Cyclist Path Prediction. In Proceedings of the IEEE Intelligent Vehicles Symposium, Los Angeles, CA, USA, 11–14 June 2017; pp. 289–296. [Google Scholar] [CrossRef]
Zernetsch, S.; Kohnen, S.; Goldhammer, M.; Doll, K.; Sick, B. Trajectory Prediction of Cyclists Using a Physical Model and an Artificial Neural Network. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gotenburg, Sweden, 19–22 June 2016; Volume 2016, pp. 833–838. [Google Scholar] [CrossRef]
Wong, C.; Xia, B.; Peng, Q.; Yuan, W.; You, X. MSN: Multi-Style Network for Trajectory Prediction. IEEE Robot. Autom. Lett. 2021, 14, 9751–9766. [Google Scholar] [CrossRef]
Peng, Y.; Zhang, G.; Li, X.; Zheng, L. STIRNet: A Spatial-Temporal Interaction-Aware Recursive Network for Human Trajectory Prediction. In Proceedings of the ICCV Workshop, Montreal, BC, Canada, 11–17 October 2021; pp. 2285–2293. [Google Scholar]

Figure 1. A visual representation of the paper’s structure, which outlines the key sections and their organization.

Figure 2. Generalized framework for pedestrian intention prediction [4].

Figure 3. Classification of pedestrian intention prediction.

Figure 4. An example from the JAAD dataset shows annotations including pedestrian bounding boxes with labels indicating whether they are crossing or not, along with contextual information about road infrastructure, weather conditions, and behavior labels [95].

Figure 5. Analyzing pedestrian’s features and traffic scene dynamics involves the following: (a) localizing pedestrians using bounding boxes; (b) gestures understanding through pose estimation; (c) categorizing objects using segmentation techniques; (d) comprehending overall motion patterns with optical flow; (e) calculating distances with a global depth heat-map [102].

Figure 6. (A) LSTM cell. (B) Deep RNN.

Figure 7. CNN-typical architecture.

Figure 8. GAN architecture.

Figure 9. Autoencoder architecture.

Figure 10. General behavior prediction framework. The behavior prediction module consists of an automated feature extractor (CNN, 3D-CNN, GCN, FCN, CVAE, GAN, etc.), an embedding layer (FCN and ANN), and a time series algorithm (RNN, GRU, and LSTM). This module relies on the perception module (detection, tracking, image processing, interaction representation, and feature engineering), which in turn depends on the ego vehicle sensors (camera, GPS, and wheel encoder). The outputs of the behavior prediction modules are subsequently sent to the planning module [13].

Figure 11. A visual representation highlighting the fundamental differences between aleatoric and epistemic uncertainties [245].

Figure 12. Diagram illustrating three distinct uncertainty models along with their corresponding network architectures [252].

Figure 13. A visual depiction of two distinct uncertainty-aware (UA) models. Source: Adapted from [253].

Figure 14. Block diagram of pedestrian intention prediction framework with feature categories and processing steps.

Figure 15. Pedestrian detection with SVC cameras. A Valeo 360 surround view camera provides a three-dimensional perspective of the environment [12].

Figure 16. A 3D-LiDAR sensor can be utilized for various ranges, including short, medium, telescopic, or combined ranges (such as dual short or dual medium). Depicted here is a Velodyne HDL-64E sensor and the corresponding point cloud data it produces [12].

Figure 17. LiDAR label example, Yellow = vehicle. Red = pedestrian, Blue = sign, Pink = cyclist [284].

Figure 18. The AGD326 radar, a 24 GHz pedestrian detector, is designed for optimizing crossing phases [12].

Figure 19. Example of an infrared sensor used for pedestrian detection. The sensor detects heat signatures emitted by pedestrians and other objects, providing reliable detection capabilities even in low-light or poor weather conditions. Infrared sensors are often integrated with other sensing technologies to enhance overall detection accuracy [285].

Figure 20. Overview of a GPS sensor and its operational mechanism for tracking the movement of pedestrians and other agents over time.

Table 1. Selection of important articles in the literature on knowledge-based pedestrian models (Part 1).

First Author, Year, Paper	Model	Summary of Prediction Methods	Citations
Jin et al., 2024 [193]	OPE	imulate hybrid pedestrian movement at both operational and tactical levels using video data from pedestrian route choice experiments for validation.	1
Wang et al., 2024 [194]	IKKW.	proposes a multi-grid cellular automata model to address the complex issue of mixed pedestrian-vehicle traffic. It introduces an Improved Kerner-Klenov-Wolf (IKKW) model and a pedestrian motion model based on Time-To-Collision (TTC).	15
Keke et al., 2017 [195]	cellular automata	presents a behavior-based cellular automaton model for pedestrian evacuation, considering environmental factors and neighbors’ behaviors.	49
Burstedde et al., 2011 [196]	cellular automata	Modeling collective pedestrian dynamics using a cellular automaton with a dynamic floor field to simulate behaviors like lane formation and evacuation.	2340
VD Berg et al., 2008 [135]	collision avoidance	Introduction of the Reciprocal Velocity Obstacle concept for real-time, multi-agent navigation without explicit communication, ensuring safe and oscillation-free motion.	2187
Pellegrini et al., 2009 [136]	collision avoidance	Introducing a dynamic social behavior model for object tracking that improves performance by incorporating future destinations, environmental context, and anticipatory collision avoidance.	2048
VD Berg et al., 2011 [133]	collision avoidance	Proposing a formal approach to reciprocal n-body collision avoidance for multiple mobile robots, ensuring collision-free motion through a low-dimensional linear program without inter-robot communication.	2388
Helbing et al. 1998 [127]	Force based	Describing pedestrian motion using a social force model, where internal motivations drive movements, incorporating acceleration towards desired velocity, distance maintenance, and attractive effects to realistically simulate crowd behavior and self-organization.	8786
Helbing et al., 2000 [197]	force-based	Investigating panic-induced crowd stampedes using a pedestrian behavior model to explore mechanisms of panic, jamming, and optimal escape strategies in life-threatening situations.	5695
Chraibi et al., 2010 [129]	force-based	Introducing a force-based model with elliptical volume exclusion to quantitatively describe pedestrian movement in various geometries, showing good agreement with empirical data.	427
Moussaïd et al., 2011 [167]	force-based	Introducing a cognitive science approach based on behavioral heuristics for simulating pedestrian dynamics, which improves the prediction of individual and collective behaviors, including self-organization and crowd turbulence at extreme densities.	1342
Karamouza et al., 2014 [198]	force-based	Introducing a statistical-mechanical approach to measure interaction energy between pedestrians, revealing an anticipatory power-law interaction based on projected time to collision, capable of describing various crowd behaviors.	421

Table 2. Selection of important articles in the literature on knowledge-based pedestrian models (Part 2).

First Author, Year, Paper	Model/Type	Summary of Prediction Methods	Citations
Treuille et al., 2006 [199]	Queuing	Presenting a real-time crowd model using continuum dynamics, integrating global navigation with moving obstacles through a dynamic potential field to achieve smooth crowd motion without explicit collision avoidance.	1312
Henderson et al. 1971 [111]	Gas-kinetic	Measuring speed/velocity distribution functions in crowd fluids, revealing alignment with Maxwell-Boltzmann theory but with deviations near the frequency mode due to sexual inhomogeneity.	901
Hughes et al., 2002 [200]	Fluid-dynamic	Derivation and analysis of equations governing two-dimensional pedestrian flow, exploring high-density and low-density regimes and their application to understanding and improving pedestrian flow on the Jamarat Bridge.	1441
Chowdhury et al., 2000 [113]	Review	Critical review of microscopic vehicular traffic models using statistical physics approaches, with a focus on particle-hopping models and their application to phenomena like phase transitions, criticality, and self-organized criticality.	3094
Bellomo et al., 2011 [115]	Review	Review and critical analysis of mathematical models for vehicular traffic and crowd dynamics, emphasizing challenges in modeling complex systems and proposing a unified modeling strategy.	578
Helbing et al., 2000 [159]	Review	Review of traffic dynamics using methods from statistical physics and non-linear dynamics to explain various phenomena, including phantom traffic jams, stop-and-go traffic, and the self-organization of pedestrian and vehicle systems.	4364
Castellano et al., 2009 [114]	Review	Review of the application of statistical physics to social phenomena, including opinion dynamics, crowd behavior, and social spreading, with emphasis on model comparisons with empirical data.	4995
Bechinger et al., 2016 [182]	Review	Comprehensive review of self-propelled Brownian particles, exploring their interactions in complex environments and their potential applications in health care, sustainability, and security.	2987

Table 3. Selection of deep learning-based pedestrian models.

Author, Year, Paper	Model Name	Summary of Prediction Methods	Datasets/Results
Sharma et al., 2025 [207]	MHSWA	introduce the Multimodal IntentFormer architecture for pedestrian crossing intention prediction in autonomous driving. This model uses three transformer encoders to learn from RGB images, segmentation maps, and trajectory paths, all integrated through a Co-learning module. The architecture incorporates a Multi-Head Shared Weight Attention mechanism and is regulated by a novel Co-learning Adaptive Composite (CAC) loss function to improve generalization and reduce overfitting.	JAAD ACC: 92%, PIE ACC: 93%
Khindkar et al., 2024 [208]	MINDREAD	Tackle the challenge of pedestrian intent prediction by introducing the novel PIE++ dataset, which includes multi-label textual explanations for pedestrian intent, aiming to understand not just “what” pedestrians will do, but “why” they will do it. They propose the MINDREAD framework, a multi-task learning model that uses cross-modal representation learning to predict both pedestrian intent and the reasons behind it.	JAAD ACC: 92%, PIE ACC: 93%
Ahmed et al., 2023 [209]	LSTM	present a novel pedestrian intent prediction model that addresses the challenge of varying pedestrian scales using 2D pose estimation and an LSTM architecture. The model extracts keypoints across video frames to generate spatio-temporal data, which is then used by the LSTM to classify pedestrian crossing behavior.	JAAD, PIE, ACC: 94%
Song et al., 2022 [210]	II-GRU	propose a graph-structured model for pedestrian behavior prediction, which constructs a traffic-aware scene graph to capture interactions between pedestrians and traffic elements. The model uses a temporal feature representation with inter-frame and intra-frame GRUs (II-GRU) to process dynamic constraints and employs a novel attention mechanism to adaptively focus on relevant features.	JAAD, PIE
Rasouli et al., 2021 [211]	PIE traj	The model is suitable for real-time onboard camera-based intention predictions but only considers the local immediate scene context and does not predict actions along with intent.	PIE and JAAD
Liang et al., 2019 [212]	Next	The model supports joint prediction of trajectory and activity but fails to capture the multimodal nature of human trajectories and does not account for group dynamics.	ETH, UCY, ActEV/VIRAT
Xue, H. et al., 2018 [34]	SS—LSTM	This method employs an LSTM network with 128 dimensions, featuring an encoder–decoder structure. Non-linear ReLU activations are applied within the hidden states to enhance predictive capabilities.	ETH: [ADE: 0.095, FDE: 0.235]; UCY: [ADE: 0.081, FDE: 0.131]; Town Center: [ADE: 29.01 (0.8 s), FDE: 36.88 (0.8 s)]
Rasouli et al., 2018 [95]	AlexNet + FCN	The model integrates intention and trajectory prediction into a single framework. However, it lacks temporal context, struggles to differentiate between standing and walking actions, and cannot accurately classify the intentions of pedestrians with obscured faces. Additionally, it does not estimate the future positions of objects in its trajectory predictions and fails to account for scene dynamics.	PIE

Table 4. Selection of DNN architectures for prediction methods (CNN).

Author, Year, Paper	Model	Summary of Prediction Methods (CNN)	Datasets/Results
Zhao et al., 2019 [213]	MATF	The approach employs an encoder–decoder setup. It addresses the challenge of capturing multimodal uncertainties by utilizing a generator-discriminator pair. The encoder handles dynamic scenes via an LSTM layer and static scenes with a CNN layer, while the decoder processes through an LSTM layer.	ETH: [ADE (Deterministic): 0.64, ADE (Stochastic): 0.48, FDE (Deterministic): 1.26, FDE (Stochastic): 0.90]; Stanford-Drone: [ADE (Deterministic): 30.75, ADE (Stochastic): 22.59, FDE (Deterministic): 65.90, FDE (Stochastic): 33.53]
Lv et al., 2021 [214]	DeepPTP	The model has a lightweight structure with high convergence ability and minimal overfitting, resulting in reduced training time and a low computational burden. However, it faces a trade-off between accuracy and training speed, and its accuracy declines when classification categories are augmented.	VRU trajectory
Marchetti et al., 2020 [215]	MANTRA	This method features an encoder–decoder architecture with an autoencoder system. The encoder is responsible for converting past and future data points into meaningful representations, while the decoder aims to reconstruct future trajectories. The model incorporates a memory network layer to refine predictions by leveraging both past and future information.	KITTI: [ADE: 0.16 (1 s), FDE: 0.25 (1 s)]; Cityscapes: [ADE: 0.49, FDE: 0.79]; Oxford RobotCar: [ADE: 0.31 (1 s), FDE: 0.35 (1 s)]
Liang et al., 2020 [59]	SimAug	The model is generalizable and robust to variations in camera views, motions, and scene semantics. However, it has not been trained on real-world data.	SDD, VIRAT/ActEV, Argoverse
Mangalam et al., 2021 [216]	Y-Net	The model addresses the dichotomy between epistemic and aleatoric uncertainty in trajectory prediction. However, its suitability for real-time implementation has not been demonstrated.	SDD, ETH, UCY, InD
Mohamed et al., 2020 [217]	Social—STGCNN	The method constructs a spatio-temporal graph $G = (V, A)$ , which is processed through a spatio-temporal graph CNN. The TXP-CNN layer is then used to predict future trajectories. Here, P represents the position dimensions of pedestrians, N indicates the number of pedestrians, and T denotes the number of time steps.	ETH: [ADE: 0.64, FDE: 1.11]; UCY: [ADE: 0.44, FDE: 0.79]
Wang et al., 2021 [218]	MI—CNN	This model applies an encoder–decoder module designed to encode and decode information about pedestrians. The encoder is divided into four sections and captures pose, 2D and 3D size, historical trajectories, and depth information. The decoder mirrors the encoder in terms of kernel size and stride to ensure consistent processing.	MOT16: [ADE: 18.25, FDE: 21.70]; MOT20: [ADE: 16.63, FDE: 19.34]

Table 5. Selection of DNN architectures for prediction methods (GAN).

Author, Year, Paper	Model	Summary of Prediction Methods (GAN)	Datasets/Results
Zhang et al., 2021 [219]	Scene Feature Extraction Module + Generator + Discriminator	The model analyzes both individual and group behaviors. However, it lacks sufficiently diverse training examples to effectively capture complex, non-linear human interactions.	ETH/UCY, CUHK, and CrowdFlow
Fernando et al., 2018 [220]	DGMMPT	The method introduces an algorithm for multi-person tracking data association. The generator is built from an encoder comprising Convolution-BatchNorm-ReLU layers, an LSTM, and a decoder with Convolution-BatchNorm-Dropout-ReLU layers. The discriminator mirrors the encoder’s structure.	3D MOT 2015, AVG-Town Centre: MOTA: 42.5, MOTP: 69.8
Huang et al., 2021 [72]	STI-GAN	The model effectively captures both spatial and temporal characteristics of complex human behavior. However, as pedestrian density increases, the model’s complexity and computational burden grow significantly. Additionally, it does not account for human-space interactions.	ETH and UCY
Gupta et al., 2018 [68]	Social GAN	This network is designed to learn social norms through a data-driven approach. It utilizes a generator with an LSTM-based encoder, a pooling module, and an LSTM-based decoder. The discriminator shares the same architecture as the encoder.	ETH: ADE: 0.39/0.58, FDE: 0.78/1.18
Liang et al., 2020 [55]	TPNMS (Temporal Pyramid Network with Multi-Supervision)	The model effectively utilizes both short-term and long-range behavioral cues. However, the lack of scene knowledge limits its ability to generalize across different scenarios.	ETH and UCY
Kosaraju et al., 2019 [221]	Social-BiGAT	The method employs a graph-based generative adversarial network, specifically using a graph attention network (GAT), to create robust feature representations. These features are designed to encode social interactions among individuals within a scene.	ETH: ADE: 0.69, FDE: 1.29
Zou et al., 2018 [57]	SA-GAIL	The model uses an unsupervised approach and effectively mimics human collision avoidance. However, it struggles with long-term trajectory prediction.	Central Station dataset
Amirian et al., 2019 [58]	SocialWays GAN	The method combines Info-GAN with hand-crafted interaction features inspired by neuroscience and biomechanics. These features aim to enhance the understanding of human interactions for more accurate predictions.	ETH: ADE: 0.39, FDE: 0.64; UCY: ADE: 0.55, FDE: 1.31
Kothari et al., 2018 [222]	FSGAN	This method incorporates two attention modules: one for physical attention and another for social attention. These modules enhance the GAN’s ability to focus on relevant aspects of the scene for more precise predictions.	ETH: ADE: 0.70, FDE: 1.43; UCY: ADE: 0.54, FDE: 1.24

Table 6. Selection of DNN architectures for prediction methods (autoencoders).

Author, Year, Paper	Model	Summary of Prediction Methods (Autoencoders)	Datasets/Results
Bhattacharyya et al., 2020 [94]	CF-VAE (Conditional Flow Variational Autoencoders)	The model is versatile and can be applied to various types of traffic participants, but it struggles with accurately modeling highly complex multimodal distributions.	MNIST, SDD, HighD
Chen et al., 2021 [223]	Graph Convolutional Autoencoders	This approach is well-suited for egovehicle perspectives, where the scenes are dynamic and affected by egonoise. However, using multiple modalities may lead to longer inference times for predictions.	PIE
Zhou et al., 2021 [224]	S-CSR (Cascaded CVAE with Socially Aware Rethinking)	SFMGNet shows a substantial reduction in ADE (Average Displacement Error) and FDE (Final Displacement Error) compared to existing state-of-the-art methods. However, its effectiveness may diminish in highly complex and crowded environments due to its limited consideration of social context and group behavior.	ETH/UCY, SDD

Table 7. Summary of studies applying uncertainty estimation on their PIP applications (sorted by year).

Author, Year, Paper	Model/Method	Summary of Prediction Methods	Application/Dataset/Results
Particke et al. (2020) [254]	Multi-hypotheses filtering	The study enhances situational awareness in autonomous driving by improving pedestrian state estimation with semantic information. It models pedestrian environments, intentions, and interactions using potential fields and a neural network, integrating them into a Kalman filter. A multi-hypothesis filter predicts movement, while a confidence score detects intention changes. A risk score estimates collision probability, and a joint probabilistic data association filter (JPDAF) improves tracking in small groups. The methods are validated through simulations and real-world data from a stereo vision camera in a parking area.	PIP
Dai et al. (2021) [255]	Reinforcement learning with uncertainty-based soft labels	This study proposes a reinforcement learning framework to improve pedestrian motion prediction in autonomous driving. By generating soft labels and incorporating predictive uncertainty, the method enhances accuracy, reliability, and efficiency, outperforming traditional models on benchmark datasets.	PIP
Upreti et al. (2023) [256]	OoD	This work improves pedestrian crossing intention prediction for autonomous driving by incorporating traffic light status as an additional input. The approach enhances model reliability and interpretability by estimating uncertainty, reducing overconfidence in out-of-distribution cases. Experiments on the PIE dataset show up to a 5% F1-score improvement across multiple models.	PIP/PIE
Zhang et al. (2023) [257]	transformer-based evidential prediction/(TrEP)	This study addresses the challenge of predicting pedestrian intentions in autonomous driving using a transformer-based evidential prediction (TrEP) algorithm. The model captures temporal correlations in pedestrian video sequences and quantifies AI uncertainty in complex scenes. Experimental results on three benchmark datasets demonstrate its superiority over state-of-the-art methods, with performance improving by managing uncertainty levels. A comparison between human disagreements and AI uncertainty further evaluates the model’s effectiveness in ambiguous scenarios.	PIP/PIE
Liu et al. (2024) [258]	Gaussian Mixture Model (GMM)	This work enhances pedestrian trajectory prediction by separately modeling movement complexity and individual uncertainty. It introduces a framework using Gaussian mixture densities to represent future locations, improving predictive diversity. Unlike previous approaches, it explicitly models uncertainty, enabling more reliable trajectory generation. The method outperforms state-of-the-art models on public benchmarks, even with lightweight architectures.	PTP/ETH, UCY, SDD
Chen et al. (2024) [259]	Multi-task learning framework	This paper introduces a cross-modal transformer-based model for pedestrian crossing intention prediction using only bounding boxes and ego-vehicle speed. The model leverages self-attention and cross-modal attention to extract meaningful correlations and employs bottleneck feature fusion for efficient representation. A novel uncertainty-aware multi-task learning approach jointly predicts future bounding boxes and crossing actions to enhance performance. Experiments on benchmark datasets show that the model achieves state-of-the-art results despite using fewer input features.	PIP/JAAD, PIE

Table 8. Datasets used in pedestrian intention prediction and related works.

Dataset	Description	Notable Papers
JAAD	A dataset focusing on pedestrian behavior in urban environments, with annotations for actions such as looking, crossing, and standing.	[24,27,29,36,49,59,90,93,101,207,208,209,210,211,261,264,265,266]
PIE	A dataset that includes pedestrian trajectories, head orientation, and gaze data, aimed at predicting pedestrian crossing intentions.	[36,90,93,95,207,208,209,210,211,261]
ETH/UCY	A dataset originally used for trajectory prediction but also applied in PIP research, focusing on pedestrian interactions in various settings.	[7,9,32,33,34,51,56,60,75,85,212,216,224,267,268,269,270,271,272,273,274]
Waymo Open Dataset	A large-scale dataset containing diverse driving scenarios with detailed annotations for pedestrians and other road users.	[6,7,68,217]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mirzabagheri, A.; Ahmadi, M.; Zhang, N.; Alirezaee, R.; Mozaffari, S.; Alirezaee, S. Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review. Vehicles 2025, 7, 57. https://doi.org/10.3390/vehicles7020057

AMA Style

Mirzabagheri A, Ahmadi M, Zhang N, Alirezaee R, Mozaffari S, Alirezaee S. Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review. Vehicles. 2025; 7(2):57. https://doi.org/10.3390/vehicles7020057

Chicago/Turabian Style

Mirzabagheri, Alireza, Majid Ahmadi, Ning Zhang, Reza Alirezaee, Saeed Mozaffari, and Shahpour Alirezaee. 2025. "Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review" Vehicles 7, no. 2: 57. https://doi.org/10.3390/vehicles7020057

APA Style

Mirzabagheri, A., Ahmadi, M., Zhang, N., Alirezaee, R., Mozaffari, S., & Alirezaee, S. (2025). Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review. Vehicles, 7(2), 57. https://doi.org/10.3390/vehicles7020057

Article Menu

Navigating Uncertainty: Advanced Techniques in Pedestrian Intention Prediction for Autonomous Vehicles—A Comprehensive Review

Abstract

1. Introduction

2. Related Work

2.1. Pedestrian Intention and Behavior Estimation

2.2. Trajectory Prediction in Crowd Scenarios

2.3. VRU Intention Estimation and Safety

2.4. Scene Understanding and Event Reasoning

2.5. Specialized Approaches and Case Studies

2.6. Use of Historical Road Incident Data for Road Redesign Potential

2.7. Unique Contributions of This Survey

3. Pedestrian Intention Prediction Approaches

3.1. Classification Based on Duration of Prediction

3.1.1. Short-Term Prediction

3.1.2. Long-Term Prediction

3.2. Classification Based on the Selected Features

3.2.1. Pedestrian-Centric Features

3.2.2. Contextual Features

3.2.3. Hybrid Features

3.3. Classification Based on the Type of Model

3.3.1. The Knowledge-Based Approach

3.3.2. The Deep Learning Approach

3.3.3. Recent Advances in Deep Learning-Based Models

3.3.4. The Ensemble Approach

3.3.5. Visualization of PIP Classification Systems

4. Uncertainty Measurement

4.1. Epistemic Uncertainty (Model Uncertainty)

4.2. Aleatoric Uncertainty (Data or Environmental Uncertainty)

4.3. Importance of Uncertainty in Pedestrian Intention Prediction

4.4. Addressing Uncertainty in Pedestrian Intention Prediction

4.5. Balancing Computational Efficiency and Safety

4.6. Impact of Traffic Regulations on Pedestrian and Trajectory Prediction

4.6.1. Key Regulatory Constraints

4.6.2. Incorporating Regulations into Prediction Models

5. Datasets

5.1. Dataset Requirements for Pedestrian Intention Prediction

5.2. Popular Datasets for Pedestrian Intention Prediction

5.3. Sensors Used in Pedestrian Intention Prediction Datasets

6. Performance Evaluation Metrics

6.1. Average Displacement Error (ADE)

6.2. Final Displacement Error (FDE)

6.3. Minimum ADE ( minADE K )

6.4. Minimum FDE ( minFDE K )

6.5. Center Mean Square Error (CMSE)

6.6. Center Final Mean Square Error (CFMSE)

6.7. Multiple Object Tracking Accuracy (MOTA)

6.8. Multiple Object Tracking Precision (MOTP)

7. Challenges and Future Directions

7.1. Complexity in Modeling Human Behavior

7.2. Handling Stochastic Human Trajectories

7.3. Improving Spatial and Temporal Consistency

7.4. Addressing Data Limitations

7.5. Bias Due to Occlusion and Tracking Issues

7.6. Integrating Various Road Users

7.7. Latent Behavioral Traits and Personalization

7.8. Adapting to Variances in Camera Views

7.9. Implications for Urban Design and Policy

7.10. Validation and Calibration in Pedestrian Intention Prediction

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.3. Minimum ADE ( ${minADE}_{K}$ )

6.4. Minimum FDE ( ${minFDE}_{K}$ )