1. Introduction
The problem of trajectory prediction involves forecasting the path a vehicle is going to take given its past trajectory and surroundings. A solution to this problem would have applications in surrogate safety analysis [
1], evaluating road safety, and infrastructure-based safety systems for providing early crash warnings [
2]. Solving this problem is also of critical importance for advanced driver assistance systems (ADAS) [
3,
4,
5] and autonomous vehicles (AV) [
3,
6,
7]. Solving this problem would also enable us to generate simulations of intersections that better conform to the reality of human driving. These more realistic simulations make it possible to predict the behavior of human drivers at intersections prior to their construction. This would allow for better safety assessments at intersections [
8]. When cast as a control problem, i.e., a problem of finding the correct control behavior, solving the problem of trajectory prediction would be equivalent to training a model to drive similar to human drivers. This enables applications where human-like driving is desired. This problem is partly related to the problem of vehicle tracking, i.e., the problem of identifying and following the motion of vehicles in a video feed. While vehicle tracking deals with identifying the current motion of vehicles, trajectory prediction deals with predicting their future movements. The data required for trajectory prediction is the output of solving the vehicle tracking problem. In this work, we focused solely on the prediction problem.
Vehicle trajectory prediction is of particular interest at intersections, where a great number of conflicts between road users could increase the likelihood of accidents [
9]. According to the National Traffic Safety Administration, between 2014 and 2018, about 40 percent of all crashes and 24 percent of fatal crashes occurred at intersections. With the advent of smart cities and smart vehicles, infrastructure to vehicle (I2V) and vehicle to vehicle (V2V) communications will be made possible. In conjunction with a trajectory prediction system, these advances in vehicle and infrastructure technology will enable us to enhance the safety of intersections by predicting collisions [
10,
11] and risky driving behavior [
12] (e.g., red-light running) and deploying countermeasures to help avoid or mitigate crashes, such as early crash warnings [
13,
14,
15,
16,
17], or real-time signal timing adjustments [
18]. Being able to project vehicles’ trajectories into the future is also important in automated driving applications because, so long as automated vehicles share roads with human driven vehicles, they need to know how human drivers act in different situations and must also behave in ways that conform to human drivers’ expectation of other vehicles, i.e., similar to other human drivers. It is, therefore, important that automated vehicles have a model of vehicle motion in different situations including at intersections.
A wide range of approaches have been used in tackling the trajectory prediction problem, ranging in complexity from models that assume that the vehicle will maintain its velocity or acceleration and (rate of change of) heading for the duration for which trajectory prediction is going to be performed [
19], to those that try to capture more of the complexities of vehicle motion by modeling different maneuvers, but that still disregard the influence of other vehicles [
20], to models that take the interactions between traffic actors into account when predicting the future motion of vehicles [
21]. The tools used in developing these approaches are also quite varied and include Kalman filters [
15], hidden Markov models [
22], Gaussian processes [
20], Bayesian networks [
14], Gaussian mixture models [
9], and neural networks [
6]. These studies all formulate the problem of trajectory prediction as a prediction task, which is to say that they directly predict the entire future trajectory of the vehicle; however, it can also be formulated indirectly as a control task in which control actions (e.g., changes in heading and velocity) are determined at each timestep and the trajectory can then be predicted by tracing the motion of the vehicle based on these actions. In this case, we will be dealing with a learning from demonstration (LfD) problem [
23] in which we are interested in learning, from human driving data, what actions should be taken to properly control a vehicle.
In this work, we developed a new solution using a hybrid approach combining elements from the prediction formulation and the control formulation based on a research project that we conducted [
24]. We adopted a two-step approach to solving the problem. In the first step, we represented vehicle trajectories as B-spline curves and trained a neural network model to predict the coefficients of these B-spline curves. A conditional variational autoencoder was then used to generate candidate trajectories from these predicted coefficients. Similar approaches to trajectory representation have been used before, such as representing trajectories using Chebyshev polynomials [
9]; but, to the best of our knowledge, this is the first work to use B-spline curves for this purpose. The reason why we chose B-spline curves for representing the trajectories is that B-spline curves can approximate complex curves with local control over the shape of the curve, while avoiding problems, such as oscillations at the edges of the interval (known as Runge’s phenomenon), that are encountered when using high degree polynomials. In the second step, the candidate trajectories were ranked using an inverse reinforcement learning (IRL) [
25] model, in which a convolutional neural network was used as the approximator for the recovered reward function. IRL is a technique for solving control problems by learning from demonstration and has previously been used to solve the trajectory prediction problem in highways [
26,
27]; but, to the best of our knowledge, this is the first work to investigate its application to the problem at intersections. This is also the first work to use MaxEnt IRL to select from a set of candidate trajectories. The work in [
28] also used an IRL-like approach to rank candidate trajectories, but used an ad hoc formulation. Trajectory prediction at intersections involves challenges not encountered in highways, such as the presence of various conflict types, multiple types of road users (vehicles, pedestrians, and bicycles), and more complicated traffic control devices. Here, we used IRL to develop methods that can address some of these complexities. The IRL model was trained using the B-spline smoothed trajectories and the context of the vehicle at the intersection, i.e., the other vehicles present at the intersection. The second step allowed us to predict trajectories that are more human-like and also to take interactions between the vehicles at the intersection into account. For the training and evaluation of our method we used the Lankershim boulevard dataset from the Next Generation Simulation (NGSIM) dataset collection [
29]. In summary, the main contributions of this work are investigating (a) the use of B-spline curves to represent vehicle trajectories, (b) the use of inverse reinforcement learning in trajectory prediction at intersections, and (c) the use of MaxEnt IRL to rank a set of candidate trajectories.
2. Related Work
The approaches to trajectory prediction can be classified into three broad categories [
3]: physics-based [
10,
13,
15,
19,
30,
31,
32,
33], maneuver-based [
5,
7,
9,
16,
17,
21], and interaction-aware [
34,
35,
36,
37]. Physics-based models, as the name suggests, deal with the physics of vehicle motion and assume that vehicles’ trajectories are determined solely by physical forces, disregarding driver decisions that affect steering and acceleration. Consequently, these models fail to accurately predict vehicle motion beyond a short horizon. Maneuver-based models take driver actions into account, but only in a vacuum, i.e., they consider these decisions to be determined solely by the position and the preceding trajectory of the vehicle of interest, ignoring the influence other road users have on these actions, which leads to less reliable projections of future motion. Interaction-aware models perform trajectory prediction by taking the presence of other road users into account. Comprehensive reviews of the three modeling approach categories can be found in [
3,
38]. The present work falls within the third category (i.e., interaction-aware models). What follows is a summary of interaction-aware models in the literature, previous studies that have applied IRL to the problem of trajectory prediction, and works that involve the application of trajectory prediction to intersection safety.
2.1. Interaction-Aware Models
In [
34], a trajectory prediction framework based on a radial basis function (RBF) network and particle filter proposed in [
5] was used to predict the joint trajectory of two vehicles at intersections. This was performed by penalizing those trajectories that lead to avoidable collisions (i.e., trajectories for which the time to collision is larger than the drivers’ reaction times). Coupled hidden Markov models [
22] were used in [
21] with the assumption of asymmetric interactions, i.e., other vehicles influence the vehicle of interest, but not vice versa, to predict driver behavior. In [
35], the intelligent driver model was used to infer the intent of drivers at intersections in the presence of a preceding vehicle. A probabilistic graphical model and recursive Bayesian filtering were used in [
36,
39] to perform interaction-aware driving behavior prediction. In [
37], a dynamic Bayesian network (DBN) was used in conjunction with a factored state space that allows for a model with less computational complexity. DBNs were also used in [
40] to jointly model what drivers intend to do and what they are expected to do in a traffic context. In [
6], traffic contexts were rasterized into two dimensional images and a deep convolutional neural network was then used to perform trajectory prediction. In [
41], a generative adversarial network (GAN) was used to model driver behavior in highways. A solution to a restricted version of the trajectory prediction problem, that of predicting the changes in velocity along a predetermined path at unsignalized intersections, was proposed in [
42]. This work modeled the problem as a partially observable Markov decision process in which the intended path of the other vehicles constitute the hidden variables. Partially observable Markov decision processes were also used in [
43] for AV decision making in scenarios, including roundabouts and T junctions. In [
44], deep neural networks (DNNs) and long short term memory (LSTM) networks were used to predict vehicle trajectories at intersections. A technique called social pooling was used with LSTM and deep CNNs in [
45] to address the interactions between vehicles in trajectory prediction in a highway setting. In [
46], a specially designed “influence network” was used in conjunction with a DBN to perform vehicle trajectory prediction at intersections. A similar solution to the trajectory prediction problem based on DBNs was proposed in [
14].
2.2. Trajectory Prediction Using IRL
Several studies have used IRL to model driving, mostly in the context of highways. In [
26], IRL was used to learn driving in highways from human demonstrations in a simulated environment. The use of IRL was motivated by the desire to achieve more humanlike behavior and a better ability to handle new scenarios. Deep Q-networks were used to address the exploding state space issue encountered in using IRL in a setting with a large state space. In addition to using a simulated environment instead of real-world data, this study contained several other limitations, such as using constant speed and having at most two cars in front of the vehicle. The authors in [
27] had similar motivations in using IRL for the task of learning individual driving styles on highways. The driving behavior of a number of drivers was recorded as they drove a car fitted with a variety of sensors on a highway. Maximum entropy IRL was then used to train a model to make driving decisions in styles similar to each of the individual drivers. This work used a reward function that was a linear function of a number of manually defined features such as acceleration, deviation from lane center, and distance to other vehicles. These last two works considered the control problem that was mentioned earlier in the introduction section. In both studies, the use of IRL allowed for faithful replication of human driving behavior and an ability to generalize to new situations. In [
47], a hierarchical learning framework was proposed, in which IRL was used to predict interactive driving behavior on two levels with a case study of ramp merging. The different levels of decision making in their framework consisted of discrete, high-level decisions (e.g., whether to merge after or before a given car in their case study) and low-level continuous actions (e.g., the acceleration and heading changes at each timestep.) Similar to the previous study, the reward function in this work was formulated as a linear function of several manually defined features. A notable limitation of this work is that the high-level discrete decisions and their corresponding low-level continuous features need to be manually defined based on the particular scenario (e.g., ramp merging) at hand. In [
28], a generative framework based on conditional variational autoencoders using recurrent neural networks was used to generate possible future trajectories. An IRL approach was used to rank and refine the trajectories generated by the generative framework. It is noteworthy that this work did not use any of the commonly employed IRL formulation, but rather integrated a reward function into a larger framework, where the reward function parameters were optimized in tandem with the rest of the architecture and the optimization method was dependent upon the sample generating component of the framework. IRL was used in [
48] to choose from a set of trajectories generated using a rule-based method in a highway environment. IRL was chosen as the approach for this study because it allowed for a hybrid method that did not require mappings from circumstances to vehicle control to be manually engineered and, at the same time, produced interpretable results. In [
49], a trajectory prediction method based on an encoder-decoder approach using RNNs was proposed, which used IRL as a regularizer for the training of the encoder-decoder network. The use of IRL as a regularizer was intended to help the model better utilize the scene context information. IRL was used to directly predict trajectories in a highway environment in [
50]. A summary of the studies enumerated above is presented in
Table 1.
2.3. Trajectory Prediction for Intersection Safety
In this subsection, we will explore in more detail those studies that have considered the trajectory prediction problem from the viewpoint of the infrastructure and whose proposed solutions cover the problem at intersections.
Trajectory prediction has several applications for intersection safety. One such application is the detection of risky driving behaviors such as dangerous turns [
16], red-light running [
12,
16,
18], abrupt stops, aggressive passes, speeding passes, and aggressive following [
12]. Trajectory prediction is also instrumental to the early prediction of turning movements, which is helpful in avoiding accidents [
43]. Collision prediction, avoidance/mitigation [
13,
14,
15,
19], and risk assessment [
10,
11,
17] also make use of trajectory prediction. Each of the studies reviewed in this subsection used their solutions to the problem of trajectory prediction to tackle one or more of these applications.
Table 2 presents, for each study, the features used for trajectory prediction (Predictors), the sensors used for collecting these features’ data (Data Collection Sensors), the number of intersections where data were gathered for training (if applicable), the duration for which data needed to be collected before starting to make predictions (monitoring period), how far into the future the predicted trajectories stretch (prediction horizon), what evaluation metric was used for measuring the performance of either the trajectory prediction method, or the safety system as a whole (evaluation metric), the applications that were tested if applicable (tested applications), interactions between which types of road users were considered (interaction type), and what movements leading to possible hazards were considered.
Most studies have focused on predicting and mitigating crashes. In [
10], the authors proposed a method for collision risk estimation between vehicles based on real time trajectory prediction. The method used for trajectory prediction in this work was a linear Kalman filter. GPS data was used for determining the position of vehicles, and risk estimation was performed using the time to collision (TTC) predicted from the predicted trajectories. Another work to use TTC from predicted trajectories for collision risk estimation was [
13], which also used a Kalman filter for trajectory prediction and DGPS as the position sensor. A system for threat assessment and decision-making system was proposed in [
15], which used an unscented Kalman filter for trajectory prediction. A probabilistic threat assessment method was also developed for threat assessment, along with a decision-making protocol for whether an intervention is necessary. In [
14], an accident prewarning system was developed with a trajectory prediction method based on a DBN and a risk assessment method based on the identification of risky driving behavior. They also presented a method for deciding the collision avoidance strategy that is based on TTC and time to avoidance (TTA) matrices. An intersection safety system was developed in [
11], which used video data to predict the trajectory of vehicles at intersections and to detect dangerous situations involving both vehicles and pedestrians using TTC and post encroachment time (PET). For trajectory prediction, it was assumed that vehicles drive according to “average drive lines,” which were predefined average trajectories for vehicles. In [
17], a trajectory prediction method based on extended Kalman filters was developed and used to identify conflict areas between vehicles and other road users and calculate time to enter (TTE) and time to leave (TTL) for these road users and conflict areas. An object-oriented Bayesian network was then used to estimate collision probability. In [
16], a maneuver prediction model was presented for use in an infrastructure-based intersection safety system. The proposed system used location, speed, and acceleration data transmitted by vehicles and roadside sensors for maneuver prediction. The objective of the system was to provide warnings for red-light violations and right and left turning hazards.
There are also other studies that have focused on other applications such as the identification of certain behaviors. In [
12], the authors developed a trajectory prediction method for identifying risky behavior at high-speed intersections that are caused by the lengthy warning sequence at the end of the green phase at these intersections. A notable feature of their method is that it divides the problem into two cases: the case where the vehicle has enough distance from its leading vehicle that it acts independently of it, and the case where the vehicle’s movements are influenced by the behavior of the leading vehicle (i.e., time headway to the leading vehicle is less than 6 s). A trajectory prediction method was developed in [
44] for predicting turning movements at intersections. Video data from three intersections was used to extract vehicle trajectories and to train neural network models for predicting vehicle trajectories. In the process of predicting the turning movement of the vehicles, after a vehicle’s trajectory has been predicted, it is compared against “typical paths” in order to obtain the final turning prediction (left, right, or through). In [
46], trajectory data transcribed from a video camera was used to train neural network models for trajectory prediction of both vehicles and pedestrians, which can be used for predicting high level behavior. A red-light running prediction method was proposed in [
18], which used trajectory prediction to detect red-light running ahead of time and dynamically extend the all-red phase of the intersection signals to mitigate accidents. A method for collision risk prediction and warning was proposed in [
19], which estimated the minimal future distance between possibly conflicting vehicles using a physics-based trajectory prediction method.
4. Results and Discussion
In our experiments, we used the Lankershim Boulevard data from the NGSIM dataset. We extracted vehicle trajectories from this data and fit B-spline curves to the extracted trajectories. Of the resulting data, 10% was set aside as test data (distributed uniformly over the three different movement types). We then trained a neural network to predict the coefficients of the B-spline curves corresponding to the trajectories using 10-fold cross validation on the rest of the data. The neural network had the following input features: the x and y distance from the center of the approach from which the vehicle entered the intersection to the centers of the three road segments by which the vehicle can exit the intersection, the distance of the vehicle from the center of the approach, velocity before entering the intersection, vehicle acceleration before entering the intersection, vehicle heading before entering the intersection, average vehicle velocity over the monitoring period (2 s in the final model), average vehicle acceleration over the monitoring period, and the turning movements allowed for the lane that the vehicle was in. We then generated candidate trajectories by randomly perturbing the predicted coefficients. An IRL model was trained in the following manner: the B-spline smoothed trajectories of the vehicles were embedded into images containing the geometry of the intersection, as well as the trajectories of the other vehicles present at the intersection (at test time, the trajectories predicted in the first step were used.) For the reward function approximator, we used a pretrained convolutional neural network, namely MobileNetV2, with the final softmax layer removed. As noted in the “Methods” section, training an IRL model using the GCL algorithm involved sample generation. This was done by changing the trajectory of the ego vehicle with respect to the sampled actions while maintaining the original trajectory of other vehicles. The trained IRL model gave us a recovered reward function that was subsequently used to score the candidate trajectories generated in the first step of the algorithm. The candidate trajectory scoring the highest was the final prediction of the model.
The results of our experiments are summarized in
Table 4. We see that the first step of our method without ranking by the IRL module already outperformed the baseline model. The addition of the IRL module further improved the performance of the model. Most of the works reviewed in the “Related Works” section either did not provide quantitative results of their methods or reported metrics on downstream tasks only. Of those that reported performance on the trajectory prediction task, none reported results on the same dataset as ours. However, to give a point of comparison, we have included results from two studies that reported results from comparable experiments.
For a qualitative assessment of the performance of the model, we can consider the trajectories in
Figure 2. Here, we have the ground truth trajectory of a left turn in blue with the prediction of the first step in red and, finally, the trajectory assigned the highest score by the IRL method in green. We can observe that the trajectory selected by the IRL module is not only closer in location to the ground truth trajectory, but also more similar to it in shape and direction.
To better understand the performance of the model, as well as the way in which the IRL module improves predictions, we consider the errors of the models broken down by movement type, i.e., whether the vehicle in question was going through the intersection, turning right, or turning left. The error values for different movement types are reported in
Table 5, showing that the effect of the IRL scoring module is more pronounced in predicting turning movements. This can be explained by the fact that predicting the trajectory of turning movements is more difficult; the IRL scoring module is, therefore, more likely to find a better trajectory among the generated candidates and return it as the top scoring trajectory. In
Figure 3, we can see a boxplot of the RMSE values by movement.
We can also look at the error of the models as a function of the prediction horizon. These figures are reported in
Table 6. We again notice that, as the task gets more difficult, the impact of the IRL scoring module increases. Here, we see that the further the prediction horizon is, the more the IRL scoring module is able to improve predictions. This can be explained in the same way as the previous observation with through and turning movements: as the trajectories get more difficult to predict, the IRL scoring module is more likely to select a trajectory that is considerably more accurate from the set of candidate trajectories.