Probabilistic Traffic Motion Labeling for Multi-Modal Vehicle Route Prediction

The prediction of the motion of traffic participants is a crucial aspect for the research and development of Automated Driving Systems (ADSs). Recent approaches are based on multi-modal motion prediction, which requires the assignment of a probability score to each of the multiple predicted motion hypotheses. However, there is a lack of ground truth for this probability score in the existing datasets. This implies that current Machine Learning (ML) models evaluate the multiple predictions by comparing them with the single real trajectory labeled in the dataset. In this work, a novel data-based method named Probabilistic Traffic Motion Labeling (PROMOTING) is introduced in order to (a) generate probable future routes and (b) estimate their probabilities. PROMOTING is presented with the focus on urban intersections. The generation of probable future routes is (a) based on a real traffic dataset and consists of two steps: first, a clustering of intersections with similar road topology, and second, a clustering of similar routes that are driven in each cluster from the first step. The estimation of the route probabilities is (b) based on a frequentist approach that considers how traffic participants will move in the future given their motion history. PROMOTING is evaluated with the publicly available Lyft database. The results show that PROMOTING is an appropriate approach to estimate the probabilities of the future motion of traffic participants in urban intersections. In this regard, PROMOTING can be used as a labeling approach for the generation of a labeled dataset that provides a probability score for probable future routes. Such a labeled dataset currently does not exist and would be highly valuable for ML approaches with the task of multi-modal motion prediction. The code is made open source.


Introduction
Urban mobility and transportation are cornerstones of society. Due to the high socioeconomic impact of road accidents, there is a motivation to continuously make improvements with regard to automotive safety. This motivation has derived from the development of the modern road infrastructure, which has brought major advances in terms of road safety and traffic-flow efficiency. Recent European Union (EU) road safety statistics [1] show, however, that these improvements stagnated in 2019. Specifically, they quantify a decrease in fatal accidents of 23% when compared to 2010 and of 2% when compared to 2018. For this reason, the EU has launched an ambitious initiative called "Vision Zero" [2], in which it establishes the goal of reducing fatalities caused by traffic accidents to near zero by 2050 and sets the target of halving the number of severe accidents by 2030. To this end, the EU initiative highlights the role that vehicle automation and connectivity play in increasing safety. Given that the majority of accidents (94%) are caused by human error [3], the ADSs under development are mainly focused on improving safety by assisting drivers with the early recognition and avoidance of dangerous situations, while also considering other aspects such as emissions reduction, driving efficiency, and improved passenger comfort. The deployment of automated driving functions in traffic scenarios in open environments is being carried out progressively. The Society of Automotive Engineers defines six levels of automation from levels 0 to 5 [4], where level 5 corresponds to full and unsupervized autonomy. A level 5 automated vehicle demands a very high technological complexity, and, to date, the driving functions required for this level of automation do not have the necessary robustness for deployment in traffic scenarios in open environments. According to [5], the main aspects and systems related to ADSs can be summarized using ten categories: (1) connected systems, (2) end-to-end driving, (3) localisation, (4) perception, (5) assessment and motion prediction, (6) planning, (7) control and dynamic, (8) human machine interface, (9) dataset and software, and (10) implementation. In this work, multidisciplinary research is performed, covering mainly aspects from categories 5 and 9.
A recent line of research [6][7][8][9][10][11][12] focuses on multi-modal motion prediction. This is based on the consideration that traffic motion is multi-modal in nature, meaning that each traffic participant is not bound to follow a single trajectory in the future, but it can instead choose from a wide variety of possible trajectories. In this way, not just one, but multiple probable motion hypotheses are predicted for each traffic participant, allowing researchers to capture the different options a driver may take, such as turning left, making a U-turn or continuing straight ahead, among others. In the following, the term mode refers to a specific estimation of future motion within a finite set of possibilities, and the likelihood that a given mode will be selected is denoted as mode score or mode probability. One prominent approach to address multi-modal motion predictions makes use of ML methods based on the supervised learning paradigm. For this, a labeled dataset is necessary, i.e., the label associated with each sample is known. In case the dataset is generated from real traffic data, only a single real trajectory per traffic participant can be labeled, namely the one that has been driven. This shows the challenge of (a) predicting multiple motion hypothesis for each traffic participant, out of a single labeled one. In addition, the prediction of multiple motion hypotheses implies the assignment of a probability score to each one with respect to the total number of hypotheses. However, labeled datasets with probabilities for routes are not available, (b) resulting in a lack of ground truth for this probability scores.
These aspects ((a) and (b)) motivate the investigation of a method that addresses the following research questions: (1) how to extract the route (certain sections of the road) that represents each possible mode from real traffic datasets, (2) how to estimate the probability that a vehicle will drive a certain mode, and (3) how to generate an adequate multi-modal labeled dataset so that a ML model can learn from it the intrinsic multi-modal motion of traffic scenarios.
In this regard, this work introduces a novel data-based method named PROMOTING that allows the estimation of multiple routes for each traffic participant and provides a probability score for each of the possible future routes. In this way, PROMOTING can be used as a labeling approach for the generation of a labeled dataset that contains not only single trajectories as its ground truth, but also the multiple estimated routes. Given the fact that the early introduction of smart intersections will be of mixed traffic, i.e., automated and non-automated driving together, the modeling of the traffic flow at such scenarios will be significant. The smart intersection is a concept aimed at improving the safety and traffic flow of intersections. It is based on the use of sensors and communication systems that allow researchers to capture and analyze traffic to support ADSs functions. Thus, PROMOTING focuses on urban traffic scenarios, paying special attention to urban intersections. Therefore, this work makes a contribution to the improvement of multi-modal motion predictions by introducing the PROMOTING method, highlighting the following. First, the method is able to extract multiple motion hypotheses for each traffic participant. Second, the method is able to estimate the probability that a vehicle will drive following a specific Sensors 2022, 22, 4498 3 of 24 motion hypothesis. Third, the method may be used for the generation of a labeled dataset that provides extra information that is useful for a multi-modal prediction task. Fourth, the method is evaluated using real-world traffic scenarios from a database, which allows us to obtain a realistic representation of the traffic's behavior in urban traffic scenarios.
The rest of the paper is structured as follows: in Section 2, related works are presented. In Section 3, the methodology of PROMOTING is detailed. In Section 4, the evaluation of PROMOTING is presented, and the associated results are shown and described. In Section 5, the main findings of the work are discussed. The paper is summarized in Section 6.

Related Works
According to [6], the motion prediction of traffic participants can be grouped into the following categories: (1) an engineering approach or physics-based methods, (2) planning-based methods, and (3) pattern-based methods.
Over the last few years, the research into motion prediction has shifted its focus from the physics-based generation of trajectories to the use of ML methods for the same purpose. The authors of [13] proposed the Attention mechanism that marked a shift in the way typical neuro-linguistic programming, time-series forecasting, and sequence-to-sequence problems are approached. Along with this Attention mechanism, Transformer Networks are also finding their way into motion prediction tasks. In [6], Multiple Attention Heads (MAH) are implemented together with a Long-Short Term Memory Encoder-Decoder architecture to predict multiple trajectories, thus addressing the multi-modality of the motion of traffic participants and considering cross-agent interaction modeling. A similar approach is taken in [14]. The difference between [6] and [14] is that the latter adds maprelated information that is learned by the Attention mechanism, which assists in modeling the agent-map interaction and improves the system performance. In [15], an architecture based on an Encoder-Decoder structure is proposed, where both are based exclusively on MAH. This model achieves a better performance than the one proposed in [14]. Other recent approaches [9,16,17] build on the work of [13] and use Transformer Networks based on MAH. In [16], pedestrian trajectory prediction is investigated, where the behavior of the pedestrians is modeled without taking into account any kind of interaction with neither traffic participants nor with the map information. This approach is able to closely predict the motion of pedestrians, highlighting the suitability of using Transformer Networks for motion planning tasks. A similar method is presented in [17], where the orientation of the traffic participants is considered to be an additional feature to the input vector when compared to [16]. Furthermore, whereas in [16] only pedestrians are considered, in [17] the performance of the ML model is evaluated for different types of traffic scenarios and different types of road users. A more complex ML-architecture than [16,17] is used in [9], consisting on three stacked Transformer Networks: vehicle motion, vehicle-map interaction, and vehicle-vehicle interaction. The networks are trained sequentially for each epoch, where the vehicle-vehicle interaction network receives the output of the vehiclemap interaction network, and the vehicle-map interaction network receives the output of the vehicle motion network. In addition to receiving the output of the previous one, each network receives additional inputs, which allows each network to specialize in a particular task.
In order for a ML model to learn something as complex as urban traffic, a large amount of data captured from real-world driving scenes is necessary. To prevent over-fitting, the data should have a large variability; in this way, the ML model is able to capture as many as possible of the variations of relevant features.
In the case of urban intersections, for example, the behavior of the traffic participants varies depending on the time of day, working/non-working days, and construction sites, among others. All these situations influence the behavior of the traffic participants, and their consideration provides extra knowledge that must be taken into account by ADSs. On the other hand, capturing real traffic data with these characteristics is a major challenge because of the required financial, computational, and time resources. One strategy to overcome this is to constrain the research and development of ADSs to bounded driving environments, such as smart urban corridors [18]. To this end, it is relevant to use appropriate databases for the training of the ML models.
Current research works [6][7][8][9][10][11][12] that focus on multi-modal motion prediction evaluate their performance either in terms of the Average Displacement Error (ADE), the Final Displacement Error (FDE), or the Root Mean Square Error (RMSE). That is, they consider a single labeled real trajectory and measure the Euclidean distances between the reference trajectory and each of the predicted ones. The best trajectory is then chosen based one of the minimum ADE, the minimum FDE, or the minimum RMSE. The main problem with using these metrics both to reduce training losses and to evaluate the model during the inference phase is that it forces ML models to generate trajectories close to the reference trajectory. This may result in a subset of the predicted trajectories not being drivable, not following the road infrastructure, or colliding with other traffic participants. Furthermore, the prediction of multiple motion scenarios for each traffic participant entails assigning a probability score that indicates the likelihood of selecting a hypothesis within the set of multiple hypotheses; however, the existent datasets containing real traffic data as [19][20][21][22][23][24][25] do not provide this score, as there is only a single real trajectory labeled by each traffic participant.
In [26], the graphs of road topologies are used to identify similar examples through their isomorphism. This is required to shape the latent space for proper novelty detection. Moreover, in [27], the isomorphisms are used to identify similar traffic scenarios, also including the trajectories as paths inside the graphs. As before, this is used for shaping a latent space. However, in the present work, isomorphisms are used to identify similar intersections and routes in the intersections in order to identify similar modes.
Relevant work on the representation of motion hypotheses in traffic scenarios is presented in [28], with the introduction of the Predicted-Occupancy Grids (POGs). These represent the future traffic scenarios in the form of grid cells, where the confidence about the motion of dynamic agents is represented. This approach considers a spectrum of expected occupancy values beyond the simplistic binary approach, i.e., occupied or not occupied. This type of representation is used for the prediction of complex traffic scenarios in [29,30], where different types of machine learning based architectures for POGs estimation are presented. However, there are three notable differences between the work of [28] and the present work. In [28], the approach is based on expert knowledge (assumes physical models of vehicles and motion hypothesis), makes use of simulation data, and the method outputs POGs. In contrast, in the present work, a methodology based on a frequentist approach is proposed (recorded traffic data is analyzed without making a motion hypothesis), realworld traffic data is used, and the presented method (PROMOTING) outputs the modes, in the form of routes, and the mode probabilities.
With regard to all the above, the present research work addresses the shortcomings of multi-modal motion prediction research by proposing the novel PROMOTING method. This serves as the methodology for the generation of a labeled dataset that extracts information about the modes of traffic participants based on conditional prior information. The method is able to extract the number and route of the modes, as well as to estimate the probability that a traffic participant will drive a specific mode. To the best of the authors' knowledge, this is the first work seeking to estimate the modes with their probabilities in a probabilistic way from real-world data for the purpose of the labeling of multi-modal motion hypothesis.

Materials and Methods
In order to estimate the modes and the probabilities of each mode, PROMOTING requires (1) historical traffic data and (2) topological information of the road map. To cover these requirements, the publicly available Lyft database [25] is selected, so PROMOTING is evaluated in this work by making use of this database. This database contains traffic motion information that is captured by a vehicle equipped with exteroceptive sensors. It contains a large amount of real-world trajectory data of dynamic participants, including urban intersections, and detailed map information covering the urban area where the traffic scenes were recorded. The methodology of PROMOTING is composed of five steps (see Figure 1), and each step is explained in a subsection of this section.
Version June 11, 2022 submitted to Sensors 5 of 24 scenes were recorded. The methodology of PROMOTING is composed by 5 steps (see 192 Figure 1) and each step is explained in a subsection of this section.

193
Step 1 Step 2 Step 3 Step 4 Step 5  The first step of PROMOTING method, see Figure 2, aims to describe the static traffic 195 information: the road infrastructure. This is described by the road map information 196 contained in the Lyft database on the basis of: map description (see Section 3.1.1) and 197 intersection description (see Section 3.1.2).

Sec
Step 1 Map Vertex Set V A visual representation of the road infrastructure generated with information from 199 the Lyft database is shown in Figure 3. The road map information contained in the Lyft database divides the road space into so called ways, which are road sections of finite length representing an individual lane in a given direction. In this work, each way is referred to as a vertex. Thus, the set of vertices ν i of the map V is defined as

Intersection Graphs
where n ν indicates the order of G, i. e., the number of vertices contained in the map. 202 Each vertex ν i ∈ V is characterised by a number of features that allow its geometric and 203 connectivity definition, for example:

204
• centreLine: (x,y) coordinates in global coordinate frame of each vertex.

205
• turnDirection: indicates the type of change of direction of the vertex: "1" for straight, 206 "2" for left turns, and "3" for right turns.

Road Infrastructure Description
The first step of the PROMOTING method, see Figure 2, aims to describe static traffic information: the road infrastructure. This is described by the road map information contained in the Lyft database on the basis of the map description (see Section 3.1.1) and the intersection description (see Section 3.1.2).
Version June 11, 2022 submitted to Sensors 5 of 24 scenes were recorded. The methodology of PROMOTING is composed by 5 steps (see 192 Figure 1) and each step is explained in a subsection of this section.

193
Step 1 Step 2 Step 3 Step 4 Step 5   Step 1 Map Vertex Set V A visual representation of the road infrastructure generated with information from 199 the Lyft database is shown in Figure 3. The road map information contained in the Lyft database divides the road space into so called ways, which are road sections of finite length representing an individual lane in a given direction. In this work, each way is referred to as a vertex. Thus, the set of vertices ν i of the map V is defined as

Intersection Graphs
where n ν indicates the order of G, i. e., the number of vertices contained in the map. 202 Each vertex ν i ∈ V is characterised by a number of features that allow its geometric and 203 connectivity definition, for example:  A visual representation of the road infrastructure generated from information from the Lyft database is shown in Figure 3.

Map Description
The road map information contained in the Lyft database divides the road space into so called ways, which are road sections of finite length representing an individual lane in a given direction. In this work, each way is referred to as a vertex. Thus, the set of vertices ν i of the map V is defined as where n ν indicates the order of G, i.e., the number of vertices contained in the map. Each vertex ν i ∈ V is characterized by a number of features that allow its geometric and connectivity definition, for example: • centreLine: (x, y) coordinates in global coordinate frame of each vertex.
Thus, the set A i ⊆ V that contains the adjacent vertices of the ith vertex is defined as a union of sets, so that The connection between the different vertices ν i ∈ V provides valuable information for the vehicle motion prediction. In this paper, the connectivity information of the vertices is used to derive a graph-based model that represents the topology of the urban road network. The map topology G is then defined as a directed graph, so that where E denotes the set of edges k of the map, with where n ε indicates the size of G, i.e., the number of edges contained in the graph. Each edge k represents the connection between two adjacent vertices, so that The order of the vertex pair indicates the driving direction on the edge, where the first element is the "source vertex", and the second one is the "target vertex". For example, k = (ν i , ν j ) ∈ E indicates that the driving direction on the kth edge is from the ith vertex to the jth vertex.

Intersection Description
Similarly, the road topology of an intersection contained in the map, denoted as the ιth intersection, is modeled as the graph To model the graph of each intersection, it is necessary to identify which vertices belong to the same intersection and how are they connected to each other. In this sense, three types of vertex are differentiated for each intersection: 1.
Incoming vertex: The vertex at the entrance of an intersection. These vertices are grouped in sets with the sub-index "in".

2.
Crossing vertex: The vertex on an intersection. These vertices are grouped in sets with the sub-index "x".

3.
Outgoing vertex: The vertex at the exit of an intersection. These vertices are grouped in sets with the sub-index "out".
Thus, incoming vertices precede crossing vertices, and crossing vertices precede outgoing vertices. With this, the graph G ι of the ιth intersection is generated as described in Algorithm 1 and a graphic depiction is shown in Figure 4.

Algorithm 1: Intersection graph generation
Input : directed graph G of the map and the unique intersection identifier ι. Output : directed graph G ι of the ιth intersection formed by the edges set E ι and the vertex set V ι with the incoming, crossing and outgoing vertices of the intersection.
Algorithm 1 can be used for as many intersections as required to generate the set of intersection graphs S int , so that where n int indicates the number of intersection graphs generated from the map.
Version June 11, 2022 submitted to Sensors 7 of 24 Algorithm 1: Intersection graph generation Input : directed graph G of the map and the unique intersection identifier ι. Output : directed graph G ι of the ι-th intersection formed by the edges set E ι and the vertex set V ι with the incoming, crossing and outgoing vertices of the intersection.
Algorithm 1 can be used for as many intersections as required to generate the set of intersection graphs S int , so that where n int indicates the number of intersection graphs generated from the map.

240
Once the intersection graphs S int and the map vertex set V are generated, the next step 241 is the extraction of the list of the Vehicle Intersection Data (VID) X VID . That is, the route 242 information (sequence of vertices) of each vehicle that crosses an intersection and the graph 243 of the crossed intersection. To accomplish this, the motion history of the vehicle is required 244 in addition to the intersection graphs and the vertex set obtained in the previous step, see 245 Figure 5.

Vehicle Intersection Data Extraction
Once the intersection graphs S int and the map vertex set V are generated, the next step is the extraction of the list of the Vehicle Intersection Data (VID) X VID . That is, the route information (sequence of vertices) of each vehicle that crosses an intersection and the graph of the crossed intersection. To accomplish this, the motion history of the vehicle is required, in addition to the intersection graphs and the vertex set obtained in the previous step, see Figure 5. A detailed description of the VID extraction process is depicted in Figure 6. Step 1

S int
Step 1 X i,j,k Figure 6. Shown is the process to extract the VID.
As depicted in Figure 6, the VID extraction starts by iterating over all n scenes traffic scenes contained in the Lyft database. Each i-th scene contains a record of the motion of all n obj registered objects. For each i-th scene, the motion information of each j-th object with "car" label is extracted. Next, for each j-th vehicle, its (x,y) coordinates are read and, A detailed description of the VID extraction process is depicted in Figure 6.  A detailed description of the VID extraction process is depicted in Figure 6. Step 1

S int
Step 1 X i,j,k Figure 6. Shown is the process to extract the VID.
As depicted in Figure 6, the VID extraction starts by iterating over all n scenes traffic scenes contained in the Lyft database. Each i-th scene contains a record of the motion of all n obj registered objects. For each i-th scene, the motion information of each j-th object with "car" label is extracted. Next, for each j-th vehicle, its (x,y) coordinates are read and, As depicted in Figure 6, the VID extraction starts by iterating over all n scenes traffic scenes contained in the Lyft database. Each ith scene contains a record of the motion of all n obj registered objects. For each ith scene, the motion information of each jth object with "car" label is extracted. Next, for each jth vehicle, its (x,y) coordinates are read, and, together with V, the coordinates are associated with vertices so as to generate the vertex sequence Q i,j , as detailed in Section 3.2.1. Later, Q i,j is used to extract n routes routes that cross intersections, as detailed in Section 3.2.2. Then, for each R k route that crosses an intersection with graph G ι a new VID, denoted by X i,j,k , is generated. Hence, the VID, X i,j,k , for the kth route of jth vehicle in the ith scene is determined as where the route R k is represented by a sequence of vertices and is denoted as follows the intersection graph G ι is generated as indicated by Algorithm 1. Thus, the VID list X VID is defined as the list whose elements are the extracted X i,j,k and is denoted as follows

Coordinate-Vertex Association
The first step to extract the R k route is to obtain the vertex sequence Q i,j . For this, the (x,y) coordinates of the jth vehicle in the ith scene at each time instance are associated with vertices contained in V. This results in the vertex sequence Q i,j , which represents the vertices that the vehicle has driven on. One should note that the association is not unique, meaning that a set of (x,y) coordinates may be associated with multiple vertices, and a vertex may be associated with multiple sets of (x,y) coordinates, which results in a multiple vertex associations. This occurs frequently when the (x,y) coordinates are located at intersections where different crossing vertices overlap. This means that Q i,j must be processed.
First, the invalid (empty) vertex associations are removed from the sequence. An invalid association can happen, for example, when the vehicle moves on "non-drivable" sections of the map. Second, duplicated vertex associations are unified. A duplicated association occurs when a vertex appears in Q i,j in two or more consecutive time instances. By unifying the duplicated vertex associations, only unique ones remain. Finally, Q i,j is filtered according to the intersection topology. This handles multiple vertex associations that can occur when various vertices overlap, see vertices 7, 8, and 9 in Figure 7. Filtering according to the intersection means that only the vertices included and connected in the intersection are kept.

Extraction of Intersection Routes
Once the vertex sequence Q i,j of the jth vehicle in the ith scene is extracted and processed, the next step is to extract the routes that cross intersections. It is possible for a single vehicle to contain more than one intersection route. An intersection route should fulfill the following two characteristics: 1.
The route must contain at least one crossing vertex.

2.
The route must contain either at least one incoming vertex or at least one outgoing vertex.
This approach allows us to differentiate four categories of intersection routes: 1. Complete: The route contains a full description of how the vehicle approaches, crosses, and leaves the intersection. The route starts with incoming vertices, follows crossing vertices, and ends with outgoing vertices. An example of a complete route is the vertex sequence [2,9,15], see Figure 4.

2.
Entering: The route contains a description of how the vehicle approaches and crosses the intersection. The route starts with incoming vertices and ends with crossing vertices. An example of an entering is the vertex sequence [2,9], see Figure 4.

3.
Leaving: The route contains a description of how the vehicle crosses and leaves the intersection. The route starts with crossing vertices and ends with outgoing vertices. An example of a leaving route is the vertex sequence [9,15], see Figure 4.

4.
Other: Routes that do not belong to any of these three categories. One such route would be that of a vehicle that is standing still during the complete scene, thus remaining at a single vertex. These are omitted, as they do not provide information on how the vehicle approaches or leaves the intersection.
An example of this process is shown in Figure 7. There, the vertex sequence Q i,j of the jth vehicle in the ith scene is given by 11,17,22,25,30].
From Q i,j , 11 and 22 are crossing vertices. Intersection IDs κ are taken from these vertices. So, let κ 11 = 8 indicate that vertex 11 belongs to the 8th intersection and κ 22 = 9 indicate that vertex 22 belongs to the 9th intersection. Then, the rest of the vertices of the Q i,j vertex sequence that belong to these given intersections are extracted. In this example, the 30th vertex is neglected, as it does not belong to any intersection of this vertex sequence. The remaining vertices are split in as many routes as unique intersection IDs. In this example, two routes are created: one for the 8th intersection and one for the 9th intersection. The elements of the vertex sequence are assigned to a route according to the intersection they belong to. In this example, as the 17th vertex belongs to both the 8th and the 9th intersection, it is assigned to both routes. Next, the type of each vertex of each route is assigned according to the intersection topology. This is the reason why the 17th vertex is assigned as being "outgoing" for the route that corresponds to the 8th intersection and "incoming" for the route that corresponds to the 9th intersection.

3.
Leaving: the route contains a description on how the vehicle crosses and leaves the intersection. The route starts with crossing vertices and ends with outgoing vertices. An example of a leaving route is the vertex sequence [9,15], see Figure 4.

4.
Other: routes that do not belong to the either of these three categories. One such route would be that of a vehicle that is standing still during the complete scene, thus remaining in a single vertex. These are omitted, as they do not provide information on how the vehicle approaches or leaves the intersection. An example of this process is shown on Figure 7. There, the vertex sequence Q i,j of the j-th vehicle in the i-th scene is given by 11,17,22,25,30].
From Q i,j , the 11 and 22 are crossing vertices. From these vertices, their intersection IDs κ are taken. So, let κ 11 = 8 indicate that the vertex 11 belongs to the 8-th intersection and κ 22 = 9 indicate that the vertex 22 belongs to the 9-th intersection. Then, the rest of the vertices of the Q i,j vertex sequence that belong to these given intersections are extracted. In this example, the 30-th vertex is neglected, as it does not belong to any intersection of this vertex sequence. The remaining vertices are split in as many routes as unique intersection IDs. In this example, two routes are created: one for the 8-th intersection and one for the 9-th intersection. The elements of the vertex sequence are assigned to a route according to the intersection they belong to. In this example, as the 17-th vertex belongs to both the 8-th and the 9-th intersection, it is assigned to both routes. Next, the type of each vertex of each route is assigned according to the intersection topology. This is the reason why the 17-th vertex is assigned the "outgoing" type for the route that corresponds to the 8-th intersection, and the "incoming" type for the route that corresponds to the 9-th intersection.

Vehicle Intersection Data Clustering
Once the set of intersection graphs S int and the map vertex set V are generated (Section 3.1), and the VID list X VID is obtained (Section 3.2), the next step is the VID clustering. This step aims to cluster the elements of the list of VIDs X VID with respect to

Vehicle Intersection Data Clustering
Once the set of intersection graphs S int and the map vertex set V are generated (per Section 3.1) and the VID list X VID is obtained (as per Section 3.2), the next step is the VID clustering. This step aims to cluster the elements of the list of VIDs X VID with respect to their graphs. Specifically, the graph isomorphism represents the similarity criterion. Then, the output of this step is n clusters , where each cluster c is denoted by X c,VID . A graphical depiction of this process is shown in Figure 8. This process consists of two steps: a preclustering of graphs (see Section 3.3.1) and an isomorphic clustering (see Section 3. The list X VID contains n VID routes of vehicles crossing the intersections and are classified as defined in Section 3.2.2. It should be noted that only complete routes have been selected because they are the only type of routes that contain a full description of the intersection crossing from the entrance to the exit.

Pre-Clustering
The process of clustering based on isomorphism is computationally expensive. This is specially relevant for large databases, where a graph-wise and vertex-wise association is verified. A brute-force search for the n ν ! possible bijective functions that satisfy the definition of isomorphism between all extracted graphs is not practical.
For this reason, pre-clustering the graphs prior to the isomorphic clustering (Section 3.3.2) is proposed. This is done by examining a series of preconditions that two graphs must possess in order to be isomorphic. The preconditions are evaluated in a hierarchical manner allowing to structure the database in the form of a tree. This database tree, allows a further analysis of the distribution of the data in terms of graph properties. Then, the first 4 hierarchical levels of the database tree are detailed in what follows. Alongside, an example slice of such a database tree is shown in Figure 9.
• Level 0: the root node of the database tree is located at this level, and is the highest hierarchical level from which all branches emerge. All VIDs are inside the root node. • Level 1: the graphs are grouped by their order, i. e., the number of vertices contained in the graph. Hence, only VIDs with the same graph order are part of the same node. In Figure 9, A and B are two example nodes at that level with graph order 20 and 21 respectively. • Level 2: the graphs are grouped by their size, i. e., the number of edges contained in the graph. VIDs with the same graph order and size are part of the same node. In Figure 9, the node C group VIDs with graph order equal to 20 and graph size equal to 24. • Level 3: the graphs are grouped by their matrix degree The list X VID contains n VID routes of vehicles crossing the intersections; these are classified as they are defined in Section 3.2.2. It should be noted that only complete routes have been selected, because they are the only type of routes that contain a full description of the intersection crossing from the entrance to the exit.

Pre-Clustering
The process of clustering based on isomorphism is computationally expensive. This is specially relevant for large databases, where graph-wise and vertex-wise associations are verified. A brute-force search for the n ν ! possible bijective functions that satisfy the definition of isomorphism between all extracted graphs is not practical.
For this reason, pre-clustering the graphs prior to the isomorphic clustering (Section 3.3.2) is proposed. This is performed by examining a series of preconditions that two graphs must possess in order to be isomorphic. The preconditions are evaluated in a hierarchical manner, allowing us to structure the database in the form of a tree. This database tree allows further analysis of the distribution of the data in terms of graph properties. Then, the first four hierarchical levels of the database tree are detailed in what follows. Alongside this, an example slice of such a database tree is shown in Figure 9.
• Level 0: The root node of the database tree is located at this level and is the highest hierarchical level from which all branches emerge. All VIDs are inside the root node. • Level 1: The graphs are grouped by their order, i.e., the number of vertices contained in the graph. Hence, only VIDs with the same graph order are part of the same node. In Figure 9, A and B are two example nodes at that level, with graph orders 20 and 21, respectively. • Level 2: The graphs are grouped by their size, i.e., the number of edges contained in the graph. VIDs with the same graph orders and sizes are part of the same node. In Figure 9, the node C group VIDs with graph order equal to 20 and graph size equal to 24. • Level 3: The graphs are grouped by their matrix degree: Θ seq = n 0,in n 1,in n 2,in . . . n n ε ,in n 0,out n 1,out n 2,out . . . n n ε ,out , where the first row refers to the in-degree of the graph, and the second row refers to the out-degree of the graph, i.e., the number of incoming and outgoing edges to/from the vertices, respectively. With this, n 2,in indicates the number of vertices in the graph whose in-degree is equal to 2, and n 2,out indicates the number of vertices in the graph whose out-degree is equal to 2. Therefore, at this level, only VIDs with the same graph order, the same graph size, and the same matrix degree are grouped. In Figure 9, the node E group VIDs with graph orders equal to 20, graph sizes equal to 24, and matrix degree Θ 1 .
The levels 0-3 describe the pre-clustering, which creates smaller groups according to their graph properties, such that computationally expensive isomorphism needs to be examined only with the nodes of level 3.

350
Given the database tree from the pre-clustering, the aim is to identify VIDs with similar 351 graphs. Only level 3 need to be taken into consideration, since isomorphism between the 352 graphs is only possible within nodes of level 3.

353
Two graphs G 1 and G 2 are said to be isomorphic if where Equation 12 holds if a bijective function f : V G 1 → V G 2 exists, such that This means that every vertex and edge of G 1 has a unique mapping to a vertex and edge 354 of G 2 . All isomorphic graphs are then clustered in level-4 nodes. Nodes H, I, J, and K of 355 Figure 9 are level-4 nodes. A graphical depiction of an slice of a database tree is shown in 356 Figure 9.  Figure 9. Shown is a slice of a database tree for the clustering of the VID based on isomorphic graphs. Each leaf node at level 4 groups the routes of vehicles crossing intersections whose graphs are isomorphic.

358
Once the set of intersection graphs S int and the map vertex set V are generated (Sec-359 tion 3.1), the VID list X VID is obtained (Section 3.2), and the VIDs are clustered (Section 3.3), 360 the next step is the counting of route types. For this, each c-th cluster of X VID is analysed in 361 order to extract the 1) set of route typesR c and 2) the counting list ρR c , whose elements 362 indicate how often each route type appears in the cluster. A graphical depiction of this 363 process is shown in Figure 10.  . Slice of a database tree for the clustering of the VID based on isomorphic graphs. Each leaf node at level 4 groups the routes of vehicles crossing intersections whose graphs are isomorphic.

Isomorphic Clustering
Given the database tree from the pre-clustering, the aim is to identify VIDs with similar graphs. Only level 3 need to be taken into consideration, since isomorphism between the graphs is only possible within nodes of level 3.
Two graphs G 1 and G 2 are said to be isomorphic if where Equation (12) holds true if a bijective function f : This means that every vertex and edge of G 1 has a unique mapping to a vertex and edge of G 2 . All isomorphic graphs are then clustered in level 4 nodes. Nodes H, I, J, and K of Figure 9 are level 4 nodes.

Route-Type Counting
Once the set of intersection graphs S int and the map vertex set V are generated (as per Section 3.1), the VID list X VID is obtained (see Section 3.2), and the VIDs are clustered (see Section 3.3), the next step is the counting of route types. For this, each cth cluster of X VID is analyzed in order to extract (1) the set of route typesR c and (2) the counting list ρR c , whose elements indicate how often each route type appears in the cluster. A graphical depiction of this process is shown in Figure 10. tion 3.1), the VID list X VID is obtained (Section 3.2), and the VIDs are clustered (Section 3.3), the next step is the counting of route types. For this, each c-th cluster of X VID is analysed in order to extract the 1) set of route typesR c and 2) the counting list ρR c , whose elements indicate how often each route type appears in the cluster. A graphical depiction of this process is shown in Figure 10.  If one considers that the names of the vertices are unique, two intersections cannot be compared by the vertex name alone. Therefore, a common vertex representation per cluster is needed. This common representation is achieved in the form of a template graph that is created for each cluster. The graph of the first X VID of each cluster is taken as the template of that node. Then, the bijective function (Equations (12) and (13)) is used to map the rest of the vertices of the routes within the cluster. A graphical depiction of this process is shown in Figure 11. There, the graph G * is the template graph. The route R G 1 is mapped to R G * using the bijective function f : Version June 11, 2022 submitted to Sensors 13 of 24 If one considers that the names of the vertices are unique, two intersections cannot be 365 compared by the vertex name alone. Therefore, a common vertex representation per cluster 366 is needed. This common representation is achieved in the form of a template graph that is 367 created for each cluster. The graph of the first X VID of each cluster is taken as the template 368 of that node. Then, the bijective function (Equation 12 and Equation 13) is used to map the 369 rest of the vertices of the routes within the cluster. A graphical depiction of this process is 370 shown in Figure 11. There, the graph G * is the template graph. The route R G 1 is mapped to 371 R G * by using the bijective function f : Once the vertices of the routes within the cluster are mapped to those of the template graph, the route types are extracted. Each route type is a specific vertex sequence in the cluster. Then, the set of route typesR c is generated for each c-th cluster as followŝ R c = R c,1 ,R c,2 , . . . ,R c,n , . . . , (14) where the first subindex of the elements ofR c indicates the cluster to which the route type 373 belongs, and the second subindex is an identifier for the type of route within the cluster.

374
For each route typeR c,n identified, the frequency ρR c,n is computed. This frequency represents how often the route typeR c,n appears in the cluster based on the dataset. This information is relevant for the estimation of the probability that a traffic participant will drive a given route. Then, the counting list of the route types ρR c is generated for each c-th cluster as follows ρR c = (ρR c,1 , ρR c,2 , . . . , ρR c,n , . . .).

375
Once the set of intersection graphs S int and the map vertex set V are generated 376 (Section 3.1), the VID list X VID is obtained (Section 3.2), the VIDs are clustered (Section 3.3), 377 and the set of route typesR c and 2) and the counting list ρR c are extracted (Section 3.4), the 378 next step is to generate the modes and to estimate the mode probability. That is, to create a 379 set of routes that a traffic participant can drive for a given intersection type (cluster) and 380 motion history, and to estimate the probability that a given mode will be driven. Thus, 381 this process extracts for each c-th cluster the mode data M c . A graphical depiction of this 382 process is shown in Figure 12. Once the vertices of the routes within the cluster are mapped to those of the template graph, the route types are extracted. Each route type is a specific vertex sequence in the cluster. Then, the set of route typesR c is generated for each cth cluster as follows: R c = R c,1 ,R c,2 , . . . ,R c,n , . . . , (14) where the first sub-index of the elements ofR c indicates the cluster to which the route type belongs, and the second sub-index is an identifier for the type of route within the cluster. For each route typeR c,n identified, the frequency ρR c,n is computed. This frequency represents how often the route typeR c,n appears in the cluster based on the dataset. This information is relevant for the estimation of the probability that a traffic participant will drive a given route. Then, the counting list of the route types ρR c is generated for each cth cluster as follows ρR c = (ρR c,1 , ρR c,2 , . . . , ρR c,n , . . .).

Mode Estimation
Once the set of intersection graphs S int and the map vertex set V are generated (as per Section 3.1), the VID list X VID is obtained (see Section 3.2), the VIDs are clustered (see Section 3.3), and the set of route typesR c , and the counting list ρR c are extracted (as per Section 3.4), the next step is to generate the modes and estimate the mode probability. That is, to create a set of routes that a traffic participant can drive for a given intersection type (cluster) and motion history, and to estimate the probability that a given mode will be driven. Thus, for each cth cluster, this process extracts the mode data M c . A graphical depiction of this process is shown in Figure 12.
Once the set of intersection graphs S int and the map vertex set V are generated (Section 3.1), the VID list X VID is obtained (Section 3.2), the VIDs are clustered (Section 3.3), and the set of route typesR c and 2) and the counting list ρR c are extracted (Section 3.4), the next step is to generate the modes and to estimate the mode probability. That is, to create a set of routes that a traffic participant can drive for a given intersection type (cluster) and motion history, and to estimate the probability that a given mode will be driven. Thus, this process extracts for each c-th cluster the mode data M c . A graphical depiction of this process is shown in Figure 12.  First, a set of sub-routes for each route type is generated inR c in the c-cluster. For example, for the first route type in the c-clusterR c,1 , the set of sub-routesR c,1 is generated, such that S c,1 ⊆R c,1 .
Since a route is a vertex sequence, each sub-route s ⊆ S c,1 is defined as a coherent sub-sequence of vertices of the corresponding route.
Second, the set S * c that contains all unique sub-routes of the cth cluster is then defined as The mode data M c of the cth cluster have as many elements, as the sub-routes s are driven in the cluster. This means that, for each sub-route s ∈ S * c , an element of M c is computed. Each element of M c contains (1) the set of modes µ c,s and (2) the estimated probabilities P(µ m|c,s ) of each mode µ m|c,s and is computed as follows: 1.
The set of modes µ c,s used to forecast the possible modes that a vehicle can drive on (1) given the observation of the sub-route s, (2) where each mode ends with an outgoing vertex, and (3) where each mode is part of S * c . For this, a setŜ c,s is created, so thatŜ c,s = {ŝ 1|c,s ,ŝ 2|c,s , . . . ,ŝ m|c,s , . . .}, withŜ c,s ⊆ S * c : s ∈ {ŝ 1|c,s ,ŝ 2|c,s , . . . ,ŝ m|c,s , . . .} , ŝ m|c,s ∩ V c,out = ∅, ∀m, where the set V c,out contains the outgoing vertices of the template graph of the cth cluster. Since the observed sub-route is not part of the modes, i.e., of the future motion, the observed sub-route s is extracted from each of the mth sub-sequenceŝ s m|c,s , generating the corresponding mth mode µ m|c,s . This allows the definition of the set of modes µ c,s as follows µ c,s = {µ 1|c,s , µ 2|c,s , . . . , µ m|c,s , . . .}, where each element of µ c,s represents a unique mode of completing the crossing of an intersection with the template graph of the cth cluster according to the recorded data and the observation s.

2.
The conditional probability estimation P(µ m|c,s ) of the mth mode µ m|c,s ∈ µ c,s is estimated. This represents the probability that a traffic participant will drive on the mth mode given the cth cluster and the sth observed sub-route in this cluster. The conditional probability is given by One the one hand, ρ m|c,s indicates how often a vehicle is traveling a route type in the cth cluster with the initial sequence-part defined by the sth observed sub-route, and the final sequence-part defined by the mth mode µ m|c,s . On the other hand, ρ c,s indicates how often a vehicle is traveling the sth observed sub-route in the cth cluster of the dataset. Then, ρ c,s is defined by where z q = 1, s ∈ S c,q 0, otherwise.
The frequency ρR c,q was introduced in Section 3.4 and indicates how often a vehicle is traveling the route typeR c,q in the cth cluster. The Boolean z q allows us to select only those route typesR c,q in which the sub-route s is part of its sequence. Given the above, the sum of the probabilities of all modes is then given by |µ c,s | ∑ m=1 P(µ m|c,s ) = 1.
These two steps (Equations (25)-(30)) are applied for each observed sub-route s in the cth cluster in order to generate each element of the mode data M c .

Evaluation and Results
In this section, the evaluation procedure and evaluation results are detailed. The proposed methodology is evaluated with respect to its ability to generate similar modes, mode probabilities, route types, graphs, and database trees, given similar datasets as inputs. For this, the Lyft database is used as data source, because it contains map information, as well as data about the motion of traffic participants. The data from the traffic participants are randomly divided into two independent datasets (D 1 and D 2 ), where D 1 is the small training dataset provided Lyft for the Kaggle Challenge https://www.kaggle.com/c/lyftmotion-prediction-autonomous-vehicles (accessed on 11 October 2021), and D 2 is the validation dataset provided by Lyft, while the map information remains the same for both datasets. An overview of the evaluation process is shown in Figure 13, and each step of the PROMOTING method is detailed in what follows. The first step of the PROMOTING method (Section 3.1) describes the static traffic 404 information (map vertex set V and intersection graphs S int ). Given that this information 405 does not vary over time and is shared among datasets, the output of the first step given 406 each dataset is not compared. A summary of the road infrastructure description of the Lyft 407 database is shown on Table 1. 408 Table 1. Shown is a summary of the road infrastructure description of the Lyft database.

Feature
Name Value Map graph G Graph order (no. of map vertices) n ν 8506 Graph size (no. of map edges) n ε 12 185 Set of intersection graphs S int No. of intersections contained in the map n int 909 The second step of the PROMOTING method (Section 3.2) extracts the VID list X VID . 409 Given that each X VID is generated from a unique set of traffic scenes, the VIDs from different 410 datasets are inherently different. At this step, the routes contained in the VIDs from D1 411 and D2 cannot be compared because the vertices that compose each route have different 412 names and are not yet standardized to a template graph. However, the details of each VID 413 (number of scenes, objects, vehicles, etc) can be compared, which allows to corroborate that 414 both D 1 and D 2 are similar in size. This is important because datasets of different sizes 415 would imply a different amount of clusters, type of clusters, modes, and so on. Specifically, 416 a total of ≈ 1.6 millions routes of vehicles crossing intersections are extracted from the Lyft 417 database [25]. ≈ 50, 5% of the routes belong to D 1 , while the remaining ≈ 49, 5% belong to 418 D 2 . The route distribution according to Section 3.2 is shown in Figure 14, and a summary 419 of the details of the VID of each dataset is shown on Table 2. The first step of the PROMOTING method (as per Section 3.1) describes the static traffic information (map vertex set V and intersection graphs S int ). Given that this information does not vary over time and is shared among datasets, the outputs of the first step for each given dataset are not compared. A summary of the road infrastructure description of the Lyft database is shown in Table 1.

Feature Name Value
Graph order (number of map vertices) n ν 8506 Graph size (number of map edges) n ε 12,185 Number of intersections contained in the map n int 909 The second step of the PROMOTING method (as per Section 3.2) extracts the VID list X VID . Given that each X VID is generated from a unique set of traffic scenes, the VIDs from different datasets are inherently different. In this step, the routes contained in the VIDs from D 1 and D 2 cannot be compared, because the vertices that compose each route have different names and are not yet standardized to a template graph. However, the details of each VID (number of scenes, objects, vehicles, etc.) can be compared, which allows us to corroborate that both D 1 and D 2 are similar in size. This is important, because datasets of different sizes would imply different numbers of clusters, types of clusters, modes, and so on. Specifically, a total of ≈1.6 millions routes of vehicles crossing intersections are extracted from the Lyft database [25]. Approximately 50.5% of the routes belong to D 1 , while the remaining ≈49.5% belong to D 2 . The route distribution according to Section 3.2 is shown in Figure 14, and a summary of the details of the VID of each dataset is shown in Table 2.  Figure 14. Shown is the route distribution according to Section 3.2: complete, outgoing, entering, and other.
As can be inferred from Figure 14 and Table 2, both datasets D 1 and D 2 are similar in 421 size, thus aiding in a fair evaluation of the method. Further, as mentioned in Section 3.2, 422 only "complete" routes have been selected in the output of the second step of PROMOTING. 423 The reason for this is that these routes are the only type that contain a full description of 424 the intersection crossing from the entrance to the exit.

425
The third step of the PROMOTING method (Section 3.3) focuses on the clustering of the VIDs according to their graph isomorphism. The comparison metric is the structure of the database trees T D 1 and T D 2 that are generated when the datasets D 1 and D 2 are used as inputs. The node generation of both trees is analysed. That is, how was the database tree generated for each input dataset. If the trees are similar, it is an indication that the method is able to cluster similar routes, even when they come from different datasets. The common tree T com is defined as one with such a lineage that is present in both T D 1 and T D 2 . That is, each node of T com within each level of the tree has a counterpart in both T D 1 and T D 2 . T com can be expressed as follows The comparison of the structure of the database tree of both T D 1 and T D 2 with T com is 426 shown on Table 3.
427 Figure 14. Route distribution according to Section 3.2: complete, outgoing, entering, and other.
As can be inferred from Figure 14 and Table 2, both datasets, D 1 and D 2 , are similar in size, thus aiding in a fair evaluation of the method. Further, as mentioned in Section 3.2, only "complete" routes have been selected in the output of the second step of PROMOTING. The reason for this is that these routes are the only type that contain a full description of the intersection crossing from the entrance to the exit.
The third step of the PROMOTING method (as per Section 3.3) focuses on the clustering of the VIDs according to their graph isomorphism. The comparison metric is the structure of the database trees T D 1 and T D 2 that are generated when the datasets D 1 and D 2 are used as inputs. The node generation of both trees is analyzed, that is, how was the database tree was generated for each input dataset. If the trees are similar, it is an indication that the method is able to cluster similar routes, even when they come from different datasets. The common tree T com is defined as one with a lineage such as the one that is present in both T D 1 and T D 2 , i.e., each node of T com within each level of the tree has a counterpart in both T D 1 and T D 2 . T com can be expressed as follows The comparison of the structure of the database tree of both T D 1 and T D 2 with T com is shown in Table 3. Given that both datasets D 1 and D 2 are similar in size, from the results shown in Table 3, it can be inferred that the method is able to comparably cluster the dynamic data from different datasets.
The fourth step of the PROMOTING method (as per Section 3.4) consists of the counting of route types within each cluster. Given a cluster a from T D 1 , its equivalent cluster b from T D 2 is the one with the similar template graph. The comparison metric is computed by the number of routes in cluster a that have an equivalence (same route type) in cluster b, normalized by the overall number of routes in cluster a. For this, let n c,e be the number of similar routes, given the cth cluster of T D 1 and its equivalent cluster in T D 2 . Then, the comparison metric is given by Then, the metricη c,e that represents the average of the ratio of equivalent routes between all common cth clusters from T D 1 and T D 2 is estimated as follows η c,e = 1 n clusters For this comparison,η c,e = 95.82% was achieved. This indicates that common cth clusters from T D 1 and T D 2 contain mostly the same route types. This indicates that the method is able to cluster the routes of traffic participants from different datasets in a similar manner.
The fifth step of the PROMOTING method (as per Section 3.5) performs the mode estimation. Therefore, the comparison metric is based on the generated modes and their estimated probabilities. For this, let P(µ (D 1 ) m|c,s ) be the probability that a vehicle will drive the mth mode given the cth cluster and the sth observed sub-route, considering the dataset D 1 .
Similarly, let P(µ (D 2 ) m e |c e ,s e ) be the probability that a vehicle will drive the m e th mode given the c e th cluster and the s e th observed sub-route, considering the dataset D 2 . Here, the subscript . . . e indicates that the corresponding equivalence is used, i.e., the m e th mode is the equivalence of the mth mode. Therefore, only equivalent modes in equivalent clusters are considered.
Then, the relative difference η m|c,s between the probabilities of equivalent modes of both trees with respect to the probability P(µ where n m,e indicates the total number of equivalent modes between T D 1 and T D 2 . For the used datasets,η m|c,s = 0.39%. This shows that the mode probabilities, when estimated from two different datasets, are similar to each other. This indicates that the mode probability, when calculated using a large dataset, can estimate mode probabilities for similar datasets from same distributions. Even when PROMOTING uses different datasets, it is able to estimate the modes and the probability of each mode in a similar fashion for equivalent sub-route observations in equivalent intersections.
The main results of the evaluation of steps 4 and 5 of PROMOTING are summarized in Table 4.

Feature Name Value
Average ratio of equivalent routesη c,e 95.82% Average relative difference between equivalent modesη m|c,s 0.39% A representative graphical example of the extraction of modes and the estimation of the mode probabilities is shown in Figure 15.

Feature
Name Value Average ratio of equivalent routesη c,e 95.82% Average relative difference between equivalent modesη m|c,s 0.39% A representative graphical example of the extraction of modes and the estimation of 452 the mode probability is shown in Figure 15.  In the first column, the intersection is represented by the vertices that compose its graph. The second, third and fourth columns represent the most probable modes (from highest to lowest probability), given the observed subroute coloured in yellow and the history of the motion.

454
A common challenge of the multimodal motion prediction, is to determine the "op-455 timal" number of modes to predict. That is, how many trajectories per traffic participant 456 should be predicted in order to comprehensively model a given traffic scene. This question 457 has to take into consideration the amount of computational resources available, the time 458 constraints, and the number of traffic participants, among others. Not only the number of 459 trajectories is important, but also how should they look like. The PROMOTING method 460 Figure 15. Example of the extraction of modes and estimation of the probability of each mode for four different types of intersections of the Lyft database. In the first column, the intersection is represented by the vertices that compose its graph. The second, third, and fourth columns represent the most probable modes (from highest to lowest probability), given the observed sub-route coloured in yellow and the history of the motion.

Discussion
A common challenge of multi-modal motion prediction, is to determine the "optimal" number of modes to predict, that is, how many trajectories per traffic participant should be predicted in order to comprehensively model a given traffic scene. This question has to take into consideration the amount of computational resources available, the time constraints, and the number of traffic participants, among others. Not only the number of trajectories is important, but also what they should look like. The PROMOTING method serves as a reference that shows both what the modes in a given intersection look like and what the probability is that a traffic participant will drive a specific mode. That is, the proposed method aids in the trajectory-prediction task. The method has the potential to be highly valuable for both the training and inference phases of ML methods for multi-modal motion prediction.
Along with the trajectory prediction that each traffic participant performs, the PRO-MOTING method could also prove to be useful at smart intersections with Vehicle-toeverything (V2X) capabilities. In that scenario, an automated vehicle could receive the information of the crossing (graphs, modes, etc.) from the infrastructure, so that the traffic participant could perform a better prediction of their own motion according to different parameters, such as efficiency or traffic load. This can be extended to all traffic participants, where each one knows where all the other traffic participants are and can predict the motion of the others with the help of the crossing information. This is relevant in the case of mixed traffic, where automated and human-driven vehicles coexist at the same intersection. Even when no V2X is present, the PROMOTING method could still be on board the EGO-vehicle, and, together with the information from exteroceptive sensors, the relationship between the surrounding traffic participants and their possible routes can be generated.
The PROMOTING method was evaluated in this work using the Lyft database. However, the method is not dependent on this database; instead, it can be used together with other map representations, as long as the required map properties are present, that is, the method is not limited to certain types of intersections but can instead generate the information from many different sources.
The method can be extended using real-time traffic information, as already provided by many navigation tools. The constant update of the traffic conditions (flow, weather, construction works, etc.) can provide an extra benefit for traffic analysis, as well as for the trajectory planning of traffic participants. This real-time traffic information does not necessarily have to come from navigation tools or infrastructure but could also be transmitted by other vehicles in the vicinity that have already crossed the intersection.
It should be noted that the mode probability estimation presented in this work does not take into account the interaction between traffic participants. In this paper, only the past sub-route, not the state of the other objects, are considered in the condition. This is a point for future research, with a special focus on the exchange of intentions between traffic participants via V2X. In addition, the investigation of abnormal behavior of traffic participants is also envisaged.

Conclusions
In this research work, a novel method named PROMOTING is proposed that is able to generate the modes (probable routes) of traffic participants, as well as estimate the probability that a traffic participant will drive a specific mode. This is done with the aim of supporting ADSs in their task of multi-modal motion prediction.
Mode generation is performed by clustering intersections based on the isomorphisms of their road topology. This allows us to cluster together equivalent intersections and, as a consequence, the equivalent routes of vehicles that crossed the isomorphic intersections. The probability of each mode is estimated based on the frequency with which each route is driven and a given observation (sub-route within the intersection).
The method is evaluated using the Lyft database. The results confirm that the method is able to cluster equivalent intersections and modes. The estimated probabilities of equivalent modes are almost identical, which also corroborates that the method estimates similar probabilities for similar crossings given similar observations. Therefore, PROMOTING provides a methodology that makes it possible to generate a labeled dataset that allows researchers to estimate multiple routes for each traffic participant and provides a probability score for each of the estimated routes. This labeled dataset has the potential to be highly valuable for ML models aimed at the task of motion prediction.
The method could be improved with the inclusion of real-time traffic information that can be sent via V2X communication, including information about the road infrastructure, cellular networks, or other traffic participants. The method is not limited to the used dataset but could also be implemented for other map sources.
Interested readers are referred to the repository [31], where the code that implements the methodology proposed in PROMOTING is made publicly available.

Data Availability Statement:
The code is public under [31].

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations and notation are used in this manuscript : Mode data of the cth cluster P(µ (D 1 ) m|c,s ) Probability that a vehicle will drive the mth mode given the cth cluster and the sth observed sub-route for the dataset D 1 P(µ (D 2 ) m e |c e ,s e ) Probability that a vehicle will drive the equivalent m e th mode given the c e th cluster and the s e th observed sub-route for the dataset D 2 Q i,j Sequence of vertices that follows the jth vehicle in the ith traffic scenê R c Set of route types in the cth cluster R c,n Sequence of vertices that represents the nth route type in the c-cluster R k Sequence of vertices that represents the k-route of a vehicle that cross an intersection S c,s Set of sequence of vertices that allows to extract the modes given the cth cluster and the sth observed sub-route S c,n Set of sub-routes of the c-cluster generated fromR c,n S * c Set that contains unique sub-routes given all the nth sub-routes sets S c,n S int Set of intersection graphs generated from the map T com Database tree whose lineage is present in both T D 1 and T D 2 T D 1 Database tree that distributes the VIDs extracted from the database D 1 T D 2 Database tree that distributes the VIDs extracted from the database D 2 V Set of vertices ν i of the map V ι Set of vertices of the ιth intersection V ι,in Set of incoming vertices of the ιth intersection V ι,out Set of outgoing vertices of the ιth intersection V ι,x Set of crossing vertices of the ιth intersection X i,j,k VID extracted at the ith scene for the jth vehicle that drives the kth route. X VID List of VIDs extracted from a given dataset X c,VID List of VIDs grouped in the cth cluster given the list X VID n 1,in Number of vertices of a graph whose in-degree is equal to 1 n 1,out Number of vertices of a graph whose out-degree is equal to 1 n (D 1 ) c Number of routes in the cth cluster of the tree T D 1 n (D 1 ) c,e