Short-Range Prediction of the Zone of Moving Vehicles in Arterial Networks

: In many moving object databases, future locations of vehicles in arterial networks are predicted. While most of studies apply the frequent behavior of historical trajectories or vehicles’ recent kinematics as the basis of predictions, consideration of the dynamics of the intersections is mostly neglected. Signalized intersections make vehicles experience different delays, which vary from zero to some minutes based on the trafﬁc state at intersections. In the absence of trafﬁc signal information (red and green times of trafﬁc signal phases, the queue lengths, approaching trafﬁc volume, turning volumes to each intersection leg, etc.), the experienced delays in trafﬁc signals are random variables. In this paper, we model the probability distribution function (PDF) and cumulative distribution function (CDF) of the delay for any point in the arterial networks based on a spatiotemporal model of the queue at the intersection. The probability of the presence of a vehicle in a zone is determined based on the modeled probability function of the delay. A comparison between the results of the proposed method and a well-known kinematic-based method indicates a signiﬁcant improvement in the precisions of the predictions.


Introduction
Predicting the location of vehicles has been the focus of many researchers in recent years.In many cases, vehicles send their locations to a database so that a variety of systems utilize the information.Location-aware advertising [1], driving safety support systems [2], and vehicular ad hoc networks (VANET) [3] are examples of systems that consider the location of vehicles in their services.In some systems (e.g., driving safety systems), predicting the future location of vehicles is essential, because the systems should send some information (e.g., warnings) to vehicles before they reach specific locations.In some other systems (e.g., VANET), only the real-time location of a vehicle is required.Although the future location of vehicles is not required in the latter category, limitations in position updating make such systems predict the location of the vehicles.
Different statistical and nonparametric methods have been applied in moving object (especially vehicles) predictions [4].Ying et al. [5] introduced a prediction model based on a cluster-based prediction approach.They predicted the future location of users based on the similarities between the attributes of users' semantic trajectories.Jeung et al. [6] proposed a hybrid method that combines an object's pattern information with motion functions to predict the future location of objects.In their proposed approach, a motion function is extracted from an object's recent movement.
They reported that their method surpassed the approaches in which an object's trajectory pattern is the only determinant factor in predictions.
If an object travels within a network, its movement is constrained by the geometry and dynamics of the network.In most metropolitan cities, highways are supervised by dedicated sensing systems, such as traffic cameras and loop detector networks.However, in many cases, especially in developing countries, arterial networks (unlike highways) are not thoroughly covered by traffic control infrastructure [7].Moreover, private sectors (e.g., Internet taxi providers and delivery services) sometimes do not have access to real-time traffic information due to financial or technological limitations.Using information collected by mobile sensors could be an inexpensive alternative for estimating traffic states and predicting the future locations of vehicles.Civilis et al. [8] proposed a tracking and update method for vehicles moving within road networks.They used the kinematics of vehicles as the basis for location prediction.In their work, routes were divided into several segments with constant accelerations.The movement of moving objects was then reconstructed by applying a pre-computed acceleration profile and the speed of objects.Mo et al. [3] introduced an updating strategy by using Kalman filter prediction.They could enhance the selection of the data communication mode by improving the prediction of the location of vehicles.Their results also indicated a reduction in the data updating frequency, which was reported as a consequence of successful predictions.Reza et al. [9] applied the Dirichlet multinomial model to capture movement patterns in a metropolitan network.They could reduce the number of roadside units in the tracking operation.
By exploring the literature, it is determined that most research tries to predict a vehicle's location, presenting methods that estimate a single point as the future location of a vehicle.Although this might be the ultimate goal of any prediction model to determine the exact location of a moving vehicle, in most cases, especially in arterial networks, it is far too unrealistic to predict a single point where a vehicle will be, even in the near future.In arterial networks, the different behaviors of drivers, varying lengths of vehicle queues at intersections, the stochastic arrival time of vehicles, and random turning at intersections are some of the variables that make location prediction stochastic.
Instead of predicting a single point as the location of a vehicle, we estimate the probability that a vehicle will be in a specific zone after a time interval (which is called a prediction horizon).In this paper, a zone is defined as a continuous area that divides arterial links into smaller parts.Predicting a group of points (the segments located in a zone) as the future location of a vehicle, instead of a single point, might, at first glance, seem to be a step backward in the prediction of moving vehicles; however, in many applications, the main challenge is in predicting the probable areas where a vehicle might be present.For example, in vehicular ad hoc networks (VANETs), messages are communicated between roadside units (RSUs) and onboard units (OBUs).The number of triggered messages by different OBUs is limited by identifying the probable areas in which a target vehicle may exist [9].In a more general case, in wireless sensor networks (WSNs), positioning the moving targets is vital to energy-efficient communication [10].A sensor node in a WSN covers some adjacent areas (faces of the network) efficiently, so it is of great importance to determine the face that covers a moving target efficiently.As a further example, warning systems operating based on location-based services notify vehicles based on their probable presence in danger zones [11].Re-identifying objects using cameras with non-overlapping coverage is also an area that would leverage zone predictions.In such problems, an object is tracked by more than one camera.When an object disappears from a camera, other cameras must re-identify the object.Some researchers have tried to reduce the number of candidates in the re-identifying process by taking the kinematics of objects into account [12].Qualifying the search area using images, by considering the predicted zones, although not examined heretofore, seems to be a promising approach.
In this paper, the probability of the presence of a vehicle in different zones in the future is estimated by taking the dynamics of intersections into account.Vehicles experience some delays at intersections, which are intrinsically uncertain [13].For example, a vehicle that arrives at an intersection during the green phase is prone to experience a shorter delay than a vehicle that arrives during the red phase.While some research has attempted to handle this uncertainty by introducing the log-normal or other families of distributions as the delay distribution function [14], the physical dynamics of intersections are almost neglected.
The sparseness of historical data is also addressed in this paper.The limited number of the observations is a challenge in modeling vehicles' movements.The proposed method is flexible to the sparseness of data by aggregating observations gathered over different days, and also by adopting some simplifying assumptions.The data used in this study were gathered using 320 GPS devices from taxi cabs in Roma, Italy [15].The dataset contains the trajectories established by the GPS points, which were recorded every 15 s.
Section 2 presents more details on the dimensions of the problem.In Section 3, some introductory concepts are discussed.The proposed method is introduced in Section 4, and the results are discussed in Section 5. Section 6 contains a summarized conclusion of the research.

Problem Definition
A city can be divided into zones, either virtually or physically (Figure 1).A vehicle traveling in a road network in a city moves from one zone to another.In many systems, the future zone in which a vehicle will be located has to be predicted.
While a vehicle travels in a road network, its future location is affected by the dynamics of the network.In arterial networks, vehicles experience delays due to the presence of signalized intersections.In other words, the future locations of vehicles are affected by signalized intersections.The delay experienced by a vehicle at an intersection can be estimated in the presence of the dynamics of the intersections (i.e., the length of the green and red phases, length of the queue at the intersection, and the number of the lanes) and start time of the signal cycle in a particular time system.However, the mentioned parameters are not available in most cases because of technological or administrative constraints.In such a situation, the experienced delay by different vehicles remains stochastic, while the arrival times of vehicles at intersections, and the queue lengths, are unknown.The stochastic delay should be modeled somehow so that the future location of vehicles is predicted realistically.
In different research, the forecasting horizon varies from a few seconds (short-range) to tens of minutes (long-range).In long-range predictions, the kinematics of the vehicles and dynamics of the network are not the only determinative factors in the location of the vehicles.In such cases, the context of passengers is also required in the predictions [16].This study primarily aims to predict the location of vehicles in the near future, which is defined to be in the order of tens of seconds.In addition, as we are not going to predict the final destination of a vehicle, we do not consider the turning probabilities in our model.In other words, we assume that the paths of vehicles are known.
This paper presents a method to estimate the probability of the presence of a vehicle in different zones in the future.This is done by modeling the probability distribution function (PDF) and cumulative distribution function (CDF) of the future location of vehicles, at any point on the links in arterial networks.These functions are modeled in terms of the dynamics of the intersections and their parameters are estimated using historical GPS data.While we are seeking a short-range prediction, the probability of the presence of a vehicle is found in zones that intersect the current and adjacent links of a vehicle.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 3 of 18 the log-normal or other families of distributions as the delay distribution function [14], the physical dynamics of intersections are almost neglected.The sparseness of historical data is also addressed in this paper.The limited number of the observations is a challenge in modeling vehicles' movements.The proposed method is flexible to the sparseness of data by aggregating observations gathered over different days, and also by adopting some simplifying assumptions.The data used in this study were gathered using 320 GPS devices from taxi cabs in Roma, Italy [15].The dataset contains the trajectories established by the GPS points, which were recorded every 15 s.
Section 2 presents more details on the dimensions of the problem.In Section 3, some introductory concepts are discussed.The proposed method is introduced in Section 4, and the results are discussed in Section 5. Section 6 contains a summarized conclusion of the research.

Problem Definition
A city can be divided into zones, either virtually or physically (Figure 1).A vehicle traveling in a road network in a city moves from one zone to another.In many systems, the future zone in which a vehicle will be located has to be predicted.
While a vehicle travels in a road network, its future location is affected by the dynamics of the network.In arterial networks, vehicles experience delays due to the presence of signalized intersections.In other words, the future locations of vehicles are affected by signalized intersections.The delay experienced by a vehicle at an intersection can be estimated in the presence of the dynamics of the intersections (i.e., the length of the green and red phases, length of the queue at the intersection, and the number of the lanes) and start time of the signal cycle in a particular time system.However, the mentioned parameters are not available in most cases because of technological or administrative constraints.In such a situation, the experienced delay by different vehicles remains stochastic, while the arrival times of vehicles at intersections, and the queue lengths, are unknown.The stochastic delay should be modeled somehow so that the future location of vehicles is predicted realistically.
In different research, the forecasting horizon varies from a few seconds (short-range) to tens of minutes (long-range).In long-range predictions, the kinematics of the vehicles and dynamics of the network are not the only determinative factors in the location of the vehicles.In such cases, the context of passengers is also required in the predictions [16].This study primarily aims to predict the location of vehicles in the near future, which is defined to be in the order of tens of seconds.In addition, as we are not going to predict the final destination of a vehicle, we do not consider the turning probabilities in our model.In other words, we assume that the paths of vehicles are known.
This paper presents a method to estimate the probability of the presence of a vehicle in different zones in the future.This is done by modeling the probability distribution function (PDF) and cumulative distribution function (CDF) of the future location of vehicles, at any point on the links in arterial networks.These functions are modeled in terms of the dynamics of the intersections and their parameters are estimated using historical GPS data.While we are seeking a short-range prediction, the probability of the presence of a vehicle is found in zones that intersect the current and adjacent links of a vehicle.

Queue Model at Signalized Intersections
A classic queue model at signalized intersections was used in this study [13].According to this model, the queue grows at speed q during the signal cycle (red and green phases), and vehicles forming the queue move at speed s during the green phase.It means that the queue length increases at speed q during the red phase and decreases at speed (s − q) during the green phase.If the queue is fully removed at the end of the green phase, the intersection is deemed undersaturated.Otherwise, the intersection is deemed oversaturated.In an oversaturated intersection, vehicles have to stop for more than one signal cycle so that they can pass the intersection.The queue length is: where t = 0 is the start time of the red phase, t r is the signal red time, and n 0 is the initial queue length at the beginning of the red phase (which is referred as initial queue in this paper).In an undersaturated intersection, the initial queue is equal to zero.

Delay Distribution at Signalized Intersections
Van Zuylen et al. [13] modeled the analytical delay distribution at signalized intersections.According to their study, the delay distribution function for an undersaturated intersection is: is the cycle time (sum of red and green time), and δ(w) is the Dirac delta function.
The delay distribution function for an oversaturated intersection is also calculated as: in which t g is the green time and B is the block function, which is defined as: Equations ( 2) and ( 3) are the probability distribution functions of the total delay experienced by vehicles that enter an intersection.A vehicle experiences the total delay only if it enters the intersection.Otherwise, the vehicle does not experience the total delay.For example, a vehicle that is in the queue has not yet experienced the total delay.In this paper, the probability distribution of the total delay is used to determine the parameters of the proposed model.

Movement Modeling Assumptions
I. Time discretization: Arterials are assumed to have discrete dynamics in time.This means that we divided the daytime into intervals (e.g., 10:00 to 10:30, 10:30 to 11:00, and so on) and assumed that the dynamics of our model were stationary during these intervals on different days (work days and holidays are modeled separately).Based on this assumption, parameters of the queue model are constant during the estimation intervals.Furthermore, a queue exhibits a periodic behavior with period C (length of the light cycle) in each time interval.Although the assumption is not fully compatible with the reality of arterial traffic, it is still acceptable since our method is proposed to deal with the sparseness of data.Moreover, a time discretization assumption is widely accepted by researchers in the traffic modeling area [17].
II. Neglected overtaking: Under this assumption, all vehicles travel at the same speed.The vehicles travel at a free flow speed when they are out of the queues, and follow queue dynamics when they are part of a queue.In addition, acceleration and decelerations are neglected.These assumptions are considered good approximations in arterials.
III. Continuous changes in the queue length: Although the queue length increases in a discrete manner (by joining a vehicle to it), we assumed that the queue length increases continuously.While we are looking for movement patterns in the order of tens of seconds, subtle differences between these two models can be ignored.

Probability Distribution Function (PDF) of the Future Location of a Vehicle
PDF of the future location of vehicle X x 0 ,τ (x) determines the probability that a vehicle located at x 0 (at time t 0 ) will be at x (Figure 2) after prediction horizon τ.In this paper, the PDF of a future location of a vehicle was determined based on the PDF of the delay of the vehicle, which indicates the probability that a vehicle located at x 0 will have a delay w in time window t 0 to t 0 + τ.The delay of a vehicle is defined as the difference between the travel time of a real vehicle and the travel time of a hypothetical vehicle that travels at free flow speed v ff .According to this definition, delay w is: Since there is a one-to-one relationship between the future location of vehicle x and experienced delay w, the PDF of the future location of vehicle p x 0 ,τ (x) is determined based on the PDF of the (future) delay of vehicle d x 0 ,τ (x): For notational simplicity, the indices x 0 and τ are omitted in the next sections.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 5 of 18 when they are part of a queue.In addition, acceleration and decelerations are neglected.These assumptions are considered good approximations in arterials.III.Continuous changes in the queue length: Although the queue length increases in a discrete manner (by joining a vehicle to it), we assumed that the queue length increases continuously.While we are looking for movement patterns in the order of tens of seconds, subtle differences between these two models can be ignored.

Probability Distribution Function (PDF) of the Future Location of a Vehicle
PDF of the future location of vehicle X x 0 ,τ (x) determines the probability that a vehicle located at x 0 (at time t 0 ) will be at x (Figure 2) after prediction horizon τ.In this paper, the PDF of a future location of a vehicle was determined based on the PDF of the delay of the vehicle, which indicates the probability that a vehicle located at x 0 will have a delay w in time window t 0 to t 0 + τ.The delay of a vehicle is defined as the difference between the travel time of a real vehicle and the travel time of a hypothetical vehicle that travels at free flow speed v ff .According to this definition, delay w is: Since there is a one-to-one relationship between the future location of vehicle x and experienced delay w, the PDF of the future location of vehicle p x 0 ,τ (x) is determined based on the PDF of the (future) delay of vehicle d x 0 ,τ (x): For notational simplicity, the indices x 0 and τ are omitted in the next sections.

PDF of the Delay
Although Equations ( 2) and ( 3) can be used as the probability distribution of the total delay that a vehicle experiences in an intersection, a vehicle does not necessarily experience the total delay at an intersection during time window t 0 to t 0 + τ.Instead, in this time interval, a vehicle may still be waiting to enter the intersection or experiences a delay in more than one intersection.In the next section, the distribution of delay between t 0 to t 0 + τ for one intersection is modeled.Based on the modeled distribution, the probability of a certain delay for one or two intersections is estimated.
A vehicle experiences a delay if it joins a queue before entering the intersection.The vehicle may join a growing or a removing queue, which cause different delays.The delay is also dependent on the location at which a vehicle joins a queue.The PDF and CDF of the delay are the sum of the following terms: 1.The PDF and CDF of the delay of a vehicle, which is not in a queue at x 0 and joins a removing

PDF of the Delay
Although Equations ( 2) and (3) can be used as the probability distribution of the total delay that a vehicle experiences in an intersection, a vehicle does not necessarily experience the total delay at an intersection during time window t 0 to t 0 + τ.Instead, in this time interval, a vehicle may still be waiting to enter the intersection or experiences a delay in more than one intersection.In the next section, the distribution of delay between t 0 to t 0 + τ for one intersection is modeled.Based on the modeled distribution, the probability of a certain delay for one or two intersections is estimated.
A vehicle experiences a delay if it joins a queue before entering the intersection.The vehicle may join a growing or a removing queue, which cause different delays.The delay is also dependent on the location at which a vehicle joins a queue.The PDF and CDF of the delay are the sum of the following terms: 1.
The PDF and CDF of the delay of a vehicle, which is not in a queue at x 0 and joins a removing queue at x j (Case 1).

2.
The PDF and CDF of the delay of a vehicle, which is not in a queue at x 0 and joins a growing queue at x j (Case 2).

3.
The PDF and CDF of the delay of a vehicle, which is not in a queue at x 0 and does not join a queue (Case 3). 4.
The PDF and CDF of the delay of a vehicle, which is already in a removing queue at x 0 (Case 4). 5.
The PDF and CDF of the delay of a vehicle, which is already in a growing queue at x 0 (Case 5).
We model these cases for both undersaturated and oversaturated intersections.The PDF and CDF of the delay between t 0 and t 0 + τ for one intersection is the sum of the PDF and CDF (respectively) of Cases 1 to 5. The proof of the derivations for Case 1 is presented in the next section so that readers can understand the logic behind the derivations, but the details of the derivations of other cases (for the undersaturated intersections) can be found in Appendix A. The PDF of joining the queue at x j is presented in the next section as a prerequisite for our model.

The PDF of Joining the Queue at x j
Since the signal timing (start of the cycles) is not known with respect to the time system of vehicles (t 0 ), the queue length at t 0 remains a random variable.In other words, the probability that a vehicle joins a queue at x j is independent of the location of the vehicle (x 0 ).
The probability that a vehicle joins a queue at x j is proportional to the probability that queue length l is equal to x j .We model the probability distribution of the queue length based on the queue model at intersections (Equation ( 1)).
In Figure 3, dl is a differential part of the queue.While it takes time dt as a growing queue to reach from l to l + dl (or from l + dl to l in a removing queue), the probability that the queue length is between l and l + dl is dt C , in which C is the total cycle duration.The differentiation of Equation ( 1) is: Then dt is: Based on Equation ( 7), the probability of queue length l for a growing queue (t ≤ t r ) is dl qC , which is a constant value.To model the PDF of this case, we used a uniform distribution function with support [n 0 , l max ].The weight of this uniform distribution function is equal to the probability of the presence of a growing queue in a cycle, which is the ratio of the growing time ( l max − n 0 q ) to the cycle time (C).The PDF of the length of a growing queue, denoted by η g , reads: in which U [n 0 ,l max ] is the uniform distribution function with support [n 0 , l max ].
Similarly, the PDF of the length of a removing queue (t > t r ), denoted by η r , reads: The probability that the length of a growing queue is between l 1 and l 2 reads: Similarly, the probability that the length of a removing queue is between l 1 and l 2 reads: ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 7 of 18 In this case, the vehicle located at x 0 joins a removing queue at x j (Figure 4).While the vehicle travels at the speed of v ff , the vehicle reaches the queue after . The remaining time of prediction horizon (τ) is denoted by τ rem : Figure 4. Illustration of Case 1.This case describes a vehicle that is located at x 0 at time t 0 and then joins the queue at x j .
If τ rem is smaller than zero, it means that a vehicle does not reach the queue before finishing prediction horizon τ, which means that it does not experience delays.After joining a growing queue, a vehicle continues its movement inside the removing queue at s speed.The time in which the vehicle travels inside the queue is denoted by τ Q : in which is the time the vehicle needs to reach the signal (x = 0).
When a vehicle travels at speed s for time τ Q , it experiences a delay compared with a vehicle traveling at the free flow speed.In this situation, the delay caused by traveling inside the queue is determined as: in which is the difference between τ Q and the time a hypothetical vehicle spends at the free flow speed v ff to traverse the same distance.
According to the explanations, the delay w in the Case 1 is determined as: In this case, the vehicle located at x 0 joins a removing queue at x j (Figure 4).While the vehicle travels at the speed of v ff , the vehicle reaches the queue after . The remaining time of prediction horizon (τ) is denoted by τ rem : ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 7 of 18 In this case, the vehicle located at x 0 joins a removing queue at x j (Figure 4).While the vehicle travels at the speed of v ff , the vehicle reaches the queue after . The remaining time of prediction horizon (τ) is denoted by τ rem : Figure 4. Illustration of Case 1.This case describes a vehicle that is located at x 0 at time t 0 and then joins the queue at x j .
If τ rem is smaller than zero, it means that a vehicle does not reach the queue before finishing prediction horizon τ, which means that it does not experience delays.After joining a growing queue, a vehicle continues its movement inside the removing queue at s speed.The time in which the vehicle travels inside the queue is denoted by τ Q : in which is the time the vehicle needs to reach the signal (x = 0).
When a vehicle travels at speed s for time τ Q , it experiences a delay compared with a vehicle traveling at the free flow speed.In this situation, the delay caused by traveling inside the queue is determined as: in which is the difference between τ Q and the time a hypothetical vehicle spends at the free flow speed v ff to traverse the same distance.
According to the explanations, the delay w in the Case 1 is determined as: If τ rem is smaller than zero, it means that a vehicle does not reach the queue before finishing prediction horizon τ, which means that it does not experience delays.After joining a growing queue, a vehicle continues its movement inside the removing queue at s speed.The time in which the vehicle travels inside the queue is denoted by τ Q : in which s is the time the vehicle needs to reach the signal (x = 0).When a vehicle travels at speed s for time τ Q , it experiences a delay compared with a vehicle traveling at the free flow speed.In this situation, the delay caused by traveling inside the queue is determined as: in which is the difference between τ Q and the time a hypothetical vehicle spends at the free flow speed v ff to traverse the same distance.
According to the explanations, the delay w in the Case 1 is determined as: In Equation ( 8), w is a function of x j .The location at which a vehicle joins a queue (x j ) is a random variable with a determined probability distribution (Equations ( 8) and ( 9)).We determined x j as a function of w so that we could model the PDF of w based on the PDF of x j .To have x j as a function of w, Equation ( 15) is expanded using Equations ( 13) and ( 14): Figure 5 shows a diagram of Equation ( 16): ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 8 of 18 x j as a function of w so that we could model the PDF of w based on the PDF of x j .To have x j as a function of w, Equation ( 15) is expanded using Equations ( 13) and ( 14): Figure 5 shows a diagram of Equation ( 16): In Equation ( 16) (Figure 5), delay w is a hybrid function of x j .While w is monotonic, x j is also a function of w. x j is derived by simply inverting the sub-functions in Equation ( 16): Equation ( 17) determines x j for any delay w.Accordingly, the probability that a vehicle does not experience any delay (w = 0) during the prediction horizon is equal to the probability that the vehicle joins the queue at point x j ∊ [0,x 0 − τv ff ], which is a mass probability with a value of Η 0,x 0 τv ff r .We modeled this probability using a Dirac delta function (centered at 0) and its corresponding weight Η 0,x 0 τv ff r .The probability distribution of other defined values of w (0 )) is determined by the PDF of joining the queue at x j (Equation ( 9)).The probability of undefined values of w is 0. The PDF of delay in Case 1 reads: In Equation ( 16) (Figure 5), delay w is a hybrid function of x j .While w is monotonic, x j is also a function of w. x j is derived by simply inverting the sub-functions in Equation ( 16): Equation ( 17) determines x j for any delay w.Accordingly, the probability that a vehicle does not experience any delay (w = 0) during the prediction horizon is equal to the probability that the vehicle joins the queue at point x j ∈ [0, x 0 − τv ff ], which is a mass probability with a value of H r 0,x 0 −τv ff .We modeled this probability using a Dirac delta function (centered at 0) and its corresponding weight H r 0,x 0 −τv ff .The probability distribution of other defined values of w (0 determined by the PDF of joining the queue at x j (Equation ( 9)).The probability of undefined values of w is 0. The PDF of delay in Case 1 reads: Since w is monotonic (Figure 5), the cumulative distribution function (CDF) of w is equal to the cumulative probability of x j = X j (w .The CDF of the delay in Case 1 reads:

CDF of the Delay for Two Consecutive Intersections
We do not present any solution for deriving the PDF of the delay for two consecutive intersections since it is not required in zone prediction.However, we introduce a method for modeling the CDF of the delay for two consecutive intersections.
Our future work addresses modeling the delay for multiple intersections.However, in this paper modeling the delay for two intersections is sufficient because we are looking for a short-range prediction.To model the CDF of the delay for two intersections, we assume that the experienced delays in two consecutive intersections are independent.This assumption is widely adopted in modeling the travel time of arterial links [17].By considering the delay independence assumption, the CDF of a delay at two consecutive intersections is a convolution between the CDF of delays at individual intersections.The CDF of delay at intersections i and i + 1 is denoted as D i (w) and D i+1 (w), respectively.The CDF of delay at two consecutive intersections, denoted as D i,i+1 (w), reads: The CDF of the delay at an intersection is the sum of some linear terms (e.g., Equations ( 10) and ( 11)).We did not derive an analytical presentation for Equation (20); instead, we applied a numerical integration method.We used rectangular integration, which is an efficient method for the integration of polynomial functions [18].

Probability of Zones
The arterial links are segmented by zones (Figure 6 illustrates an example).The probability of a zone is defined as the probability that a vehicle located at x 0 falls into a segment within a zone after prediction horizon τ.It is obvious that the probability of a zone is not a constant quantity, and that its value is dependent on the location (x 0 ) of the vehicle.According to the definition, the probability of a zone (in this case, zone c in Figure 6) is: in which p(x) is the PDF of the future location of the vehicle (Equation ( 5)).Based on the Equations ( 4) and ( 5): Ρ zone c = p(x)dx x c2 (21) in which p(x) is the PDF of the future location of the vehicle (Equation ( 5)).Based on the Equations ( 4) and ( 5): For a vehicle, the probability of all the zones ahead is determined.Based on the estimated probabilities, a set of priorities for the presence of the vehicle in the zones (after prediction horizon, ) is defined.For example, in Figure 6  For a vehicle, the probability of all the zones ahead is determined.Based on the estimated probabilities, a set of priorities for the presence of the vehicle in the zones (after prediction horizon, τ) is defined.For example, in Figure 6 if P(zone b) > P(zone c) > P(zone a) > P(zone d), then the priorities of the zones b, c, a, and d are 1, 2, 3, and 4, respectively.These priorities can be used in some applications; for example, in a wireless network, after an unsuccessful search for a vehicle in the zone with the highest priority (or the greatest probability), the system looks for the vehicle with the second priority.

Parameter Estimation
Zheng and Van Zuylen [19] indicated that the parameters of a delay distribution could be properly estimated for both undersaturated and oversaturated intersections using the trajectories of vehicles.They used delay distribution functions at intersections (Equations ( 2) and ( 3)) to determine the parameters of the dynamic model of the traffic at intersections.Zheng and Van Zuylen revealed that, compared with the minimum square estimator, the maximum likelihood estimator gets better results when estimating parameters.Based on their findings we used the maximum likelihood estimator to determine the parameters required in modeling the future location of vehicles.Based on the time discretization assumption, we estimated parameters t g , t r , n 0 , q, s, v ff , and the state of an intersection (undersaturated or oversaturated) for time intervals.We aggregate the trajectories that are collected on different days but in the same time intervals (e.g., all the trajectory observations collected between 13:00 and 13:30 on work days).Similar to Zheng and Van Zuylen, in the absence of an analytical solution, we applied the genetic algorithm (GA) to estimate the best parameter set by applying a maximum likelihood estimator.Details of the parameter estimation method can be found in Zheng and Van Zuylen's paper.

Results and Discussion
The proposed method for predicting the location of vehicles can be utilized in a variety of relevant location-based systems (e.g., advertising services and driver safety systems).While we are not going to limit the results to a specific application, we adopted some different zonings to explore the precision of the proposed method in different scenarios.In many service networks (e.g., wireless networks), space is divided into Voronoi polygons around the service points [10].Applying Voronoi diagrams or grids are also common approaches for indexing moving objects in spatial databases [20,21].Although the proposed method is not built on a presumed zoning approach, constant interval grids with cell sizes of 25 m and 50 m (Figure 7a) and Voronoi polygons (Figure 7b) are the basis of our analysis so that some usual zonings are taken into consideration.In this paper, the centers of the Voronoi polygons are random points with an average distance of 100 m.In selecting the size of the Voronoi polygons and grids, we chose relatively small sizes (between 25 m and 100 m) since we are going to examine the precision of a short-range prediction method.
To explore the efficiency of the proposed method, the results should be compared with existing methods of predicting the future location of the vehicles.While some relevant methodologies are found in the literature [22], most of them are dependent on the additional data sources (e.g., flow information derived from inductive detectors).Additionally, the existing methodologies that only use the positional information are not designed for short-range prediction of the location of vehicles in arterial networks.In this paper, the results are compared with a kinematics-based method in which the future location of a vehicle is estimated regarding the kinematics of the vehicle derived from its recent trajectory [8].Applying the kinematics of the vehicles in location prediction is a conventional approach in moving object databases [23,24], but the accuracy of its results is limited in the arterial networks, as it neglects the dynamics of the intersections.
The data used in this study were collected by 320 GPS devices in taxi cabs in Roma, Italy.The dataset contains a one-month period of trajectories, established by GPS points, which were recorded in 15-s intervals.We chose an area containing seven arterial links and eight traffic signals (Figure 8) on Angelico and Delle Milizie streets.The lengths of the links are between 180 m and 440 m.We applied 18,000 GPS records in the study area.The data collected in the first three weeks were applied in the parameter estimation (the historical data), and the prediction precision was estimated using the rest of the data, which were the real-time data in our experiments.The future zone of vehicles was predicted for 15-, 30-, 45-, and 60-s prediction horizons in 300 randomly-chosen sample points, collected from the real-time trajectories (the selected prediction horizons were multiples of 15 so we could extract the correct future location of the vehicles from the trajectories and estimate the precision of the predictions).A proper method to assess the correctness of a modeled probability distribution function is to compare it with the empirical distribution of the random variable.However, this is not a feasible method in our case, because it is not possible to collect the future location of vehicles at any given point (x 0 ) and prediction horizon (at least by using the sparse trajectories).We define some metrics to examine the efficiency of the proposed method.First, we compared the results of our method with a common prediction approach (kinematics-based).Second, we looked for a relationship between the estimated probability of the zones and the true zones extracted from the trajectories.
For all 300 sample points, we determined the probability of the zones vehicles had ahead.We considered the zone with the greatest probability as the predicted future zone of a vehicle.At the   A proper method to assess the correctness of a modeled probability distribution function is to compare it with the empirical distribution of the random variable.However, this is not a feasible method in our case, because it is not possible to collect the future location of vehicles at any given point (x 0 ) and prediction horizon (at least by using the sparse trajectories).We define some metrics to examine the efficiency of the proposed method.First, we compared the results of our method with a common prediction approach (kinematics-based).Second, we looked for a relationship between A proper method to assess the correctness of a modeled probability distribution function is to compare it with the empirical distribution of the random variable.However, this is not a feasible method in our case, because it is not possible to collect the future location of vehicles at any given point (x 0 ) and prediction horizon (at least by using the sparse trajectories).We define some metrics to examine the efficiency of the proposed method.First, we compared the results of our method with a common prediction approach (kinematics-based).Second, we looked for a relationship between the estimated probability of the zones and the true zones extracted from the trajectories.
For all 300 sample points, we determined the probability of the zones vehicles had ahead.We considered the zone with the greatest probability as the predicted future zone of a vehicle.At the same time, we determined the future zone of a vehicle using the kinematics-based method.Since the prediction horizons are on the order of tens of seconds, we considered the mean speed of vehicles as the kinematics of their movement.Tables 1 and 2 indicate the results of both zoning methods described in Figure 7. Tables 1 and 2 indicate that, in different zoning scenarios and prediction horizons, the proposed method had significantly better results than the kinematics-based method.While the proposed method takes the physics of arterials into account, this can be proposed as the main reason behind the improvement in the results.
The precision of the prediction also has a meaningful relationship with the initial location of the vehicle (x 0 ). Figure 9 illustrates that when the prediction horizon is 15 s and 30 s, the precision of the prediction reduces for vehicles that are closer to the intersection.It seems that vehicles have a more predictable movement when they are far from the intersection.In other words, while the movement of the vehicles is affected by the queues at intersections, the precision of the predictions decreases near the intersections.However, when the prediction horizon is greater (in this case 45 s and 60 s), the vehicles reach the intersections and their movements are affected by intersections.This can be proposed as the reason why the precision of predictions is not related to the initial location of a vehicle in greater prediction horizons.
predictable movement when they are far from the intersection.In other words, while the movement of the vehicles is affected by the queues at intersections, the precision of the predictions decreases near the intersections.However, when the prediction horizon is greater (in this case 45 s and 60 s), the vehicles reach the intersections and their movements are affected by intersections.This can be proposed as the reason why the precision of predictions is not related to the initial location of a vehicle in greater prediction horizons.Figure 10 indicates the average of estimated probabilities and the percentage of realizations in zones for different priorities (priority 1 to 5 are presented in the graphs).These graphs reveal a relationship between the percentage of the realizations and average of the probabilities of zones with specific priorities.Although we were not able to validate the modeled probability functions directly (by comparing the modeled functions with empirical distributions), this relationship indicates that our model follows the real-world movement of vehicles in arterials.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 13 of 18 Figure 10 indicates the average of estimated probabilities and the percentage of realizations in zones for different priorities (priority 1 to 5 are presented in the graphs).These graphs reveal a relationship between the percentage of the realizations and average of the probabilities of zones with specific priorities.Although we were not able to validate the modeled probability functions directly (by comparing the modeled functions with empirical distributions), this relationship indicates that our model follows the real-world movement of vehicles in arterials.

Conclusions
In this paper, a new method for predicting the location of vehicles in arterials is proposed and validated.The method estimates the future zone of a vehicle based on the CDF of the delay of the vehicles.The results indicate a promising improvement compared with a kinematics-based method that is regularly used in the databases.Especially, when the prediction horizon is longer, the results indicate a more significant improvement in the precision of the predictions in comparison with the kinematic-based method.The observed improvement may be related to the result of considering the dynamics of the intersections in the prediction model.
The proposed method not only predicts the most probable zone of vehicles but also determines

Conclusions
In this paper, a new method for predicting the location of vehicles in arterials is proposed and validated.The method estimates the future zone of a vehicle based on the CDF of the delay of the vehicles.The results indicate a promising improvement compared with a kinematics-based method that is regularly used in the databases.Especially, when the prediction horizon is longer, the results indicate a more significant improvement in the precision of the predictions in comparison with the kinematic-based method.The observed improvement may be related to the result of considering the dynamics of the intersections in the prediction model.
The proposed method not only predicts the most probable zone of vehicles but also determines the priorities for the presence of vehicles in different zones.While the geometry of the zones is not a determinant in the proposed model, our model can also be used in dynamic scenarios in which the zones change over time.

Figure 1 .
Figure 1.The links of the road network are divided by zones.Figure 1.The links of the road network are divided by zones.

Figure 1 .
Figure 1.The links of the road network are divided by zones.Figure 1.The links of the road network are divided by zones.

Figure 2 .
Figure 2. The distance between the vehicle and the downstream intersection is denoted by x.

Figure 2 .
Figure 2. The distance between the vehicle and the downstream intersection is denoted by x.

Figure 3 .
Figure 3.The differential part of a queue.

Figure 3 .
Figure 3.The differential part of a queue.

Figure 3 .
Figure 3.The differential part of a queue.

Figure 4 .
Figure 4. Illustration of Case 1.This case describes a vehicle that is located at x 0 at time t 0 and then joins the queue at x j .

Figure 5 .
Figure 5. Delay w as a function of x j in Case 1.

Figure 5 .
Figure 5. Delay w as a function of x j in Case 1.

Figure 6 .
Figure 6.Links of the arterials are segmented by zones.The probability of zones is determined for all the zones a vehicle has ahead.

Figure 6 .
Figure 6.Links of the arterials are segmented by zones.The probability of zones is determined for all the zones a vehicle has ahead.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 11 of 18 collected from the real-time trajectories (the selected prediction horizons were multiples of 15 so we could extract the correct future location of the vehicles from the trajectories and estimate the precision of the predictions).

Figure 7 .
Figure 7. (a) Indicates a 25-m and a 50-m grid; (b) indicates a Voronoi diagram that divides the links into 30-to 100-m segments.

Figure 8 .
Figure 8. Study area on Google Maps.

Figure 7 .
Figure 7. (a) Indicates a 25-m and a 50-m grid; (b) indicates a Voronoi diagram that divides the links into 30-to 100-m segments.

Figure 7 .
Figure 7. (a) Indicates a 25-m and a 50-m grid; (b) indicates a Voronoi diagram that divides the links into 30-to 100-m segments.

Figure 8 .
Figure 8. Study area on Google Maps.

Figure 8 .
Figure 8. Study area on Google Maps.

Figure 9 .
Figure 9. Changes in the percentage of the correct predictions with respect to the initial location of the vehicle.Figure 9. Changes in the percentage of the correct predictions with respect to the initial location of the vehicle.

Figure 9 .
Figure 9. Changes in the percentage of the correct predictions with respect to the initial location of the vehicle.Figure 9. Changes in the percentage of the correct predictions with respect to the initial location of the vehicle.

Figure 10 .
Figure 10.The average of estimated probabilities and the percentage of realizations in zones in different priorities.

Figure 10 .
Figure 10.The average of estimated probabilities and the percentage of realizations in zones in different priorities.

Table 1 .
Results of the proposed method and kinematics-based method for the grid zoning described in Figure7a.

Table 2 .
Results of the proposed method and kinematics-based method for the Voronoi zoning described in Figure7b.