A Mesoscopic Traffic Data Assimilation Framework for Vehicle Density Estimation on Urban Traffic Networks Based on Particle Filters

Traffic conditions can be more accurately estimated using data assimilation techniques since these methods incorporate an imperfect traffic simulation model with the (partial) noisy measurement data. In this paper, we propose a data assimilation framework for vehicle density estimation on urban traffic networks. To compromise between computational efficiency and estimation accuracy, a mesoscopic traffic simulation model (we choose the platoon based model) is employed in this framework. Vehicle passages from loop detectors are considered as the measurement data which contain errors, such as missed and false detections. Due to the nonlinear and non-Gaussian nature of the problem, particle filters are adopted to carry out the state estimation, since this method does not have any restrictions on the model dynamics and error assumptions. Simulation experiments are carried out to test the proposed data assimilation framework, and the results show that the proposed framework can provide good vehicle density estimation on relatively large urban traffic networks under moderate sensor quality. The sensitivity analysis proves that the proposed framework is robust to errors both in the model and in the measurements.


Introduction
Traffic state information, such as the density, speed on road segments and the queue size in front of an intersection, is the basis of various road traffic management and control strategies. They range from traffic light control [1], ramp metering [2] to link control [3], and route guidance [4]. Estimation of the traffic state is necessary due to the limited coverage of sensors and to the noisy measurements that the sensors produce [5]. Traffic models and traffic simulations play an important role in traffic engineering and traffic control and are widely used in traffic state estimation [5,6].
However, many factors influence the accuracy of traffic simulation results. Firstly, since every traffic flow model is a simplification of a real traffic system which is complex and uncertain in nature, errors from the process of modeling are inevitable. They include both the inaccurate modeling, the errors in parametric data as well as the uncertainty in traffic systems [7][8][9][10]. Moreover, unpredictable traffic events, such as automobile accidents, make the estimate of traffic simulations far from the real traffic condition. In order to reduce these errors and improve the accuracy of traffic simulation results, data assimilation techniques are employed.
Data assimilation aims to incorporate the observed information into the dynamic system model to produce improved state estimates [11,12] where the three elements of system model, measurement model and data assimilation algorithm are involved. It has been widely applied in areas such as

Mesoscopic Urban Traffic Model in the DEVS Formalism
Previous research has defined and validated the approach of aggregating vehicles into platoons in the urban traffic through analyzing real measurements [26]. Since platoon based model (PBM) makes a good compromise between computational efficiency and simulation accuracy, we employ it as our traffic flow model with the expectation that our proposed data assimilation framework can be applied in relatively large urban traffic networks.
The PBM is a typical discrete event system model, so we formally describe it using the DEVS formalism [27] which is widely adopted in discrete event modeling and simulation. Firstly, we identify the atomic components of an urban traffic system and present their coupling relations to construct a network. Then, we depict the dynamic behaviors of some key atomic models with the DEVS formalism.

The Coupled DEVS Model of the Urban Traffic System
Conceptually, an urban traffic network is composed of links and intersections with specific origins and destinations of traffic demands. Following the DEVS framework, an urban traffic network is represented as a coupled model which consists of atomic components. We identify six types of atomic components in an urban traffic system: • source model A, which randomly generates platoons of vehicles according to the traffic arrival flow and sends them into the urban traffic network; • segment model M, which represents either a section of road links S or a preselection lane P at the entrance of a intersection and describes the movement of vehicle platoons on it; • assignment model D, which randomly assigns platoons that will enter an intersection to the preselection lanes according to the given turning probabilities; • intersection model I, which imitates the behavior of a physical intersection in urban traffic networks and transfers platoons from the preselection lanes at entrance points to the exit links; • traffic light model L, which sends index signals to an intersection model to switch the phase of traffic light periodically. In our study, the fixed-time traffic light control is employed; • sink model B, which serves as the destination of vehicles and records information of platoons leaving the network under study.
For an urban traffic network under consideration, we define a set {So, Sg, Ag, Int, Tl, Sk} to categorize all related atomic components, where So is the set of all related source models (i.e., So = {A i , i = 1, . . . , N A }), Sg is the set of all related segment models (i.e., Sg = {M i , i = 1, . . . , N M }), Ag is the set of all related assignment models (i.e., Ag = {D i , i = 1, . . . , N D }), Int is the set of all related intersection models (i.e., Int = {I i , i = 1, . . . , N I }), Tl is the set of all related traffic light models (i.e., Tl = {L i , i = 1, . . . , N L }), and Sk is the set of all related sink models (i.e., Sk = {B i , i = 1, . . . , N B }).
In addition, four types of messages which are transmitted between atomic models are defined: • platoon message, representing a group of vehicles traveling together with the same speed (i.e., the platoon of vehicles). The platoon message is characterized by variables (T head , P size ), indicating the time instant when the head of the platoon arrives at the entrance boundary of the current segment/intersection and the number of vehicles within the platoon respectively; • exit message, used to block (exit = 0) or free (exit = 1) the exit boundaries of segment models (maybe via an intersection model); • revise message, used to revise the number of vehicles on the downstream segment when a platoon is split by the red traffic light. The platoon messages and revise messages are transmitted to a segment model via the same port. A revise message consists of variables ( f lag r , N r ), where f lag r is used to distinguish the revise message from the platoon message (for example, f lag r = −1 when T head ≥ 0 in platoon messages are assured in a simulation) and N r indicates the number of vehicles failing to cross the stop line.
• phase_index message, which indexes the phase of the traffic light and is sent to an intersection model by a traffic light model; Figure 1 illustrates how the atomic models form a coupled urban traffic network model using ports. In Figure 1, the rectangles represent atomic models with input and output ports and the arrows show the connections where messages are sent from an output port to an input port of models. A road link Link i can be represented by a sequence of segment models (donated as S 1 , . . . , S s ) where platoon messages are transmitted from the upstream to the downstream segment and exit messages are transmitted from the downstream to the upstream segment. The first segment S 1 receives platoon messages from an upstream source A m or Intersection I j . The last segment S s sends platoon messages to the downstream component. If the downstream component is a sink model, platoons can enter it directly. Otherwise, the downstream of this link is connected to an intersection. In this case, S s first sends platoon messages to an assignment model D j in order to assign the vehicles within a platoon to different preselection lanes. Then, the platoons are sent to an intersection model Int j by the preselection lanes. The exit messages are transmitted from the intersection model Int j to S s via their preselection lanes. Intersection models transmit platoon and revise messages to the downstream links and receive exit messages from them. For each intersection, there is a corresponding traffic light model which sends phase_index messages to it. Notice that the coupled urban traffic network model has no external input and output.

Key Atomic Components of the Urban Traffic System
In this subsection, we will describe the atomic models of source, segment, and intersection in detail. Each atomic component is modeled into different phases. The phase variable qualitatively partitions the infinite state space into finite mutually exclusive and collectively exhaustive subsets (i.e., phases) where the dynamics of atomic models are recognizable. Thus, we can specify the behavior of atomic models (e.g., the time advance, transition, and output function) in each phase. Phases make models more understandable, validatable, and communicable [28]. The phases and state variables of these atomic components are listed in Table 1. Since the other models (i.e., sink model, assignment model and fixed-time traffic light model) are quite simple, we omit them in this paper due to the limited space. The container of the information of all platoons on the segment (the platoon that is entering or leaving the segment is also in it) vn The number of all vehicles in platoonList out The Let the n-th platoon message be sent out at time p_time n in which the number of vehicles is p_size n , then the time when sending the (n + 1)-th platoon p_time n+1 is determined by where hw is the average time interval between two successive vehicles within a platoon entering the network, ∆ is a pre-determined value which represents the minimum time gap between successive platoons, and r gap is an exponentially distributed random variable. As a result, the vehicle arrival rate q is determined by where E(r gap ) is the mean value of r gap , E(p_size) is the mean value of the size of the platoon generated which is drawn from a binomial distribution with size limit of p_size max . According to Equation (2), given the vehicle arrival rate, E(p_size) is calculated by

Segment Model
The segment model has two pairs of input and output port: InPorts = {"p_in", "e_in"}, OutPorts = {"p_out", "e_out"}, where "p_in" is used to receive platoon/revise messages, "p_out" is used to send platoon messages, "e_in" and "e_out" are used to get and send exit messages. Three attributes are defined for the segment model: V max is the speed limit of the segment, segLength is the length of the segment, and C represents the maximum number of vehicles on the segment. There are three state variables in the segment model: platoonList records the information of all platoons on the segment including the platoon which is entering or leaving the segment; vn is the number of all vehicles in platoonList; out indicates whether the platoons can leave the segment when arriving the boundary.
When a platoon characterized by (T head , P size ) enters a segment, it travels on the segment with an independently random speed P v = p · V max , where p is a random variable indicating the speed profiles of platoons on urban roads. The same as in [26], we assume p = 1.0, 0.9, 0.8 with probabilities of 0.8, 0.15 and 0.05, respectively. Then, the element of (T head , P size , P v ) is added to platoonList. Notice that, unlike [26], the queue size is not represented separately in our study, since we focus on the vehicle density on the segment. However, if we need the queue size (e.g., when the vehicles in the queue exit the segment as a single platoon), it can be calculated as in [29]. In the platoon based model, the movements of platoons on the segment are not traced, only the entries and exits of platoons are dealt with, and overtaking of platoons within a segment is not considered currently. If a faster platoon catches up with a slower platoon, they merge as a single platoon.
As is shown in Table 1, eight phases are defined to model the dynamical evolution of an urban road in the segment model: • empty, which indicates there is no vehicle on the segment (i.e., vn = 0); • approach, which indicates the first platoon in platoonList is approaching the exit boundary of the segment; • cross, which indicates the first platoon in platoonList is crossing the exit boundary of the segment; • blocked, which indicates the head of the first platoon in platoonList has arrived at the blocked exit boundary and the segment can contain all the vehicles in platoonList; • blocked_in, which indicates the head of the first platoon in platoonList has arrived at the blocked exit boundary and the last platoon in platoonList is entering and will totally occupy the segment; • blocked_ f ull, which indicates the exit boundary of the segment is blocked and the segment is totally occupied by vehicles; • transient_p, which is a transient phase with 0 time duration. The segment model moves to transient_p in order to output a platoon message; • transient_e, which is also a transient phase. The segment model moves to transient_e in order to output an exit message. Figure 2 shows the phase transitions of the segment model. In the diagram, external transitions and message outputs are represented by solid arrow lines, while internal transitions are represented by the dashed arrow lines. Conditions of transitions are indicated together with the arrow lines representing the internal/external transitions. When a segment is in empty, a phase transition to approach takes place immediately if receiving a platoon message through "p_in". The phase stays approach until the time when the first platoon reaches the exit boundary. If the boundary is free (i.e., out = free), the phase moves to transient_p to send the platoon message to the downstream model through "p_out" and instantaneously a phase transition to cross occurs. As soon as the platoon leaves the segment completely, the segment removes it from platoonList. In this case, if there still are platoons on the segment (i.e., vn > 0), the phase moves back to approach. Otherwise, the phase moves to empty.
The exit boundary becomes blocked if a segment receives a blocked exit message (i.e., exit = 0) from "e_in". If the phase of a segment is approach, a queue forms when the first platoon arrives at the blocked boundary. In this case, the phase transition depends on the number of vehicles in platoonList.
If vn ≥ C, the phase jumps to blocked_in. Otherwise, the phase moves to blocked. If the phase is cross when a segment receives a blocked exit message, the crossing platoon is split, and the phase transition also depends on vn like in the approach case.
In phase blocked, if receiving a platoon message results in excessive vehicles (i.e., vn ≥ C), the segment also transits to blocked_in. If the segment is full, the phase enters blocked_ f ull via transient phase transient_e for sending a blocked exit message to the upstream model through "e_out".
In phase blocked_ f ull, as soon as a free exit message (i.e., exit = 1) is received from "e_in", the segment jumps to transient_e and transient_p successively in order to send a free exit message to the upstream model and send a platoon message to the downstream model, then the phase enters cross. In the case that the phase is blocked or blocked_in when a free exit message is received, a phase transition to cross via transient_p occurs. In addition, the segment which is connected to the exit point of an intersection can receive revise messages from "p_in". In this case, if the segment is in blocked_in and the revised platoon can no longer totally occupy the segment (i.e., vn < C), the phase moves to blocked.

Intersection Model
An intersection connects the upstream preselection lanes and the downstream exit segments. Three types of input ports and two types of output ports are defined in the intersection model.
where "p_in m " is used to receive platoon messages from an upstream preselection lane, "e_in n " is used to receive exit messages from a downstream exit segment, "tlc_in" is used to receive phase_index messages from a traffic light model, "p_out n " is used to send platoon/revise messages to a downstream segment, "e_out m " is used to send exit messages to an upstream preselection lane, and msize, nsize are the number of the upstream lanes and downstream segments, respectively. In an intersection model, each upstream preselection lane i m corresponds to a pair of ("p_in m ", "e_out m ") and Ent represents the set of all preselection lanes (i.e., i m ∈ Ent), while each downstream segment o n corresponds to a pair of ("p_out n ", "e_in n ") and Ext represents the set of all downstream segments (i.e., o n ∈ Ext).
In order to associate the preselection lanes with the exit segments and enumerate phases of the traffic light in an intersection, the following variables are defined:

•
ODMap, which maps a preselection lane in Ent to an exit segment in Ext.

•
DOMap, which maps an exit segment in Ext to several preselection lanes in Ent.

•
TLPhases, which contains all phases of the traffic light in an intersection. The phase of the traffic light is represented by a subset of Ent (i.e., TLPhases(i) ⊂ Ent, where i is the index of the phase), which lists the preselection lanes for which the traffic light is green.
As is shown in Table 1, there are two state variables in the intersection model: crossPlatoons contains the related information of platoons which are crossing the entrance boundary of the intersection; currentPhase records the current phase of the traffic light in the intersection. Four phases evolve in the intersection model: Phase I = {empty, cross, transient_p, transient_e}, where transient_p and transient_e are transient phases which are used to output platoon/revise messages and exit messages, respectively, by the intersection, empty indicates no platoon is entering the intersection (i.e., crossPlatoons = NULL), and cross indicates some platoons are entering the intersection (i.e., crossPlatoons! = NULL). The dynamic of evolution between them is shown in Figure 3. When an intersection receives a platoon message of (T head , P size ) from i m , the platoon information along with the i m is added into crossPlatoons, and the time when the platoon reaches the corresponding exit segment T head,e is determined by adding a random delay δ I (i.e., T head,e = T head + δ I ). Then, the phase transits to transition_p in order to send out the platoon message of (T head,e , P size ) to the exit segment (i.e., ODMap(i m )). Subsequently, the phase transits to cross immediately. In phase cross, if a platoon enters the intersection completely, the intersection removes the platoon from crossPlatoons. Then, if there are still platoons in crossPlatoons, the intersection remains cross. Otherwise, the phase jumps to empty.
When receiving a blocked exit message (i.e., exit = 0) from o n , the intersection moves to phase transient_e to send out blocked exit messages to the preselection lanes in DOMap(o n ). When receiving a free exit message (i.e., exit = 1) from o n , the intersection transits to phase transient_e to send free exit messages to the preselection lanes in (DOMap(o n ) ∩ currentPhase). If an external event occurs on port "tlc_in" and a phase index pi is obtained, the currentPhase is updated. Then, the intersection moves to transient_e to send out a free exit message to each preselection lane in TLPhases(pi) that is not blocked by the downstream segment and sends out a blocked exit message to each preselection lane in (Ent \ TLPhases(pi)). In addition, if a platoon in crossPlatoons comes from the preselection lane i m in (Ent \ TLPhases(pi)), it means the platoon is split by the red traffic light. As a result, a revise message is sent out to the exit segment ODMap(i m ) and the platoon information is removed from crossPlatoons.

Data Assimilation Framework for Vehicle Density Estimation Based on Particle Filters
In this section, we present the mesoscopic traffic data assimilation framework. Firstly, we formalize the state evolution based on the mesoscopic traffic model expressed in Section 2. Then, we describe the available traffic data and the measurement model which relates the measurement data to the system state. Subsequently, the particle filter for vehicle density estimation is presented. Finally, the weight computation method is illustrated based on the assumed error model of the noisy measurements.

The Evolution of Traffic State
According to the description of Section 2 and the formalization for discrete event state evolution in [30], the state of an urban traffic network can be defined as X˜k = {{θ i,k i , e i,k i } i∈{So,Sg,Ag,Int,Tl,Sk} , t˜k},k = 0, 1, . . . ;k i = 0, 1, . . . , where t˜k is the time instant when the coupled network model transfers to the current state, θ i,k i represents the state of the atomic component i, e i,k i is the elapsed time since the component i transfers to state θ i,k i , andk andk i are, respectively, the state index of the coupled model and atomic component i. As a result, we formalize the discrete event state evolution of an urban traffic network as where Tra f f icSim represents the platoon based traffic model, k −1 represents the system noise resulting from the randomness of atomic components.

Measurement Model
In this framework, the configurations of traffic signals in urban networks are assumed to be known, and sensors are deployed at inflow boundaries of some segments (an urban road is always subdivided into segments with small length in order to obtain an accurate traffic model, but it is difficult to deploy sensors that densely in the real traffic system). We assume that the sensors can detect and report vehicle passage times. The measurement data is available per time interval of length ∆T, and the measurements at the k-th interval are denoted as where N s represents the number of sensors in an urban traffic network, and Y i k is the vehicle passage times detected by the i-th sensor in the interval ((k − 1)∆T, k∆T]. The detections among sensors are considered independent and the measurement data is assumed to be noisy where both missed detection (i.e., the sensor fails to detect vehicle's passage) and false detection (i.e., the sensor reports a passage when no vehicle passes by) exist. We define two parameters to model the two types of errors: • detection accuracy p, representing the probability that a vehicle passage is detected by a sensor successfully. Consequently, the probability of a missed detection is 1 − p. • occurrence rate of false detection λ, indicating the number of false detections occurring in an unit time interval, which is assumed to be Poisson distributed.
In this framework, since passage times are related with the state transitions over the measurement interval, we formalize the measurement model as follows: where X R k−1 is the state point retrieved at time (k − 1)∆T, X R k−1 +1:R k represents a sequence of states indexed from R k−1 + 1 to R k (i.e., state trajectory) which records the state transitions during ((k − 1)∆T, k∆T] completely, and e k is the measurement noise as is mentioned above.

Principles of Particle Filters
Consider a general discrete state dynamic evolution as follows: where p( s 0 ) is the prior distribution, s k−1 , s k are respectively the state at time k − 1 and k, f k is a possibly nonlinear function, and k−1 is a stochastic process noise. The measurement at time k is given by m k = h k (s k ) + e k , k = 1, 2, ... (9) in which h k is a possibly nonlinear function mapping the state s k to the measurement m k , and e k is a measurement noise. The particle filter aims to estimate the conditional probability density of all states up to time k based on all measurements until time k, that is, p(s 0:k |m 1:k ), where s 0:k = {s 0 , s 1 , ..., s k }, m 1:k = {m 1 , m 2 , ..., m k }.
Since it is always difficult to solve p(s 0:k |m 1:k ) analytically, the particle filter approximates the p(s 0:k |m 1:k ) with a set of Monte Carlo samples (particles) with their corresponding weights [32].
represent the p(s 0:k |m 1:k ), where N p is the particles size, s i 0:k is the i-th particle and w i k is its weight. When the weights are normalized (i.e., ∑ is the Dirac delta distribution in vector form. Since it is usually intractable to draw from p(s 0:k |m 1:k ) directly, the importance sampling method is employed in particle filters. In this method, can be drawn from a probability q(s 0:k |m 1:k ), which is called importance density [32], then the weights {w i k } N p i=1 are computed according to Equation (13): In recursive case, at step k, assuming that {s i 0:k−1 , w i k−1 } The system transition density is a common choice of the importance density, namely, q(s k |s 0:k−1 , m 1:k ) = p(s k |s k−1 ). Consequently, Equation (14) is simplified to In the particle filter, degeneracy phenomenon is a common problem which means most particles have negligible weights and the effective particle set is reduced to very few particles after a few iterations. In order to reduce the influence of the degeneracy, a resampling step is performed after the particles are updated.

Particle Filtering for Vehicle Density Estimation
It has been proven that the variable dimensions of both the system state and the discrete event state trajectory have no tangible effect on the updating of particles and their weights in particle filters by previous studies [24,30,33]. Therefore, we can safely apply the particle filter to estimate vehicle densities in our study. Since we map the traffic state trajectory during the measurement interval to the vehicle passage times in the measurement model of Equation (7), the particle weight should be updated as w k = p(z k |X R k−1 +1:R k )w k−1 , k = 1, 2, . . . Algorithm 1 describes the main steps to estimate traffic densities using particle filters. Algorithm 1: The particle filter for vehicle density estimation // Initialize N p particles at k = 0 1 k = 0 2 for i = 1 : N p do 3 generate the i-th particle X i At the meantime, record the state trajectory during this interval X i R k−1 +1:R k in order to compute the weight. 10 update the weight:

end
At step k = 0, we randomly generate N p particles by guessing the size, position, and speed of platoons over the network, and all weights are initialized to 1/N p (lines 2-5). Then, the following steps are iterated until the end of algorithm:

•
Sampling step: for each particle, we run the mesoscopic traffic simulation for ∆T, the particle is updated and the state trajectory over this interval is recorded. Then, the particle's weight is calculated based on (noisy) newly available passage times and the recorded state trajectory (the method of weight computation is depicted in Section 3.3.3). After all particles are updated, the normalization of the weights is performed to prepare for resampling (lines 8-15).

•
Output step: we obtain the estimated vehicle densities (i.e., the number of vehicles on segments) from the state of the particle with the highest weight (lines [16][17]. The number of vehicles on a segment is calculated by excluding the vehicles which have not entered or have left the segment from vn, and the detailed process is illustrated in Algorithm 2.

Weight Computation
When a sample is generated, the state trajectory is recorded (i.e., X i R k−1 +1:R k ), newly available measurement and the error model are used to compute p(z k |X i Since all sensors detect vehicle passages independently, we have In order to compute p(Y j k |X i R k−1 +1:R k ), we obtain the estimated passage times at the j-th sensor (denoted as Y i,j k ) from X i R k−1 +1:R k , and, as a result, Then, a match procedure [24] is employed to define missed detections and false detections based on the measurement where n i,j is the number of passage times in Y i,j k , n m is the number of missed detections, and n o is the number of false detections. The term p n i,j −n m (1 − p) n m represents the probability of missed detection errors, and the term (λ∆T) no e −λ∆T n o ! represents the probability of false detection errors, and e −d m is a penalty term where d m is the maximum distance in all matched pairs. More details about the math procedure can be found in [24].

Experimental Design
The urban traffic network used in the experiments is shown in Figure 4, where 11 links are connected by seven intersections. In this network, two source nodes generate platoons traveling to the sink, where the mean time gap between successive platoons is 8 s (minimum time gap is 5 s, the mean of random time gap is 3 s) and the average time interval of crossing a boundary is 1.2 s. Platoons are always able to exit the network from the sink. Each link is subdivided into road segments with lengths of 100 m, and 16 sensors are regularly deployed in the network (both the red solid line and the red dotted line represent the inflow boundary of a road segment. The red solid line also indicates the place where a sensor is deployed). All road segments have a speed limit of 15 m/s and a capacity of 16 vehicles. Three fixed time traffic lights with a cycle length of 60 s are used to control the conflicting movements at intersections. The offset and duration of the green lights for each movement at traffic lights are shown at the right top of Figure 4. At the end of link 1 and link 9, platoons are split and assigned to different exit links according to the turning probabilities of r 1 , r 2 and r 3 , r 4 , respectively. Firstly, a simulation of the urban traffic network is performed, and all data is recorded. The simulation is considered as the real system, and the recorded data is regarded as the ground truth data. Then, the ground truth data are processed based on the assumed error model to produce the noisy measurement data that will be used in the data assimilation to estimate the vehicle density.
Then, we build an imperfect traffic model by adding errors in the model parameters (see Table 2). The traffic network is simulated again using the imperfect traffic model to get the estimation without data assimilation (we refer these results as the simulated results). Next, the real measurements from the real system are assimilated into the imperfect traffic model to generate the estimation with data assimilation (we refer these results as the filtered results). By assimilating the noisy measurement data, the filtered results are expected to be more accurate than the simulated results.
Specifically, two cases are tested in our experiments where the vehicle arrival rates of the network (represented by f low1 and f low2 in Figure 4) and the turning probabilities at intersections (represented by r 1 , r 2 and r 3 , r 4 in Figure 4) are perturbed to get the imperfect traffic models, respectively. The configuration of these parameters is illustrated in Table 2. In the real system, an average of 1000 vehicles per hour enter the network from source 1 and 1200 vehicles per hour enter from source 2. A vehicle reaching the exit point of link 1 moves to link 2 and link 3 with probability of 0.4 and 0.6, respectively, at the end of link 9, these probabilities are 0.6 to link 7, and 0.4 to link 10. In case 1, the imperfect traffic model has inaccurate vehicle arrival rates. Specifically, the flow from source 1 is 200 vehicles per hour more than the real flow, while the flow from source 2 is 200 vehicles per hour less than the real flow. In case 2, the turning probabilities are erroneous in the imperfect traffic model. The probability of traveling to link 2 and link 3 from link 1 is set to 0.6 and 0.4, respectively, while the probability of traveling to link 7 and link 10 from link 9 is set to 0.4 and 0.6, respectively.
We implement an event scheduling based discrete event simulator using c++ on which we run our simulation model. A simulation of 1200 s is considered in all experiments and the number of vehicles on the segments in the urban network are recorded every 60 s. We run the real system for 120 s as a warm-up period. The initial network states of the particles are randomly sampled based on the real network state at 120 s, and the results of 18 cycles (from 180 s to 1200 s) are used to evaluate the effectiveness of the data assimilation framework. In the data assimilation system, the noisy measurement data (i.e., vehicle passage times at each sensor in this network) are available every 60 s.

Evaluation Criteria
In this section, the measurement error model is fixed with detection accuracy p = 0.9, occurrence rate of false detection λ = 1/300 s −1 , and 1000 particles are employed in the data assimilation system. The goal of our experiments is twofold: we intend to show that the filtered results are more accurate than the simulated results when compared with the ground truth, and we want to explore whether the filtered results can estimate the ground truth accurately.
In order to quantify the proximity between two traffic states, we consider the Root Mean Square Error (RMSE) of the number of vehicles on segments as the evaluation criteria, that is, where RMSE e,k represents the RMSE of the estimated results (including the simulated results and the filtered results) comparing with the ground truth at time step k, N s is the total number of segments in the traffic network, s r i,k indicates the number of vehicles on the i-th segment in the ground-truth traffic state at time instant k∆T while s e i,k is the corresponding number in the estimated state at the same time instant.
In order to illustrate the accuracy of the estimation to the ground truth, we base our analysis on the estimation results of the vehicle density on an arbitrarily selected road segment. In our case, the 17th road segment (the one in dark in Figure 4) is chosen. Figure 5 displays the experimental results of test case 1 where the vehicle arrival rates are inaccurate. Figure 5a shows the RMSE errors of the estimation results with and without data assimilation, respectively. As shown in the figure, the RMSE errors of the estimation results with data assimilation are smaller than that of the estimation results without data assimilation at all time steps, which indicates that the data assimilation framework has improved the estimation results of the whole traffic network with the help of the sensor data. The RMSE errors of the estimation results without data assimilation are decreased by an average of 19.4% when the noisy measurement data are assimilated using the proposed data assimilation framework. Figure 5b compares the estimated number of vehicles using data assimilation (blue line) with the ground-truth value (red line) on the 17th road segment. From the figure, we can see that the estimated number follows the real number at most of the time steps. The Mean Absolute Percentage Error (MAPE) of the estimated number over 18 cycles is 10.4%, which indicates a promising performance. At some points (for example, t = 540 s, 660 s, 780 s), the estimated numbers contain relatively large errors. One possible reason is that only the most likely particle is insufficient to represent the whole possibility distribution. Future research is needed to find a suggestion that can better reflect the "belief histogram" estimated by particle filters.

Test Case 2
This case examines the effectiveness of this proposed data assimilation framework dealing with the erroneous turning probabilities. The experimental results are displayed in Figure 6. From Figure 6a, we can see that the RMSE errors of the estimation results without data assimilation are larger than that in case 1 on the whole. It indicates an increasing challenge of estimating the ground truth. Similar to the experimental results in case 1, this data assimilation framework reduces the RMSE errors of estimation results at all time steps by assimilating the sensor data in this case. After assimilating the noisy measurement data using the proposed data assimilation framework, the RMSE errors of the estimation results without data assimilation are reduced by 21.1% on average. Figure 6b shows the estimated number of vehicles with data assimilation and the ground-truth value on the 17th segment. The MAPE over 18 cycles is 10.7% in this case, which exhibits a comparable effectiveness to that in test case 1.

Sensitivity Analysis
In this section, a series of additional experiments are carried out to analyze the sensitivity of the estimation results to several key factors of the proposed data assimilation framework. These factors include the measurement data quality and the number of particles. The average RMSE error over 18 cycles is used to quantify the experimental results: For each combination of parameters, we present the average result of 10 independent experiments.

Effect of Measurement Data Quality
In the measurement model of this study, detection accuracy p and occurrence rate of false detection λ characterize the quality of the noisy data. Therefore, we explore the effect of sensor quality by varying p and λ. The set of parameters used in the case experiments (i.e., N p = 1000, p = 0.9, λ = 1/300 s −1 ) are selected as the baseline. When varying p , we remain λ = 1/300 s −1 ; when varying λ, we keep p = 0.9. The results are shown in Figure 7a,b, respectively. Coinciding with our expectations, in both cases, the data assimilation performance deteriorates as the data quality becomes worse. However, even when the detection accuracy falls to 0.6 or the false rate increases to 1/60 s −1 , the performance is still better than that of the estimation results without data assimilation (in test case 1 and case 2, the RMSE of the estimation results without data assimilation are 1.86 and 2.15, respectively), which indicates the robustness of this framework to measurement data errors.

Effect of the Number of Particles
We fix p = 0.9, λ = 1/300 s −1 in both cases and vary the number of particles used in the algorithm from 100 to 2000. The results are displayed in Figure 8a. From the figure, we can see that, as the number of particles increases from 100 to 2000, the RMSE error decreases in both cases. The more particles used, the better the performance. However, we note that the decrease of RMSE error is not proportional to the increase of the number of particles. Figure 8b shows the increased percentage of RMSE error relative to that at 1000 particles (i.e., (RMSE/RMSE(N p = 1000) − 1)). The plot tells that a reduction from 1000 to 100 leads to an increase of about 5.6% (6.54% in case 1, 4.83% in case 2) of the error measure, while doubling the number of particles improves the performance about 1.6% (1.44% in case 1, 1.84% in case 2).

Conclusions
In this study, we presented a data assimilation framework for vehicle density estimation on urban traffic networks. In this data assimilation framework, a mesoscopic traffic model (i.e., platoon based model) was employed since it is not only able to capture more details compared with macroscopic traffic models, but also has the advantage of computing faster than microscopic traffic models. The passage times of individual vehicle were considered as the measurement data, which contains errors of missed and false detection. Since the mesoscopic traffic model is nonlinear and the vehicle passage times contain strongly non-Gaussian noises, particle filters, which impose no restriction on the model dynamics and error assumptions, were applied to conduct the data assimilation.
In order to test this data assimilation framework, we conducted experiments in a simulated urban traffic network. Experimental results show that the proposed data assimilation framework can provide more accurate estimation results compared to those produced without data assimilation. More specifically, the average percentage of reduced errors of 19.4% and 22.1% are achieved in the two test cases (one with errors in vehicle arrival rates, and the other with errors in turning probabilities), respectively. With regard to the estimation accuracy, the estimated results are able to follow the real situation at most time steps. The absolute percentage errors of the estimated vehicle density are respectively 10.4% and 10.7% in the two cases, which indicates a promising performance.
Sensitivity analysis indicates that this data assimilation framework is robust to both measurement errors and model errors. In both cases, even with 40% missed passage times or one false detection per minute, the performance does not deteriorate too much and is still superior to that without data assimilation. It is noticed that the improvement of performance is not proportional to the increase of the number of particles. Specifically, an increase of the number of particles from 1000 to 2000 leads to an improvement of about 1.6%, while a reduction of the number of particles from 1000 to 100 results in a deterioration of about 5.6%.
Future research directions include looking for an appropriate real-life scenario to further evaluate and apply the data assimilation framework, and integrating traffic data from different sources (for example, route choice fractions from automated vehicle identification (AVI) system and travel time from floating-car data) in the data assimilation framework to improve the accuracy and robustness of the estimation results further. Another direction is to combine the state estimation framework with urban traffic control strategies to improve the performance of the urban traffic networks.
Author Contributions: S.W. conceived the presented idea, performed the experiments, and wrote the manuscript; X.X. guided the entire study and contributed to the final version of the manuscript; R.J. helped shape the research.
Funding: This research is supported by the National Natural Science Foundation of China (No. 61673388).