Estimation of Traffic Stream Density Using Connected Vehicle Data: Linear and Nonlinear Filtering Approaches

The paper presents a nonlinear filtering approach to estimate the traffic stream density on signalized approaches based solely on connected vehicle (CV) data. Specifically, a particle filter (PF) is developed to produce reliable traffic density estimates using CV travel-time measurements. Traffic flow continuity is used to derive the state equation, whereas the measurement equation is derived from the hydrodynamic traffic flow relationship. Subsequently, the PF filtering approach is compared to linear estimation approaches; namely, a Kalman filter (KF) and an adaptive KF (AKF). Simulated data are used to evaluate the performance of the three estimation techniques on a signalized approach experiencing oversaturated conditions. Results demonstrate that the three techniques produce accurate estimates—with the KF, surprisingly, being the most accurate of the three techniques. A sensitivity of the estimation techniques to various factors including the CV level of market penetration, the initial conditions, and the number of particles in the PF is also presented. As expected, the study demonstrates that the accuracy of the PF estimation increases as the number of particles increases. Furthermore, the accuracy of the density estimate increases as the level of CV market penetration increases. The results indicate that the KF is least sensitive to the initial vehicle count estimate, while the PF is most sensitive to the initial condition. In conclusion, the study demonstrates that a simple linear estimation approach is best suited for the proposed application.


Introduction
Real-time traffic estimation has received increased attention with the introduction of advanced applications and technologies such as intelligent transportation systems (ITSs). Adaptive traffic signal controllers require real-time traffic state estimation to improve intersection performance, as real-time estimation plays a major role in capturing variations in traffic behavior (e.g., nonrecurrent changes). As inputs to traffic signal controllers, traffic state variables (e.g., travel time and traffic density) assist with green time allocation and help to enhance intersection performance by reducing traffic delays, vehicle emissions, and fuel consumption [1,2]. Several estimation techniques have been developed to estimate traffic state variables [3][4][5][6][7]. In some previous studies, the traffic state system has been treated as a linear system [4,[7][8][9][10]. Other studies have considered the system as nonlinear [3,[11][12][13]. For linear system models, a Kalman filter (KF) has been widely deployed to produce accurate estimates [4,7,10,14] due to its simplicity and applicability in the field. The KF assumes linear system transitions with a Gaussian distribution for the probability density function (PDF) of the system and measurement noise. For nonlinear system models, an extended KF (EKF) has been utilized in estimation [11,15,16]. The EKF also assumes that the PDF distribution is Gaussian. The EKF is derived by linearizing the system using a Taylor series expansion by calculating the Jacobian expression. However, it was found that use of the EKF approach is only valid if the system is near linearity during the updating time [17], and thus, large errors may result from linearization. In addition, the task of deriving the Jacobian matrices may cause implementation difficulties [18]. A more robust nonlinear approach is a particle filter (PF), which has been frequently employed in the literature to handle nonlinear dynamic problems [12,19,20]. The PF approach is a Monte Carlo sequential solution that deals with nonlinear system transitions without the assumption of the PDF noise distribution [21,22]. In this paper, a PF approach is developed to estimate the traffic stream density along signalized intersection approaches using only connected vehicle (CV) data. Moreover, the paper compares the performance of the PF to the KF and adaptive KF (AKF) approaches.
Traffic density is defined as the number of vehicles per unit length on a specific roadway segment [23]. Estimating the traffic stream density is critical in the development of effective traffic controllers [24]. For instance in the case of freeways, identifying bottleneck locations in the early stages is critical in developing congestion mitigation strategies that include ramp metering, variable speed limits, and traffic routing. For signalized segments, the traffic density measures are crucial for either traffic signal performance [25][26][27] or traffic signal optimization [28][29][30]. Hence, traffic density measures must be precisely estimated to represent traffic demands at each signalized intersection approach. Once accurate measurements are obtained, efficient adaptive traffic signal controllers can be developed. However, determining traffic density is not a trivial task and cannot be directly measured in the field since it is a spatial measurement. Consequently, traffic stream density is typically based on estimations.
Previous research has utilized different data sources, such as stationary sensors (e.g., loop detectors), fused data (combining two distinct data sources), and CV data to estimate traffic stream density. Traffic density estimates can be measured using video detection systems, but this is difficult due to the high cost of the infrastructure and the limited visibility of roadway segments [4]. Time-occupancy measurements from loop detectors are used as an alternative data source to estimate the traffic density [31]. However, time-occupancy measurements only represent the temporal density estimates around the location of the detector. A recent study introduced a relationship between time-occupancy and space-occupancy to estimate traffic density by dividing the link into small segments and installing detectors on all of the small segments [32], but the installation cost is high. A more common way of estimating the traffic density is the use of the traffic flow continuity equation (input-output approach), which considers two traffic counting stations, one at the entrance and the other at the end of the link [33]. Vigos et al. [4] proposed a robust linear KF approach with at least three loop detectors to estimate the traffic density along signalized approaches. However, the implementation cost is high. Another study [34] employed two conventional loop detectors to estimate the traffic density using the flow continuity equation. The two loop detectors provide the estimation model with traffic flow and occupancy data. Bhouri et al. [35] proposed a KF approach to estimate the traffic stream density along a freeway segment using both loop detectors and a recorded film [35]. One commonality about the use of fixed sensors is that they are subject to detection failures and thus always produce errors in their data [36,37].
Recent research has fused different data sources to estimate the traffic stream density along certain roadway sections, increasing the accuracy of the estimate over using just one data source [7,14]. Many works have employed the KF approach [14,38,39]. For instance, traffic flow data at the entrance and the exit of the roadway section observed from stationary sensors together with CV data were used to estimate the traffic density [38]. The CV data provided travel-time measurements to correct the prior estimate from the state equation. Another study has utilized fused loop and CV measurements to estimate the traffic density in a freeway section [19]. In that study, the authors derived the estimation model using the PF estimation approach, considering two sources of measurements: (1) loop detectors, and (2) fusing loop detectors and CVs. They obtained a 30% reduction in the mean absolute percentage error from the fused measurements compared to the measurements from loop detectors, demonstrating that more data sources produce more-accurate outcomes. However, the use of different data sources requires more computational cost in both time and memory as the data include both trivial (data that are not needed) and nontrivial (data that are needed) information.
Limited studies have used CV data as the only source of inputs to estimate the traffic stream density [7,9,10]. These studies developed the linear KF estimator approach. The CV data used were the number of CVs at the entry and at the exit of the tested roadway section, in addition to the travel time experienced for the CVs to traverse the tested section. Moreover, Aljamal et al. [7] demonstrated that treating the estimation interval time as a variable instead of a fixed value is mandatory when dealing with only CV data, as the variable approach always ensures that sufficient information is gathered from the CVs in every estimation interval. This approach enhances the accuracy of the estimation, especially for the scenarios with low CV level of market penetration (LMP) rate. The estimation time interval for this study is therefore defined as the time when an exact number of CVs (i.e., 5 vehicles) reach the end of the tested link.
Several researchers have employed the PF approach to improve traffic stream estimates for different transportation applications, including traffic flow [12,13], travel time [3,20], and traffic speed [40]. In one study, magnetic loop detectors were placed at the boundaries of the tested freeway section to estimate the traffic flow, and a PF estimation approach was developed using traffic flow and speed measurements [12]. In another study, Mihaylova et al. [13] developed two nonlinear approaches, an unscented KF and a PF, to produce real-time traffic flow estimates in a freeway network using data from stationary sensors. They found that the PF approach outperformed the unscented KF. Chen et al. [20] proposed a time series speed equation to estimate traffic speed. They claimed that the traffic system is nonlinear and thus presented two nonlinear approaches, a PF and an ensemble KF, using available speed measurements from loop detectors. They found that the PF approach is more accurate than the ensemble KF. Another study developed a PF estimation approach for travel-time predictions using real-time and historical data [3]. They used the historical data to generate particles as opposed to using a state-transition model. In addition, a comparison between the PF, KF, and k-nearest neighbor estimators found that the PF is the most accurate approach. CV data were employed to estimate the traffic speed and flow using the PF approach [40]. In that study, each link in the network was assumed to have base stations to retrieve and transfer the data. Results found that other data sources (e.g., loop detectors) should be incorporated with CVs to enhance the estimation performance. However, our recent study developed a KF approach, showing that the use of CV data alone is sufficient to obtain accurate results [7].
In summary, the existing literature shows that the PF has been widely used to address nonlinear systems and has been proven to outperform other nonlinear estimation techniques; however, to our best of knowledge in the application of traffic stream density estimation, only a few studies have applied the PF approach using data from stationary sensors and fusing data from different sources. In addition, no comparison between the PF and the linear KF has been reported. Therefore, the PF was adopted in this study. The primary objective of this study is to develop a nonlinear PF estimation approach to estimate the traffic stream density based solely on CV data on signalized approaches. Subsequently, we compare the PF approach to linear estimation approaches-namely, KF and AKF-to identify the best approach for the application of the traffic density estimation, given that no comparison has been reported in the literature between these filtering techniques. Consequently, this research will recommend a specific approach to estimate the traffic stream density. The proposed three approaches are employed to estimate the vehicle counts based solely on CV data. In addition, this study also investigates the sensitivity of the proposed estimation approaches to several factors, such as the LMP rate of the CVs, the initial conditions, and the number of PF particles.
The paper is organized as follows: Section 2 describes the problem formulation and the estimation approaches. Section 3 discusses the findings from applying the estimation approaches. Section 4 includes the conclusions of the study and the proposed future work.

Problem Formulation and Estimation Approaches
First, Section 2.1 formulates the research problem using a state-space model. Then, three different estimation approaches are described: the PF (Section 2.2.1), the KF (Section 2.2.2) and the AKF (Section 2.2.3).

State-Space Model
The state-space model is represented by a state equation and a measurement equation. The state equation describes how the system behaves and provides a prior knowledge of the estimation. The measurement equation is used to help correct and improve the prior estimation. In this study, the goal is to estimate the number of vehicles on signalized links using only CV data, as depicted in Figure 1, where CVs are the vehicles that have the connection icon (e.g., the first vehicle on the left). The only information that is needed in practice is as follows: (1)   The model is formulated using the derived state-space equations in [7]. The state equation, Equation (1), is based on the continuity equation of traffic flow; whereas the measurement equation, Equation (2), is based on the traffic flow hydrodynamic relationship, based on measurements of the average travel time of CVs. In Equation (1), the number of vehicles is computed by continuously adding the difference of the number of vehicles that enter and exit the tested section to the cumulative number of vehicles traveling along the section previously computed.
where N(t) is the number of vehicles traversing the link at time t, N(t − ∆t) is the number of vehicles traversing the link in the preceding time interval, and u(t) is the system inputs, as described in Equation (3).
where q in and q out represent the flow of CVs entering and exiting the link, respectively, during ∆t. ρ is the CVs' LMP, defined as the ratio of CV count to total vehicle count. In the state equation, the ρ variable is set to be the maximum number of the actual ρ (ρ actual ) and a predefined minimum value of ρ (ρ min ). ρ actual can be obtained from historical data. ρ min is introduced to avoid producing large errors in the state equation since a single ρ value is used to approximate the two ρ values (upstream and downstream of the tested link) [7]. In this study, ρ min is set to be equal to 0.5; more details about the system state representation can be found in [7]. It should be noted that the ρ variable is the main noise source in the system, and thus, there is an urgent need to develop the measurement equation to fix these errors. In Equation (2), TT is the average vehicle travel time, H(t) is a vector that transforms the vehicle counts to travel times. H(t) is derived from the hydrodynamic relationship between the macroscopic traffic parameters (flow, density, and space-mean speed), as presented in Equation (4).

Estimation Approaches
As mentioned earlier, the KF and the AKF are considered linear estimators that can efficiently handle linear state-space systems. However, in the proposed state-space equations, we suspect some nonlinearity coming from the ρ variable, which raises the question, would a nonlinear filter improve the estimation performance? For this purpose, this study develops a nonlinear PF approach to estimate the vehicle counts along the signalized link. This section presents the formulation of the three approaches used to estimate the vehicle counts using only CV data along signalized approaches. The three techniques are the proposed PF, the KF [7], and the AKF [10].

The PF Approach
The PF approach is used to solve nonlinear state-space systems with no form restrictions on the initial state and noise distributions. For instance, the PF can deal with any arbitrary PDF distribution [21]. The PF approach is used to estimate the posterior PDF of the state vehicle count variable (N) given some measurements of CV travel times (TT) by assigning k number of particles (samples). Each particle has a certain relative weight (w). When a new measurement is received, the particles' locations and weights are updated. It should be noted that the particles with low relative weight values are replaced with new particles (resampling) so that the system keeps only the important particles. The estimates are then calculated using the average value of the remaining particles. The following steps are used to implement the proposed PF approach:

1.
Initialization: t = 0; where t is the time interval.
(a)N + (0), R, V, and k, whereN + (0) is the initial vehicle count estimate; R is the measurement's covariance error; and V is the variance of the initial vehicle count estimate, which is used to randomly generate the initial particles' locations aroundN + (0).

(b)
Generate k particles' locations randomly, from 1 to K, from the initial prior Gaussian distribution P(N 0 ).
(a) Update the locations (N k (t)), measurements (TT k (t)), and weights (w k (t)) of the particles.
where TT is the observed measurement from the CVs. The weights are then normalized using the following equation,ŵ k (t) = w k (t)/ K ∑ k=1 w k (t).

(b)
Replace the low-weighted particles with new particles (resampling [21]). After a few iterations in the PF process, the weight will focus on a few particles only and most particles will have insignificant weights, resulting in sample degeneracy [41]. The resampling process is therefore used to tackle the degeneracy problem. It should be noted that the highly weighted particles are used to compute the PF posterior estimate.
(c) Compute the PF posterior estimate: The PF posterior estimate is computed as the average value of the remaining particles (particles with high weights), as shown in Equation (9).
(d) Next time step (t + ∆t): When 5 new CVs traverse the link, return to step 2a.

The KF Approach
The KF approach is a linear quadratic estimator. It has been proven to be the best for estimating linear systems with Gaussian noise [42]. The KF estimation approach can be solved using the following steps:

1.
Initialization: t = 0; where t is the time interval.
(a) Prior estimates:N − (t) =N + (t − ∆t) + u(t) (10) whereN − is an estimate of a priori vehicle count,TT is the estimated average travel time, andP − is the a priori covariance estimate for the state system.
(b) Correction: The correction uses the prior estimate and the new measurement (i.e., the CV average travel time) to compute the Kalman gain (G).
(c) Posterior state estimates: whereN + is the posterior vehicle count estimate, andP + is the posterior error covariance estimate.
(d) Next time step (t + ∆t): When 5 new CVs traverse the link, return to step 2a.

The AKF Approach
The AKF approach is presented to estimate the total number of vehicles, using real-time noise error estimates in the state and measurement systems (i.e., mean and variance values). It should be noted that the KF and the AKF approaches use the same equations, but the AKF approach dynamically estimates the noise statistical parameters every estimation step. The vehicle count estimates can be obtained using the following steps:

1.
Initialization: t = 0; where t is the time interval.

2.
For t = 1 : T (a) Prior estimates: (b) Estimation of noise statistics for the measurement system: where r and R are the mean and covariance of the measurement noise, respectively, and n is the number of state noise samples.
(c) Correction: (d) Posterior state estimates: (e) Estimation of noise statistics for the state system: where m and M are the mean and covariance of the state noise, respectively.
(f) Next time step (t + ∆t): When 5 new CVs traverse the link, return to step 2a.

Results and Discussion
This section evaluates and compares the three estimation approaches. The simulated data were generated for a signalized link under an oversaturation condition in which the traffic demand exceeds the link capacity. The free-flow speed is 40 km/h; the saturation flow rate is 1800 veh/h/lane, resulting in a traffic capacity of 855 veh/h given the cycle length and traffic signal's green times; the speed-at-capacity is 32 km/h; and the jam density is 160 veh/km/lane. The traffic signal is operated at a cycle length of 120 s and a phase split of 50:50. The amber and all-red intervals are 3 s. To test the accuracy of the estimation approaches, the INTEGRATION microscopic traffic assignment and simulation software was used [43,44]. The relative root mean square error (RRMSE), presented in Equation (26), was used to evaluate the proposed estimation approaches.
whereN + (s) represents the estimated count of vehicles, N(s) represents the actual count of vehicles, and S is the overall number of estimations.

Performance of Estimation Approaches
The simulations were conducted with the same predefined initial conditions to obtain a fair comparison. The initial conditions are described in Table 1. It should be noted that each estimator requires specific initial variables. For instance,N + (0), R, andP + (0) are required for the KF approach. For all estimation approaches, the first estimate begins with an erroneous initial estimate of vehicle count (N + (0) = 5 veh), whereas the actual vehicle count is zero [4,7]. Table 1. Initial conditions for the Kalman filter (KF), adaptive KF (AKF), and particle filter (PF) approaches.

Initial Conditions KF AKF PF
The three estimation approaches were evaluated using different CV LMPs, including 1%, 3%, 5%, 8%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90%. For each LMP scenario, 100 random samples from the full data set were created using a Monte Carlo simulation. Table 2 presents the RRMSE values of the KF, AKF, and the PF approaches. The table indicates that estimation errors decrease with increasing LMP for all estimation approaches. The table also demonstrates that the KF outperforms the AKF and the PF approaches. For instance, for the scenario of 1% LMP, the vehicle count estimates were off by 30%, 48% and 64% using KF, AKF and PF, respectively. The PF approach produces high RRMSE values at low LMPs (LMP < 40%), while for the high-LMP scenarios, the PF produces RRMSE values close to the values obtained from the KF. Moreover, the AKF approach produces high errors, especially at very low LMPs (LMP< 10%) and high LMPs (LMP >= 70%). This demonstrates that the real-time estimates of the statistical noise values obtained from the AKF are not needed for the high-LMP scenarios, and the user may proceed with predefined statistical values due to low errors in the vehicle count estimates (low error in the ρ value). It was found that the high RRMSE error values produced from the AKF and PF approaches are mainly caused from assigning an inappropriate initial vehicle count estimate, as discussed in the next section. Figure 2 presents the KF, AKF, and PF estimation outcomes with regard to the actual values at different LMPs (i.e., 10% to 90% with an increase of 10%). In each subfigure, three plots are generated to display the estimation approaches' outcomes with regard to the actual values; the top one displays the PF outcomes, the middle one presents the KF outcomes, and the bottom one displays the AKF outcomes. The actual curve is represented by the dotted curve. In conclusion, the KF approach is recommended, as it produces the most accurate estimates in addition to its simplicity and applicability in the field. The next section will discuss the impact of the initial conditions on the performance of the various estimation approaches.

Impact of Initial Conditions
This section examines the effect of the choice of the initial conditions on the performance of the estimators, such as the initial vehicle count estimateN + (0) and the k number of particles in the PF approach. First, differentN + (0) values were tested, from 0 to 25 at increments of 5, at different LMP scenarios, as presented in Tables 3 and 4. Table 3 presents the RRMSE values when theN + (0) is set to equal 0, 5, and 10 vehicles. Table 4 displays the RRMSE for theN + (0) values of 15, 20, and 25 vehicles. The tables demonstrate that the RRMSE values are sensitive to the changes of theN + (0) values. The tables also show that the PF is the most sensitive estimator toN + (0) for all LMP scenarios. For instance, for the scenario of 1% LMP, the RRMSE is 81% when the simulation starts with 0 veh, while the RRMSE is 17% whenN + (0) is equal to 25. Therefore, starting the simulations with an appropriate initial estimate close to the truth value significantly improves the estimation accuracy since this helps the PF to quickly converge. In addition, the AKF seems to be sensitive to theN + (0) with low LMP scenarios (LMP <= 10%), while the choice ofN + (0) has a slight effect on the estimation accuracy for the scenarios with medium and high LMPs. For instance, for the scenario of 1% LMP, the RRMSE is 71% when the simulation starts with 0 veh, while the RRMSE is 21% whenN + (0) is equal to 25. Lastly, the tables show that the KF is the least-sensitive estimator to theN + (0) value. Figure 3 summarizes the RRMSE values for nine LMP scenarios presented in Tables 3 and 4. Table 3. RRMSE values for the KF, AKF, and PF approaches using different initial vehicle count estimates (i.e., 0, 5 and 10) for different LMPs.  1  23  32  36  20  23  24  19  21  17   3  22  26  33  20  25  24  20  24  19   5  21  25  31  20  23  26  19  24  20   8  21  27  33  22  26  26  21  26  23   10  20  24  30  20  24  27  19  26  22   15  19  23  30  19  24  26  19  23  18   20  19  23  30  19  23  30  19  23  16   30  19  19  21  19  19  21  19  19  26   40  19  18  20  19  18  20  19  18  44   50  18  17  33  18  17  47  18  17  33   60  15  16  32  15  16  32  15  16  32   70  12  17  30  12  17  30  12  17  30   80  9  17  30  9  17  30  9  17  30   90  7  17  29  7  17  44  7 17 29 This study also examined the choice of the number of particles, k, on the PF performance (i.e., k = 10, 100, 200, 1000, and 2000), as presented in Table 5. The findings show that the estimation accuracy increases as the number of particles increases, especially at low LMPs. However, increasing the number of particles is associated with additional computational time. The PF is implemented in MATLAB R2019a on a Dell PC with 8.0 GB RAM. The computation time ranges between 0.  Table 5 show that the use of 1000 and 2000 particles slightly reduces the RRMSE values compared to the use of 200 particles; however, this comes at a very high computational cost. Therefore, the use of 200 particles is recommended in the PF approach.

Summary and Conclusions
The paper developed a nonlinear PF estimation approach to estimate the number of vehicles approaching a traffic signal based solely on CV data, with the aim of improving the estimation accuracy of linear state-of-the-art estimation approaches. This study introduced two linear approaches, KF and AKF, as benchmarks, to be compared with the proposed nonlinear PF approach. The results show that the KF produces the least error and accurately estimates the vehicle counts compared with the AKF and PF approaches. Consequently, to address the research problem appropriately, it is recommended to deploy the linear KF approach rather than the more complex AKF and PF approaches because of its simplicity and high-performance accuracy. In addition, the study investigated the sensitivity of the developed approaches to different factors, including the LMP of CVs, the initial vehicle count estimates, and the number of particles used in the PF approach. The results indicate that the estimation errors decrease as the LMP increases. Furthermore, the paper investigated the effect of the choice of the number of particles on the performance of the PF and showed that the PF estimation accuracy increases as the number of particles increases. However, this comes at the expense of significantly longer computational times. This can significantly impact the performance of the PF, requiring longer time to converge. The results demonstrate that the KF approach is the least sensitive to the initial vehicle count estimate, while the PF approach is the most sensitive to the initial vehicle count estimate and thus is the most suitable for the proposed application. Proposed future work entails integrating the KF approach with an adaptive traffic signal controller to quantify the impact of inaccuracies of the traffic stream density on the traffic signal controller performance.