Abstract
The average speed (AS) of a road segment is an important factor for predicting traffic congestion, because the accuracy of AS can directly affect the implementation of traffic management. The traffic environment, spatiotemporal information, and the dynamic interaction between these two factors impact the predictive accuracy of AS in the existing literature, and floating car data comprehensively reflect the operation of urban road vehicles. In this paper, we proposed a novel road segment AS predictive model, which is based on floating car data. First, the impact of historical AS, weather, and date attributes on AS prediction has been analyzed. Then, through spatiotemporal correlations calculation based on the data from Global Positioning System (GPS), the predictive method utilizes the recursive least squares method to fuse the historical AS with other factors (such as weather, date attributes, etc.) and adopts an extended Kalman filter algorithm to accurately predict the AS of the target segment. Finally, we applied our approach on the traffic congestion prediction on four road segments in Chengdu, China. The results showed that the proposed predictive model is highly feasible and accurate.
1. Introduction
The prediction of the average speed (AS) of road segments plays an important role in an intelligent transportation system (ITS). Its accuracy and timeliness have a great impact on the implementation of dynamic traffic management, such as traffic congestion estimation [1] and signal control [2]. The data collection of floating cars has the advantages of high flexibility, strong real-time performance, wide coverage, and high data precision, when compared to that of fixed detectors [3].
Existing researches usually relied on traffic parameters of fixed detectors to predict the AS in a road segment. The low accuracy is the main barrier for its wide application. Cetin and Comert [4] utilized the coil dataset published by California Path and then proposed the expectation maximization and Cumulative Sum (CUSUM) algorithms to predict the average traffic speed. Chandra and Al-Deek [5] mined the interaction between the upstream and downstream segments using dual-loop detector data and predicted the AS of road segments designed by a vector self-decreasing time series model. Jing et al. [6] assessed the multistep speed predictive performance of eight different models using 2-min road segment speed data collected from remote traffic microwave sensors. All above approaches failed to consider the traffic state of adjacent intersections, so it is difficult to accurately demonstrate the traffic state of urban roads via data acquisition from fixed detectors.
Recently, the booming of mobile Internet has inspired new ideas for traffic congestive detection. The mobile detective data from the vehicle have a wide coverage and continuous space, and the huge daily traffic data make the prediction of the traffic state more accurate and reliable. Emerging technologies based on Global Positioning System (GPS) enable us to track vehicle trajectories and collect real-time traffic data across entire road networks [7], and have been introduced to predict the AS of road segments. Queen and Albers [8] proposed a dynamic Bayesian model to identify lagged causal relationships between time series, and predict traffic speed at multiple road link locations. Pei et al. [9] collected GPS probed data of road segments, and developed a predictive model of AS using a full Bayesian method. Combining the acceleration of the target segment and the speed of the adjacent segment, Ye et al. [10] used an improved Neural Network (NN) to improve the prediction performance of AS. Based on a study of prediction bias correlation among adjacent road segments and weather factors, Yang et al. [11] employed an artificial NN and adjustment approach to predict the AS of a road segment. Yao et al. [12] developed a Support Vector Machine (SVM) model consisting of spatiotemporal parameters. It is commonly used for short-term prediction under the experimental condition that the runtime speed should be below 35 KM/h. Satrinia and Saptawati [13] combined map-matching with topological information to predict traffic speed via Support Vector Regression (SVR). Zhao et al. [14] adopted a deep learning model to predict the traffic speed during non-recurrent congestion periods. These approaches perform well only if the GPS data sampling is sufficient. On the other hand, the predictive accuracy of these approaches based on NNs, SVM, and SVR usually depend on the training quality of the traffic dataset.
Apart from the aforementioned road traffic predictions, Kalman filter (KF) does not depend on the training quality of the traffic dataset, and is one of the most widely used traffic prediction methods, which was first introduced in traffic forecasting by Okutani and Stephanedes [15]. KF addressed the problem of filtering the recursion of discrete linear data, which is applied to the fields of traffic variable prediction and travel time estimation [16,17]. However, due to its linear model, it is not appropriate for nonlinear and random traffic variables. To overcome this issue, an extended Kalman filter (EKF) that is suitable for nonlinear traffic prediction is implemented with the KF algorithm, which linearizes the nonlinear state space model. Liu et al. [18] proposed a state-space model and a progressive EKF method. It fuses heterogeneous data and tracks the variation in traffic dynamics. Yuan et al. [19,20] later used the EKF to predict the traffic states, in which the discretized Lagrangian model was used as the process equation. Based on the EKF, Dong et al. [21] developed a spatiotemporal model to predict traffic flow. Huang et al. [22] designed an advanced EKF algorithm to improve the accuracy of vehicle speed prediction by combining the adaptive forgetting factor and the EKF algorithm. Although EKF has been widely adopted to speed prediction, it failed to enable high accuracy and parameter estimation, as well as random factors.
Recursive least squares (RLS) is used to correct the previous results by using new observational data. RLS usually performs real-time traffic state estimation toward the system parameters [23]. Comert et al. [24] adopted a RLS filtering and proposed a model for predicting traffic speed with the considerations to impact factors such as weather, accidents, and driving characteristics. A weighted RLS estimator was used to optimize these parameters of the linear functions. Tang et al. [25] established the Takagi–Sugeno-type fuzzy rules to forecast travel speed. Aiming to optimize wireless network performance, Kulkarni et al. [26] proposed a simple traffic mechanism to predict traffic load by using RLS. However, RLS performs poor recognition accuracy if noise exists.
Hybrid models incorporate the advantages of single approaches to improve traffic prediction accuracy [27,28]. However, the road segment data used [27] is not sufficient, and random events should be taken into account for further accuracy improvement.
Recently, existing researches based on motion detectors have had these problems. On one hand, the accuracy of AS prediction would be affected when the road segment data is not sufficient or a random event occurs. On the other hand, the predictive accuracy of Machine Learning methods such as NNs, SVM, and SVR usually depend on the training quality of the dataset.
In this study, we proposed a novel road segment AS prediction model based on floating car GPS data (FCG-ASpredictor), which adopted a spatiotemporal correlation calculation method and a recursive least squares–extended Kalman filter (RLS-EKF) to solve current issues. Finally, we identified our approach on the AS prediction on four road segments in Chengdu and found that FCG-ASpredictor is feasible and highly accurate.
2. Data Association Analyses
Based on the GPS data of floating cars in Chengdu in November 2016, we adopted a K-means Clustering algorithm to calculate the frequency distribution. Based on the frequency-intensive areas of GPS data and historical data, we also used the Pearson correlation coefficient [29] to analyze the correlation of the AS. In addition, we analyzed the impact of other sudden factors (such as weather, date attributes, etc.) on the AS of road segments.
2.1. Historical Data Correlation Analyses
A traffic dataset that contains time-series data is chronologically consecutive. As a typical time-series set, the AS of a road segment is analyzed on an hourly basis, with which we can comprehensively study its internal relationship. The current time interval is closely related to the AS of the adjacent road segments. As shown in Figure 1, the AS of the north third section of the First Ring Road for the time span (1 November to 7 November 2016) is selected. In addition to an obvious sudden change in the AS during the traffic rush hours, the data correlation between these two adjacent timeslots is large, and the trend of change is coherent.
Figure 1.
Average speed (AS) time-series data for the first seven days of November (in h).
A correlation coefficient analysis is a statistical method that reflects the close relationship between variables [30] and can be used to reveal the degree of influence on traffic conditions during adjacent hours. The Pearson correlation coefficient is a measure of the strength of a linear relationship between two variables. In this study, the Pearson correlation coefficient was used to analyze the correlation between the AS of the road segment during these adjacent timeslots.
The AS of 500 road segments in the main area is divided to 24 timeslots during November. The average Pearson correlation coefficient is formulated as follows:
where is the Pearson correlation coefficient for adjacent timeslots, is the AS dataset of all of the same timeslots over 30 days, is the AS covariance of road segments in adjacent timeslots within 30 days, and is the variance.
As shown in Figure 2, the Pearson correlation coefficient between adjacent hours in the range of 24 timeslots is positive—that is, the current AS of the road segment has a correlation with the forward timeslot under normal conditions. According to the aforementioned analysis, the AS of the current timeslot is correlated with the forward timeslot, but the correlation is not large when traffic rush hours are encountered. Therefore, the AS prediction only considers the data value of the forward timeslot, leading to a low accuracy. The historical data of different timeslots are an important component of the AS prediction of the road segment. By comprehensively analyzing the influencing weights of different timeslots in the historical AS data, the accuracy of the AS prediction can be improved.
Figure 2.
Average Pearson correlation coefficient of AS between adjacent hours.
The historical influencing factor includes forward timeslot data and historical simultaneous timeslot data—that is, the AS of the previous six timeslots and the AS of the previous seven days of historical timeslots, respectively. The Pearson correlation coefficient of the 13 influencing factors in each road segment is calculated. Take the four road segments in Chengdu, China as an example. Their Pearson correlation coefficients are listed in Table 1, indicating that the AS of the road segment is closely related to the AS of the forward timeslots and the AS of the historical simultaneous timeslots. The AS of the four road segments for the previous 1–6 h and the first, sixth, and seventh historical simultaneous timeslots are positively correlated. Meanwhile, the AS correlation degree of the forward timeslot decreases with time, which indicates that the above nine influencing factors are considerations of the AS prediction of the four target segments. The degree of correlation varies with the road segment and timeslot.
Table 1.
Pearson correlation coefficient of historical influence factors.
2.2. Correlation Analyses of Other Factors
It can be seen from the foregoing correlation analysis that the traffic dataset changes in chronological order and has coherence, and the historical simultaneous timeslot data and the forward timeslot data have different degrees of influence on the current timeslot data. However, the daily traffic status does not completely obey the normal law of historical data. When affected by external dynamic factors such as weather, date attributes, and emergencies, the traffic status may cause special circumstances, making the traffic situation deviate from the long-term trend [31,32]. Emergencies have greater randomness and unpredictability, and the corresponding datasets are limited. Therefore, this work mainly analyzes the influence of other external factors such as weather and date attributes on the traffic road segment speed.
Rain and snow worsen the road conditions and gradually result in traffic congestion, as shown in Figure 3. Since the period of the sixth day to the ninth day was rainy in November, the 16 days of each hour of the road segment AS in November were selected (the four days of the sixth day to the ninth day had light rain, and the remaining 12 days were cloudy or sunny under the same week attribute). The abscissa of Figure 3 represents the AS of the road segment over 24 h (km/h), and the ordinate indicates the date in different types of weather. The deeper the red color, the slower the AS. Meanwhile, the darker the blue color, the faster the AS. It is clear that the AS during the light rainy days is basically slower than that of other days, so the influence of external factors such as the weather on traffic congestion cannot be ignored.
Figure 3.
AS condition under different weather conditions.
Figure 4 demonstrates that the AS of a road segment is different in the state presented on weekdays and weekends. The abscissa represents the AS of a road segment over 24 h each day (km/h), and the ordinate represents the date of two consecutive weeks (seventh day to 11th day and 14th day to 18th day are the weekdays; and the 12th day, 13th day, 19th day, and 20th day are the weekends) in the figure. This indicates that the phenomenon of morning and evening rush hours is brighter on weekdays, while it is weakened during the weekend. This is clearly related to people’s travel behavior: people need to go to work on weekdays, and they travel less on weekends.
Figure 4.
AS condition over two consecutive weeks.
3. Materials and Methods
The traffic flow system is a highly correlated system, and a change is random at a certain moment, which makes traffic status prediction difficult. RLS can realize the real-time estimation of system parameters and has a great influence on model identification accuracy under noisy conditions. An EKF can be applied to nonlinear system prediction, but it is susceptible to the accuracy of the state estimation.
In order to compensate for the defects of the respective methods and solve the issue of insufficient road segment data, the main idea of FCG-ASpredictor is shown in Figure 5. By establishing multiple regression equations, the historical AS obtained by the spatiotemporal correlation calculation method and the external factors (i.e., weather and date attribute) of the current timeslot are identified by the RLS. The measured values and observed values are adopted by the EKF to improve the predictive accuracy of the AS of the target road segment.
Figure 5.
Our novel road segment AS prediction model based on floating car Global Positioning System data (FCG-ASpredictor) based on recursive least squares–extended Kalman filter (RLS-EKF).
3.1. Study Area and Data Sources
Chengdu, as the capital of Sichuan Province in China, is an important central city in the western region. Its geographical coordinate range is 30◦05′–31◦26′ latitude and 102◦54′–104◦53′ longitude. It is consist of 20 districts, covering the total area of 14,335 km2, with a resident population of 16.33 million. This paper selected the central urban areas, the Wuhou, Jingjiang, Qingyang, Jinniu, and Chenghua districts, as the study areas.
Due to the high sampling frequency of floating cars data, we employ the dataset (i.e., order details) from the Chengdu branch of Didi Chuxing, The sampling frequency is 3 s. The data size is 462 GB, and each record includes: (1) driver ID; (2) order ID; (3) timestamp; (4) latitude; (5) longitude; and (6) vehicle status. The raw data format is shown in Table 2.
Table 2.
The raw data format.
3.2. The Computational Procedures of AS
The AS of the road segment usually refers to the AS of travel through the road segment. We employ the travel speed of the road segment by using the accumulated integral of the instantaneous speed, and obtain the AS of the road segment.
According to the position and timestamp of the adjacent position belonging to the same order ID, the distance between adjacent positions can be calculated by using the spherical distance formula. The time interval can be calculated by the timestamp of the adjacent positions. The instantaneous speed of each position is calculated as follows:
where v is the instantaneous speed, r is the earth radius, x1 and x2 are the latitudes of the adjacent positions, y1 and y2 are the longitudes of the adjacent positions, and T1 and T2 are the time stamps of the adjacent positions.
The travel distance of a positioning car based on the accumulated integral is calculated as follows:
where is the travel distance, is the GPS positioning time, and is the instantaneous speed.
Since the sampling frequency is fixed, Formula (3) is modified as follows:
where is the fixed time interval.
According to the travel distance and time interval, the travel speed is calculated as follows:
where is the travel speed.
Owing to the uneven distribution of the floating car in the urban road network, the speed measurement accuracy is degraded, and the AS prediction of the road segment is considered from the distribution of the floating car. In order to ensure accurate calculation of the AS of the road segment, the number of travel speed samples n at a certain time should not be less than the minimum number of samples nmin. If the number of travel speed samples n is insufficient, then the historical AS and AS of the upstream and downstream segments during the simultaneous timeslot need to be integrated.
In addition, if the cumulative number m of continuous travel speed samples is greater than the maximum value mmax, this indicates that the number of travel speed samples in the previous mmax timeslots is continuously less than the minimum number of samples nmin, and the AS of the upstream and downstream segments in the simultaneous timeslot is insufficient to reflect the current traffic status. Then, it is necessary to integrate the historical AS of road segments. The spatiotemporal correlation calculation process of AS is shown in Figure 6.
Figure 6.
The process of AS calculation.
The formula for calculating the AS of the road segment is as follows:
where is the AS of the road segment during timeslot t, n is the number of travel speed samples, and is the ith travel speed at timeslot t.
If the travel speed sample number n of the road segment at timeslot t is smaller than the minimum sample number nmin, then the historical AS and the simultaneous AS of the upstream and downstream segments are integrated as follows:
where is the estimated historical AS of the road segment, and is the estimated AS of the upstream and downstream segment during the current timeslot. The control parameters nmin and mmax are derived from the example calibration.
and are calculated by weighting the corresponding correlation speeds. The weighting formula is as follows:
where and are the AS of the historical simultaneous timeslot and the AS of the forward timeslot, respectively; and are the AS of the upstream and downstream segments during the current timeslot, respectively; and and are weight coefficients that are adjusted according to the measurement of actual data.
3.3. Establishment of Multiple Regression Equations
According to the impact of historical AS, weather, and date attributes on AS prediction, the degree of influence between the AS of the target road segment and the historical AS is calculated by the Pearson correlation coefficient.
The AS in the historical simultaneous timeslots of the previous days, the AS of the previous timeslots during a day, the weather value of the current timeslot, and the date attribute value of the current timeslot are selected. The following multiple regression equation for predicting the AS value is established:
where is the predicted AS during timeslot t of the kth day, are the AS in the historical simultaneous timeslot t of the previous days, and are the AS in the previous timeslots of the kth day. and are the weather-quantized value and the date-attribute-quantized value, respectively, in timeslot t of the kth day; these need to be quantified according to the standard. ,, and are the influence weights of each system variable on the predicted value.
3.4. System Identification of RLS Method
The system parameters are identified and updated according to Formula (10). The transformed recursive equation is as follows:
where is the AS value of the road segment, is the identified parameter vector, and is the error caused by observation noise. and are recorded as vectors as follows:
Combining Formulas (11), (12), and (13), the system parameter identification gain and the error covariance matrix are updated. The least-squares equation is expressed as follows:
where is the parameter identification gain for timeslot t, is the error covariance matrix of different timeslots, and is the identity matrix of the identification parameter.
According to Formulas (6), (7), and (11) to (15), the recursive formula for system parameter identification during timeslot t is expressed as follows:
where is the least-squares estimate of the system parameters for different timeslots, and is the correction term of the identified parameter estimation for timeslot t−1.
3.5. Implementation of EKF
It can be seen from Formula (10) that the AS prediction model includes nonlinear external factors such as the weather and date attributes. This study uses an EKF algorithm to improve the AS prediction accuracy of the target segment. For the sake of simplicity, Formula (10) is modified as follows:
where is the number of timeslots in a day (assuming the length of the timeslot and the number of timeslots remain constant), is the AS prediction of the road segment, are the AS in the historical simultaneous timeslots of the previous days, and are the AS of the previous timeslots during a day.
According to Formula (17), the standard form of the state equation and the observation equation are expressed as follows:
where and are state and observation vector values, respectively; is the system process noise; is the observation noise; and and are nonlinear mapping functions of the state equations and observation equations, respectively. , , and are expressed as follows:
where is the estimated value of , and and are the system state matrix and the observation matrix, respectively.
According to Formulas (17) to (21), and are derived as follows:
The three components of state vector in Formula (19) are , , and . They are partial derivatives. and are converted to a Jacobian matrix:
The corresponding parameters , , and in Formula (24) are calculated as follows:
Since the specific values of parameters , , and corresponding to Formula (25) are calculated by Formula (16), then and are known values. Combining with the KF, the time update of Formula (17) is expressed as follows:
where is the prior estimate of the state vector at timeslot t, is the covariance of the state vector estimation error, and is the covariance matrix of the process noise.
According to Formulas (18), (23), (26), and (27), the observation update of Formula (17) is expressed as follows:
where is the Kalman gain at timeslot t, is the posterior estimate of the state vector at timeslot t, and is the covariance matrix of the observation noise.
4. Results
Since traffic control and guidance require real-time prediction, the length of the traffic prediction horizon is short, usually no more than 1 h. In this study, the prediction horizons are set to 15 min, 30 min, and 1 h, respectively. All experiments are compiled and tested based on Python 3.7 and TensorFlow 1.13.1.
4.1. Data Preprocessing
In order to make the selected segments more objectively reflect the advantages of our approach, four road segment speed datasets were adopted under the different road types. The road segment information is shown in Table 3. According to the characteristics of the Chengdu urban network, the segments 01_521, 03_6479, 04_6276, and 06_28250 belong to the main urban road, general road, ring road, and outer ring road, respectively. The time span of these datasets is from June to November 2016. In order to be consistent with the comparison algorithm long short term memory–recurrent neural network (LSTM-RNN) [27] and autoregressive integrated moving average model- Kalman filter (ARIMA-KF) [28], we select the AS data from 1 June 2016 to 31 October 2016 as the training sample and the AS data from 1 November to 30 November 2016 as the forecast sample.
Table 3.
Road segment information.
To solve the issue of data drift, we use map-matching technology to obtain a standard dataset. In terms of the driving characteristics of the floating car, the DiDi cars do not represent the normal traffic state of the road segment under the vehicle status of empty and parking. Thus, we remove the records of the vehicle status of empty and parking.
The datasets include the AS of the road segments [derived from Formulas (2) to (9)], and the quantified values of external factors such as the weather and date attributes. According to the degree of external factors affecting traffic flow [33], the weather and date attributes are quantified as shown in Table 4.
Table 4.
Quantized values of weather and date attributes.
4.2. Result Analysis
Corresponding to the data in Table 4, multivariate regression equations are established by taking the correlation factor values and the AS of four target road segments, respectively. Identified by the RLS method and considering nonlinear external factors such as weather and date attributes, the EKF algorithm is used to predict the AS for the current timeslot. Root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used as the evaluation metrics. Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15 show that the AS predictions based on the RLS-EKF are superior to those according to other two algorithms. All evaluation values of AS predicted on three different prediction horizons based on the RLS-EKF are lower than those based on the other two algorithms, which means the accuracy and stability of AS predicted on three different prediction horizons based on the RLS-EKF are superior to those based on the other two algorithms for the four road segments.
Figure 7.
Root mean square error (RMSE) of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 15-min intervals.
Figure 8.
Mean absolute error (MAE) of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 15-min intervals.
Figure 9.
Mean absolute percentage error (MAPE) of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 15-min intervals.
Figure 10.
RMSE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 30-min intervals.
Figure 11.
MAE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 30-min intervals.

Figure 12.
MAPE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at 30-min intervals.
Figure 13.
RMSE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at one-hour intervals.
Figure 14.
MAE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at one-hour intervals.
Figure 15.
MAPE of 01_521 (a), 03_6479 (b), 04_6276 (c), and 06_28250 (d) at one-hour intervals.
5. Discussion
5.1. Evaluation
The comparative analysis of proposed algorithms and existing algorithms are performed by three commonly used metrics in traffic prediction, including (1) RMSE, (2) MAE, and (3) MAPE. The three evaluation metrics are defined as follows:
where , are the predicted value and estimated value at timeslot , respectively, and n is the timeslot number.
The estimated value is a relative value that is obtained from the historical AS. To predict the AS of the current timeslot t, the AS in the historical simultaneous timeslots of the previous nt days, the AS of the previous np timeslots, the weather value of the current timeslot, and the date attribute value of the current timeslot are selected. When entering the next timeslot, the AS of the timeslot t is calculated as the estimated value according to Formulas (6) to (9).
From Table 5, RLS-EKF achieves the better performance with all three metrics for all prediction horizons, and the advantage becomes more evident in the four road segments.
Table 5.
Performance comparison of different approaches for AS prediction.
All metrics of RLS-EKF are lower than those based on the other two algorithms (LSTM-RNN and ARIMA-KF). LSTM-RNN and ARIMA-KF are the latest traffic prediction approaches, and the difference between all three evaluation metrics of RLS-EKF and those based on other two algorithms are larger, which means that the experiments based on RLS-EKF have achieved good results.
From the perspective of three prediction horizons, all three metrics increase as the prediction horizon increases. The difference between all three evaluation metrics of one-h intervals and those of 30-min intervals are larger than the difference between those of 30-min intervals and 15-min intervals, which means that long-term traffic forecasting needs to consider more alternative influencing factors for optimization.
From the perspective of four road segments, the errors of road segment 01_521 and 04_6276 are larger than those of other two road segments.
5.2. Feasibility
The multiple regression equations in Section 3.3 contain four factors: (1) the AS of the historical simultaneous timeslot (AS-hst); (2) the AS of the forward timeslot (AS-ft); (3) the weather condition of the current timeslot (WC-ct); and (4) the date attribute of the current timeslot (DA-ct). In order to demonstrate the influence of four factors on the AS prediction, according to the equations in Section 3.3, we select five influencing cases, which respectively leave out the AS of the historical simultaneous timeslot (Miss-AS-hst), the AS of the forward timeslot (Miss-AS-ft), the weather condition of the current timeslot (Miss-WC-ct), the date attribute of the current timeslot (Miss-DA-ct), and lastly, do not have any missing factor.
The RMSE of five different influencing cases based on RLS-EKF is illustrated in Table 6. We further analyzed the feature contributions of five different influencing cases toward four road segments for three predicted dimensions. All the RMSEs of the no missing factor case are lower than those of four missing factor cases such as Miss-AS-hst, Miss-AS-ft, Miss-WC-ct, and Miss-DA-ct in the same road segment and predicted horizon, which means that the four factors of the equations in Section 3.3 are contributed to improve the predicted accuracy. In addition, Miss-AS-hst is the highest, Miss-AS-ft is the second highest, Miss-WC-ct is the third highest, and Miss-DA-ct is the lowest in the RMSE comparison of four missing factor cases. That means that AS-hst is the largest, AS-ft is the second largest, WC-ct is the third largest, and DA-ct is the smallest regarding the feature contributions of the predicted accuracy.
Table 6.
RMSE of different influencing cases based on RLS-EKF.
To improve the predicted accuracy of the FCG-ASpredictor, it is reasonable and feasible to select AS-hst, AS-ft, WC-ct and DA-ct as import impact factors of the equations in Section 3.3.
6. Conclusions
In this paper, we propose an integrated analysis model of predicting road segment AS: FCG-ASpredictor. It incorporates the spatiotemporal correlation calculation and RLS-EKF to address two issues: (1) low accuracy due to insufficient data and (2) poor training quality. By using traffic data in Chengdu, China to verify the proposed model, the analysis result is feasible. The main contributions of this paper are as follows: (1) new design to obtain an accurate AS of the road segment: we use the number of travel speed samples and the cumulative number of segments with less continuous travel speed samples as the benchmark metrics, and build a spatiotemporal correlations calculation method with regard to GPS data; (2) new approach based on RLS-EKF, which utilizes the RLS to fuse the historical AS with other factors (such as weather and date attributes) and apply EKF to predict the AS in the target segment. The experimental result shows that the RLS-EKF performs well and achieves high accuracy.
The FCG-ASpredictor combines various impact factors such as AS-hst, AS-ft, WC-ct, DA-ct, etc., and achieves good results for the AS prediction of road segments. However, there still exists limitations while applying the model for the speed prediction of long-term traffic; thus, we will work toward improving the model adaptation on spatiotemporal correlations in the future.
Author Contributions
G.S. principally conceived of the idea for the study and provided the financial support. D.Z. was responsible for the design of the study, completing the experiments and writing the manuscript. D.L. was responsible for the analysis and discussion of the experimental results. J.C. was responsible for review and editing. Y.Z. was responsible for the experimental validation.
Funding
This work was supported by the Zhejiang Public Welfare Technology Research Program under Grant LGG19F030012, the National Natural Science Foundation of China under Grant No. 61603339.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Kong, X.; Xu, Z.; Shen, G.; Wang, J.; Yang, Q.; Zhang, B. Urban traffic congestion estimation and prediction based on floating car trajectory data. Future Gener. Comput. Syst. 2015, 61, 97–107. [Google Scholar] [CrossRef]
- Mannion, P.; Duggan, J.; Howley, E. Parallel reinforcement learning for traffic signal control. Procedia Comput. Sci. 2015, 52, 956–961. [Google Scholar] [CrossRef]
- Kong, X.; Xia, F.; Ning, Z.; Rahim, A.; Cai, Y.; Gao, Z.; Ma, J. Mobility dataset generation for vehicular social networks based on floating car data. IEEE Trans. Veh. Technol. 2018, 67, 3874–3886. [Google Scholar] [CrossRef]
- Cetin, M.; Comert, G. Short-Term Traffic Flow Prediction with Regime-Switching Models; Transportation Research Board: Washington, DC, USA, 2006; pp. 23–31. [Google Scholar]
- Chandra, S.R.; Al-Deek, H. Predictions of freeway traffic speeds and volumes using vector autoregressive models. J. Intell. Transp. Syst. 2009, 13, 53–72. [Google Scholar] [CrossRef]
- Jing, H.; Zou, Y.; Zhang, S.; Tang, J.; Wang, Y. Short-term speed prediction using remote microwave sensor data: Machine learning versus statistical model. Math. Probl. Eng. 2016, 2016, 9236156. [Google Scholar] [CrossRef]
- Chen, D.; Yan, X.; Liu, F.; Liu, X.; Wang, L.; Zhang, J. Evaluating and diagnosing road intersection operation Performance using floating car data. Sensors 2019, 19, 2256. [Google Scholar] [CrossRef]
- Queen, C.M.; Albers, C.J. Intervention and causality: Forecasting traffic flows using a dynamic bayesian network. J. Am. Stat. Assoc. 2009, 104, 669–681. [Google Scholar] [CrossRef]
- Pei, X.; Wong, S.C.; Li, Y.C.; Sze, N.N. Full bayesian method for the development of speed models: Applications of GPS probe data. J. Transp. Eng. 2012, 138, 1188–1195. [Google Scholar] [CrossRef]
- Ye, Q.; Szeto, W.Y.; Wong, S.C. Short-term traffic speed forecasting based on data recorded at irregular intervals. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1727–1737. [Google Scholar] [CrossRef]
- Yang, J.; Chou, L.; Tung, C.; Huang, S.; Wang, T. Average-speed forecast and adjustment via VANETs. IEEE Trans. Veh. Technol. 2013, 62, 4318–4327. [Google Scholar] [CrossRef]
- Yao, B.; Chen, C.; Cao, Q.; Jin, L.; Zhang, M.; Zhu, H.; Yu, B. Short-term traffic speed prediction for an urban corridor. Comput.-Aided Civ. Infrastruct. Eng. 2016, 32, 154–169. [Google Scholar] [CrossRef]
- Satrinia, D.; Saptawati, G.A.P. Traffic speed prediction from GPS data of taxi trip using support vector regression. In Proceedings of the IEEE 2017 International Conference on Data and Software Engineering (ICoDSE), Palembang, Indonesia, 1–2 November 2017. [Google Scholar]
- Zhao, J.; Gao, Y.; Bai, Z.; Lu, S.; Wang, H. Traffic speed prediction under non-recurrent congestion: Based on LSTM method and BeiDou navigation satellite system data. IEEE Intell. Transp. Syst. Mag. 2019, 11, 70–81. [Google Scholar] [CrossRef]
- Okutani, I.; Stephanedes, Y.J. Dynamic prediction of traffic volume through Kalman filtering theory. Transp. Res. Part B Methodol. 1984, 18, 1–11. [Google Scholar] [CrossRef]
- Barimani, N.; Moshiri, B.; Teshnehlab, M. State space modeling and short-term traffic speed prediction using Kalman filter based on ANFIS. IACSIT Int. J. Eng. Technol. 2012, 4, 116–120. [Google Scholar] [CrossRef]
- Mir, Z.H.; Filali, F. An adaptive Kalman filter based traffic prediction algorithm for urban road network. In Proceedings of the IEEE 12th International Conference Innovation Information Technology (IIT), Al-Ain, UAE, 28–30 November 2016. [Google Scholar]
- Liu, Y.; He, S.; Ran, B.; Cheng, Y. A progressive extended Kalman filter method for freeway traffic state estimation integrating multisource data. Wirel. Commun. Mob. Comput. 2018, 2018, 6745726. [Google Scholar] [CrossRef]
- Yuan, Y.; Van Lint, H.; Van Wageningen-Kessels, F.; Hoogendoorn, S. Network-wide traffic state estimation using loop detector and floating car data. J. Intell. Transp. Syst. 2014, 18, 41–50. [Google Scholar] [CrossRef]
- Yuan, Y.; Van Lint, J.W.C.; Wilson, R.E.; Van Wageningen-Kessels, F.; Hoogendoorn, S.P. Real-time lagrangian traffic state estimator for freeways. IEEE Trans. Intell. Transp. Syst. 2012, 13, 59–70. [Google Scholar] [CrossRef]
- Dong, C.; Xiong, Z.; Shao, C.; Zhang, H. A spatial–temporal-based state space approach for freeway network traffic flow modelling and prediction. Transp. A Transp. Sci. 2015, 11, 547–560. [Google Scholar] [CrossRef]
- Huang, Y.; Qian, L.; Feng, A.; Wu, Y.; Zhu, W. Rfid data-driven vehicle speed prediction via adaptive extended kalman filter. Sensors 2018, 18, 2787. [Google Scholar] [CrossRef]
- Kolansky, J.; Sandu, C. Enhanced polynomial chaos-based extended Kalman filter technique for parameter estimation. J. Comput. Nonlinear Dyn. 2018, 13, 021012. [Google Scholar] [CrossRef]
- Comert, G.; Bezuglov, A.; Cetin, M. Adaptive traffic parameter prediction: Effect of number of states and transferability of models. Trans. Res. Part C Emerg. Technol. 2016, 72, 202–224. [Google Scholar] [CrossRef]
- Tang, J.; Liu, F.; Zou, Y.; Zhang, W.; Wang, Y. An improved fuzzy neural network for traffic speed prediction considering periodic characteristic. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2340–2350. [Google Scholar] [CrossRef]
- Kulkarni, P.; Lewis, T.; Fan, Z. Simple traffic prediction mechanism and its applications in wireless networks. Wirel. Pers. Commun. 2011, 59, 261–274. [Google Scholar] [CrossRef]
- Wang, X.; Xu, L.; Chen, K. Data-driven short-term forecasting for urban road network traffic based on data processing and LSTM-RNN. Arab. J. Sci. Eng. 2019, 44, 3043–3060. [Google Scholar]
- Xu, D.; Wang, Y.; Jia, L.; Qin, Y.; Dong, H. Real-time road traffic state prediction based on ARIMA and Kalman filter. Front. Inf. Technol. Electron. Eng. 2017, 18, 287–302. [Google Scholar] [CrossRef]
- Szczepanska, A.; Senetra, A.; Wasilewicz-Pszczolkowska, M. The effect of road traffic noise on the prices of residential property-A case study of the polish city of Olsztyn. Transp. Res. Part D Transp. Environ. 2015, 36, 167–177. [Google Scholar] [CrossRef]
- Wei, G.; Wang, H.; Lin, R. Application of correlation coefficient to interval-valued intuitionistic fuzzy multiple attribute decision-making with incomplete weight information. Knowl. Inf. Syst. 2011, 26, 337–349. [Google Scholar] [CrossRef]
- Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, 4–9 February 2017; pp. 1655–1661. [Google Scholar]
- Xia, F.; Wang, J.; Kong, X.; Wang, Z.; Li, J.; Liu, C. Exploring human mobility patterns in urban scenarios: A trajectory data perspective. IEEE Commun. Mag. 2018, 56, 142–149. [Google Scholar] [CrossRef]
- Nahar, L.; Sultana, Z. A new travel time prediction method for intelligent transportation system. IOSR J. Comput. Eng. 2014, 16, 24–30. [Google Scholar] [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).