FCG-ASpredictor: An Approach for the Prediction of Average Speed of Road Segments with Floating Car GPS Data

The average speed (AS) of a road segment is an important factor for predicting traffic congestion, because the accuracy of AS can directly affect the implementation of traffic management. The traffic environment, spatiotemporal information, and the dynamic interaction between these two factors impact the predictive accuracy of AS in the existing literature, and floating car data comprehensively reflect the operation of urban road vehicles. In this paper, we proposed a novel road segment AS predictive model, which is based on floating car data. First, the impact of historical AS, weather, and date attributes on AS prediction has been analyzed. Then, through spatiotemporal correlations calculation based on the data from Global Positioning System (GPS), the predictive method utilizes the recursive least squares method to fuse the historical AS with other factors (such as weather, date attributes, etc.) and adopts an extended Kalman filter algorithm to accurately predict the AS of the target segment. Finally, we applied our approach on the traffic congestion prediction on four road segments in Chengdu, China. The results showed that the proposed predictive model is highly feasible and accurate.


Introduction
The prediction of the average speed (AS) of road segments plays an important role in an intelligent transportation system (ITS). Its accuracy and timeliness have a great impact on the implementation of dynamic traffic management, such as traffic congestion estimation [1] and signal control [2]. The data collection of floating cars has the advantages of high flexibility, strong real-time performance, wide coverage, and high data precision, when compared to that of fixed detectors [3].
Existing researches usually relied on traffic parameters of fixed detectors to predict the AS in a road segment. The low accuracy is the main barrier for its wide application. Cetin and Comert [4] utilized the coil dataset published by California Path and then proposed the expectation maximization and Cumulative Sum (CUSUM) algorithms to predict the average traffic speed. Chandra and Al-Deek [5] mined the interaction between the upstream and downstream segments using dual-loop detector data and predicted the AS of road segments designed by a vector self-decreasing time series model. Jing et al. [6] assessed the multistep speed predictive performance of eight different models using 2-min road segment speed data collected from remote traffic microwave sensors. All above approaches failed to consider the traffic state of adjacent intersections, so it is difficult to accurately demonstrate the traffic state of urban roads via data acquisition from fixed detectors.
In this study, we proposed a novel road segment AS prediction model based on floating car GPS data (FCG-ASpredictor), which adopted a spatiotemporal correlation calculation method and a recursive least squares-extended Kalman filter (RLS-EKF) to solve current issues. Finally, we identified our approach on the AS prediction on four road segments in Chengdu and found that FCG-ASpredictor is feasible and highly accurate.
The rest of the paper is organized into five sections. Section 2 analyzes the data association. Section 3 describes the materials and methods. Section 4 illustrates the experimental results. Section 5 discusses the evaluation and feasibility. Finally, conclusions are drawn in Section 6.

Data Association Analyses
Based on the GPS data of floating cars in Chengdu in November 2016, we adopted a K-means Clustering algorithm to calculate the frequency distribution. Based on the frequency-intensive areas of GPS data and historical data, we also used the Pearson correlation coefficient [29] to analyze the correlation of the AS. In addition, we analyzed the impact of other sudden factors (such as weather, date attributes, etc.) on the AS of road segments.

Historical Data Correlation Analyses
A traffic dataset that contains time-series data is chronologically consecutive. As a typical time-series set, the AS of a road segment is analyzed on an hourly basis, with which we can comprehensively study its internal relationship. The current time interval is closely related to the AS of the adjacent road segments. As shown in Figure 1, the AS of the north third section of the First Ring Road for the time span (1 November to 7 November 2016) is selected. In addition to an obvious sudden change in the AS during the traffic rush hours, the data correlation between these two adjacent timeslots is large, and the trend of change is coherent.
A correlation coefficient analysis is a statistical method that reflects the close relationship between variables [30] and can be used to reveal the degree of influence on traffic conditions during adjacent hours. The Pearson correlation coefficient is a measure of the strength of a linear relationship between two variables. In this study, the Pearson correlation coefficient was used to analyze the correlation between the AS of the road segment during these adjacent timeslots. Recently, existing researches based on motion detectors have had these problems. On one hand, the accuracy of AS prediction would be affected when the road segment data is not sufficient or a random event occurs. On the other hand, the predictive accuracy of Machine Learning methods such as NNs, SVM, and SVR usually depend on the training quality of the dataset.
In this study, we proposed a novel road segment AS prediction model based on floating car GPS data (FCG-ASpredictor), which adopted a spatiotemporal correlation calculation method and a recursive least squares-extended Kalman filter (RLS-EKF) to solve current issues. Finally, we identified our approach on the AS prediction on four road segments in Chengdu and found that FCG-ASpredictor is feasible and highly accurate.
The rest of the paper is organized into five sections. Section 2 analyzes the data association. Section 3 describes the materials and methods. Section 4 illustrates the experimental results. Section 5 discusses the evaluation and feasibility. Finally, conclusions are drawn in Section 6.

Data Association Analyses
Based on the GPS data of floating cars in Chengdu in November 2016, we adopted a K-means Clustering algorithm to calculate the frequency distribution. Based on the frequency-intensive areas of GPS data and historical data, we also used the Pearson correlation coefficient [29] to analyze the correlation of the AS. In addition, we analyzed the impact of other sudden factors (such as weather, date attributes, etc.) on the AS of road segments.

Historical Data Correlation Analyses
A traffic dataset that contains time-series data is chronologically consecutive. As a typical time-series set, the AS of a road segment is analyzed on an hourly basis, with which we can comprehensively study its internal relationship. The current time interval is closely related to the AS of the adjacent road segments. As shown in Figure 1, the AS of the north third section of the First Ring Road for the time span (1 November to 7 November 2016) is selected. In addition to an obvious sudden change in the AS during the traffic rush hours, the data correlation between these two adjacent timeslots is large, and the trend of change is coherent.
A correlation coefficient analysis is a statistical method that reflects the close relationship between variables [30] and can be used to reveal the degree of influence on traffic conditions during adjacent hours. The Pearson correlation coefficient is a measure of the strength of a linear relationship between two variables. In this study, the Pearson correlation coefficient was used to analyze the correlation between the AS of the road segment during these adjacent timeslots. The AS of 500 road segments in the main area is divided to 24 timeslots during November. The average Pearson correlation coefficient is formulated as follows: The AS of 500 road segments in the main area is divided to 24 timeslots during November. The average Pearson correlation coefficient is formulated as follows: where Pe(t, t + 1) is the Pearson correlation coefficient for adjacent timeslots, X is the AS dataset of all of the same timeslots over 30 days, Cov(X t , X t+1 ) is the AS covariance of road segments in adjacent timeslots within 30 days, and Var(·) is the variance. As shown in Figure 2, the Pearson correlation coefficient between adjacent hours in the range of 24 timeslots is positive-that is, the current AS of the road segment has a correlation with the forward timeslot under normal conditions. According to the aforementioned analysis, the AS of the current timeslot is correlated with the forward timeslot, but the correlation is not large when traffic rush hours are encountered. Therefore, the AS prediction only considers the data value of the forward timeslot, leading to a low accuracy. The historical data of different timeslots are an important component of the AS prediction of the road segment. By comprehensively analyzing the influencing weights of different timeslots in the historical AS data, the accuracy of the AS prediction can be improved.
where ( , 1) Pe t t + is the Pearson correlation coefficient for adjacent timeslots, Χ is the AS dataset of all of the same timeslots over 30  As shown in Figure 2, the Pearson correlation coefficient between adjacent hours in the range of 24 timeslots is positive-that is, the current AS of the road segment has a correlation with the forward timeslot under normal conditions. According to the aforementioned analysis, the AS of the current timeslot is correlated with the forward timeslot, but the correlation is not large when traffic rush hours are encountered. Therefore, the AS prediction only considers the data value of the forward timeslot, leading to a low accuracy. The historical data of different timeslots are an important component of the AS prediction of the road segment. By comprehensively analyzing the influencing weights of different timeslots in the historical AS data, the accuracy of the AS prediction can be improved. The historical influencing factor includes forward timeslot data and historical simultaneous timeslot data-that is, the AS of the previous six timeslots and the AS of the previous seven days of historical timeslots, respectively. The Pearson correlation coefficient of the 13 influencing factors in each road segment is calculated. Take the four road segments in Chengdu, China as an example. Their Pearson correlation coefficients are listed in Table 1, indicating that the AS of the road segment is closely related to the AS of the forward timeslots and the AS of the historical simultaneous timeslots. The AS of the four road segments for the previous 1-6 h and the first, sixth, and seventh historical simultaneous timeslots are positively correlated. Meanwhile, the AS correlation degree of the forward timeslot decreases with time, which indicates that the above nine influencing factors are considerations of the AS prediction of the four target segments. The degree of correlation varies with the road segment and timeslot.

Correlation Analyses of Other Factors
It can be seen from the foregoing correlation analysis that the traffic dataset changes in chronological order and has coherence, and the historical simultaneous timeslot data and the forward timeslot data have different degrees of influence on the current timeslot data. However, the daily traffic status does not completely obey the normal law of historical data. When affected by external dynamic factors such as weather, date attributes, and emergencies, the traffic status may cause special circumstances, making the traffic situation deviate from the long-term trend [31,32]. Emergencies have greater randomness and unpredictability, and the corresponding datasets are The historical influencing factor includes forward timeslot data and historical simultaneous timeslot data-that is, the AS of the previous six timeslots and the AS of the previous seven days of historical timeslots, respectively. The Pearson correlation coefficient of the 13 influencing factors in each road segment is calculated. Take the four road segments in Chengdu, China as an example. Their Pearson correlation coefficients are listed in Table 1, indicating that the AS of the road segment is closely related to the AS of the forward timeslots and the AS of the historical simultaneous timeslots. The AS of the four road segments for the previous 1-6 h and the first, sixth, and seventh historical simultaneous timeslots are positively correlated. Meanwhile, the AS correlation degree of the forward timeslot decreases with time, which indicates that the above nine influencing factors are considerations of the AS prediction of the four target segments. The degree of correlation varies with the road segment and timeslot.

Correlation Analyses of Other Factors
It can be seen from the foregoing correlation analysis that the traffic dataset changes in chronological order and has coherence, and the historical simultaneous timeslot data and the forward timeslot data have different degrees of influence on the current timeslot data. However, the daily traffic status does not completely obey the normal law of historical data. When affected by external dynamic factors such as weather, date attributes, and emergencies, the traffic status may cause special circumstances, making the traffic situation deviate from the long-term trend [31,32]. Emergencies have greater randomness and unpredictability, and the corresponding datasets are limited. Therefore, this work mainly analyzes the influence of other external factors such as weather and date attributes on the traffic road segment speed. Rain and snow worsen the road conditions and gradually result in traffic congestion, as shown in Figure 3. Since the period of the sixth day to the ninth day was rainy in November, the 16 days of each hour of the road segment AS in November were selected (the four days of the sixth day to the ninth day had light rain, and the remaining 12 days were cloudy or sunny under the same week attribute). The abscissa of Figure 3 represents the AS of the road segment over 24 h (km/h), and the ordinate indicates the date in different types of weather. The deeper the red color, the slower the AS. Meanwhile, the darker the blue color, the faster the AS. It is clear that the AS during the light rainy days is basically slower than that of other days, so the influence of external factors such as the weather on traffic congestion cannot be ignored. Figure 4 demonstrates that the AS of a road segment is different in the state presented on weekdays and weekends. The abscissa represents the AS of a road segment over 24 h each day (km/h), and the ordinate represents the date of two consecutive weeks (seventh day to 11th day and 14th day to 18th day are the weekdays; and the 12th day, 13th day, 19th day, and 20th day are the weekends) in the figure. This indicates that the phenomenon of morning and evening rush hours is brighter on weekdays, while it is weakened during the weekend. This is clearly related to people's travel behavior: people need to go to work on weekdays, and they travel less on weekends.

Materials and Methods
The traffic flow system is a highly correlated system, and a change is random at a certain moment, which makes traffic status prediction difficult. RLS can realize the real-time estimation of system parameters and has a great influence on model identification accuracy under noisy conditions. An EKF can be applied to nonlinear system prediction, but it is susceptible to the accuracy of the state estimation.
In order to compensate for the defects of the respective methods and solve the issue of insufficient road segment data, the main idea of FCG-ASpredictor is shown in Figure 5. By establishing multiple regression equations, the historical AS obtained by the spatiotemporal correlation calculation method and the external factors (i.e., weather and date attribute) of the current timeslot are identified by the RLS. The measured values and observed values are adopted by the EKF to improve the predictive accuracy of the AS of the target road segment.

Materials and Methods
The traffic flow system is a highly correlated system, and a change is random at a certain moment, which makes traffic status prediction difficult. RLS can realize the real-time estimation of system parameters and has a great influence on model identification accuracy under noisy conditions. An EKF can be applied to nonlinear system prediction, but it is susceptible to the accuracy of the state estimation.

Study Area and Data Sources
Chengdu, as the capital of Sichuan Province in China, is an important central city in the western region. Its geographical coordinate range is 30 • 05′-31 • 26′ latitude and 102 • 54′-104 • 53′ longitude. It is consist of 20 districts, covering the total area of 14,335 km 2 , with a resident population of 16.33 In order to compensate for the defects of the respective methods and solve the issue of insufficient road segment data, the main idea of FCG-ASpredictor is shown in Figure 5. By establishing multiple regression equations, the historical AS obtained by the spatiotemporal correlation calculation method and the external factors (i.e., weather and date attribute) of the current timeslot are identified by the RLS. The measured values and observed values are adopted by the EKF to improve the predictive accuracy of the AS of the target road segment.

Study Area and Data Sources
Chengdu, as the capital of Sichuan Province in China, is an important central city in the western region. Its geographical coordinate range is 30 • 05 -31 • 26 latitude and 102 • 54 -104 • 53 longitude. It is consist of 20 districts, covering the total area of 14,335 km 2 , with a resident population of 16.33 million. This paper selected the central urban areas, the Wuhou, Jingjiang, Qingyang, Jinniu, and Chenghua districts, as the study areas.
Due to the high sampling frequency of floating cars data, we employ the dataset (i.e., order details) from the Chengdu branch of Didi Chuxing, The sampling frequency is 3 s. The data size is 462 GB, and each record includes: (1) driver ID; (2) order ID; (3) timestamp; (4) latitude; (5) longitude; and (6) vehicle status. The raw data format is shown in Table 2.

The Computational Procedures of AS
The AS of the road segment usually refers to the AS of travel through the road segment. We employ the travel speed of the road segment by using the accumulated integral of the instantaneous speed, and obtain the AS of the road segment.
According to the position and timestamp of the adjacent position belonging to the same order ID, the distance between adjacent positions can be calculated by using the spherical distance formula. The time interval can be calculated by the timestamp of the adjacent positions. The instantaneous speed of each position is calculated as follows: where v is the instantaneous speed, r is the earth radius, x 1 and x 2 are the latitudes of the adjacent positions, y 1 and y 2 are the longitudes of the adjacent positions, and T 1 and T 2 are the time stamps of the adjacent positions.
The travel distance of a positioning car based on the accumulated integral is calculated as follows: where d tra is the travel distance, t(·) is the GPS positioning time, and v(·) is the instantaneous speed.
Since the sampling frequency is fixed, Formula (3) is modified as follows: where t int is the fixed time interval. According to the travel distance and time interval, the travel speed is calculated as follows: where v tra is the travel speed.
Owing to the uneven distribution of the floating car in the urban road network, the speed measurement accuracy is degraded, and the AS prediction of the road segment is considered from the distribution of the floating car. In order to ensure accurate calculation of the AS of the road segment, the number of travel speed samples n at a certain time should not be less than the minimum number of samples n min . If the number of travel speed samples n is insufficient, then the historical AS and AS of the upstream and downstream segments during the simultaneous timeslot need to be integrated.
In addition, if the cumulative number m of continuous travel speed samples is greater than the maximum value m max , this indicates that the number of travel speed samples in the previous m max timeslots is continuously less than the minimum number of samples n min , and the AS of the upstream and downstream segments in the simultaneous timeslot is insufficient to reflect the current traffic status. Then, it is necessary to integrate the historical AS of road segments. The spatiotemporal correlation calculation process of AS is shown in Figure 6.  If the travel speed sample number n of the road segment at timeslot t is smaller than the minimum sample number nmin, then the historical AS and the simultaneous AS of the upstream and downstream segments are integrated as follows: Figure 6. The process of AS calculation.
The formula for calculating the AS of the road segment is as follows: where v(t) is the AS of the road segment during timeslot t, n is the number of travel speed samples, and v tra i (t) is the ith travel speed at timeslot t. If the travel speed sample number n of the road segment at timeslot t is smaller than the minimum sample number n min , then the historical AS and the simultaneous AS of the upstream and downstream segments are integrated as follows: where v h (t) is the estimated historical AS of the road segment, and v a (t) is the estimated AS of the upstream and downstream segment during the current timeslot. The control parameters n min and m max are derived from the example calibration. v h (t) and v a (t) are calculated by weighting the corresponding correlation speeds. The weighting formula is as follows: where v h (t) and v h (t − 1) are the AS of the historical simultaneous timeslot and the AS of the forward timeslot, respectively; v u (t) and v d (t) are the AS of the upstream and downstream segments during the current timeslot, respectively; and α and β are weight coefficients that are adjusted according to the measurement of actual data.

Establishment of Multiple Regression Equations
According to the impact of historical AS, weather, and date attributes on AS prediction, the degree of influence between the AS of the target road segment and the historical AS is calculated by the Pearson correlation coefficient.
The AS in the historical simultaneous timeslots of the previous n t days, the AS of the previous n p timeslots during a day, the weather value of the current timeslot, and the date attribute value of the current timeslot are selected. The following multiple regression equation for predicting the AS value is established: v k (t) = a 1 a 2 · · · a n t where v k (t) is the predicted AS during timeslot t of the kth day, v k−1 (t), . . . , v k−n t (t) are the AS in the historical simultaneous timeslot t of the previous n t days, and v k (t − 1), . . . , v k (t − n p ) are the AS in the previous n p timeslots of the kth day. x k,1 (t) and x k,2 (t) are the weather-quantized value and the date-attribute-quantized value, respectively, in timeslot t of the kth day; these need to be quantified according to the standard. a 1 . . . a n t ,b 1 . . . b n p ,c 1 and c 2 are the influence weights of each system variable on the predicted value.

System Identification of RLS Method
The system parameters are identified and updated according to Formula (10). The transformed recursive equation is as follows: v k (t) = ϕ k T (t)θ + e k (t) (11) where ϕ k T is the AS value of the road segment, θ is the identified parameter vector, and e k (t) is the error caused by observation noise. ϕ k T and θ are recorded as vectors as follows: Combining Formulas (11), (12), and (13), the system parameter identification gain and the error covariance matrix are updated. The least-squares equation is expressed as follows: where K k (t) is the parameter identification gain for timeslot t, P k (·) is the error covariance matrix of different timeslots, and I is the identity matrix of the identification parameter. According to Formulas (6), (7), and (11) to (15), the recursive formula for system parameter identification during timeslot t is expressed as follows: whereθ k (·) is the least-squares estimate of the system parameters for different timeslots, and v(t is the correction term of the identified parameter estimation for timeslot t−1.

Implementation of EKF
It can be seen from Formula (10) that the AS prediction model includes nonlinear external factors such as the weather and date attributes. This study uses an EKF algorithm to improve the AS prediction accuracy of the target segment. For the sake of simplicity, Formula (10) is modified as follows: x(t) = a 1 a 2 · · · a n t . . .
where ∆ is the number of timeslots in a day (assuming the length of the timeslot and the number of timeslots remain constant), x(t) is the AS prediction of the road segment, x(t − ∆), · · · , x(t − n t ∆) are the AS in the historical simultaneous timeslots of the previous n t days, and x(t − 1), · · · , x(t − n p ) are the AS of the previous n p timeslots during a day. According to Formula (17), the standard form of the state equation and the observation equation are expressed as follows: where x(t) and y(t) are state and observation vector values, respectively; w(t − 1) is the system process noise; m(t) is the observation noise; and f (X(t − 1)) and g(X(t)) are nonlinear mapping functions of the state equations and observation equations, respectively. X(t − 1), f (X(t − 1)), and g(X(t)) are expressed as follows: g(X(t)) ≈ g X(t) + B(t) X(t) − X(t) (21) where X is the estimated value of X, and A and B are the system state matrix and the observation matrix, respectively. According to Formulas (17) to (21), A and B are derived as follows: The three components of state vector X(t − 1) in Formula (19) are x(t − n), x k,1 (t), and x k,2 (t). They are partial derivatives. A and B are converted to a Jacobian matrix: The corresponding parameters A 1 , A 2 , and A 2 in Formula (24) are calculated as follows: Since the specific values of parameters A 1 , A 2 , and A 3 corresponding to Formula (25) are calculated by Formula (16), then A(t − 1) and B(t) are known values. Combining with the KF, the time update of Formula (17) is expressed as follows: where X(t) − is the prior estimate of the state vector at timeslot t, P is the covariance of the state vector estimation error, and Q is the covariance matrix of the process noise. According to Formulas (18), (23), (26), and (27), the observation update of Formula (17) is expressed as follows: where G(t) is the Kalman gain at timeslot t, X(t) + is the posterior estimate of the state vector at timeslot t, and R is the covariance matrix of the observation noise.

Results
Since traffic control and guidance require real-time prediction, the length of the traffic prediction horizon is short, usually no more than 1 h. In this study, the prediction horizons are set to 15 min, 30 min, and 1 h, respectively. All experiments are compiled and tested based on Python 3.7 and TensorFlow 1.13.1.

Data Preprocessing
In order to make the selected segments more objectively reflect the advantages of our approach, four road segment speed datasets were adopted under the different road types. The road segment information is shown in Table 3. According to the characteristics of the Chengdu urban network, the segments 01_521, 03_6479, 04_6276, and 06_28250 belong to the main urban road, general road, ring road, and outer ring road, respectively. The time span of these datasets is from June to November 2016. In order to be consistent with the comparison algorithm long short term memory-recurrent neural network (LSTM-RNN) [27] and autoregressive integrated moving average model-Kalman filter (ARIMA-KF) [28], we select the AS data from 1 June 2016 to 31 October 2016 as the training sample and the AS data from 1 November to 30 November 2016 as the forecast sample.
To solve the issue of data drift, we use map-matching technology to obtain a standard dataset. In terms of the driving characteristics of the floating car, the DiDi cars do not represent the normal traffic state of the road segment under the vehicle status of empty and parking. Thus, we remove the records of the vehicle status of empty and parking.
The datasets include the AS of the road segments [derived from Formulas (2) to (9)], and the quantified values of external factors such as the weather and date attributes. According to the degree of external factors affecting traffic flow [33], the weather and date attributes are quantified as shown in Table 4. Table 3. Road segment information.

01_521
Second Section of North Third Ring Road 03_6479 North Third Section of First Ring Road 04_6276 First Section of Hongxing Road 06_28250 Third Section of Jinxianqiao Road

Result Analysis
Corresponding to the data in Table 4, multivariate regression equations are established by taking the correlation factor values and the AS of four target road segments, respectively. Identified by the RLS method and considering nonlinear external factors such as weather and date attributes, the EKF algorithm is used to predict the AS for the current timeslot. Root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used as the evaluation metrics. Figures 7-15 show that the AS predictions based on the RLS-EKF are superior to those according to other two algorithms. All evaluation values of AS predicted on three different prediction horizons based on the RLS-EKF are lower than those based on the other two algorithms, which means the accuracy and stability of AS predicted on three different prediction horizons based on the RLS-EKF are superior to those based on the other two algorithms for the four road segments.

Evaluation
The comparative analysis of proposed algorithms and existing algorithms are performed by three commonly used metrics in traffic prediction, including (1) RMSE, (2) MAE, and (3) MAPE. The three evaluation metrics are defined as follows: wherev(t i ), v(t i ) are the predicted value and estimated value at timeslot t i , respectively, and n is the timeslot number. The estimated value is a relative value that is obtained from the historical AS. To predict the AS of the current timeslot t, the AS in the historical simultaneous timeslots of the previous n t days, the AS of the previous n p timeslots, the weather value of the current timeslot, and the date attribute value of the current timeslot are selected. When entering the next timeslot, the AS of the timeslot t is calculated as the estimated value according to Formulas (6) to (9).
From Table 5, RLS-EKF achieves the better performance with all three metrics for all prediction horizons, and the advantage becomes more evident in the four road segments. All metrics of RLS-EKF are lower than those based on the other two algorithms (LSTM-RNN and ARIMA-KF). LSTM-RNN and ARIMA-KF are the latest traffic prediction approaches, and the difference between all three evaluation metrics of RLS-EKF and those based on other two algorithms are larger, which means that the experiments based on RLS-EKF have achieved good results.
From the perspective of three prediction horizons, all three metrics increase as the prediction horizon increases. The difference between all three evaluation metrics of one-h intervals and those of 30-min intervals are larger than the difference between those of 30-min intervals and 15-min intervals, which means that long-term traffic forecasting needs to consider more alternative influencing factors for optimization.
From the perspective of four road segments, the errors of road segment 01_521 and 04_6276 are larger than those of other two road segments.

Feasibility
The multiple regression equations in Section 3.3 contain four factors: (1) the AS of the historical simultaneous timeslot (AS-hst); (2) the AS of the forward timeslot (AS-ft); (3) the weather condition of the current timeslot (WC-ct); and (4) the date attribute of the current timeslot (DA-ct). In order to demonstrate the influence of four factors on the AS prediction, according to the equations in Section 3.3, we select five influencing cases, which respectively leave out the AS of the historical simultaneous timeslot (Miss-AS-hst), the AS of the forward timeslot (Miss-AS-ft), the weather condition of the current timeslot (Miss-WC-ct), the date attribute of the current timeslot (Miss-DA-ct), and lastly, do not have any missing factor.
The RMSE of five different influencing cases based on RLS-EKF is illustrated in Table 6. We further analyzed the feature contributions of five different influencing cases toward four road segments for three predicted dimensions. All the RMSEs of the no missing factor case are lower than those of four missing factor cases such as Miss-AS-hst, Miss-AS-ft, Miss-WC-ct, and Miss-DA-ct in the same road segment and predicted horizon, which means that the four factors of the equations in Section 3.3 are contributed to improve the predicted accuracy. In addition, Miss-AS-hst is the highest, Miss-AS-ft is the second highest, Miss-WC-ct is the third highest, and Miss-DA-ct is the lowest in the RMSE comparison of four missing factor cases. That means that AS-hst is the largest, AS-ft is the second largest, WC-ct is the third largest, and DA-ct is the smallest regarding the feature contributions of the predicted accuracy.
To improve the predicted accuracy of the FCG-ASpredictor, it is reasonable and feasible to select AS-hst, AS-ft, WC-ct and DA-ct as import impact factors of the equations in Section 3.3.