Multi-Step Trafﬁc Speed Prediction Based on Ensemble Learning on an Urban Road Network

: Short-term trafﬁc speed prediction plays an important role in the ﬁeld of Intelligent Transportation Systems (ITS). Usually, trafﬁc speed forecasting can be divided into single-step-ahead and multi-step-ahead. Compared with the single-step method, multi-step prediction can provide more future trafﬁc condition to road trafﬁc participants for guidance decision-making. This paper proposes a multi-step trafﬁc speed forecasting by using ensemble learning model with trafﬁc speed detrending algorithm. Firstly, the correlation analysis is conducted to determine the representative features by considering the spatial and temporal characteristics of trafﬁc speed. Then, the trafﬁc speed time series is split into a trend set and a residual set via a detrending algorithm. Thirdly, a multi-step residual prediction with direct strategy is formulated by the ensemble learning model of stacking integrating support vector machine (SVM), CATBOOST, and K-nearest neighbor (KNN). Finally, the forecasting trafﬁc speed can be reached by adding predicted residual part to the trend one. In tests that used ﬁeld data from Zhongshan, China, the experimental results indicate that the proposed model outperforms the benchmark ones like SVM, CATBOOST, KNN, and BAGGING.


Introduction
Various types of vehicles have pushed human society forward by making the mobility of people and goods possible, providing faster and more comfortable travel experience, facilitating social interactions, and so on.Nevertheless, the rapidly increasing number of vehicles has also brought some severe problems into worldwide cities. Apart from consequences like global warming and fossil fuel depletion, traffic congestion is one of the most negative effects that can be perceived by each traffic participant and it can inevitably result in a series of problems, such as traffic accidents, energy overconsumption, and significant travel delay [1].In 2017, INRIX released the Urban Mobility Scorecard Annual Report, which showed that traffic congestion was a significant challenge in a large number of major cities around the world.According to this report, urban Americans spent a total of an extra 8.8 billion hours and purchased an extra 3.3 billion gallons of fuel because of congestion in 2017, giving a direct congestion cost of $166 billion [2].Transportation and traffic researchers believe that the Intelligent Transportation Systems (ITS) is a promising solution to improve transportation management and can provide much better services that can eventually lead to less congestion than traditional methods [3,4].Among such services, traffic prediction plays an important role in ITS because forecasting information can be utilized to support traffic guidance, signal optimization, and so on.For example, travelers can re-plan their traveling paths to avoid congestion and incidents, which could save their time and cost by using the forecasting information, such as traffic speed, travel time, and traffic condition [5].Morever, accurate and timely speed prediction have also been key issues in traffic prediction horizon, even in the ITS horizon.Correspondingly, it has led to an intensive body of works about traffic speed prediction in the recent years.
However, some major challenges about short-term traffic forecasting have been pointed out as follows [6,7]: (1) Traffic prediction based on spatial-temporal characteristics.
(2) Further exploration of Artificial Intelligence (AI) in traffic flow prediction.
(3) Multi-step prediction for real-life ITS applications to provide relatively long-term future traffic situation for road users and government.
Based on the aforementioned issues, a novel multi-step speed prediction model is proposed by considering spatial-temporal dependencies and using ensemble learning.The developed method separates original data into mean and residual time series, and then employs direct strategy and the ensemble learning framework of stacking algorithm to multi-step-ahead forecast the residual time series.Main contributions of this paper are listed as follows: (1) A novel multi-step prediction with detrending and direct strategy is achieved by the ensemble learning model of stacking (DDSELM) to forecast travel speed using spatial-temporal characteristics.(2) The proposed multi-step model is validated by using a very large field dataset of hourly average link traffic speed, which reveals it has good performance.
The remaining part of this paper is organized as follows: In Section 2, a summary of the state-of-the-art research in exploring traffic speed prediction is presented.Then, Section 3 formulates a new multi-step prediction model with ensemble learning.Subsequently, a field dataset of Zhongshan, China, is employed to validate the effectiveness of our model in Sections 4 and 5, respectively.Finally, the conclusion and future work are presented in Section 6.

Related Work
In this section, a relevant background review about works on traffic speed prediction including parametric methods and non-parametric methods [8,9], machine learning, and multi-step prediction is provided.
Parametric methods, a well-structured family of models, estimate the model parameters based on the training data and have been widely used to conduct traffic forecast.For example, the auto-regressive integrated moving average (ARIMA) model was proposed in 1970s to predict short-term freeway traffic data [10].Additionally, Voort et al. [11] proposed a KARIMA prediction model to forecast traffic flow, which combined Kohonen maps with ARIMA time series.Then, Williams et al. [12] provided a theoretical and empirical analysis of a seasonal ARIMA method, and Kumar et al. [13] extended it into the scenarios of the limited input data.In Kumar's scheme, the prediction of the next day situation (24 h, start from the prediction moment) was only based on the historical data in the last three days.In 1984, Okutani et al. [14] applied Kalman filtering (KF) theory to traffic prediction and proved that KF could perform well in traffic prediction, and Guo et al. [15] introduced the adaptive KF approach to forecast stochastic short-term traffic flow.Along this line, Mir et al. [16] presented a KF model for travel speed prediction by minimizing the variance between the real-time speed measurement and its prediction.Zambrano-Martinez et al. [17] presented an intuitive formula to predict link travel time based on the degree of traffic congestion for route choice optimization.
Unlike the parametric methods, non-parametric ones have ability to flexibly capture the stochastic and nonlinear features of traffic state (i.e., speed, flow, occupancy, travel time).Vlahogianni et al. [8] pointed that traffic forecasting methods with computational intelligence (CI) have gradually replaced the traditional statistical ones, because they need no or little prior assumptions for input variables.As typical representatives, artificial neural networks (ANNs) have been successfully applied in many transportation domains [18,19].ANNs are mathematical models that formulate information processing systems by imitating the structure and function of the neural network of the brain.For example, Vlahogianni et al. [20] proposed an advanced, genetic algorithm based multi-layered struc-tural optimization strategy to predict traffic flow.Different from ANNs, Zhang et al. [21] proposed wavelet-based higher-order spatial-temporal (Wavelet-HST) method to accurately predict network-scale traffic speed with an improvement of 7.8%∼10.5% in the root mean square error than other six benchmark methods.Moreover, Cai et al. [22] improved the original KNN model based on spatiotemporal correlation for traffic prediction.
In recent years, with the rapid development of machine learning and deep learning techniques, more and more ITS researchers have begun to adopt these kinds of techniques for high-accuracy traffic prediction.As pointed out by Ma et al., the LSTM-NN can overcome the problem of back-propagated error decay by using memory blocks and has a better capability for time series prediction with long temporal dependency [23].Additionally, a single-step support vector machine (SVM) with spatiotemporal parameters was proposed in 2017, which provided short-term traffic speed prediction results (5-min) with error ranging from 3.31% to 15.35% [24].Moreover, Dong et al. [25] developed an extreme gradient boosting (XGBOOST) model with wavelets decomposition and reconstruction to predict the short-term traffic flow, which outperformed SVM.
Although single models are studied by many researchers and proved to be suitable for many cases, they still have some shortcomings [25].Alternatively, it is a much better way to fuse the results from different prediction methods combined one to achieve better prediction accuracy than single predictor.For example, the ensemble learning models have been proved to achieve much better performance in prediction accuracy than individual ones [26].Nowadays, ensemble learning has been used in many fields of traffic prediction, such as traffic sign detection and recognition [27], traffic speed [28], short-term traffic volume [29], and traffic incident detection [30].
However, current ensemble methods are not explicitly designed to deal with spatiotemporal data, and how to effectively ensemble multiple models while utilizing the spatiotemporal information remains a challenging, but practical, problem.
There is a tendency that more and more scholars draw their attentions on multistep prediction.Usually, multi-step traffic prediction can provide drivers and traffic agencies more chances and time to pre-make better decisions than one-step prediction.Zhang et al. [31] reported a hybrid deep ensemble approach by integrating 3D convolutional neural network (CNN) with ensemble empirical mode decomposition (EEMD), and yielded the high performance regardless of prediction time step increases from 1 to 6. Notably, although the prediction time step increases, the evolving fuzzy neural network (EFNN) model with the consideration of the periodic pattern can also outperform other models (ANN, SVM, ARIMA, vector autoregressive model) with smaller prediction errors and slower raising rate of errors [5].Zhang et al. [7] proposed a novel deep learning framework named attention graph convolutional sequence-to-sequence model (AGC-Seq2Seq) to accurately capture the temporal heterogeneity in multi-step traffic speed prediction.Papathanasopoulou et al. [32] embedded a microscopic traffic simulation of car-following model into dynamic multi-step traffic prediction and leaded to less than 10% error in speed prediction even for ten steps into the future.

Prediction Methodology
Previous studies show that ensemble learning can be used for traffic prediction with good performance.Thus, this paper will develop an ensemble learning model for multistep traffic speed forecasting with direct strategy, namely DDSELM, to process the given time series data.Firstly, the correlation analysis is conducted to identify the key factors affecting speed forecasting.Then, a detrending algorithm is developed to divide the speed dataset into trend part (i.e., mean set) and residual one, and then this study employs direct strategy and ensemble learning of stacking to predict multi-step residuals.Finally, the multi-step residuals combined with the trend sets can form the final predict results.

Direct Strategy
The direct strategy was firstly proposed by Cox in 1961 in the fields of multi-step prediction.This strategy needs to establish a set of models for each step.Input variables of direct strategy depend on observed values instead of predicted ones [33,34].For example, the framework of direction strategy is shown in Figure 1.
strategy and ensemble learning of stacking to predict multi-step residuals.Finally, the multi-step residuals combined with the trend sets can form the final predict results.

Direct Strategy
The direct strategy was firstly proposed by Cox in 1961 in the fields of multi-step prediction.This strategy needs to establish a set of models for each step.Input variables of direct strategy depend on observed values instead of predicted ones [33,34].For example, the framework of direction strategy is shown in Figure 1.

Feature Construction
Notably, representative features determine the performance of forecasting modeling.In order to determine the appropriate model inputs, this study chooses initially the ten spatiotemporal candidate variables of travel speed and flow for correlation analysis, as shown in Table 1, which involve the time of day, day of week, and upstream and downstream connected links.Correlation analysis is a statistical analysis method that studies the correlation between two or more random variables.Among them, the Pearson correlation coefficient proposed by Pearson in 1895 is one of the most influential coefficients in correlation analysis to select the final representative features for the prediction model [35].

Representative Feature Descriptions v(d,t) Speed at time t, day d v(d−1,t)
Speed at time t, day d  v(d,t)

Downstream speed at time t, day d flow(d,t)
Flow at time t, day d

Detrending
Since traffic speed time series used in this paper are different spatio-temporal characteristics (i.e., workdays or weekends, peak or off-peak hours), it is reasonable to split speed time series into its mean trends and residuals via detrending algorithm, and

Feature Construction
Notably, representative features determine the performance of forecasting modeling.In order to determine the appropriate model inputs, this study chooses initially the ten spatiotemporal candidate variables of travel speed and flow for correlation analysis, as shown in Table 1, which involve the time of day, day of week, and upstream and downstream connected links.Correlation analysis is a statistical analysis method that studies the correlation between two or more random variables.Among them, the Pearson correlation coefficient proposed by Pearson in 1895 is one of the most influential coefficients in correlation analysis to select the final representative features for the prediction model [35].
Table 1.Constructed feature candidates for the prediction model.

Representative Feature Descriptions
Upstream speed at time t, day d v (d,t) Downstream speed at time t, day d flow (d,t) Flow at time t, day d

Detrending
Since traffic speed time series used in this paper are different spatio-temporal characteristics (i.e., workdays or weekends, peak or off-peak hours), it is reasonable to split speed time series into its mean trends and residuals via detrending algorithm, and develop the model to predict the residual time series.Following the previous literature [36], a simple average method was used to find out the trend, which takes the average of daily traffic speed series into account in Equation (1).Among, the speed observation at the tth hour belonging to the dth day can be formulated as follows: where the first item on the right of the equation represents the average speed(flow) of the dth day; r v (d,t) (r f low (d,t) ) represents the speed(flow) residual at the tth hour on the dth day, which constitutes the speed(flow) residual time series.Next, a predicted model will be introduced to forecast the residuals.

Ensemble Learning
As aforementioned, the ensemble learning can perform well in dealing with regression and classification tasks.Bagging, boosting, and stacking are the three conventional ensemble learning algorithms to integrate weak models into a strong one for applications in different fields [37].The final prediction of bagging algorithm is equal to the average of all base learners or underlying models.The common boost model is Adaboost.Boosting is an ensemble meta-algorithm that builds a model by iteratively training a new model to emphasize the misclassified training samples from the previous model.
Depending on the combination structure, the stacking regression is an ensemble learning technique to combine multiple regression models via a meta-regressor, which was first introduced by Wolpert in 1992 [38].Firstly, each individual prediction model is trained based on the complete training set.Then, the meta-regressor is fitted based on the outputsmeta-features-of the individual predictor.Thus, it can be found that stacking algorithm depends on meta-regressor learning mechanism to combine all underlying predictors, and goes beyond simple weighting mechanisms with the comparison of boosting and bagging.

Performance Indices
In this study, there are four traditional measurement of effectiveness (MOE) indices to evaluate the developed prediction method, such as mean absolute error (MAE), Mean absolute percentage error (MAPE), mean square error (MSE) and coefficient of variation (CV).CV is widely used in fields of engineering or applied statistics when doing quality assurance studies.The detailed calculation formulas of these indices are expressed as follows: where v (d,t) and v(d,t) represent the actual and the predicted traffic speeds, respectively.N is the number of test samples, μ denotes the average value of predicted speed, and σ is the standard deviation of the predicted speed.In this study, the stacking regression models contain three basic underlying learners of SVM, CATBOOST, and KNN, and the meta-regressor is the ridge regression method.The framework of the proposed method is showed in Figure 2.
In this study, the stacking regression models contain three basic underlying learners of SVM, CATBOOST, and KNN, and the meta-regressor is the ridge regression method.The framework of the proposed method is showed in Figure 2.

Case Study
The proposed model will be evaluated with traffic data collected by ITS with Internet Plus from the department of the Zhongshan Traffic Police Detachment.Zhongshan is one of the pilot cities to implement ITS applications in China, which has the ability to automatically collect city-level traffic flow data at signalized intersections.The testbed is selected on Xingzhong Rd with two-way six motorized lanes, which is the busiest and most congested south-north corridors in the Zhongshan downtown area [39].Northbound and southbound traffic flow were collected by loop detectors located several meters before the stopline at the signalized intersection between Zhongshan Rd. and Tiyu Rd., and the twoway link travel speed was collected by floating car detection in Figure 3.In this study, the pilot dataset with the time interval of 1 h on Xingzhong Rd, was recorded over five weeks from October 21 to 24 November 2018.Referring to the correlation analysis mentioned above, this paper selected 10 representative features to calculate the Pearson correlation analysis as shown in Tables 2 and 3.

Case Study
The proposed model will be evaluated with traffic data collected by ITS with Internet Plus from the department of the Zhongshan Traffic Police Detachment.Zhongshan is one of the pilot cities to implement ITS applications in China, which has the ability to automatically collect city-level traffic flow data at signalized intersections.The testbed is selected on Xingzhong Rd with two-way six motorized lanes, which is the busiest and most congested south-north corridors in the Zhongshan downtown area [39].Northbound and southbound traffic flow were collected by loop detectors located several meters before the stopline at the signalized intersection between Zhongshan Rd. and Tiyu Rd., and the two-way link travel speed was collected by floating car detection in Figure 3.In this study, the pilot dataset with the time interval of 1 h on Xingzhong Rd, was recorded over five weeks from 21 October to 24 November 2018.Referring to the correlation analysis mentioned above, this paper selected 10 representative features to calculate the Pearson correlation analysis as shown in Tables 2 and 3.

𝐶𝑉 = 𝜇̂
(5) where  (,) and  ̂(,) represent the actual and the predicted traffic speeds, respectively. is the number of test samples, ̂ denotes the average value of predicted speed, and  ̂ is the standard deviation of the predicted speed.
In this study, the stacking regression models contain three basic underlying learners of SVM, CATBOOST, and KNN, and the meta-regressor is the ridge regression method.The framework of the proposed method is showed in Figure 2.

Case Study
The proposed model will be evaluated with traffic data collected by ITS with Internet Plus from the department of the Zhongshan Traffic Police Detachment.Zhongshan is one of the pilot cities to implement ITS applications in China, which has the ability to automatically collect city-level traffic flow data at signalized intersections.The testbed is selected on Xingzhong Rd with two-way six motorized lanes, which is the busiest and most congested south-north corridors in the Zhongshan downtown area [39].Northbound and southbound traffic flow were collected by loop detectors located several meters before the stopline at the signalized intersection between Zhongshan Rd. and Tiyu Rd., and the two-way link travel speed was collected by floating car detection in Figure 3.In this study, the pilot dataset with the time interval of 1 h on Xingzhong Rd, was recorded over five weeks from October 21 to 24 November 2018.Referring to the correlation analysis mentioned above, this paper selected 10 representative features to calculate the Pearson correlation analysis as shown in Tables 2 and 3.   Speed difference between measured speed and daily average value at time t belonging to the d-3th day Flow difference between measured traffic flow and daily average value at time t belonging to the dth day Upstream speed difference between measured speed and daily average value at time t belonging to the dth day Downstream speed difference between measured speed and daily average value at time t belonging to the dth day

Discussion
The experimental in this study is operated on a Windows 10 64-bit PC with 4.00 GHz Intel(R) Core(TM) i7-4790K CPU and a 16 GB memory.The software used in our experiment is Jupyter 6.1.1 and Python 3.6.The key parameters of four benchmark models are shown in Table 5.Where n_neighbors mean the number of nearest neighbors; depth denotes the depth of trees; learning_rate is used for reducing the gradient step, which affects the overall time of training: The smaller the value, the more iterations are required for training; loss_function represents the a certain metric during model training; C limits the importance of each point; gamma controls the width of the Gaussian kernel; kernel means kernel function; alpha means regularization strength; and random_state means the seed of the pseudo random number generator to use when shuffling the data while random_seed is the same with random_state.
The proposed forecasting models in this study are evaluated by comparing with four other predictors: SVM, CATBOOST, KNN, and BAGGING (the average result of SVM, CATBOOST, and KNN into an ensemble learning).SVM could deal with overfitting problem and have good generalization performance because SVM can construct a mapping from one dimensional input vector into high-dimensional space by the use of reproducing kernels.Furthermore, the SVM is also slow in the test phase due to the high algorithm complexity and needs a large memory capacity to calculate.CATBOOST uses an efficient gradient modification of ordered boosting to overcome the problem of target leakage, and it performs well in small datasets, but training a CATBOOST model requires a great deal of time and compute memory.KNN is suitable for small datasets but it is usually hysteretic in time series.BAGGING is a combination of KNN, SVM, and CATBOOST, and outperforms each individual method.

Prediction Accuracy
The MOE results of the proposed DDSELM and other four benchmark models are drawn in Figure 4 on southbound and northbound road links.Each subfigure shows one performance index of five prediction models under three scenarios with three kinds of prediction steps [1 h (60 min), 2 h (120 min), 3 h (180 min)] into the future.For the different Appl.Sci.2021, 11, 4423 9 of 15 steps in Figure 4, one can find that the prediction accuracy of each models is decreasing with the increase of the prediction step regardless of the southbound or northbound links.This result is consistent with the results of existing studies [30,33], which found that it is particularly difficult to conduct multi-step-ahead prediction due to the randomness and uncertainty of the travel speed.lar, the KNN, an individual method, performs better than SVM and CATBOOST for onestep-ahead prediction, but it also suffers many more errors than the two other individual ones for three-step-ahead prediction in Figure 4a.Compared with KNN, DDSELM always has good performance regardless of the road direction and prediction steps, among which the MAPE of northbound DDSELM is 1.16% lower (7.08% versus 8.24%) than KNN in onestep-ahead prediction, 1.58% (8.77% versus 10.35%) in two-step-ahead prediction, and 1.56% (10.34% versus 11.90%) in three-step-ahead prediction, respectively.In Figure 4b, the MAPE of southbound DDSELM is 2.10% lower (14.90%versus 17.00%) than KNN in one-step-ahead prediction, 1.05% (16.99% versus 18.04%) in two-step-ahead prediction, and 4.30% (17.82% versus 22.12%) in three-step-ahead prediction, respectively.Notably, the prediction accuracy of the northbound DDSELM is better than southbound, which might be that the correlation between travel speed of the northbound links is higher than the southbound.Furthermore, a more detailed analysis about the two ensemble models of DDSELM and BAGGING was conducted for the multi-step-ahead prediction over a weekday of Wednesday and a weekend of Saturday in Figure 5. Regardless of the ahead-prediction step size, the performance of these two ensemble models performs much better during the off-peak hours (9:00~16:00) than peak ones (7:00~8:00 and 17:00~18:00), and DDSELM is especially better than BAGGING.The reason might be because BAGGING only uses the As shown in Figure 4, the ensemble learning models of the proposed DDSELM can yield many more benefits than individual models (SVM, CATBOOST, and KNN) and ensemble one (BAGGING) regardless of the ahead-prediction step.In particular, the developed DDSELM in this study can outperform four kinds of benchmark models.In particular, the KNN, an individual method, performs better than SVM and CATBOOST for one-stepahead prediction, but it also suffers many more errors than the two other individual ones for three-step-ahead prediction in Figure 4a.Compared with KNN, DDSELM always has good performance regardless of the road direction and prediction steps, among which the MAPE of northbound DDSELM is 1.16% lower (7.08% versus 8.24%) than KNN in one-step-ahead prediction, 1.58% (8.77% versus 10.35%) in two-step-ahead prediction, and 1.56% (10.34% versus 11.90%) in three-step-ahead prediction, respectively.In Figure 4b, the MAPE of southbound DDSELM is 2.10% lower (14.90%versus 17.00%) than KNN in one-step-ahead prediction, 1.05% (16.99% versus 18.04%) in two-step-ahead prediction, and 4.30% (17.82% versus 22.12%) in three-step-ahead prediction, respectively.Notably, the prediction accuracy of the northbound DDSELM is better than southbound, which might be that the correlation between travel speed of the northbound links is higher than the southbound.
Furthermore, a more detailed analysis about the two ensemble models of DDSELM and BAGGING was conducted for the multi-step-ahead prediction over a weekday of Wednesday and a weekend of Saturday in Figure 5. Regardless of the ahead-prediction step size, the performance of these two ensemble models performs much better during the off-peak hours (9:00~16:00) than peak ones (7:00~8:00 and 17:00~18:00), and DDSELM is especially better than BAGGING.The reason might be because BAGGING only uses the average of all underlying prediction outputs to make up for the shortcomings of each individual prediction model, but DDSELM uses the ridge regression algorithm in the mega learner.The ridge regression uses L2 regularization for reducing the prediction error.During the peak hours, both southbound and northbound predictions have much higher accuracy in the morning peak period (7:00~9:00) than evening (17:00~19:00).As far as the evening peak period, single-step prediction is better than multi-step-ahead for the northbound segments.This is because the traffic flow during the evening peak hours was much larger than the morning and there was a sharp drop in travel speed around 17:00.Therefore, there is a certain difference in the accuracy of two-way prediction.Compared with the accuracy of weekdays, accuracy of one-step-ahead northbound prediction on weekends slightly increases (8.36% to 7.54%), while the accuracy of one-step-ahead southbound prediction decreases (9.60% to 13.79%).The multi-step prediction (two-step-ahead prediction and three-step prediction) also has a similar trend, namely, prediction accuracy increases in northbound while decreasing in the southbound direction.This may be because the southbound data itself is less relevant than the northbound data in Tables 2 and 3.

Prediction Stability
Figure 6 shows the boxplots for the one-week prediction error and one will find that, for the northbound prediction, the number of positive errors is larger than the number of negative errors for different prediction steps; that is, most northbound prediction outputs are larger than the observed values, whereas the number of positive errors is roughly equal to the number of negative errors for the southbound traffic.For the same prediction step, the fluctuation of the northbound prediction errors is smaller than southbound.For example, the northbound one-step prediction error range is [−7.57,17.12],while southbound is [−16.06,17.50].
tion result on Saturday in northbound; (f) three-step-ahead prediction result on Saturday in northbound; (g) one-step-ahead prediction result on Wednesday in southbound; (h) two-step-ahead prediction result on Wednesday in southbound; (i) three-step-ahead prediction result on Wednesday in southbound; (j) one-step-ahead prediction result on Saturday in southbound; (k) two-step-ahead prediction result on Saturday in southbound; (l) three-step-ahead prediction result on Saturday in southbound.

Prediction Stability
Figure 6 shows the boxplots for the one-week prediction error and one will find that, for the northbound prediction, the number of positive errors is larger than the number of negative errors for different prediction steps; that is, most northbound prediction outputs are larger than the observed values, whereas the number of positive errors is roughly equal to the number of negative errors for the southbound traffic.For the same prediction step, the fluctuation of the northbound prediction errors is smaller than southbound.For example, the northbound one-step prediction error range is [−7.57,17.12],while southbound is [−16.06,17.50].The cumulative distribution function (CDF) is an integral of the probability density function, which provides a complete description of the probability distribution of a real random variable.The CDFs of five prediction models are plotted in Figure 7, where the x axis is the deviation of prediction error, and the y axis is the cumulative probability.One can found out that the sequence of model prediction performance from good to bad is DDSELM, BAGGING, KNN, CATBOOST, and SVM.The DDSELM also can provide much more stability than others.For northbound prediction, the 83.33% of its one-stepahead prediction has error less than 10%, 71.42% for two-step-ahead, and 70.83% for three- The cumulative distribution function (CDF) is an integral of the probability density function, which provides a complete description of the probability distribution of a real random variable.The CDFs of five prediction models are plotted in Figure 7, where the x axis is the deviation of prediction error, and the y axis is the cumulative probability.One can found out that the sequence of model prediction performance from good to bad is DDSELM, BAGGING, KNN, CATBOOST, and SVM.The DDSELM also can provide much more stability than others.For northbound prediction, the 83.33% of its one-step-ahead prediction has error less than 10%, 71.42% for two-step-ahead, and 70.83% for three-stepahead, respectively.Correspondingly, the prediction error of less than 10% accounts for 71.42% for one-step-ahead, 60.11% for two-step-ahead, and 58.76% for three-step-ahead for southbound prediction, respectively.step-ahead, respectively.Correspondingly, the prediction error of less than 10% accounts for 71.42% for one-step-ahead, 60.11% for two-step-ahead, and 58.76% for three-stepahead for southbound prediction, respectively.The CV is an important indicator to measure the diversity of data.Compared with other prediction models, the CV of DDSELM has the minimum score regardless of prediction step as shown in Figure 8.Compared with other four models, DDSELM has lower CV, reaching 0.09, 0.10, and 0.11 for one-step-ahead, two-step-ahead, and three-stepahead prediction, respectively, in Figure 8a.In the south direction, the CVs are similar to those of north direction except the south direction has higher CVs because of the lower correlation mentioned in Section 4. The CV is an important indicator to measure the diversity of data.Compared with other prediction models, the CV of DDSELM has the minimum score regardless of prediction step as shown in Figure 8.Compared with other four models, DDSELM has lower CV, reaching 0.09, 0.10, and 0.11 for one-step-ahead, two-step-ahead, and three-step-ahead prediction, respectively, in Figure 8a.In the south direction, the CVs are similar to those of north direction except the south direction has higher CVs because of the lower correlation mentioned in Section 4.

Figure 1 .
Figure 1.The framework of direction strategy

Figure 1 .
Figure 1.The framework of direction strategy

Figure 2 .
Figure 2. The framework of the proposed predictor (DDSELM) in this study.

Figure 3 .
Figure 3.The layout of pilot intersections in the city of Zhongshan, China.

Figure 2 .
Figure 2. The framework of the proposed predictor (DDSELM) in this study.

Figure 2 .
Figure 2. The framework of the proposed predictor (DDSELM) in this study.

Figure 3 .
Figure 3.The layout of pilot intersections in the city of Zhongshan, China.
Speed difference between measured speed and daily average value at time t belonging to the dth dayv di f (d−1,t)Speed difference between measured speed and daily average value at time t belonging to the d-1th dayv di f (d−2,t)Speed difference between measured speed and daily average value at time t belonging to the d-2th day v di f (d−3)

Figure 4 .
Figure 4. MOEs of five prediction models:(a) MAPE of five models in northbound; (b) MAPE of five models in southbound; (c) MAE of five models in northbound; (d) MAE of five models in southbound; (e) MSE of five models in northbound; (f) MSE of five models in southbound.

Figure 4 .
Figure 4. MOEs of five prediction models:(a) MAPE of five models in northbound; (b) MAPE of five models in southbound; (c) MAE of five models in northbound; (d) MAE of five models in southbound; (e) MSE of five models in northbound; (f) MSE of five models in southbound.

Figure 5 .Figure 5 .
Figure 5.The prediction comparisons by two ensemble methods on Wednesday and Saturday:(a) one-step-ahead prediction result on Wednesday in northbound; (b) two-step-ahead prediction result on Wednesday in northbound; (c) three-step-ahead prediction result on Wednesday in northbound; (d) one-step-ahead prediction result on Saturday in northbound; (e) two-step-ahead predic-Figure 5.The prediction comparisons by two ensemble methods on Wednesday and Saturday:(a) one-step-ahead prediction result on Wednesday in northbound; (b) two-step-ahead prediction result on Wednesday in northbound; (c) three-step-ahead prediction result on Wednesday in northbound; (d) one-step-ahead prediction result on Saturday in northbound; (e) twostep-ahead prediction result on Saturday in northbound; (f) three-step-ahead prediction result on Saturday in northbound; (g) one-step-ahead prediction result on Wednesday in southbound; (h) two-step-ahead prediction result on Wednesday in southbound; (i) three-step-ahead prediction result on Wednesday in southbound; (j) one-step-ahead prediction result on Saturday in southbound; (k) two-step-ahead prediction result on Saturday in southbound; (l) three-step-ahead prediction result on Saturday in southbound.

Figure 6 .
Figure 6.Comparison between the real speed and the predicted speed in boxplots: (a)one-step-ahead prediction error in northbound; (b) two-step-ahead prediction error in northbound; (c) three-stepahead prediction error in northbound; (d) one-step-ahead prediction error in southbound; (e) twostep-ahead prediction error in southbound; (f) three-step-ahead prediction error in southbound.

Figure 6 .
Figure 6.Comparison between the real speed and the predicted speed in boxplots: (a) one-step-ahead prediction error in northbound; (b) two-step-ahead prediction error in northbound; (c) three-step-ahead prediction error in northbound; (d) one-step-ahead prediction error in southbound; (e) two-step-ahead prediction error in southbound; (f) three-step-ahead prediction error in southbound.

Figure 7 .
Figure 7. Cumulative distribution function of five prediction models: (a) one-step-ahead prediction CDF in northbound; (b) two-step-ahead prediction CDF in northbound; (c) three-step-ahead prediction CDF in northbound; (d) one-step-ahead prediction error in southbound; (e) two-step-ahead prediction CDF in southbound; (f) three-step-ahead prediction CDF in southbound.

Figure 7 .
Figure 7. Cumulative distribution function of five prediction models: (a) one-step-ahead prediction CDF in northbound; (b) two-step-ahead prediction CDF in northbound; (c) three-step-ahead prediction CDF in northbound; (d) one-step-ahead prediction error in southbound; (e) two-step-ahead prediction CDF in southbound; (f) three-step-ahead prediction CDF in southbound.

Figure 8 .
Figure 8. Performance of five predictive models' results in CVs: (a) CVs of predictive results in northbound; (b) CVs of predictive results in southbound 6. Conclusions In order to tackle the challenge of multi-step traffic speed prediction, we proposed an ensemble model, i.e., the Detrending and Direct Strategy Ensemble Learning Model (DDSELM).The detrending technique could separate original dataset into mean trends

Figure 8 .
Figure 8. Performance of five predictive models' results in CVs: (a) CVs of predictive results in northbound; (b) CVs of predictive results in southbound

Table 1 .
Constructed feature candidates for the prediction model.

Table 4 .
Input candidates of the proposed prediction model in this study.Predicted speed difference between predicted speed and daily average value at time t + h belonging to the dth day h Prediction time step into the future, h ≥ 1 v

Table 5 .
The key parameters of four benchmark models in Python.