A Novel Hybrid Model for Short-Term Trafﬁc Flow Prediction Based on Extreme Learning Machine and Improved Kernel Density Estimation

: Short-term trafﬁc ﬂow prediction is the basis of and ensures intelligent trafﬁc control. However, the conventional models cannot make accurate predictions due to the strong nonlinearity and randomness in short-term trafﬁc ﬂow data. To this end, the authors of this paper developed a novel hybrid model based on extreme learning machine (ELM), adaptive kernel density estimation (AKDE), and conditional kernel density estimation (CKDE). Speciﬁcally, the ELM model was employed for nonlinear prediction. Then, AKDE was established to estimate the bandwidth of CKDE (i.e., AKDE-CKDE), which predicted the training residuals obtained by ELM. Finally, the predicted results of the two models were superimposed to derive the ﬁnal prediction of the hybrid model. Two case studies based on measured data were conducted to evaluate the performance of the proposed method. The experimental results indicate that the proposed method can realize a signiﬁcant improvement in terms of forecasting accuracy in comparison with the other concerned models. For instance, it performed better than the single ELM model, with an improvement in the evaluation criterion of a mean relative percentage error of 7.46%.


Introduction
With the acceleration of urbanization and the rapid increase in car ownership, traffic congestion in urban areas is becoming more and more serious, leading to a series of social problems, such as traffic accidents, air pollution, energy waste, and so on. These problems have greatly decreased the living standard of human beings. The emergence of the intelligent transportation system (ITS) has effectively alleviated traffic congestion and traffic accidents, thereby improving the efficiency of urban traffic operations and reducing environmental pollution [1].
Short-term traffic flow prediction is one of the crucial tasks of ITS. It aims to forecast the variation in traffic flow soon from a few seconds to a few hours based on historical traffic data. The accuracy and efficiency of prediction play a decisive role in the performance of path guidance and transportation management [2]. In recent decades, to enhance prediction accuracy, domestic and foreign scholars have put forward a wide variety of approaches. Generally, research on traffic flow prediction falls into the following three categories: statistical theoretical models, intelligent models, and hybrid models.
The commonly used statistical models include time-series models (e.g., autoregressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA), etc.) [3][4][5][6], the Kalman filtering model [7,8], the hidden Markov model [9], etc. All of them can obtain linear characteristics hidden in traffic flow data by selecting appropriate parameters. In general, the statistical theoretical models may be more suitable for short-term forecasting and widely utilized in practice due to the simpler model structure and the lower requirement for enhance its local adaptability [29]. Furthermore, the high complexity of traffic systems leads to significant randomness of traffic flow, and considering stochastic factors ensures accurate traffic flow prediction. Probabilistic density estimation can effectively quantify the uncertainty of traffic flow and provide more comprehensive information for traffic flow prediction. In light of the above, it is essential to develop a novel effective prediction method further to improve the accuracy of short-term traffic flow prediction. The authors of this paper developed an innovative hybrid short-term traffic flow forecasting model based on the ELM, AKDE, and CKDE. Specifically, the ELM model was adopted to predict the original traffic flow sequence, and the training residuals were obtained. Secondly, AKDE was utilized to estimate the variance in each dimension of the reconstructed samples. Then, the variance was used to replace the relevant parameters of CKDE, and the AKDE-CKDE model was established to forecast the residuals. Thirdly, the final prediction results were obtained by summing up the prediction values of the ELM and AKDE-CKDE model. Finally, the proposed model was analyzed based on two groups of traffic flow data. In order to better exhibit the performance of the proposed model, the authors selected the ARIMA, LSSVM, ELM-CKDE, ELM, and CKDE methods for comparison. Some conclusions are drawn in the end.
The main contributions of the proposed model are: • A novel hybrid predictor based on the ELM, AKDE, and CKDE is proposed for shortterm traffic flow prediction. The main characteristic of the predictor is that it considers the nonlinearity and randomness characteristics of traffic flow data, making it more suitable for the actual situation; • The corresponding parameters of CKDE are replaced by the variance in the reconstructed residual samples estimated by AKDE, which improves the model's adaptability. In addition, AKDE-CKDE can directly use the sample data for distribution estimation without any parameter assumptions; • Through extensive experiments on two real-world datasets at the intersection of the main road in the main urban area of Chongqing, the results show that the proposed hybrid model can increase the precision of urban road traffic flow prediction.
The remainder of this article is organized as follows. The basic principles of the methods and the hybrid forecasting model are briefly introduced in Section 2. Two case studies were conducted based on actual traffic flow data, and the corresponding results and analysis are given in Section 3. Finally, Section 4 summarizes some of the main conclusions.

Materials and Methods
The ELM, as a single hidden layer feedforward neural network, can randomly initialize the weights and thresholds of the input layer and hidden layer and get the corresponding output weights. It has the advantages of fewer training parameters, faster learning speed, as well as better generalization performance [30]. On the other hand, the combination of AKDE and CKDE can effectively and quickly obtain the probability density function (PDF) of the target variable. The authors of this paper combined the merits of these two models and constructed a new hybrid prediction model, i.e., the ELM-AKDE-CKDE.

Extreme Learning Machine (ELM)
As an improved single hidden layer feedforward neural network, the ELM has the capacity of training samples without resetting the cost and threshold value, and the optimal connection and bias parameters can be obtained by solving the matrix equation [20]. A typical single hidden layer feedforward neural network is shown in Figure 1.
Suppose there are n arbitrary training samples {x i , y i }, x i = [x i1 , x i2 , · · · , x in ] T ∈ R n , y i = [y i1 , y i2 , · · · , y im ] T ∈ R m . w ij (i = 1, · · · , n, j = 1, · · · , l) is the connection weight between the input layer and the hidden layer; β jk (j = 1, · · · , l, k = 1, · · · , m) denotes the connection weight between the hidden layer and the output layer; b k (k = 1, · · · , l) is the hidden layer bias value. Then, the ELM model can be formulated as: where g(x) is the activation function; optimal connection and bias parameters can be obtained by solving the matrix equation [20]. A typical single hidden layer feedforward neural network is shown in Figure 1. Suppose there are n arbitrary training samples   Its matrix form is: where H is the output matrix of the hidden layer;  is the matrix of the output weights; T is the output vector. Its matrix form is: where H is the output matrix of the hidden layer; β is the matrix of the output weights; T is the output vector.
The goal of network learning is to minimize the output error of the neural network, i.e., By training the single hidden layer neural network to obtain optimalβ, which is calculated as: The elimination ofβ givesβ = H + T where H + is the generalized inverse of matrix H. The ELM algorithm can be summarized as follows: Given a training set {(x i , y i )|x i ∈ R n , y i ∈ R m , i = 1, 2, · · · , n }:

1.
Determine the specific structure of the ELM network, such as the hidden neuron node number l and the hidden layer activation function g(x); 2.
Calculate the hidden layer output matrix H in Equation (3); 4.

Adaptive Kernel Density Estimation and Conditional KDE (AKDE-CKDE)
As a matter of fact, the choice of bandwidth matrix has a great effect on the estimation results, while the selection of the kernel function may have a minor effect [10]. Therefore, the authors adopted the AKDE method selected by plug-in bandwidth, which can effectively and quickly obtain the probability density estimation function [29]. After that, the coefficients of CKDE were estimated by the variance obtained from the probability density of the sample data in the AKDE method. The detailed illustration of AKDE-CKDE in traffic flow prediction is shown as follows: Assume that a set of discrete time series of traffic flow after data processing is {x 1 , x 2 , . . . , x n }. For one-step ahead prediction, N d-dimensional explanatory variables x t = [x(t), x(t + 1), · · · , x(t + d − 1)] and N target variables y t = [x(t + d)], t = 1, 2, · · · , N can be constructed by the following equation: where N = n − d. Then, the sets of x t and y t (t = 1, 2, · · · , N) can be regarded as independent samples of random vector x(x ∈ R d ) and random variable y(y ∈ R), respectively. Combine x and y, a random vector z = (x, y) ∈ R d+1 with the sample {z t = (x t , y t )} can be constructed. Then, the multi-dimensional kernel function of the random vector z is shown aŝ Similarly, the multi-dimensional kernel density estimation for x is given bŷ where K d (·) denotes Gaussian kernel density function. The Gaussian kernel function is often used as the kernel function due to its advantages of simplicity of use. Its expression is shown as On the other hand, B z represents a symmetric and positive definite kernel bandwidth matrix. For simplicity, the diagonal matrix is used as the kernel bandwidth matrix in this paper, and its expression is where b 1 , b 2 , · · · , b d are the bandwidth parameters corresponding to each dimension of the independent variable x, b d+1 is the bandwidth parameter of y. b 1 , b 2 , · · · , b d and b d+1 determine the smoothness in the x-direction and y-direction, respectively. Then, in this study, adaptive kernel density estimation via diffusion was utilized to obtain the mean and variance of the grid points [29], which are shown as where j = 1, 2, · · · , λ denotes the number of discrete grid points, which is large enough.
In this study, normal reference criterion (NRC) was employed to determine the value of bandwidth parameter b i (i = 1, 2, · · · , d + 1) [31]. Both of them can be calculated by: where σ i (i = 1, 2, · · · , d + 1) is the standard deviation of the grid point probability density. Based on the above results, the distribution of target variable y under the condition of explanatory variable x can be expressed aŝ The conditional expectation and variance of y can be calculated by utilizing Equation (17), which are shown as In this way, the one-step ahead forecasting results can be produced bŷ wherex(n + 1),σ 2 (n + 1) andf (n + 1) denote the one-step-ahead predicted value, variance and PDF at the time n + 1, respectively.

Hybrid Forecasting Model
Through the above brief review, a novel hybrid model-ELM-AKDE-CKDE was employed to enhance prediction accuracy. The ELM model was applied to predict the shortterm traffic flow and the training residuals were obtained. Then, the variance of each dimension of the reconstructed sample estimated by AKDE was used to substitute the corresponding parameters of the CKDE model and obtain a one-step-ahead estimation. At the final forecasting task, the prediction result was obtained by the superposition of the two models' predicted values. The specific flowchart of the ELM-AKDE-CKDE model is shown in Figure 2, and the complete steps are shown as follows: 1.
Establish the ELM network, and set the hidden node number l and hidden node output function g(x), by which the prediction results {x (n + 1), · · · ,x (n + N)} and training residuals {r (1), · · · , r(n)} can be obtained; 3.
Replace the corresponding parameters of CKDE with the variance in the reconstructed residual samples estimated by AKDE, then implement one-step-ahead estimation for the residual sequences {r (1), · · · , r(n)}, by which the predictive value of the n + 1th residual datar(n + 1) can be estimated by AKDE-CKDE; 4.
Update the training part to {x(2), · · · , x(n + 1)} and repeat steps 2-3, and the corresponding residual forecasting resultr(n + 2) can be obtained. Continue one-step ahead prediction until the overall forecasting part is predicted, and the predicted values of the training residuals {r(n + 1), · · · ,r(n + N)} can be obtained; 5.
Analyze the forecasting results and evaluate the performance of the proposed model via comparing it with the involved models.

Data Description
To better present the performance of the proposed model, two groups of collected data were utilized for prediction. Dataset 1 and dataset 2 came from the A and B intersections of the main road in the main urban area of Chongqing, respectively, as shown in Figure 3. The collection lasted for a week with a statistical interval of 5 min. A total of 2016 sample data were contained in each dataset.

Data Description
To better present the performance of the proposed model, two groups of collected data were utilized for prediction. Dataset 1 and dataset 2 came from the A and B intersections of the main road in the main urban area of Chongqing, respectively, as shown in Figure 3. The collection lasted for a week with a statistical interval of 5 min. A total of 2016 sample data were contained in each dataset.
In this section, the predictive performance of the proposed model based on dataset 1 is presented first. For the sake of making the prediction results more convincing, twothirds of the data were used to construct and train the model, and the rest were utilized to evaluate the performance [32]. The statistical results of dataset 1 are shown in Figure 4 and Table 1. It should be noted that skewness 0  and kurtosis 3  mean that these data overall obey Gaussian distribution, and a value farther away from the target value indicates a stronger non-Gaussianity characteristic. In Table 1, it can be seen that dataset 1 fluctuates severely and has strong nonstationarity and non-Gaussianity.    In this section, the predictive performance of the proposed model based on dataset 1 is presented first. For the sake of making the prediction results more convincing, two-thirds of the data were used to construct and train the model, and the rest were utilized to evaluate the performance [32]. The statistical results of dataset 1 are shown in Figure 4 and Table 1. It should be noted that skewness = 0 and kurtosis = 3 mean that these data overall obey Gaussian distribution, and a value farther away from the target value indicates a stronger non-Gaussianity characteristic. In Table 1, it can be seen that dataset 1 fluctuates severely and has strong nonstationarity and non-Gaussianity.

Data Description
To better present the performance of the proposed model, two groups of collected data were utilized for prediction. Dataset 1 and dataset 2 came from the A and B intersections of the main road in the main urban area of Chongqing, respectively, as shown in Figure 3. The collection lasted for a week with a statistical interval of 5 min. A total of 2016 sample data were contained in each dataset.
In this section, the predictive performance of the proposed model based on dataset 1 is presented first. For the sake of making the prediction results more convincing, twothirds of the data were used to construct and train the model, and the rest were utilized to evaluate the performance [32]. The statistical results of dataset 1 are shown in Figure 4 and Table 1. It should be noted that skewness 0  and kurtosis 3  mean that these data overall obey Gaussian distribution, and a value farther away from the target value indicates a stronger non-Gaussianity characteristic. In Table 1, it can be seen that dataset 1 fluctuates severely and has strong nonstationarity and non-Gaussianity.

Evaluation Criteria
In order to quantitatively evaluate the performance of the proposed model for shortterm traffic flow, the following four frequently used indicators were selected in this study: mean absolute error (MAE), mean relative percentage error (MRPE), root mean square error (RMSE), and root mean square relative error (RMSRE). Their specific mathematical expressions are displayed as follows: The MAE represents the mean of the absolute error between the predicted and measured value: The MRPE was used to measure the relative errors between the average predicted value and real value on the test set: The average differences between the measurements and the predictive values of the method were measured by RMSE: The RMSRE represents the standard deviation in the relative error of the prediction: where y i andŷ i represent the measured value and predicted value, respectively. It is obviously seen in Equations (23)-(26) that the smaller the values of MAE, MRPE, RMSE, and RMSRE, the higher the prediction accuracy.

Performance Evaluation
For the sake of reflecting the superiority of the proposed model, five other models, including the ARIMA model, LSSVM model, ELM-CKDE model, ELM model, and CKDE model, were employed for comparisons. In fact, different parameters may have great impacts on the performance of the prediction method. The ARIMA model, as a statistical model most commonly used for time-series forecasting, can well capture the linear relationships in short-duration traffic volume data [3]. Here, we used the ARIMA (1,1,1) model to predict the traffic flow. As for the ELM method, we set 30 neuron nodes for the hidden layer and generated randomly input weight and bias. For the CKDE nonparametric method, the sample data can be used directly to estimate the distribution without any parameter assumptions. It should be noted that, all experiments were run with the aid of MATLAB 2019a software on a 2.40 GHz PC with I5-1135G7 and 16 GB RAM. In addition, each method was run 10 times independently to mitigate the influence of randomness. On this basis, short-term traffic flow prediction was implemented, and the corresponding forecasting results are given below.

Traffic Flow Prediction
Taking the case study based on dataset 1 as an example, the prediction process of the proposed method is briefly explained below. In this study, the sigmoid function was chosen as the hidden layer activation function of the ELM network, and its mathematical expression is shown as According to the above parameter settings, the ELM-AKDE-CKDE model was constructed for experiments. We manually adjusted the parameter value that denotes the input vector dimension of the ELM model mentioned above and compared the MRPEs of the prediction results under different dimensional values, as shown in Figure 5. When the dimension of the input vector was 9, the prediction result achieved the best MRPE. This means that each input vector of the prediction model was composed of nine consecutive data values in the original traffic flow data, and the corresponding output value is the predicted value of the tenth data point after the initial nine. Finally, we select the 9-dimensional input vector for the experiments to obtain the training residuals of the ELM network.
One-step ahead traffic flow prediction was adopted to illustrate the performance of the proposed method. After subseries reconstruction was achieved, the matrix B z in CKDE was determined by employing the AKDE and NRC methods. Finally, the estimated probability distribution of traffic flow for the ELM-AKDE-CKDE model prediction results was obtained. Analogously, the probabilistic prediction results could be obtained by applying other probabilistic estimation models, and the final one-step-ahead results, including the predictive PDF and single-value prediction, were generated and are shown in Figure 6. The above procedure was executed for the other training data, and the corresponding prediction results are provided. chosen as the hidden layer activation function of the ELM network, and its mathematical expression is shown as According to the above parameter settings, the ELM-AKDE-CKDE model was constructed for experiments. We manually adjusted the parameter value that denotes the input vector dimension of the ELM model mentioned above and compared the MRPEs of the prediction results under different dimensional values, as shown in Figure 5. When the dimension of the input vector was 9, the prediction result achieved the best MRPE. This means that each input vector of the prediction model was composed of nine consecutive data values in the original traffic flow data, and the corresponding output value is the predicted value of the tenth data point after the initial nine. Finally, we select the 9-dimensional input vector for the experiments to obtain the training residuals of the ELM network. One-step ahead traffic flow prediction was adopted to illustrate the performance of the proposed method. After subseries reconstruction was achieved, the matrix z B in CKDE was determined by employing the AKDE and NRC methods. Finally, the estimated probability distribution of traffic flow for the ELM-AKDE-CKDE model prediction results was obtained. Analogously, the probabilistic prediction results could be obtained by applying other probabilistic estimation models, and the final one-step-ahead results, including the predictive PDF and single-value prediction, were generated and are shown in Figure 6. The above procedure was executed for the other training data, and the corresponding prediction results are provided.

Prediction Results and Analysis
After obtaining the measured data and predicted data, the evaluation index values of the five models involved were calculated, as shown in Table 2. Compared to the other models, the decreased percentage of the proposed model in this paper is shown in Table   0 5

Prediction Results and Analysis
After obtaining the measured data and predicted data, the evaluation index values of the five models involved were calculated, as shown in Table 2. Compared to the other models, the decreased percentage of the proposed model in this paper is shown in Table 3. For the simplicity of the description, the proposed model is abbreviated as proposed. The results can be seen in Tables 2 and 3.  Table 3. Improved percentages of the other models by the proposed method (dataset 1).

MAE (%) MRPE (%) RMSE (%) RMSRE (%)
ARIMA In terms of the single models, two nonlinear models, including ELM and LSSVM, achieved better prediction than the other models. The reason could be that the nonlinear information in dataset 1 is more significant than the linear information. In other words, the ELM and LSSVM models focus on addressing the problem of nonlinear classification and prediction, and they thus outperformed the traditional statistical method ARIMA, but the accuracy was still low. Although CKDE takes the stochastic characteristics of the data into account, it performed the worst overall, and the reason could be that the linear and nonlinear components of the data were ignored when the individual CKDE model predicted traffic flow.
Compared to the single models, the results show that the proposed model produced overall improvements in the experiment, and the reason could be attributed to the fact that the combination of the ELM and AKDE-CKDE could not only extract multiple characteristics embedded in the data but also utilize the strengths of the individual models.
Based on the comparisons between the ELM-AKDE-CKDE and ELM-CKDE models, the improvements by the proposed method in terms of MAE, RMSE, MRPE, and RMSRE were 7.24%, 9.46%, 8.39%, and 3.04%, respectively. In addition, a similar comparison was conducted between the ELM and the proposed model. The results indicate that the AKDE-CKDE method surpassed CKDE in boosting forecasting accuracy; the reason may be that AKDE-CKDE is more effective in dealing with the data randomness of the residuals obtained by ELM. Because AKDE usually has the obvious advantages of being more adaptive, and the overall optimal bandwidth can be adjusted according to the sample density of the characteristic variable data.
In order to more intuitively compare the prediction results of the proposed model with other involved models, Figure 7 exhibits a comparison of the prediction performance of the proposed model and the other models on the forecasting data-set.
AKDE-CKDE is more effective in dealing with the data randomness of the residuals obtained by ELM. Because AKDE usually has the obvious advantages of being more adaptive, and the overall optimal bandwidth can be adjusted according to the sample density of the characteristic variable data.
In order to more intuitively compare the prediction results of the proposed model with other involved models, Figure 7 exhibits a comparison of the prediction performance of the proposed model and the other models on the forecasting data-set. As shown in Figure 7, the predicted values of the proposed model are closer to the true value than other models in the local interval, which indicates the superiority of the proposed model.

Additional Case
To further verify the applicability of the proposed model, another set of data with different periods and fluctuations (dataset 2) was used to prove that the ELM-AKDE-CKDE model can provide superior short-term forecasts of traffic flows. Analogously, the statistical results of dataset 2 are depicted in Figure 8 and Table 4. As shown in Figure 7, the predicted values of the proposed model are closer to the true value than other models in the local interval, which indicates the superiority of the proposed model.

Additional Case
To further verify the applicability of the proposed model, another set of data with different periods and fluctuations (dataset 2) was used to prove that the ELM-AKDE-CKDE model can provide superior short-term forecasts of traffic flows. Analogously, the statistical results of dataset 2 are depicted in Figure 8 and Table 4.  In comparing dataset 1 and dataset 2, we can intuitively observe from Figure 4 and Figure 8 that they have similar trends and show strong cyclical characteristics. Still, the average traffic flow of dataset 2 is slightly low, implying slightly less volatility. Analogously, the same experiment was conducted on dataset 2, and the performance is shown in Table 5. In order to visualize the difference between the test performances of different methods, we constructed a bar graph according to Table 5, as shown in Figure 9. The intuitive comparative results are exhibited in Figure 10.   In comparing dataset 1 and dataset 2, we can intuitively observe from Figures 4 and 8 that they have similar trends and show strong cyclical characteristics. Still, the average traffic flow of dataset 2 is slightly low, implying slightly less volatility. Analogously, the same experiment was conducted on dataset 2, and the performance is shown in Table 5. In order to visualize the difference between the test performances of different methods, we constructed a bar graph according to Table 5, as shown in Figure 9. The intuitive comparative results are exhibited in Figure 10.   In Tables 2 and 5, it can be seen that the MAE, RMSE, and RMSRE of the proposed method of dataset 2 are smaller than the proposed method of dataset 1, but the MRPE is greater. This may have been due to the existence of different data characteristics in the two datasets, such as the smaller average traffic flow in dataset 2. It is worth noting that  In Tables 2 and 5, it can be seen that the MAE, RMSE, and RMSRE of the proposed method of dataset 2 are smaller than the proposed method of dataset 1, but the MRPE is greater. This may have been due to the existence of different data characteristics in the two datasets, such as the smaller average traffic flow in dataset 2. It is worth noting that the MRPE of the ARIMA model is higher than that of the ELM-AKDE-CKDE model, but In Tables 2 and 5, it can be seen that the MAE, RMSE, and RMSRE of the proposed method of dataset 2 are smaller than the proposed method of dataset 1, but the MRPE is greater. This may have been due to the existence of different data characteristics in the two datasets, such as the smaller average traffic flow in dataset 2. It is worth noting that the MRPE of the ARIMA model is higher than that of the ELM-AKDE-CKDE model, but the RMSRE is lower than that of the proposed model. The reason could be attributed to the RMSRE indicator being more sensitive to outliers.
From the results presented in Table 5 and Figures 9 and 10, the conclusions are similar to the results of dataset 1. Namely, the proposed method outperformed the other five methods in the overall performance of the prediction task. Firstly, the AKDE-CKDE was better than CKDE in facilitating the prediction of stochastic traffic flow data. In addition, the hybrid model was superior to individual models because it could integrate the advantages of each model component. At the same time, the proposed model can well explain the nonlinear features and random features hidden in traffic flow data and has excellent adaptability.
To sum up, our ELM-AKDE-CKDE method performed the best for both datasets in terms of all metrics. This proves that the capabilities of the proposed method for modeling nonlinear and complex characteristic data are superior. The proposed model considered nonlinearity, nonstationarity, and randomness simultaneously, and thus achieved better prediction results than the single model that considered only linear or nonlinear. Our model thus further reduced the prediction errors and can be applied to predict short-term traffic flow accurately.

Conclusions
Since actual traffic flow sequences are affected by random factors, obtaining accurate traffic flow prediction results is often a significant challenge. In order to cope with these challenges, a novel hybrid prediction method based on ELM, AKDE, and CKDE was proposed and investigated in this study. It offers a way to improve the CKDE method by using the adaptive bandwidth method. To the best of our knowledge, the method was first applied to the field of short-term traffic flow prediction. The results prove that the AKDE-CKDE model has a more positive effect than CKDE in terms of improving prediction accuracy. Moreover, case studies based on the measured data illustrate that its performance was better than other models, including ARIMA, LSSVM, ELM-CKDE, ELM, and CKDE.
The novelty of this article is that the hybrid method can take into account nonlinear and stochastic characteristics embedded in traffic flow data and exhibit a satisfactory performance. Similar to other prediction methods, the proposed method also needs further improvement. It is worth noting that the method established in this paper does not decompose the traffic flow, and the short-term traffic flow prediction after decomposition is worth studying further. In addition, the characteristics of traffic flow data should be analyzed to provide a basis for selecting prediction models.