Short-Term Tra ﬃ c Flow Forecasting Based on Data-Driven Model

: Short-term tra ﬃ c ﬂow forecasting is the technical basis of the intelligent transportation system (ITS). Higher precision, short-term tra ﬃ c ﬂow forecasting plays an important role in alleviating road congestion and improving tra ﬃ c management e ﬃ ciency. In order to improve the accuracy of short-term tra ﬃ c ﬂow forecasting, an improved bird swarm optimizer (IBSA) is used to optimize the random parameters of the extreme learning machine (ELM). In addition, the improved bird swarm optimization extreme learning machine (IBSAELM) model is established to predict short-term tra ﬃ c ﬂow. The main researches in this paper are as follows: (1) The bird swarm optimizer (BSA) is prone to fall into the local optimum, so the distribution mechanism of the BSA optimizer is improved. The ﬁrst ﬁve percent of the particles with better ﬁtness values are selected as producers. The last ten percent of the particles with worse ﬁtness values are selected as beggars. (2) The one-day and two-day tra ﬃ c ﬂows are predicted by the support vector machine (SVM), particle swarm optimization support vector machine (PSOSVM), bird swarm optimization extreme learning machine (BSAELM) and IBSAELM models, respectively. (3) The prediction results of the models are evaluated. For the one-day tra ﬃ c ﬂow sequence, the mean absolute percentage error (MAPE) values of the IBSAELM model are smaller than the SVM, PSOSVM and BSAELM models, respectively. The experimental analysis results show that the IBSAELM model proposed in this study can meet the actual engineering requirements.


Introduction
As a new type of technology, the intelligent transportation system combines theories of the sensor, data communication, data processing technology, artificial intelligence and automatic control [1,2]. The Intelligent Transportation System (ITS) has become a research hotspot all over the world. Japan is vigorously developing smart cars and autonomous driving technologies, and wants to build a world-class transportation system by 2020. In 2012, the number of intelligent transportation projects in China exceeded 230, with a total investment of more than 10 million yuan. Additionally, with the increase of time, the number of intelligent transportation projects and total investments are increasing [3]. Short-term traffic flow forecasting is the basic content of a traffic intelligence system. By predicting traffic flow, it can provide real-time data support for this intelligent traffic system, which is beneficial to the system to more efficiently plan the route and realize traffic congestion warnings. Therefore, it can achieve the purpose of alleviating traffic congestion, and also reduce air pollution and improve driving efficiency [4,5].
The Intelligent Transportation System is a huge and complex system that changes with time. Yang et al. [6] proposed a neural network method to predict traffic flow. This method adopts an optimization framework of a traffic flow prediction model based on in-depth learning, and improves the efficiency of intelligent traffic management by improving the prediction accuracy of the model. Driving behavior analysis is an important part of the intelligent transportation system. He et al. [7] proposed a method to identify driving style and detect abnormal driving. The method uses phase space reconstruction technology to reconstruct the original vehicle data trajectory, and uses a convolution neural network to extract feature vectors. Traffic congestion prediction is a basic technology for implementing intelligent transportation systems. Zhang et al. [8] proposed a symmetric layer neural network model based on deep self-coding to predict traffic congestion. The capacity and efficiency of the transportation network are improved by predicting traffic congestion. With the gradual popularization of intelligent transportation projects, intelligent transportation systems are suffering from more complex, structured query language (SQL) injection attacks, and traditional detection methods have been unable to cope with this challenge. Li et al. [9] proposed a method for detecting SQL injection attacks. This method can cope with high-dimensional massive data, and solve the over-fitting problem caused by insufficient samples.
The main research contents of this thesis are as follows: (1) The performance of the BSA optimizer is improved, and the optimization performance of the IBSA optimizer is tested by the optimization functions; (2) the support vector machine (SVM), particle swarm optimization support vector machine (PSOSVM), bird swarm optimization extreme learning machine (BSAELM) and improved bird swarm optimization extreme learning machine (IBSAELM) are all used to predict the traffic flow for one day and two days, respectively. (3) The prediction results of the models are evaluated by using absolute error (AE), root mean square error (RMSE), mean absolute percentage error (MAPE) and decision coefficient (r2) indicators. This research has the following three contributions: (1) This study proposes the IBSA optimizer with better search ability. The IBSA optimizer can not only realize extreme learning machine (ELM) parameters optimization, but can also be applied to various fields; (2) This study proposes the IBSAELM model to forecast short-term traffic flow, and test experiments show that the IBSAELM model has a higher predictive performance; (3) By improving the short-term traffic flow prediction accuracy of the model to provide reliable data support for the intelligent transportation system.
The remaining chapters of this paper are organized as follows. Section 2 introduces the literatures about short-term traffic flow methods and introduces the method proposed in this study. Section 3 introduces a short-term traffic flow prediction model. Section 4 analyzes the prediction results of the IBSAELM model for short-term traffic flow. Section 5 gives the findings and conclusions of this study.

Literature Review
This section describes the literatures on short-term traffic flow methods and the method presented in this paper.

Short-term Traffic Flow Forecasting Methods
The rapid development of the economy has led to an increasing demand for transportation, and the increasing demand for transportation has caused certain negative impacts, such as widespread traffic congestion and environmental pollution problems [10]. Short-term traffic flow prediction mainly refers to forecasting traffic flow within 15 minutes [11]. Traffic flow forecasting is an important part of the intelligent transportation system. By providing real-time traffic data information for traffic drivers, it is more efficient to select the optimal route and reduce the loss of time and money due to traffic congestion.
At present, traffic flow prediction methods mainly include statistical theory prediction models, machine learning prediction models, and combined prediction models [12][13][14]. The autoregressive (AR) model describes the relationship between the current value and the historical value; the moving average (MA) model describes the error cumulant of the AR model, eliminating the random fluctuation of the prediction process; the ARMA model is the combination of the AR model and the MA model; the autoregressive integrated moving average (ARIMA) model is based on the autoregressive moving average (ARMA) model. Some time series are periodic and trendy. The periodicity of the series is caused by seasonal changes or intrinsic factors. When the time series is periodic, this series is a seasonal time series. Therefore, seasonal differences can be used to smooth this seasonal time series. This time series can be predicted by the seasonal autoregressive integrated moving average (SARIMA) model. The SARIMA model is a short-term forecasting model formed by adding the seasonal variation of the time series on the basis of the original ARIMA model. It has strong linear modeling capabilities. When there are obvious time trends and seasonal changes in the time series, compared with the ARIMA model, the prediction effect of the SARIMA model is better. Moreover, the SARIMA model does not need to make prior assumptions about the development model of the time series. Therefore, compared with many other prediction model methods, this SARIMA model has a higher prediction accuracy. The introduction of differential operations to convert non-stationary data into stationary data is currently the most widely used time series prediction model. Williams and Hoel [15] proposed a seasonal ARIMA process to forecast traffic flow. The author compares the seasonal ARIMA process model with three heuristic methods, and the test results show that the prediction error of the seasonal ARIMA process model is the smallest. Smith et al. [16] used a nonparametric regression method to forecast traffic flow and compared the prediction effect of this same nonparametric regression method with the seasonal ARIMA method. The results show that the method based on the combination of heuristic and nonparametric regression has a greater application prospect. Tselentis et al. [17] compared the prediction effect of a single time series model and combined a model based on Bayes and statistics on short-term traffic flow. Compared with the combined model, the cost of the single model is lower, but the prediction risk of the combined model is lower. Karlaftis, and Vlahogianni [18] compared the neural network model with the statistical method in the field of traffic flow prediction. Neural network has advantages for data missing and nonlinear problems. Vlahogianni and Karlaftis [19] analyzed the effects of smooth weather and non-stationary weather on traffic flow and proposed a nonlinear dynamic method. The differences caused by weather factors need to be considered in short-term traffic flow forecasting. The statistical regression model is based on a time series, and when the nonlinearity of the time series increases, the prediction error of the model increases. Compared with the statistical regression method, the machine learning model has a stronger nonlinear mapping ability. Li et al. [20] proposed a method based on deep feature learning to predict short-term traffic flow. In this method, the particle swarm optimizer is improved, and the multi-objective particle swarm optimizer is proposed. The improved particle swarm optimizer is used to optimize the hyperparameters of the deep belief network. Compared with ELM, the learning speed of the deep belief network is slower and more sensitive to the selection of parameters. Zhu et al. [21] found that the traffic flow of two adjacent intersections affected each other by calculating the correlation coefficient and used the RBF neural network to predict the traffic flow. Through comparison, it is found that the RBF neural network considering the influence of adjacent intersections has the higher prediction accuracy. The gradient descent method is used in the artificial neural network, which is easy to cause the network to fall into the local minimum. Vlahogianni et al. [22] proposed a neural network optimization strategy which is based upon a genetic algorithm. This optimization strategy can improve the performance of the neural network. The neural network with optimized structure is used to predict the traffic, and the better prediction effect is achieved. Wang et al. [23] proposed a short-term memory regression neural network to predict short-term traffic flow. In this method, the traffic flow sequence is first decomposed into the residual sequence and the trend sequence, and the two traffic sequences decomposed are predicted by the model respectively, and the prediction results are synthesized to form the final result. This method can better realize traffic flow prediction on multiple time scales. By decomposing the dimension of the time series, then using the model to predict separately, and finally synthesizing the prediction results, this method can improve the prediction accuracy, but also increase the calculation amount. Diao et al. [24] proposed a hybrid approach to predict traffic flow. This method takes into account many factors, such as time, frequency, origin, destination, and so on. The wavelet time transformation is used to decompose the traffic time series, and the Gaussian model is used to predict the specific components. This Gaussian model can fit adaptively, but its prediction efficiency is poor when dealing with high-dimensional problems.

The Method Proposed in This Study
The extreme learning machine is used in various fields due to its strong generalization and learning ability. Mariani et al. [25] applied ELM to the field of single cylinder engine pressure prediction. In order to improve the prediction accuracy of ELM, the biogeography-based optimizer (BBO) is improved, and the improved BBO optimizer is used to optimize the ELM prediction model. The experimental results show that the optimized ELM model can achieve an effective prediction of the average pressure. Li et al. [26] established the whale optimization extreme learning machine (WOA-ELM) model to evaluate the aging state of insulated gate transistors. In this method, the ELM model is optimized by the whale optimizer. The experimental results show that the WOA-ELM model is superior to the genetic algorithm optimization extreme learning machine (GA-ELM), the crow algorithm optimization extreme learning machine (CS-ELM) model and the dandelion algorithm optimization extreme learning machine (DA-ELM) model. Bui et al. [27] proposed the PSO-ELM model to predict flash floods. Firstly, the initial flood model is established by ELM, and then the PSO optimizer is used to optimize the ELM model. The analysis results show that the method is suitable for areas with frequent tropical typhoons, and an early warning of mountain torrents can be provided by forecasting. Kourehli et al. [28] used ELM to predict the location and depth of Timoshenko beam cracks at different motion masses. This method reduces the sensitivity to noise, and can still identify the severity of the crack in the case of noise.
In order to improve the prediction accuracy of the model, this paper improves the optimization performance of the BSA optimizer, and proposes the IBSAEM prediction model to forecast the short-term traffic flow. In the BSA optimizer, the positions of producers and beggars are randomly assigned, which causes birds with large food reserves in the flock to be randomly assigned to beggars, and birds with small food reserves are randomly assigned to beggars, thus reducing the speed of bird optimization. Aiming at the defect of the BSA optimizer, this study improves the particle allocation mechanism of the BSA optimizer. Firstly, the particles are sorted according to the fitness value. Then 5% of the particles with a good fitness value are selected as producers, 10% of the particles with poor fitness values are selected as beggars, and the remaining particles are selected randomly. In order to solve the problem that beggars are easy to fall into local optimal values in the BSA optimizer, this study improves the position rule of beggars to enhance the convergence ability of particles. Finally, the global and local convergence ability of the optimizer is enhanced by the nonlinear weight coefficient.

Bird Swarm Optimizer (BSA) and Improved Bird Swarm Optimizer (IBSA)
In recent years, bionic optimizers have been proposed, such as a bird flock optimizer and a chicken flock optimizer [29]. Meng et al. [30] proposed the Bird Swarm Optimizer (BSA) in 2016. BSA is a new type of optimizer. BSA imitates the foraging, migrating and alert behavior of birds.
Compared with particle swarm optimization (PSO) and differential evolution (DE), BSA has faster convergence speed and higher convergence accuracy. The BSA optimizer follows the following rules [31][32][33]. Rule 1: Each particle in a flock can randomly select alert behavior and foraging behavior. Rule 2: Information in the flock can be shared, that is, the best location information of each bird can be shared in the flock. Birds update the optimal position of the population according to the information of individual particles.
Rule 3: Birds with alert information are more likely to fly to the center of the group, and individuals with better fitness are more likely to fly to the center of the group. In the process of flying to the center of the group, every particle has competitive behavior. Rule 4: Birds migrate regularly. When birds migrate to their destination, each particle in the group chooses between the roles of producer and beggar.
Rule 5: The particles with the best fitness value are selected as producers, the particles with the lowest fitness value are selected as beggars, and the remaining particles are randomly selected between producers and beggars.
Rule 6: Producers dominate the flock of birds, and the beggars follow the producers to find food. Rule 1 is a process of random decision. Set the random number y, when y is less than rand(0, 1), the bird group particles choose the foraging behavior; and then when y is greater than rand(0, 1), the bird group particles choose the warning behavior. Rule 2 can be described by the following mathematical formulas: where pos j i (t) is the position of the t-th iteration of the bird group particles; W is the local learning coefficient; C is the global cognitive coefficient; rand is the random number between 0 and 1; k j i is the optimal individual position in the j-th dimensional space; bestpos represents the optimal position in the population.
The competitive behavior of the birds flying to the center of the population in Rule 3 is simulated by the following mathematical formulas: where S1 represents the indirect effect of the individual particles in the bird population flying towards the center of the flock; pos j mean represents the average of the j-th position; S2 indicates the direct impact of specific competition in the process of individual particles flying to the center of a flock; L represents the number of particles in the flock; pos f it i represents the fitness value of the i-th particle; pos f it u (u ∈ [1, L], u i) represents the fitness value of the u-th particle; sumpos f it represents the sum of the fitness values of the flock; ε represents an infinitesimal amount.
Simulate the positions of producers and beggars in rules 4, 5 and 6 using Equations (3) and (4): where Fol ∈ [0, 1] represents the probability that the beggar follows the producer for foraging. In the BSA optimizer, the producer is the main force to find food in the bird flock, which determines the global optimization ability of this same BSA optimizer. The beggar keeps the diversity of the bird flock, which determines the local optimization ability of BSA optimizer.
Beggars and producers are basically randomly selected, and there are some problems in such a distribution mechanism. Because of random distribution, particles with higher food reserves may be selected as beggars, and particles with lower food reserves may be selected as producers, which will increase the number of iterations of the BSA optimizer and increase the computational cost of the BSA optimizer.
In this study, the BSA optimizer is improved to solve the above problems. When the birds migrate to their destination, the bird population particles are classified. The first 5% of the particles with high food reserves are selected as producers. The last 10% of the particles with lower food reserves are selected as beggars. The remaining particles randomly become producers and beggars. In this way, the population diversity of the BSA optimizer can be guaranteed and the optimization speed of the BSA optimizer can be accelerated. Beggars have poor productivity in the BSA optimizer and are prone to fall into the minimum in the process of optimization. In order to improve the productive capacity of beggars, the part of learning from the best particle of birds is added to the position of beggars. In addition, in order to accelerate the convergence speed of the BSA optimizer, the nonlinear weight coefficient d is introduced into the producer update formula.
The above behavior is described by Equation (5): where d(d max = 0.9, d min = 0.2) is the weight coefficient; t is the current iteration number; T is the maximum number of iterations; Ln is the adaptive learning coefficient; a is a random number. By enhancing the convergence ability of the IBSA optimizer, the optimal parameters of the ELM model can be better mined, thereby improving the prediction accuracy of the model for traffic flow.

Basic Principles of Extreme Learning Machine
The single hidden layer feedforward neural network (SLFN) has a simple structure and strong learning ability, so SLFN has been widely used in many fields. The learning method adopted by most SLFNs is the gradient descent method. However, the gradient descent method requires multiple iterations, resulting in high computational cost and low learning efficiency, which has a great influence on the convergence speed of the SLFN. A new predictive model, the Extreme Learning Machine (ELM), is proposed for the shortcomings of SLFN. Compared with traditional SLFN, ELM learns faster and converges faster [25,26].
The network topology of ELM is similar to that of SLFN. It adopts a unidirectional three-layer structure, each layer consists of multiple neurons and there is no connection between the same layer of neurons. The first layer of the network topology is the input layer, the last layer is the output layer, and the middle layer is the hidden layer. Let the number of three layers of ELM neurons be o, p and q [28]. The connection weights between the first layer and the middle layer is e: where e i,j (i = 1, 2, · · · , p; j = 1, 2, · · · , o) is the connection weight between the i-th neuron of the first layer and the j-th neuron of the intermediate layer.
The connection weight between the second and third layers is r [34,35]: where r i,j (i = 1, 2, · · · , p; j = 1, 2, · · · , q) is the connection weight between the i-th neuron of the middle layer and the j-th neuron of the last layer. Let the threshold between the middle layer and the last layer be v = [v 1 , v 2 , · · · , v o ] T o×1 . The input and output matrices of b sample sets are M and N, respectively.
where m is the input quantity and n is the output quantity. The activation function enables the ELM to solve nonlinear problems. The activation function used in this paper is the Sigmoid function: Thus the output of the ELM network is obtained.
The ELM network output H is the product of the hidden layer output matrix G and the connection weight r: where H T is the transposed matrix of H. When the activation function Z of the ELM is infinitely divisible, it is not necessary to adjust all of the parameters of the ELM. The weight e and the threshold v in the ELM model can be randomly initialized, and remain unchanged during the training. The weight r between the intermediate layer and the output layer can be obtained by the following least squares solution.
where r = G + H T can be obtained by finding the least squares solution; G + is the generalized inverse matrix.
In this study, the ELM model is used as the basic model of traffic flow prediction. Through the above analysis of the principle of this ELM model, it is found that the super parameters of the ELM model are randomly selected. The choice of super parameters has a great influence on the prediction effect of traffic flow. Therefore, this paper uses the IBSA optimizer to optimize the super parameters of the ELM model to achieve a good effect on traffic flow prediction.

IBSA Optimization Effect Analysis
In order to analyze the optimization performance of the IBSA optimizer, the BSA and IBSA optimizers are used to optimize the standard functions, and the optimization performance of the two optimizers is compared. In order to make the optimization results more convincing, two optimizers respectively search for the standard function 50 times, and the optimization dimension is set to 30. The optimization performance test of the two optimizers uses a unified simulation platform.
The BSA and IBSA optimizer parameters are set as follows: the number of populations L is 20, the number of cycles T is 500, the migration frequency Q is 10, the local learning coefficient W and the global cognitive coefficient S are 2. The standard functions are shown in Table 1.

Equation
Minimum Optimizing Boundary The optimization results of the BSA and IBSA optimizers are as follows. Table 2 shows the optimization results of the BSA and IBSA optimizers from four aspects: the optimal value, the occurrence times of optimal value for 50 iterations, the running time of 50 iterations and the average running time of each iteration. For F1 and F2, the BSA optimizer cannot find the optimal value of 0, and the local optimal values are 1.2394e-284 and 3.9898e-143, respectively. For the BSA optimizer, the number of times to find the local optimal value are 9 times and 10 times in the 50 iterations, respectively. For F1, the IBSA optimizer converges to the optimal value 0 and 28 times to the global optimal value in 50 iterations. The average convergence accuracy of IBSA optimizer is 7.6166e-298, and the search accuracy is higher than that of BSA optimizer. For F2, the optimal convergence value of the IBSA optimizer is 1.9202e-180, and the average convergence value is 5.8248e-155. The search accuracy of the IBSA optimizer is also higher than that of the BSA optimizer.
For F3 and F4, the BSA optimizer and IBSA optimizer converge to global optimal value 0, and both optimizers converge to global optimal value 0 in each iteration process. For the four optimization functions, the IBSA optimizer consumes 1.5756 s, 3.9936 s, 4.1652 s and 1.2607 s less time than the BSA optimizer in 50 iterations. Similarly, the running time of each iteration of the IBSA optimizer are 0.0315 s, 0.0799 s, 0.0832 s and 0.0399 s less than that of BSA optimizer, respectively.
Through the analysis of the optimization data of both BSA and IBSA optimizers, it is found that IBSA has better optimization accuracy and speed than the BSA optimizer. By improving the distribution mechanism of birds, the searching ability of the whole bird group is improved, and the nonlinear weight coefficient makes the bird group approach the optimal value faster. For four optimization functions, the optimization curves of BSA and IBSA optimizers are shown in Figure 1. From Figures 1 (c) and (d), we can see that both the BSA optimizer and IBSA optimizer converge to the global optimal value 0, but the IBSA optimizer searches the optimal value at the 21th and 34th, respectively, and the BSA optimizer searches the optimal value at the 25th and 44th, respectively. By From Figure 1c,d, we can see that both the BSA optimizer and IBSA optimizer converge to the global optimal value 0, but the IBSA optimizer searches the optimal value at the 21th and 34th, respectively, and the BSA optimizer searches the optimal value at the 25th and 44th, respectively. By comparing the convergence curves in the graph, it is found that the search speed of the IBSA optimizer is faster than that of the BSA optimizer.
Through the above analysis, it is found that compared with the BSA optimizer, the convergence accuracy and convergence time of the IBSA optimizer are significantly improved. Therefore, the IBSA optimizer can achieve better optimization of the model hyperparameters, thereby improving the model's prediction effect on traffic flow.

IBSAELM Short-term Traffic Flow Forecasting Model
In the training process of the ELM model, the connection weight between the input layer and the hidden layer and the threshold between the hidden layer and the output layer are randomly determined. During the training process, the number of neurons in the hidden layer of the ELM is increased, resulting in an increase in the computational cost of the ELM model. In order to reduce the influence of random parameters on the prediction effect of ELM model, this study uses IBSA to optimize the random parameters of the ELM model to improve the prediction effect of this ELM model.
The specific process of the IBSAELM model for predicting short-term traffic flow is as follows: (1) Divide the short-term traffic flow samples into model test samples and model training samples.
(3) Set the parameters of the BSA optimizer, such as the number of cycles T, the number of birds group L, and so on.
(4) The bird group position is updated according to each rule, and the fitness value is calculated.

Short-term Traffic Flow Forecasting Based on IBSAELM Model
The traffic flow data in this study were collected from an intersection in Tianjin, China. The traffic flow of the intersection was recorded every 15 minutes. For five days, 480 traffic flow data of the intersection were collected. The 5-day traffic flow sequence collected is shown in Figure 3.

Short-term Traffic Flow Forecasting Based on IBSAELM Model
The traffic flow data in this study were collected from an intersection in Tianjin, China. The traffic flow of the intersection was recorded every 15 minutes. For five days, 480 traffic flow data of the intersection were collected. The 5-day traffic flow sequence collected is shown in Figure 3.

Short-term Traffic Flow Forecasting Based on IBSAELM Model
The traffic flow data in this study were collected from an intersection in Tianjin, China. The traffic flow of the intersection was recorded every 15 minutes. For five days, 480 traffic flow data of the intersection were collected. The 5-day traffic flow sequence collected is shown in Figure 3.  As can be seen from Figure 3, the traffic flow sequence exhibits a certain periodicity. Traffic flow is small before 6 o'clock in the morning, and traffic flow increases significantly after 6 o'clock. Since the traffic flow at a certain time at the intersection is related to the traffic flow in the first few hours of the time, the traffic flow data for the first five moments of this moment is used as the input of the prediction model.
In this study, two sets of prediction simulation experiments were carried out. Firstly, the traffic flow data of the first 4 days was used as training samples for the IBSAELM model, and the last day traffic flow data was used as test samples for the IBSAELM prediction model for testing; secondly, the traffic flow data of the first 3 days was used as training samples for the IBSAELM model, while the last two days of the traffic flow data were used as test samples for the IBSAELM prediction model.
In order to evaluate the prediction results of the model, the absolute error (AE), mean absolute percentage error (MAPE), root mean square error (RMSE) and decision coefficient (r2) are used as evaluation indexes of the IBSAELM model.
where x * is the fitting value of the prediction model; m is the number of traffic flow data; x is the actual value of the sample. AE, MAPE and RMSE are mainly used to evaluate the gap between the predicted value and the actual value of the prediction model. r2(r2 ∈ [0, 1]) is mainly used to evaluate the fitting degree between the predicted value and the actual value. When the r2 value tends to 1, it shows that the predicted value of the model fits well with the actual value; when the r2 value approaches 0, it shows that the fitting effect of the predicted value and the real value is poor.
Firstly, the IBSAELM, BSAELM, SVM and PSOSVM models are used to predict traffic flow on the fifth day, respectively. The results of each model are shown in Figure 4. of the time, the traffic flow data for the first five moments of this moment is used as the input of the prediction model.
In this study, two sets of prediction simulation experiments were carried out. Firstly, the traffic flow data of the first 4 days was used as training samples for the IBSAELM model, and the last day traffic flow data was used as test samples for the IBSAELM prediction model for testing; secondly, the traffic flow data of the first 3 days was used as training samples for the IBSAELM model, while the last two days of the traffic flow data were used as test samples for the IBSAELM prediction model.
In order to evaluate the prediction results of the model, the absolute error (AE), mean absolute percentage error (MAPE), root mean square error (RMSE) and decision coefficient (r2) are used as evaluation indexes of the IBSAELM model.
Where * x is the fitting value of the prediction model; m is the number of traffic flow data; x is the actual value of the sample.
AE, MAPE and RMSE are mainly used to evaluate the gap between the predicted value and the actual value of the prediction model. 2( 2 [0,1]) r r ∈ is mainly used to evaluate the fitting degree between the predicted value and the actual value. When the r2 value tends to 1, it shows that the predicted value of the model fits well with the actual value; when the r2 value approaches 0, it shows that the fitting effect of the predicted value and the real value is poor.
Firstly, the IBSAELM, BSAELM, SVM and PSOSVM models are used to predict traffic flow on the fifth day, respectively. The results of each model are shown in Figure 4. The prediction curves of the five prediction models in Figure 4 can basically reflect the trend of one-day traffic flow series, but the prediction results of the BSAELM model and IBSAELM model are closer to the real trend of traffic flow. In order to analyze the errors of these five prediction models for one-day traffic flow prediction, the absolute error curves of each model are given in Figure 5, and the evaluation results of each model are given in Table 3.  The prediction curves of the five prediction models in Figure 4 can basically reflect the trend of one-day traffic flow series, but the prediction results of the BSAELM model and IBSAELM model are closer to the real trend of traffic flow. In order to analyze the errors of these five prediction models for one-day traffic flow prediction, the absolute error curves of each model are given in Figure 5, and the evaluation results of each model are given in Table 3.  Compared with the error curves in Figure 5, it is found that the prediction errors of the SVM model and PSOSVM model are higher than those of the BSAELM model and IBSAELM model. In Table 3 Secondly, the traffic flow data of the first three days are used as training samples of the IBSAELM, BSAELM, SVM, and PSOSVM prediction models, and the traffic flow data of 192 moments in the last two days are used as test samples. The predicted results of the IBSAELM, BSAELM, SVM and PSOSVM models are shown in Figure 6.   Compared with the error curves in Figure 5, it is found that the prediction errors of the SVM model and PSOSVM model are higher than those of the BSAELM model and IBSAELM model. In Table 3  Four prediction models can still reflect the changing trend of the actual value curve, and with the increase of traffic flow samples, the predicted values of the four models do not fluctuate significantly, which shows that the four prediction models have high prediction stability. The prediction error curves of the IBSAELM, BSAELM, SVM and PSO SVM models are shown in Figure  7. Table 4 gives the evaluation results of four evaluation indicators for the prediction effect of each model.  By analyzing the error curves in Figures 7 and 5, it is found that the fluctuation range of the error curves in Figure 7 is increased. Comparing the data in Table 3, the AE interval, MAPE and RMSE of the four models in Table 4   Four prediction models can still reflect the changing trend of the actual value curve, and with the increase of traffic flow samples, the predicted values of the four models do not fluctuate significantly, which shows that the four prediction models have high prediction stability. The prediction error curves of the IBSAELM, BSAELM, SVM and PSO SVM models are shown in Figure 7. Table 4 gives the evaluation results of four evaluation indicators for the prediction effect of each model. Four prediction models can still reflect the changing trend of the actual value curve, and with the increase of traffic flow samples, the predicted values of the four models do not fluctuate significantly, which shows that the four prediction models have high prediction stability. The prediction error curves of the IBSAELM, BSAELM, SVM and PSO SVM models are shown in Figure  7. Table 4 gives the evaluation results of four evaluation indicators for the prediction effect of each model.  By analyzing the error curves in Figures 7 and 5, it is found that the fluctuation range of the error curves in Figure 7 is increased. Comparing the data in Table 3, the AE interval, MAPE and RMSE of the four models in Table 4    By analyzing the error curves in Figures 5 and 7, it is found that the fluctuation range of the error curves in Figure 7 is increased. Comparing the data in Table 3, the AE interval, MAPE and RMSE of the four models in Table 4 have increased significantly.
The MAPE values of the IBSAELM, BSAELM, SVM and PSOSVM models increase by 12.2743%, 16.6811%, 8.1935% and17.9991%, respectively; the RMSE of the IBSAELM, BSAELM, SVM and PSOSVM models increases by 1.3085, 0.5018, 7.3959 and 7.2101, respectively. As the number of traffic flow samples increases, the prediction errors of the four models increase, and the fitting effect of the prediction model is also deteriorating. Compared with r2 in Table 3, the r2 of the BSAELM and IBSELM models are reduced by 0.06 and 0.0498, respectively.
Comprehensive analysis of the data in Table 4 shows that the prediction results of the IBSAELM model are better than those of the other three models. The AE interval of the IBSAELM model is [−44.7104, 40.1863], controlled in interval [−50, 50], and the RMSE of IBSAELM model is 8.806, 7.9287 and 2.1937 smaller than that of SVM, PSOSVM and BSAELM respectively. For two-day traffic flow, the r2 value of IBSAELM is 0.9295 higher than that of the other three prediction models, which shows that IBSAELM model is better than the other three models in explaining traffic flow sequence.

Conclusions
With the rapid development of economy, people's demand for automobiles is increasing. Too many automobiles are the root cause of urban traffic congestion. By forecasting the short-term traffic flow, traffic control and efficient vehicle routing planning can be achieved, thus alleviating traffic congestion and reducing the waste of resources caused by traffic congestion. In this study, the IBSA optimizer is proposed to optimize the stochastic parameters of the ELM model, and the IBSAELM model is proposed to predict one-day traffic flow and two-day traffic flow, respectively. The prediction results of the model are evaluated by AE, MAPE, RMSE and r2, respectively.
The main findings of this study are as follows: (1) In order to improve the optimization performance of the BSA optimizer, the BSA optimizer was improved in this study. This study has three contributions: (1) The IBSA optimizer proposed in this study has better optimization effect; (2) The IBSAELM model is proposed to predict short-term traffic flow, and the feasibility of the model is verified by simulation experiments; (3) The accurate prediction of short-term traffic flow lays a solid foundation for the realization of intelligent transportation system. It has positive significance to alleviate urban traffic congestion and improve traffic management efficiency.
Although this study proposes a traffic flow prediction method based on machine learning theory. But time series statistical models (such as ARIMA and SARIMA models) are not studied in depth. In future research, the author will use intelligent optimizer to optimize the super parameters of SARIMA model to achieve the best parameter setting of SARIMA model. In the future, the improved SARIMA model will be applied to the field of traffic flow prediction.
Author Contributions: K.-P.L. designed the experiment and got the experimental data; S.-q.Z. analyzed the experimental data and wrote the paper. All authors contributed to discussing and revising the manuscript. All authors have read and agreed to the published version of the manuscript.

Acknowledgments:
The authors would like to thank the editors and reviewers for their constructive comments on this study.

Conflicts of Interest:
The authors declare no conflict of interest.