SSA-ELM: A Hybrid Learning Model for Short-Term Traffic Flow Forecasting

: Nowadays, accurate and efficient short-term traffic flow forecasting plays a critical role in intelligent transportation systems (ITS). However, due to the fact that traffic flow is susceptible to factors such as weather and road conditions, traffic flow data tend to exhibit dynamic uncertainty and nonlinearity, making the construction of a robust and reliable forecasting model still a challenging task. Aiming at this nonlinear and complex traffic flow forecasting problem, this paper constructs a short-term traffic flow forecasting hybrid optimization model, SSA-ELM, based on extreme learning machine by embedding the sparrow search algorithm in order to solve the above problem. Extreme learning machine has been widely used in short-term traffic flow forecasting due to its characteristics such as low computational complexity and fast learning speed. By using the sparrow search algorithm to optimize the input weight values and hidden layer deviations in the extreme learning machine, the sparrow search algorithm is utilized to search for the global optimal solution while taking into account the original characteristics of the extreme learning machine, so that the model improves stability while increasing prediction accuracy. Experimental results on the Amsterdam A10 road traffic flow dataset show that the traffic flow forecasting model proposed in this paper has higher forecasting accuracy and stability, revealing the potential of hybrid optimization models in the field of short-term traffic flow forecasting.


Introduction
Short-term traffic flow forecasting is based on real-time and historical data collected by various sensors, including cameras and radars.With the widespread use of traffic sensors and the development of emerging traffic sensor technologies, the amount of traffic flow data has grown significantly.Data-driven transportation planning and regulation is starting to become mainstream in the era of big data transportation.Timely, accurate, and credible traffic flow information has a significant role in reducing traffic jams, improving the efficiency of the transportation system, and helping people to make travel decisions [1].It has now caught the close attention of public institutions, commercial organizations, and individual travelers.Short-term traffic flow forecasting is performed to provide such traffic flow information.The need for short-term traffic flow prediction also becomes greater and more urgent with the rapid technological advancement and popular application of intelligent transportation systems.Short-term traffic flow forecasting plays an irreplaceable role in and is an important part of intelligent transportation systems, complex mass transit systems, and normalized passenger data systems [2].As a result, traffic flow forecasting has drawn a large number of researchers to its research work.
A number of methods and approaches can be implemented for traffic flow forecasting, with a general classification into parametric models and nonparametric ones [3].The first type is the parametric model, which usually requires some parameters for learning and estimation.These include moving average [4], the autoregressive integrated moving average (ARIMA) model [5,6], Kalman filter [7], the multivariate time series model [8], spectral analysis [9,10], etc.The second type is a non-parametric model, which explores the implicit relationship between the input value and the true value by being data-driven only.Common non-parametric models include support vector regression (SVR) [11], the knearest neighbor model [12], extreme learning machine (ELM) [13], multi-layer perceptron (MLP) [14], artificial neural network (ANN) [15], and so on.
Nowadays, deep learning techniques are developing rapidly and becoming increasingly sophisticated, and deep neural networks are gradually being applied to model complex non-linear traffic flows [16].A number of researchers apply deep learning to short-term traffic flow forecasting.For instance, an attention mechanism is introduced to graph convolution networks for traffic flow forecasting by Guo et al. [17].A deep irregular convolutional residual LSTM network is presented by Du et al. [18] for urban traffic passenger flow prediction.Zhang et al. [19] employ a spatial-temporal graph diffusion network for traffic flow forecasting.Li et al. propose a deep belief network optimized by the multiobjective particle swarm algorithm for day-ahead traffic flow forecasting.Wu et al. [20] propose a DNN-based traffic flow prediction model (DNN-BTF), which makes full use of the weekly/daily periodicity and spatial-temporal characteristics of traffic flow to improve the prediction accuracy.The researchers found that these models have flexibility and parallelism in traffic flow prediction [21].Nevertheless, the large number of parameters in a deep neural networks requires large amounts of high-quality traffic flow data for training, which is often not readily available.All parameters of the network need to be optimized iteratively during training by a slow gradient descent algorithm based on empirical and risk minimization principles [22,23].Based on this, deep neural networks with well-designed optimization algorithms, yet which converge easily to local minima [24], are needed if fast learning speeds are desired.As a result, many researchers devote themselves to the study of heuristic algorithms.This type of algorithm enables the more rapid and accurate tuning of the parameters of data-driven learning models.For instance, if the activation functions of a hidden layer are infinitely differentiable, then it is feasible to randomly initialize the weights of the hidden layer, which has been proved by Huang et al. [25].Based on the findings, they propose an efficient prediction algorithm based on the construction of feed-forward neural networks, known as the extreme learning machine (ELM).In the ELM algorithm, the input weights and hidden layer biases of the network can be generated by random initialization, while the output weight matrix is obtained by the Moore-Penrose (MP) generalized inverse calculation.Thereby, this learning method makes the learning speed of ELM markedly better than that of the gradient descent algorithm [26].Thus, in order to solve this problem, stopping criteria [27] are often introduced into the algorithm, such as the early stopping method, weight attenuation, and so on.In traditional gradient-based learning algorithms, if the learning rate is set too small, the convergence rate of the learning algorithm will be slow.At the same time, if the learning rate is set too high, the algorithm can become divergent and unstable.Several of the above-mentioned problems caused by gradient-based learning methods such as stopping criteria, local minima, and learning rate are avoided or solved in the ELM algorithm [28,29].
ELM has a wide range of applications in traffic flow forecasting [30].To improve the accuracy of model predictions, increasing the number of hidden layer nodes is one of the most common methods.However, at the same time, this method tends to make the overfitting of ELM more serious [31].Therefore, in this regard, the classic ELM model can be optimized and then applied to complex nonlinear traffic flow forecasting [32].
In this study, we reformulate the extreme learning machine optimized by the sparrow search algorithm to solve the problem of traffic flow forecasting.The basic idea of this algorithm is to first search for more discriminative input weights and hidden layer biases for the ELM through the sparrow search algorithm (SSA) [33].The SSA has also been applied to optimize other learning models.For example, Yan et al. [34] propose optimizing the BP neural network algorithm with the sparrow search algorithm to process coal mine water source data.Yang et al. introduce the sparrow search algorithm to optimize the particle swarm optimization for software defects prediction.In this way, our algorithm can also achieve more accurate prediction performance, while alleviating the problem of over-fitting due to increasing the number of parameters.The main contributions of the SSA-ELM model proposed in this paper in short-term traffic flow prediction are as follows: • First, we think from the perspective of metamodel and optimize the performance of the extreme learning machine by using the sparrow search algorithm without increasing the complexity of the model.

•
Second, through the end-to-end mechanism of ELM, we propose an extreme learning machine optimized by the sparrow search algorithm model to effectively predict short-term traffic flow.• Third, we show that the proposed model outperforms the advanced models in forecasting through sufficient experiments on four benchmark datasets.The model preserves the ELM characteristics and releases its potential in short-term traffic flow forecasting.
The rest of this paper is organized as follows.The second part is the methodology, and the third part is the empirical study on four benchmark datasets collected from four freeways in Amsterdam, the Netherlands.The last part is the conclusion.

Methodology
In this section, we first present a short-term traffic flow forecasting model by means of an extreme learning machine.Then, we will optimize the performance of the forecasting model using the sparrow search algorithm.

Extreme Learning Machine
In a traditional feedforward neural network, the error tends to be backpropagated through the gradient descent algorithm to update the model parameters, including weights and thresholds, during the training process.After continuous training, the sum of squared errors gradually decreases to a certain level, at which point the output of the neural network increasingly approximates the expected output.However, this network is accompanied by the disadvantages of slow learning speed and high computational complexity.Unlike traditional feedforward neural networks, the extreme learning machine (ELM) proposed by Huang et al. is a single-layer feedforward neural network machine learning algorithm, which randomly selects the input layer weights and hidden layer biases, and the output layer weights can be calculated.This algorithm has faster learning speed and higher generalization ability, as well as good fitting ability and low computational complexity.The structure of the three-layer ELM network is illustrated in Figure 1.
The following is the introduction to the principle of extreme learning machine.First, suppose that there are N training samples {(x i , t i )} N i=1 , where x i = [x i1 , x i2 , . . . ,x in ] ⊤ ∈ R n and t i = [t i1 , t i2 , . . . ,t im ] ⊤ ∈ R m .This feedforward neural network [35,36] with k hidden layer nodes and activation function g(x) is mathematically presented as follows: where H = h ij i=1,...,N,j=1,...,k represents the output matrix of the hidden layer.h ij = g α ⊤ j x i + b j denotes the output of the jth hidden node in regard to x i .α j = α j1 , α j2 , . . ., α jn ⊤ represents the weight vector, which is used to link the input nodes to the jth hidden neuron.b j denotes the bias of the jth hidden neuron.β = [β 1 , β 2 , . . . ,β k ] ⊤ represents the matrix of output weights, where β j = β j1 , β j2 , . . ., β jm ⊤ , j = 1, . . ., k is the weight vector, which can connect the jth hidden layer neuron to the output nodes.T = [t 1 , t 2 , . . . ,t N ] ⊤ on the right-hand side of Equation (1) denotes the target matrix.The steps of the ELM algorithm can be summarized as follows.First, the number of hidden layer neurons and the activation function g(x) are determined.Then, the input weights and the hidden layer biases are randomly generated.Afterwards, the matrix H is calculated from the activation function g(x).Finally, Equation ( 1) is transformed into the least squares solution β for solving the linear system [37,38].
where H † in Equation ( 2) denotes the Moore-Penrose (MP) generalized inverse of the matrix H.

Sparrow Search Algorithm
In nature, sparrows are characterized by a brown-black upper body and a short, thick, and conical beak.There are quite a variety of them.They mostly live in groups.In the fall, it is easy to see a large colony of sparrows flying through the air.In winter, they mostly live in small groups of a dozen to dozens of individuals.As gregarious birds, sparrows are very intelligent and have good memory skills.At the same time, there is a clear division of labour within the sparrow population.They are generally divided into producers and scroungers.The former are responsible for finding food in the population and guiding the population in the area and direction of foraging.The latter use the information provided by the former to obtain food in the population.When a sparrow in the middle of the population finds a danger, that is, a predator, they will sound an alarm in time to alert other sparrows.After hearing the sound, the entire population will immediately flee the dangerous area and fly to a safe area for food.
The sparrow search algorithm (SSA) is an optimization algorithm obtained by modelling the predatory and anti-predatory behaviour of sparrows [33].In the mathematical model of the SSA, there are some corresponding rules.The energy reserve in the model corresponds to the fitness value of the individual sparrow.The energy reserve of the producer is usually high, while when the scrounger has low energy, it tends to fly elsewhere in search of food.When the sparrow finds a danger, a predator, it signals.When the alarm value is greater than the safety value, the producer will lead the scrounger to a safe area to feed, while the sparrows in the middle of the population will randomly wander away to keep a closer distance.The identities of producer and scrounger are dynamically interchangeable, with the total proportion of them remaining constant.The scrounger is always able to find the producer, who can find good food.
In the SSA, the producers with high fitness values have a higher preference for food during the search process.In addition, based on the characteristics of producers and scroungers, discoverers have a wider foraging range.In the iterative process, the location update of the producer is described as follows: where X ij denotes the position information of the ith sparrow in the jth dimension.t represents the current number of iterations.iter max indicates the maximum number of iterations of the algorithm and is a constant.α ∈ (0, 1] denotes a random number.Q is a random number that also observes a normal distribution.A 1 × d matrix is represented by L, and each element of the matrix is 1.R 2 and ST represent the alarm value and safety threshold, respectively, where When R 2 < ST, it means that no predators are found temporarily in the process of foraging, and the producer can search extensively at this time.If R 2 ⩾ ST, this means that there are sparrows in the population who have found the predator and alert other sparrows.The producer will lead other sparrows to evacuate the dangerous area and fly to a safe area for food.
While foraging, some of the scroungers will keep an eye on the producers.When they notice that the producer has found a good meal, they leave their position and fight for it.If they win, they get the producer's food.The update formula of the scrounger's position is as follows: where X p denotes the current optimum position for the producer, and X worst indicates the current global worst position.A represents a 1 × d matrix in which the value of each element is randomly set to 1 or −1, and A + = A T AA T −1 .When i > n/2, this indicates that the ith scrounger with a lower fitness value at this time is not getting food, feels hungry, and needs to fly elsewhere to forage for a boost of energy.
The number of sparrows aware of the danger is usually set at 10% to 20% of the total population.The initial positions of the danger-aware sparrows in the population are randomly generated, and the mathematical formula for their positions is as follows: where X best refers to the current global optimal position.β denotes a parameter responsible for controlling the step size and is a random number that obeys a normal distribution with a mean of 0 and a variance of 1. K represents a random number whose value ranges from −1 to 1.At the same time, K is used as a parameter to indicate the direction of the sparrow's movement and to control the step size.f i , f g , and f w denote the current sparrow's fitness value, the global optimum, and the worst fitness value, respectively.ε is the minimum constant to avoid the denominator being equal to zero.Simply understood, when f i > f g , it indicates that the sparrow's current position is vulnerable to attack by its natural enemies [33].Meanwhile, when f i = f g , it suggests that the sparrow that has realized the danger approaches and escapes to other sparrows.X best refers to the best position in the sparrow population which is simultaneously quite safe.

SSA-ELM for Traffic Flow Forecasting
The input weight matrix and the hidden layer deviation of the ELM are stochastically generated.If both are zero, the hidden layer nodes are invalid.For traffic flow prediction tasks, a large number of hidden nodes is essential to achieve the expected performance of the ELM.However, the forecasting capability of the model is susceptible to the random determination of the parameters associated with the hidden layer.The generalization capability of the ELM is not sufficient to handle non-linear and complex traffic flow data that did not occur during the training process.
Given this limitation, we propose to introduce a sparrow search algorithm to find more appropriate network parameters for the ELM traffic flow prediction model.We develop a comprehensive SSA framework to iteratively generate the input weight matrix and hidden layer bias for the ELM instead of random assignment in the initial phase and no adjustment in the later learning phase, which we refer to as SSA-ELM in the following.Then, we iteratively update the parameters and verify the performance of the model to optimize the parameters of the ELM, see Algorithm 1.
An effective optimization technique, the sparrow search algorithm, simulates the foraging and anti-predatory behaviour of sparrows.Compared to other state-of-the-art algorithms, SSA can provide highly competitive results in terms of search accuracy, convergence speed, and stability.In addition, SSA has high performance in different search spaces.It can be seen that SSA has a good ability to explore the global optimal potential region, effectively avoiding the local optimum problem.14 end 15 Obtain the optimal input weight and hidden layer bias of ELM; 16 Obtain the final neural network for forecasting.
As shown in Figure 2, the workflow of our proposed SSA-ELM model can be seen.First, we establish the ELM network and set the number of hidden neurons and activation function.Then, we initialize the sparrows by setting the parameters such as the number of sparrows, the maximum iterations, the alarm value, and so on.After that, we calculate the initial fitness value of each sparrow and obtain the global optimal fitness value and location.Next, we update the locations of the producers, the scroungers, and the sparrows, which are aware of the danger.At the same time, we update the fitness of all sparrows, the global optimal fitness, and the corresponding location.We repeat the above operations in a loop until the maximum number of iterations is reached.Finally, the global optimal sparrow's location corresponding to the input weight and hidden layer bias term of the ELM are obtained.The most optimized ELM model has been developed for forecasting traffic flow.

Experiment
In this section, short-term traffic flow forecasting experiments are conducted using traffic volumes collected for four intersections on the A10 ring road in Amsterdam.Then, the performance of the proposed SSA-ELM prediction model is evaluated.

Data Description
The datasets used in the paper were collected by Wang et al. [39] on four motorways in Amsterdam, the Netherlands.The motorway names are A1, A2, A4, and A8, as shown in Figure 3.The measurement location for the data was set near the junction of the ring road A10.The data are 1 min average flow data collected by MONICA sensors.The data were collected between 20 May 2010 and 24 June 2010.The raw data consist of a total of 5 weeks of measurements.A brief description of these four motorways with basic information is as follows.
The A1 motorway is part of the European E30, which connects Amsterdam to the German border.At the same time, the A1 motorway has the first high-occupancy vehicle (HOV) 3 + lane in Europe.This makes forecasting quite challenging, as traffic volumes on the high-occupancy vehicle lane (HOV) tend to change dramatically over time.
Statistically, among the busiest motorways in the Netherlands is the A2 motorway.In this study, data collected in 2010 are used to test whether the proposed model can still perform well under traffic congestion.
The A4 motorway is part of the Rijksweg.It also connects Amsterdam with the Belgian border.
The shortest of the four motorways is the A8.Its total length is less than 10 km.There are some missing data in the original data, which have a value of −1.We can estimate and fill in the missing data using the statistical learning method proposed by Li et al. [40] by averaging the measurements from other weeks at the same time.

Evaluation Criterion
In the experiments, two commonly used criteria are used to assess the forecasting performance of the proposed model.The mean error between the measured values and the predicted values of the model is measured by the root mean squared error (RMSE).The mean absolute percentage error (MAPE) is a percentage description of these differences.These two criteria are calculated using Equations ( 6) and (7), respectively.
where y(n) and ŷ(n) are the true value and the forecasting value of the nth group of data, and N is the number of test samples.

Baseline Models
The short-term traffic flow forecasting performance of the SSA-ELM model will be evaluated and compared with several traffic flow forecasting models commonly applied in ITS.

Artificial Neural Network (ANN)
An artificial neural network (ANN) refers to a complex network structure formed by the interconnection of a large number of processing units, also called neurons, and is also a non-parametric learning model.We set the network parameters of the artificial neural network (ANN) model in our experiments according to the criteria proposed by Zhu et al. [15] for radial basis function (RBF) neural networks.The number of hidden layers of the ANN is set to 1, and the mean squared error goal is set to 0.001.At the same time, we set the maximum number of neurons in the hidden layers to 40 and the expansion of the radial basis function (RBF) to 2000.The number of neurons added between displays (DF) is set to 25, according to the default value.

Grey Model (GM)
The grey prediction model (GM) builds mathematical models to make predictions from a small amount of incomplete information.Here, we use the cumulative generative process and the GM(1,1) model [41] to predict short-term traffic flow.

Support Vector Machine Regression (SVR)
A specific description of the SVR is given by Zhou et al. [3].In the experiments, the radial basis function (RBF) is chosen as the kernel.The cost parameter C is set according to the maximum difference between traffic flow, and the regression horizon is set to 8.

Kalman Filter (KF)
The KF can dynamically update its state variables, thus adapting well to fluctuations in traffic flow, and it is suitable for timely traffic forecasting.The prediction accuracy of the KF model is susceptible to the effects of noise in the data [42].We use the discrete wavelet decomposition method to remove the influence of noise.The specific relevant parameter settings are referred to.

Autoregression (AR)
The AR model is a statistical approach to a time series.As a linear regression model, it describes subsequent random variables by a linear combination of random variables from previous times.Due to the large randomness of traffic flow, the AR model is widely used in traffic flow forecasting.In this case, we set the parameter p to 8.

Decision Tree (DT)
In the experiments, we employ a DT model based on a classification and regression tree (CART) for traffic flow forecasting, which does not require any prior hypothesis and is robust to noisy and missing data [43].
We compare the predictive performance of the proposed SSA-ELM with the standard extreme learning machine and the extreme learning machine optimized by genetic algorithm (GA-ELM).In extreme learning machines optimized by SSA and genetic algorithms (GA), the stochastic distribution of particles makes the results of each model run differently.Based on this, we take the average of the results of 100 runs as the result of the GA-ELM model and the SSA-ELM model on each dataset.
Tables 1 and 2 indicate the results of the different forecasting models on the four benchmark datasets.It is clear that the SSA-ELM model achieves the best forecasting results on each dataset.For instance, compared with DT, which has relatively good prediction performance among several other models, the MAPEs of our proposed model SSA-ELM decrease by 3.97%, 5.99%, 5.83%, and 11.16% for the forecasting experimental results on the A1, A2, A4, and A8 datasets, respectively.Our proposed model, SSA-ELM, compared with the ANN model, shows decreases in MAPEs by 8.01%, 6.24%, 6.97%, and 3.43% for the predicted experimental results on the A1, A2, A4, and A8 datasets, respectively.Comparing our proposed model SSA-ELM with AR, the MAPEs of the predicted experimental results on the A1, A2, A4, and A8 dataset decreased by 14.52%, 11.91%, 8.50%, and 4.80%, respectively.ELM and ANN have a similar single hidden layer network structure, and their prediction performance is also comparable.Nonetheless, ELM has the following advantages over ANN: (1) ELM does not need to continuously adjust the weights and thresholds in reverse and has a noticeably faster learning speed than ANN.(2) The proof indicates that ELM has better generalization ability.(3) ANN, as a traditional learning algorithm based on gradient descent, is prone to encounter some problems, such as falling into a local minimum, the selection sensitivity of learning rate, and overfitting.Some methods, like weight attenuation and early stopping, are often applied to ANN to solve the above problems.Solutions to avoid these trivial problems can be obtained directly from ELM, which usually has a much simpler network structure than ANN.(4) The ANN is only applicable to differentiable activation functions, whereas the ELM is applicable to almost all non-linear activation functions, including training feedforward neural networks with numerous nondifferentiable activation functions.In the meantime, we compare the forecasting performances of standard ELM, GA-ELM, and SSA-ELM (Figures 4 and 5).As shown by the experimental results, both SSA and GA can optimize ELM.As can be seen from Table 1, compared with GA-ELM, SSA-ELM reduces MAPEs by 2.19%, 0.87%, 2.11%, and 1.31% on the A1, A2, A4, and A8 data sets, respectively.Furthermore, as seen from the data in Table 2, the RMSEs of the experimental results of our proposed model SSA-ELM declined by 1.35%, 5.04%, 3.23%, and 3.18% in each data set compared to GA-ELM for the A1, A2, A4, and A8 data set, respectively.The extreme learning machine has numerous advantages, such as strong learning ability and high generalization ability, as well as low computational complexity.At the same time, ELM could learn the internal relationships between them in a short time with a limited amount of training data at a high speed.In the SSA-ELM model, we propose to optimize the model effectively by the SSA algorithm.By optimizing in this way, our proposed model outperforms the other selected models in terms of prediction accuracy when compared with them.Meanwhile, we compare the Akaike Information Criterion (AIC) [44] of ELM, GA-ELM, and SSA-ELM in Table 3. AIC is based on the concept of entropy, which allows the weighing of the complexity of different prediction models and the goodness of this model to fit the data.As indicated in Table 3, SSA-ELM has a relatively smaller AIC for each benchmark dataset.Therefore, it can be seen that our proposed model has better performance in terms of model complexity and the goodness of fitted data.
Then, we clearly visualize the deviation between the short-term traffic flow prediction results of the proposed model and the ground truth for one week in Figures 6-9.In the graph, the pink line denotes the real value of traffic flow, while the green line represents the predicted result of the SSA-ELM model.The relative error of the model forecast is shown by the black line, which is calculated by dividing the error between the predicted and true values by the true value.As seen in Figures 6-9, the prediction performance of our proposed model is good in the cases of A1, A2, A4, and A8, as the relative errors of the experimental predictions are approaching 0 most of the time.The GEH statistic is introduced to analyze the predicted results of the experiment.The GEH statistic compares two groups of traffic volumes in traffic forecasting and traffic modeling.It has also proven to be useful for all kinds of traffic analysis.In Table 4, the GEH statistics of the predicted experimental results for the four benchmark datasets can be seen.As indicated in Table 4, the GEH of the experimental results of the SSA-ELM model for most of the datasets is less than 5.The GEH for the A1 dataset is 6.11, which probably means that the volatility of the A1 dataset is stronger than the other three datasets.From the property of the GEH statistic, it is known that the proposed model can be considered to have a well-fitting performance.The proposed model is known to demonstrate strong fitting performance.

Ablation Study
Traffic flow forecasting is not achieved by forecasting minute-by-minute fluctuations.Therefore, the time average aggregation is proposed by Wang et al. for traffic flow forecasting.The average aggregation refers to the average traffic flow per hour during this 10-min period.We aggregate the 1-min average traffic data into a 10-min average traffic.In the prediction experiments we conducted, each data set is divided into two parts.The first four weeks of data are used for model training, and the remaining week of data is for model performance testing.We set the time lag to 8. Regarding the setting of ELM, the number of hidden layer nodes is set to 30, and the activation function is chosen as a sigmoid function.
In the experiments, MAPE and RMSE are used to assess the optimization effectiveness of the SSA algorithm.The hyperparameter settings of the SSA algorithm can be seen from Table 5.The effect of the number of iterations optimized by the SSA algorithm in the model is shown in Figure 10.The number of Iterations We introduce another commonly used optimization algorithm, called the genetic algorithm (GA), for comparison with the SSA to evaluate the ability of SSA to optimize ELM parameters.We set the parameters to the same value, except those for the particular algorithms, to ensure the fairness of the comparison.Thus, the maximum number of iterations of the genetic algorithm is set to 100, the crossover probability is 0.85, the generation gap is set to 0.95, and the variance probability is 0.03.Additional details on extreme learning machine algorithm optimized by genetic algorithm (GA-ELM) [45] can be found in the study by Krishnan et al.The traffic flow data of two typical time periods are selected to evaluate the effectiveness of SSA and GA in optimizing the parameters of the ELM model.The two typical time periods are the time interval from 7:30 to 9:30, which is usually considered the morning peak (Table 6) , and the afternoon off-peak period from 13:30 to 14:30 (Table 7).In Tables 6 and 7, we can see the forecasting performances for the morning peak and afternoon off-peak periods separately.Tables 6 and 7 illustrate the traffic flow forecasting error for several different models for the morning peak and the afternoon peak period, respectively.The forecasting error mentioned above refers to the absolute error between the ground truth and the predicted value.According to the data in Tables 7 and 8, the SSA-ELM model shows better predictive performance than GA-ELM under the RMSE and MAPE evaluation criteria.Tables 6 and 7 indicate that the forecasting error of the SSA-ELM model is lower than that of the GA-ELM model under different scenarios.It can be seen from the above data that SSA outperformed GA in optimizing the parameters of the ELM model, whether in the morning peak period or the afternoon off-peak period.In addition, we also take into account the conditions during a period of low traffic flow.The period between midnight and early morning is usually a period of low traffic.A small traffic flow forecast error can easily produce a large MAPE value during this time period.The time period of 23:30 to 00:30 is considered by us as the midnight period.Table 8 shows the comparison of traffic flow forecasting results for ELM models under different algorithm optimizations during the midnight period, when traffic flow is low.
The performance of a forecasting model through learning is often determined by the learning ability of the model and the quality of the training data.In this work, the temporal learning capability of the proposed model is our main focus.Our proposed model is trained on a benchmark dataset from motorway inflow, which is relatively less affected by the traffic flow of the road network.The model requires some slight modification if it is to be applied to traffic flow forecasting on urban arterial roads, because of the strong correlation between the intersections of urban arterial roads.

Conclusions
In this paper, based on the characteristics of short-term traffic flow that cause difficulties in short-term traffic flow forecasting, such as nonlinearity and the dynamics of short-term traffic flow, extreme learning machine and sparrow search algorithm are selected as the method support and theoretical basis.We proposed to construct a hybrid optimization forecasting model SSA-ELM for short-term traffic flow forecasting using a data-driven approach that combines the characteristics of both and to explore the potential of hybrid optimization models in the field of short-term traffic flow forecasting.By comparing experiments with common short-term traffic flow forecasting model methods on four benchmark datasets, the results show that the SSA-ELM hybrid optimization model achieves the optimal evaluation metrics, and the sparrow search algorithm effectively improves the generalization performance of the model, revealing the potential of hybrid optimization models in the field of short-term traffic flow forecasting.

Figure 1 .
Figure 1.Extreme learning machine has a three-layer structure with only one hidden layer.It contains parameters input weights α, output weights β, and hidden layer deviations b.

Algorithm 1 : 8 9 11 Rank the fitness values; 12 Find
SSA-ELM Algorithm.Input: G: the maximum number of iterations; P: the population; PD: the number of the producers; SD: the number of the sparrows who perceive the danger; R 2 : the alarm value; K: the number of hidden neurons; g(x): the activation function Output: x best : the optimal parameters 1 Prepare for the model of ELM network; 2 Set the activation function and the number of hidden nodes; 3 Initialize a population of sparrows and define its relevant parameters; 4 Input the training samples; 5 Calculate the initial fitness value of each sparrow and rank the fitness values; 6 while ending conditions false do 7 Update the producer's location; Update the scrounger's location; Update the location of sparrow who perceives the danger; 10 Calculate the new fitness value of each sparrow; the current best individual and the current worst individual; 13 Update the global optimal fitness value and corresponding location if necessary;

Figure 2 .
Figure 2. The workflow of the SSA-ELM hybrid model.

Figure 3 .
Figure 3. Brief description of the four highways in Amsterdam.

Figure 6 .
Figure 6.The predictions of the SSA-ELM model and the measurement over a week, and the prediction's relative error on the A1.

Figure 7 .Figure 8 .Figure 9 .
Figure 7.The predictions of the SSA-ELM model and the measurement over a week, and the prediction's relative error on the A2.

3 Figure 10 .
Figure 10.The forecasting performance by the number of SSA iterations.

Table 1 .
The MAPE (%) of different forecasting models on four datasets.

Table 2 .
The RMSE (vehicles/h) of different forecasting models on four datasets.

Table 3 .
The comparison of AIC for ELM, GA-ELM, and SSA-ELM.

Table 4 .
The GEH statistics of the SSA-ELM model.

Table 5 .
The hyperparameters of the SSA.

Table 6 .
The forecasting performance of different optimization algorithms on the ELM during the morning peak period.

Table 7 .
The forecasting performance of different optimization algorithms on the ELM during the afternoon off-peak period.

Table 8 .
The forecasting performance of different optimization algorithms on the ELM during the low traffic period at midnight.