GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting

Chai, Wenguang; Zhang, Liangguang; Lin, Zhizhe; Zhou, Jinglin; Zhou, Teng

doi:10.3390/math12010103

Open AccessArticle

GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting

by

Wenguang Chai

¹,

Liangguang Zhang

¹,

Zhizhe Lin

²

,

Jinglin Zhou

^3,* and

Teng Zhou

⁴

¹

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China

²

School of Information and Communication Engineering, Hainan University, Haikou 570228, China

³

School of Computer Science, Fudan University, Shanghai 200433, China

⁴

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(1), 103; https://doi.org/10.3390/math12010103

Submission received: 1 December 2023 / Revised: 21 December 2023 / Accepted: 25 December 2023 / Published: 27 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Short-term traffic flow forecasting, an essential enabler for intelligent transportation systems, is a fundamental and challenging task for dramatically changing traffic flow over time. In this paper, we present a gravitational search optimized kernel extreme learning machine, named GSA-KELM, to avoid manually traversing all possible parameters to improve the potential performance. Furthermore, with the interference of heavy-tailed impulse noise, the performance of KELM may be seriously deteriorated. Based on the Kalman filter that cleverly combines observed data and estimated data to perform the closed-loop management of errors and limit the errors within a certain range, we propose a combined model, termed GSA-KELM-KF. The experimental results of two real-world datasets demonstrate that GSA-KELM-KF outperforms the state-of-the-art parametric and non-parametric models.

Keywords:

traffic flow theory; extreme learning machine; Kalman filter

MSC:

68Q07; 68Q32; 68W50

1. Introduction

Accurate traffic flow forecasting is essential for many intelligent transportation systems. It not only participates in dynamic traffic management (DTM) but also provides routing information for the public traffic plans [1]. Therefore, it has attracted close attention from public institutions, commercial organizations, and individual travellers. Despite all that, short-term traffic flow forecasting is challenging because of the uncertainty of the seasonality that emerged from noise and the randomness of exogenous elements [2].

Short-term traffic flow is traffic flow with a sampling interval not exceeding 15 min. Parametric Regression (PR) and Non-Parametric Regression (NPR) are two kinds of short-term traffic flow forecasting methods [3]. PR includes moving average [4], spectral analysis [5,6], time series method [7,8], combined forecasting method [9], and Kalman filtering [10]. Parametric models with finite parameters presume that traffic flow is subordinated to a certain distribution, e.g., Gaussian distribution. It is difficult to capture the complex nonlinear relationship of traffic flow for this kind of model with the functions of finite parameters, which are prone to insufficient fitting. Non-parametric models for short-term traffic flow forecasting contain fuzzy logic systems [11], support vector regression [12], artificial neural networks [13], extreme learning machines [14,15], long short-term memory [16,17], etc.

In recent years, more and more researchers have demonstrated their interest in short-term traffic flow forecasting. A simple yet efficient learning method for complex nonlinear structures has been established based on a neural network that occupies incalculable superiority for strong learning capacity. The most popular neural networks—single-hidden layer feed-forward neural networks—have been widely utilized for traffic flow forecasting. Nevertheless, they still has the drawbacks of a slower learning speed than required, easily converging to a local minimum and over-fitting.

Extreme Learning Machine (ELM) is an improved single-hidden layer feedforward neural network learning algorithm. In ELM, the input weights and the biases are randomly generated, which is useful for overcoming the shortcomings of the above forecasting models [18]. ELM has the advantages of strong adaptability to nonlinear systems and fast speed. Wang et al. [19] proposed an ELM based on fuzzy C-means (FCM). Chen et al. [20] apply the maximum mixture correntropy criterion (MMCC) to ELM for function approximation and data regression. For all that, the method needs to set the number of hidden layer neurons. To further enhance the performance of ELM and satisfy some particular application requirements, the implicit mappings defined by the kernel could be employed to replace the explicit feature mappings. Along this line of thought, KELM provides a junction point for ELM with several classical learning methods, e.g., the radial basis function network [21] and the least squares support vector machine [22].

It is noteworthy that KELM finds the optimal parameters suitable for the model via the grid search method, which causes a small learning rate, an overly long learning time, over-fitting, degradation of generalization performance, etc. Furthermore, the minimum mean square error is dependent on the assumption that the data are noise-free or the noise is of Gaussian distribution. The performance may deteriorate seriously when some heavy-tailed impulsive noises disturb the distribution of data in KELM.

Many algorithms have been reformulated to address these issues, such as the genetic algorithm and gravitational search algorithm [23,24,25,26]. In this paper, the kernel extreme learning machine optimized by GSA is proposed, which does not need to manually traverse all possible parameter combinations of

γ

and

σ

to achieve more accurate forecasting performance. And, the GSA-KELM combined Kalman filter model effectively reduces the interference of heavy-tailed impulse noise.

The major contributions of this work are shown as follows.

First, we retain the advantages of KELM’s implicit mapping and optimize the KELM through the GSA to avoid manually traversing all possible parameters and improve potential accuracy.
Second, the GSA-KELM combined Kalman filter model effectively reduces the interference of heavy-tailed impulse noise.
Third, the outperformance of our model is demonstrated on two benchmark datasets via comparison with several benchmark methods for traffic flow forecasting.

The rest of this paper is organized as follows. The second part is the methodology, the third part is the experiment, and the fourth part is the conclusion.

2. Methodology

This section briefly introduces the gravitational search algorithm, kernel extreme learning machine, and Kalman filter as shown in Figure 1. Then, we employ GSA-KELM and KF to forecast traffic flow. Lastly, we detail the combination of GSA-KELM and KF.

2.1. Gravitational Search Algorithm

The gravitational search algorithm (GSA) [27] is a heuristic search algorithm inspired by Newtonian laws of gravity. Its main idea is that any two particles attract each other through the force connecting the centre line direction.

Assuming k particles in d-dimensional search space, the position of the i-th particle can be expressed as

X_{i} = (x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{d})

for

i = 1, 2, \dots, k

. And the velocity can be expressed as

V_{i} = (v_{i}^{1}, v_{i}^{2}, \dots, v_{i}^{d})

. The inertial mass of particles can be updated by

\begin{matrix} {f i t}_{i} & (t) = {f v}_{i} (t) - {f v}_{i} (t - 1), \\ m_{i} (t) & = \frac{| {f i t}_{i} (t) | - m a x | {f i t}_{i} (t) |}{m i n | {f i t}_{i} (t) | - m a x | {f i t}_{i} (t) |}, \\ M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{k} m_{j} (t)}, \end{matrix}

(1)

where

f v_{i} (t)

is the fitness value of the i-th particle at iteration t. And the initial value is set to 0. The gravitation force exerted by the j-th particle to i-th particle at tth iteration is defined by

F_{i j}^{d} (t) = G (t) \frac{M_{j} (t) \times M_{i} (t)}{D_{i j} (t) + r} (x_{j}^{d} (t) - x_{i}^{d} (t)),

(2)

where

M_{j} (t)

and

M_{i} (t)

are the inertial masses of the particle j, which exerts the gravity and the acted particle i, respectively;

D_{i j} (t)

denotes the Euclidian distance between i-th particle and j-th particle; and r is a small constant.

G (t)

is a gravity constant that controls the search accuracy, indicated by

G (t) = G_{0} e^{- v \frac{t}{T}},

(3)

where

G_{0}

denotes the initial value of the gravity constant, v is the manually adjusted constant, and T is the utmost iteration. Similarly, the acceleration is expressed by

a_{i}^{d} (t) = \frac{F_{i}^{d} (t) \times {f i t}_{i} (t)}{M_{i} (t) \times | {f i t}_{i} (t) |},

(4)

In each iteration, the particle updates its position and speed, which can be represented as

\begin{matrix} v_{i}^{d} (t + 1) = r a n d o m_{i} \times v_{i}^{d} (t) + a_{i}^{d}, \\ x_{i}^{d} (t + 1) = x_{i}^{d} + v_{i}^{d} (t + 1) . \end{matrix}

(5)

Obviously, given the initial value of particle position and velocity, the optimal position of the particle is obtained after a series of update calculations and after the optimal fitness value is figured out.

2.2. Kernel Extreme Learning Machine

The Extreme Learning Machine (ELM) is a single hidden layer feed-forward neural network, whose general form with L hidden layer neurons is described in Equation (6).

f_{L} (X) = \sum_{i = 1}^{L} β_{i} g_{i} (X) = \sum_{i = 1}^{L} β_{i} g (a_{i}, b_{i}, X),

(6)

where

β_{1}, β_{2}, \dots, β_{L}

are the output weights between the hidden layer with L neurons and the output layer, and

g_{i}

is activation function, whose parameters

a_{i}

and

b_{i}

are randomly assigned [28]. The output weight of the ELM is directly computed by minimizing the approximation error described in Equation (7).

min_{β ϵ R_{L \times N}} {∥ H β - Y ∥}^{2},

(7)

where H is the output of the hidden layer with L neurons, N is the number of neurons in the output layer, and Y is the target matrix of training data. The solution to Equation (8) is

β = {(γ I + H^{T} H)}^{- 1} H^{T} Y),

(8)

where

I

is an identity matrix. For a new input

x

, the output can be expressed as

f (x) = h (x) β .

(9)

After introducing the kernel function [29,30], the explicit feature mappings can be replaced by implicit mappings, which means that

h (x)

does not need to be known. In this paper, the Gaussian kernel [31], the most popular kernel, is chosen. The expression is

κ (x_{1}, x_{2}) = e x p (- \frac{∥ x_{1} - x_{2} ∥}{2 σ^{2}}),

(10)

where

σ

is the kernel size. Therefore, the trouble of manually adjusting the number of hidden nodes and randomly generating the input weights and biases is omitted [32]. Then, the expression is converted to

\begin{matrix} f (x) & = φ (x) β \\ = φ (x) {(γ I + ϕ^{T} ϕ)}^{- 1} ϕ^{T} Y \\ = φ (x) ϕ^{T} {(γ I + ϕ^{T} ϕ)}^{- 1} Y \\ = k (x) α, \end{matrix}

(11)

where

α = {(γ I + ϕ^{T} ϕ)}^{- 1} Y

is the coefficient matrix and

k (x) = φ (x) ϕ^{T}

. The kernel trick [33] can be described by

φ (x_{1}) φ {(x_{2})}^{T}

=

κ (x_{1}, x_{2})

. Therefore, having

\begin{matrix} φ (x) ϕ^{T} & = φ (x) [φ {(x_{1})}^{T}, \dots, φ {(x_{N})}^{T}] \\ = [κ (x, x_{1}), \dots, κ (x, x_{N})], \end{matrix}

(12)

and

ϕ ϕ^{T} = [\begin{matrix} κ (x_{1}, x_{1}) & \dots & κ (x_{1}, x_{N}) \\ ⋮ & ⋱ & ⋮ \\ κ (x_{N}, x_{1}) & \dots & κ (x_{N}, x_{N}) \end{matrix}]

(13)

2.3. Kalman Filter

The Kalman filter (KF) [34], a kind of prediction method based on the filtering principle proposed by Kalman in 1960, is suitable for estimating the optimal state of a dynamic system. Before being applied to short-term traffic flow prediction, it has been successfully applied in traffic demand prediction with high prediction accuracy.

The basic principle of the Kalman filter is to establish the state space model of the sampled signal, and its mathematical model is expressed as

x_{t + 1} = F x_{t} + ω_{t},

(14)

y_{(t + 1)} = H x_{t} + ν_{t} .

(15)

Equation (14) is the state of the system, and Equation (15) is the system observation. F is state transfer matrix, and H is the observation matrix.

ω_{t}

is random variables for the process noise, and

ν_{t}

is the observation noise, both of which obey independent Gaussian distribution.

\begin{matrix} p (ω) & \sim N (0, Q), \\ p (ν) & \sim N (0, R) \end{matrix}

(16)

where Q and R are processed error covariance and observation error covariance.

The Kalman filter estimates the prior state

{\hat{x}}_{t}^{-}

and

P_{t}^{-}

by recursively updating the a posteriori estimation

{\hat{x}}_{t}^{+}

of the state and the covariance of the estimation error

P_{t}^{+}

by the following equations:

{\hat{x}}_{t}^{-} = F {\hat{x}}_{t - 1}^{+},

(17)

P_{t}^{-} = F P_{t - 1}^{+} F^{T} + Q,

(18)

K_{t} = P_{t}^{-} H_{t}^{T} {(H_{t} P_{t}^{-} H_{t}^{T} + R)}^{- 1},

(19)

{\hat{x}}_{t}^{+} = {\hat{x}}_{t}^{-} + K_{t} (x_{t} - H_{t} {\hat{x}}_{t}^{-}),

(20)

P_{t}^{+} = (I - K_{t} H_{t}) P_{t}^{-},

(21)

where

K_{t}

is the Kalman filter gain, I is the identity matrix, and

x_{t}

is Observation.

2.4. GSA-KELM for Traffic Flow Forecasting

It is a tremendous project in KELM using the grid search method to manually operate potential values of

γ

and

σ

. However, massive parameter values are vital to achieving the expected performance.

To settle the aforementioned problems, the gravitational search algorithm takes the place of the grid search method to search for more suitable network parameters for the KELM, termed GSA-KELM. First, the particles are initialized by setting the spatial dimensions, number of particles, gravity constant, gravitational force constant, value of fit, and maximum iteration number. Then, the positions and velocities of particles are generated randomly, and the initial fit value is zero. After that, the fitness value of particles is calculated in KELM. Later, the positions and velocities of particles are updated for optimum fitness until the ultimate criteria are met. The algorithm of GSA-KELM is shown in Algorithm 1.

Algorithm 1: Framework of GSA-KELM

Input: Training set S =

{x_{j}, y_{j}}_{j = 1}^{N}

Output: The value of fitness

₁ Parameters setting: spatial dimensions d, the number of particles k in a d-dimensional search space, gravity constant G, constant r for the gravitational force, and maximum iteration number T.

₂ Initialization: Positions x and velocities v of randomly generated Q groups of particles.

₃ for t = 1 to T do

2.5. Kalman Filter for Traffic Flow Forecasting

Kalman filter is a good predictor for the near future with a basic implementation process as follows. First, the forecast period data are input, and the model is initialized. Then, the prior estimate

{\hat{x}}_{t}^{-}

and prior covariance of the estimation error

P_{t}^{-}

are calculated by Equation (17) and Equation (18), respectively. In Equation (19), the Kalman filter gains

K_{t}

are calculated. Later, the posterior estimate

{\hat{x}}_{t}^{+}

is acquired by Equation (20), and the posterior covariance

P_{t}^{+}

is obtained by Equation (21) to predict the next time. The algorithm is shown in Algorithm 2.

Algorithm 2: Framework of KF.

Input:

F, H, Q, R, P_{T_{0}}^{+}, {\hat{x}}_{t_{0}}^{+}

Output:

{\hat{x}}_{t + 1}

, t =

T_{0}, \dots, T_{c}

₁ for t =

T_{0}

to

T_{c}

do

2.6. GSA-KELM-KF Model for Traffic Flow Forecasting

In reality, most data are collected by instruments. Heavy-tailed impulsive noises are unavoidable, disturbing data distribution in GSA-KELM. The reason the Kalman filter is widely used in traffic flow prediction is not that its estimation deviation is small. The observation data and estimation data are skillfully combined in the KF, which conducts closed-loop management of error and limits the error to a certain range. Thus, combining the Kalman filter with GSA-KELM can effectively reduce the influence of heavy-tailed pulse noise. In this paper, the linear combination method is adopted, and it can be expressed as

\hat{y} = η {\hat{y}}_{g k} + (1 - η) {\hat{y}}_{k f},

(22)

where

{\hat{y}}_{g k}

and

{\hat{y}}_{k f}

are the prediction results of GSA-KELM and KF, respectively.

η

is the weight, and the value range is

(0, 1)

.

The prediction process can be described as follows. Firstly, calculate the forecast results of GSA-KELM and KF. Then, select the weight value. Finally, obtain the forecast results of the GSA-KELM-KF through Equation (22). The algorithm is shown in Algorithm 3. The flow chart is shown in Figure 1.

Algorithm 3: Framework of GSA-KELM-KF

Input: Training set S =

{x_{j}, y_{j}}_{j = 1}^{N}

Output: The forecast results of the Emsemble

₁ Initialize parameters.

₂ Use GSA-KELM and KF to predict the traffic flow of the selected period, and the results are stored as

{\hat{y}}_{g k}

and

{\hat{y}}_{k f}

, respectively.

₃ The results

\hat{y}

are obtained by Equation (22).

3. Experiments

In this section, the performance of the GSA-KELM-KF model is evaluated on two publicly benchmark datasets for short-term traffic flow forecasting, e.g., the Amsterdam high-way dataset and England M25 highway dataset. More specific details are given below.

3.1. Datasets Description

The Amsterdam highway dataset was collected on four motorways, namely, A1, A2, A4, and A8, in Amsterdam, the Netherlands. The basic conditions of the four expressways are shown in Figure 2. The four benchmark datasets, each of which consists of five weeks of data collected by the MONICA annular detector from 20 May to 24 June 2010, are aggregated by vehicles per minute and hour. Traffic flow data of 1 min are aggregated into 10-min aggregation in the same method. In the raw data, erroneous data are mixed, e.g., zero or negative for a long time. The average of the measurements at the same time in other weeks is used to revise them.

England M25 highway dataset was collected from seven stations, namely, D1, D2, D3, D4, D5, D6, and P in England. Each station contains four weeks of data, collected per 15 min from 1 August 2019 to 31 August 2019. The measurement positions are near Heathrow Airport on the M25 expressway, as shown in Figure 3.

3.2. Evaluation Criteria

In this paper, some frequently used criteria are employed to evaluate the performance of the GSA-KELM-KF model. The root means square error (RMSE) measures the average differences between the predictions of a model and the system being modelled, while the mean absolute percentage error (MAPE) is the percentage expression of the differences. Their expression is as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {(\hat{y} (k) - y (k))}^{2}},

(23)

M A P E = \frac{1}{N} \sum_{k = 1}^{N} | \frac{\hat{y} (k) - y (k)}{y (k)} | \times 100 %,

(24)

where

y (k)

is the groundtruth, and

\hat{y} (k)

is the prediction at the time k.

3.3. Experimental Setup

All experiments are compiled and tested on a Linux cluster (CPU: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz, GPU: NVIDIA GeForce RTX 3070).

For KF, this paper adopts the prediction strategy of predicting the current time from the previous time, so H is set to

[1]

. F is the identity matrix. And the initialization covariance matrix is consistent with Q.

Amsterdam highway dataset is divided into two parts: the first four weeks for training and the last week for testing. Each input vector contains 12 consecutive data values to forecast the value of the 13th data value. The ranges of the positions and velocities of

γ

are set to

[10^{- 3}, 10^{- 1}]

and

[- 10^{- 4}, 10^{- 1}]

, while the ranges of positions and velocities of

θ

are

[0.1, 15]

and

[- 2, 2]

. In KF, the R is set as

0.2

. Q of dataset A1 is set

4 \times 10^{- 2} I

, and the other settings are

10^{- 2} I

. The value of

η

of dataset A1 and dataset A2 is

0.9

, and the others are

0.7

.

England M25 highway dataset is also split into two parts: the first week for training and the second week for testing. The dimensions of the network are set to 12. In the gravity search algorithm, the maximum number of iterations is set to 50. The ranges of the positions and velcocities of

γ

are set to

[10^{- 3}, 10^{- 1}]

and

[{- 10}^{- 4}, 10^{- 1}]

.

[0.5, 30]

and

[- 5, 5]

are the ranges of the positions and velocities of

θ

. In KF, R and

η

are set to

0.1

and

0.9

. Q of datasets D4 and P is set

10^{- 3} I

and

10^{- 4} I

, respectively, while the other settings are

10^{- 2} I

.

To evaluate the performance of the GSA-KELM-KF framework, we compared it with the support vector machine regression, decision tree, artificial neural network, k-nearest neighbour, long short-term memory, noise-immune long short-term memory, extreme learning machine, and kernel extreme learning machine frameworks.

Support vector machine regression (SVR): SVR adds the tolerance deviation of ∈ between the estimated value and the real value to minimize the total deviation of all sample points from the hyperplane. The radial basis function is used as the kernel type, whose regression horizon and the width parameter are set to 8 and

3 \times 10^{- 6}

. The cost parameter is set to the maximum difference for short-term traffic flow forecasting.

Decision tree (DT): Based on classification and regression tree, DT has strong robustness for missing data and noise without any prior hypothesis [35].

Artificial neural network (ANN): ANN, a learning model, is generated by the interconnection of a large number of neurons. The number of hidden layers, the mean squared error goal, the spread of radial basis functions, the maximum number of neurons in the hidden layer, and the number of neurons to add between displays based on a default values are set to 1,

0.001

, 2000, 40, and 25, respectively [36].

k-nearest neighbor (kNN) [37]: kNN is a machine learning algorithm with a high tolerance for outliers and noise. Predict the properties of a query vector by using the properties of the few vectors closest to the query vector.

Long short-term memory (LSTM) [38]: LSTM is a special kind of recurrent neural network. It is developed to capture time dependence over a long period for traffic flow forecasting. Optimized by grid search, the validation split, the epochs, the batch size, and the hyperparameter units are

0.05

, 50, 32, and 256, respectively.

Noise-immune long short-term memory (NiLSTM) [39]: A noise immunity loss function is derived through the maximum correlation entropy of long short-term memory networks. Compared with conventional LSTM, the loss of maximum correlation entropy is a local similarity measure and is not immune to non-Gaussian noise. The length of the input sequence is set to 12, and the range of kernel size is

[0.1, 0.2, 0.5, 1.0, 2.0, 3.0]

.

Extreme learning machine (ELM) [40]: ELM is a single-hidden layer feed-forward neural network, which can be directly applied to regression, classification, clustering, and other learning tasks. Both the input weights and the biases of the hidden layer are generated randomly.

Kelnel extreme learning machine (KELM): KELM, as a variety of extended

{E L M}_{s}

, improves the generalization of ELM, in which the explicit feature mappings are replaced by kernels according to the kernel method. It is unnecessary to allocate the input weights and biases [41,42]. In KELM, the parameters

γ

and

σ

are set to

{10^{- 5}, 10^{- 4}, 10^{- 3}, 10^{- 2}, 10^{- 1}}

and {0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10, 15, 30, 60, 120}.

3.4. Experimental Result

To verify the effectiveness of the proposed models for the randomness and nonlinearity of traffic flow, the experimental scenario in Amsterdam’s ring road dataset is introduced. Table 1 and Table 2 indicate the results of

{R M S E}_{s}

and

{M A P E}_{s}

separately.

Table 1 shows the comparison of different methods on the A1, A2, A4, and A8 datasets. Our GSA-KELM-KF outperforms typical parametric and non-parametric methods on the

{R M S E}_{s}

of all datasets, whose results are

283.67

,

192.25

,

216.78

, and

162.21

, respectively. For example, compared with LSTM, the RMSE of the GSA-KELM-KF on the A1, A2, A4 and A8 datasets are decreased by

3.68 %

,

9.01 %

,

3.51 %

, and

3.96 %

, respectively. As for ELM, the RMSE of our GSA-KELM-KF are reduced by

3.54 %

,

4.67 %

,

2.38 %

, and

4.10 %

, respectively. Similarly, the RMSE of our GSA-KELM-KF are lower than KELM

0.41 %

,

0.68 %

,

2.07 %

, and

0.38 %

, respectively.

Although RMSE is more suitable for showing larger deviations, MAPE has the simplest interpretation and can be used to compare accuracy between different volumes studied [43]. Furthermore, MAPE is more suitable for the comparison of long-time series. From Table 2, we know that our GSA-KELM-KF achieves the best performance on MAPE except for the A8 dataset, with results of

11.57

,

9.58

,

11.20

, and

12.15

, respectively. Compared with LSTM, the MAPE of the GSA-KELM-KF decreases by

10.06 %

,

13.38 %

,

18.30 %

, and

3.26 %

on the A1, A2, A4, and A8 datasets, respectively. Compared with ELM, the MAPE of our GSA-KELM-KF on the A1, A2, A4, and A8 datasets decreases by

2.03 %

,

7.35 %

,

7.05 %

, and

2.17 %

, respectively. Similarly, the MAPE of our GSA-KELM-KF is lower than KELM by

1.53 %

,

1.74 %

,

1.49 %

and

5.00 %

, respectively.

The above shows that ELM has the advantages of overcoming the local minimum, and overfitting, while KELM uses the kernel learning function to effectively solve the problem of reduced generalization and stability caused by hidden layer neurons.

Figure 4 shows the visualization result of GSA-KELM-KF. The red line represents the actual truth, and the blue line represents the predicted value of our GSA-KELM-KF. Obviously, our GSA-KELM-KF has excellent fit ability. The related error can be denoted as:

r e_e r r o r = \frac{\hat{Y} - Y}{Y} .

(25)

In the early morning and late night, the traffic flow is low, and the error between the predicted value and the actual measured value is small, but the related error becomes larger.

To further illustrate the performance of our model, we also conduct an intuitive comparison with KELM in two representative stages of morning peak passenger flow and nighttime low peak passenger flow in dataset A4.

In Figure 5a, the morning peak traffic flow ranges from 2900 to 3500 vehicles per hour. Traffic flow fluctuates greatly during this stage, and it is difficult for KELM to capture these rapid fluctuations. In this case, our GSA-KELM-KF prevents KELM from falling behind, effectively mitigating the shortcomings of sudden performance drops due to drastic traffic changes. This is extremely important in intelligent transportation systems and is a key issue in traffic management and traffic information release.

In Figure 5b, traffic flow gradually decreases at night. In this case, the traffic flow dropped relatively smoothly from 1100 vehicles per hour to 300 vehicles per hour. In this case, our GSA-KELM-KF can predict this change more accurately, prevent KELM from overshooting, and achieve more reasonable predictions.

To sum up, our method has better performances than other state-of-the-art models.

To further confirm the effectiveness of the proposal in different short-term traffic flow prediction tasks, we also compare it with several extreme learning methods on the M25 highway dataset, as summarized in Table 3.

Due to the zero values of traffic flow at some time on the dataset, the criterion of MAPE is unsuitable as an evaluation criterion. Under the criterion of RMSE, the GSA-KELM-KE achieves the best representation effects on all datasets, whose results are

94.58

,

106.08

,

109.81

,

47.75

,

17.48

,

129.98

, and

25.39

on the D1, D2, D3, D4, D5, D6, and P datasets, respectively. Compared with KELM, the RMSE of our GSA-KELM-KF decreases by

1.52 %

,

1.10 %

,

2.78 %

,

1.48 %

,

4.21 %

,

6.00 %

, and

2.08 %

. Similarly, the RMSE of GSA-KELM decreases by

0.32 %

,

0.86 %

,

1.82 %

,

1.34 %

,

3.72 %

,

4.53 %

, and

1.92 %

. Our GSA-KELM-KF model outperforms the ELM, KELM, and GSA-KELM on the England M25 highway dataset, showing the potential for accurate and reliable traffic flow prediction. It can be seen from Figure 6 that GSA can automatically break the convergence state after reaching a stable state to reach a new stable state, which provides the possibility of breaking away from the local optimal solution. When particles tend to converge, the state is broken by Equation (1).

From Figure 7, our model also has a good fit on the England M25 highway dataset. Particularly, there are three outliers in Figure 7f, marked by red boxes. However, there is almost no fluctuation in the predicted values for outlier locations, indicating the strong robustness of our model.

3.5. Ablation Study

To verify the effectiveness of different components in our GSA-KELM-KF, we performed ablation experiments on the A1 and A4 datasets, as summarized in Table 4.

We compare the GSA-KELM-KF with the following several variants: (1) GSA-KELM, in which the variant removes the Kalman filter, which means we optimize KELM using the gravity search algorithm alone; (2) KELM-KF, in which the variant removes the gravity search algorithm, which means we use a grid traversal search algorithm.

From the experimental results, we can see that GSA-KELM-KF outperforms all ablation variants. Compared with the GSA-KELM, the MAPE and RMSE of GSA-KELM-KF on A4 decrease by

2.24 %

and

1.66 %

, respectively. Correspondingly, the RMSE and MAPE of GSA-KELM-KF are

1.14 %

and

0.71 %

lower than those of KELM-KF, respectively. The above illustrates the effectiveness of the gravity search algorithm and Kalman filter components. The gravity search algorithm not only avoids manual traversal operations but also automatically searches for better solutions. The Kalman filter module can further reduce the impact of heavy-tail pulse noise interference.

In reality, traffic data always arrives chunk by chunk. We further divide the datasets of A1 and A4 into multiple mini-batches. For the convenience of experiments, we divide the datasets into several mini-batch sizes of 200. We use the first two batches of data for training and test the results on the third batch.

Table 5 shows that GSA-KELM-KF achieves real-time prediction. In Figure 8, it shows excellent fitting effects. Overall, these findings suggest that our GSA-KELM-KF has promising potential in real-time traffic flow prediction applications.

4. Conclusions

In this paper, we propose a new KELM-based model (GSA-KELM-KF) for short-term traffic flow prediction. Our GSA-KELM-KF achieves the best performance in two real-world datasets, is insensitive to outliers, and exhibits strong robustness. Our model also has good fitting results on small batch data sets and has good application prospects in future work. Our model also has good fitting results on small-batch data sets. At the same time, the proposed method has some limitations. The basic datasets used in the article to train the traffic flow prediction model are relatively simple and come from highways. The application of more diverse traffic scenarios such as urban main roads will be included in our future research scope. We will also explore using additional data sources such as weather.

Author Contributions

Conceptualization, T.Z.; methodology, L.Z.; software, W.C.; validation, Z.L.; writing—original draft preparation, J.Z.; writing—review and editing, T.Z.; project administration, J.Z.; and funding acquisition, T.Z.; contribute equally, W.C., L.Z. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Guangdong Basic and Applied Basic Research Foundation (Nos. 2022A1515011590, 2021A1515012302, and 2022A1515011978), Guangdong Provincial Key Areas R&D Program Project (No. 2021B0101220006), the National Postdoctoral Fellowship Program (No. GZC20230549), National Natural Science Foundation of China (Nos. 61772143, 61902232) and the Open Fund of State Key Laboratory of Public Big Data, Guizhou University.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lana, I.; Del Ser, J.; Velez, M.; Vlahogianni, E.I. Road traffic forecasting: Recent advances and new challenges. IEEE Intell. Transp. Syst. Mag. 2018, 10, 93–109. [Google Scholar] [CrossRef]
Xu, W.; Liu, J.; Yan, J.; Yang, J.; Liu, H.; Zhou, T. Dynamic spatiotemporal graph wavelet network for traffic flow prediction. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
Zhou, N.; Chen, B.; Du, Y.; Jiang, T.; Liu, J.; Xu, Y. Maximum correntropy criterion-based robust semisupervised concept factorization for image representation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3877–3891. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Li, X.; Qiang, W.; Zhao, Y.; Zhang, W.; Tang, C. A network traffic forecasting method based on SA optimized ARIMA–BP neural network. Comput. Netw. 2021, 193, 108102. [Google Scholar] [CrossRef]
Koopmans, L.H. The Spectral Analysis of Time Series; Elsevier: Amsterdam, The Netherlands, 1995. [Google Scholar]
Deng, Q.; Zhan, Y.; Liu, C.; Qiu, Y.; Zhang, A. Multiscale power spectrum analysis of 3D surface texture for prediction of asphalt pavement friction. Constr. Build. Mater. 2021, 293, 123506. [Google Scholar] [CrossRef]
Ghosh, B.; Basu, B.; O’Mahony, M. Multivariate short-term traffic flow forecasting using time-series analysis. IEEE Trans. Intell. Transp. Syst. 2009, 10, 246–254. [Google Scholar] [CrossRef]
Ma, C.; Dai, G.; Zhou, J. Short-term traffic flow prediction for urban road sections based on time series analysis and LSTM_BILSTM method. IEEE Trans. Intell. Transp. Syst. 2021. [Google Scholar] [CrossRef]
AlKheder, S.; Alkhamees, W.; Almutairi, R.; Alkhedher, M. Bayesian combined neural network for traffic volume short-term forecasting at adjacent intersections. Neural Comput. Appl. 2021, 33, 1785–1836. [Google Scholar] [CrossRef]
Guo, J.; Huang, W.; Williams, B.M. Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification. Transp. Res. Part Emerg. Technol. 2014, 43, 50–64. [Google Scholar] [CrossRef]
Babanezhad, M.; Zabihi, S.; Behroyan, I.; Nakhjiri, A.T.; Marjani, A.; Shirazian, S. Prediction of gas velocity in two-phase flow using developed fuzzy logic system with differential evolution algorithm. Sci. Rep. 2021, 11, 2380. [Google Scholar] [CrossRef]
Cong, Y.; Wang, J.; Li, X. Traffic flow forecasting by a least squares support vector machine with a fruit fly optimization algorithm. Procedia Eng. 2016, 137, 59–68. [Google Scholar] [CrossRef]
Kumar, K.; Parida, M.; Katiyar, V. Short term traffic flow prediction for a non urban highway using artificial neural network. Procedia-Soc. Behav. Sci. 2013, 104, 755–764. [Google Scholar] [CrossRef]
Yang, H.F.; Dillon, T.S.; Chang, E.; Chen, Y.P.P. Optimized configuration of exponential smoothing and extreme learning machine for traffic flow forecasting. IEEE Trans. Ind. Inform. 2018, 15, 23–34. [Google Scholar] [CrossRef]
Tian, Z. Approach for short-term traffic flow prediction based on empirical mode decomposition and combination model fusion. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5566–5576. [Google Scholar] [CrossRef]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Tian, Y.; Zhang, K.; Li, J.; Lin, X.; Yang, B. LSTM-based traffic flow prediction with missing data. Neurocomputing 2018, 318, 297–305. [Google Scholar] [CrossRef]
Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
Wang, X.c.; Hu, J.m.; Liang, W.; Zhang, Y. Short-term travel flow prediction method based on FCM-clustering and ELM. J. Cent. South Univ. 2017, 24, 1344–1350. [Google Scholar] [CrossRef]
Chen, B.; Wang, X.; Lu, N.; Wang, S.; Cao, J.; Qin, J. Mixture correntropy for robust learning. Pattern Recognit. 2018, 79, 318–327. [Google Scholar] [CrossRef]
Karamichailidou, D.; Kaloutsa, V.; Alexandridis, A. Wind turbine power curve modeling using radial basis function neural networks and tabu search. Renew. Energy 2021, 163, 2137–2152. [Google Scholar] [CrossRef]
Leong, W.C.; Bahadori, A.; Zhang, J.; Ahmad, Z. Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM). Int. J. River Basin Manag. 2021, 19, 149–156. [Google Scholar] [CrossRef]
Abdulhai, B.; Porwal, H.; Recker, W. Short-term traffic flow prediction using neuro-genetic algorithms. ITS J. Intell. Transp. Syst. J. 2002, 7, 3–41. [Google Scholar] [CrossRef]
Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Gao, S.; Yu, Y.; Cai, Z.; Wang, Z. A gravitational search algorithm with hierarchy and distributed framework. Knowl. Based Syst. 2021, 218, 106877. [Google Scholar] [CrossRef]
Chai, W.; Zheng, Y.; Tian, L.; Qin, J.; Zhou, T. GA-KELM: Genetic-algorithm-improved kernel extreme learning machine for traffic flow forecasting. Mathematics 2023, 11, 3574. [Google Scholar] [CrossRef]
Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. GSA: A gravitational search algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man, Cybern. Part 2011, 42, 513–529. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Principe, J.C.; Haykin, S. Kernel Adaptive Filtering: A Comprehensive Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Santamaría, I.; Pokharel, P.P.; Principe, J.C. Generalized correlation function: Definition, properties, and application to blind equalization. IEEE Trans. Signal Process. 2006, 54, 2187–2197. [Google Scholar] [CrossRef]
Xing, H.J.; Wang, X.M. Training extreme learning machine via regularized correntropy criterion. Neural Comput. Appl. 2013, 23, 1977–1986. [Google Scholar] [CrossRef]
Tanveer, M.; Tiwari, A.; Choudhary, R.; Ganaie, M. Large-scale pinball twin support vector machines. Mach. Learn. 2022, 111, 3525–3548. [Google Scholar] [CrossRef]
Welch, G.; Bishop, G. An Introduction to the Kalman Filter; Department of Computer, Science University of North Carolina at Chapel Hill: Chapel Hill, NC, USA, 1995. [Google Scholar]
Li, Y.; Guo, Z.; Yang, J.; Fang, H.; Hu, Y. Prediction of ship collision risk based on CART. IET Intell. Transp. Syst. 2018, 12, 1345–1350. [Google Scholar] [CrossRef]
Zhu, J.Z.; Cao, J.X.; Zhu, Y. Traffic volume forecasting based on radial basis function neural network with the consideration of traffic flows at the adjacent intersections. Transp. Res. Part C Emerg. Technol. 2014, 47, 139–154. [Google Scholar] [CrossRef]
Cai, L.; Yu, Y.; Zhang, S.; Song, Y.; Xiong, Z.; Zhou, T. A sample-rebalanced outlier-rejected k-nearest neighbor regression model for short-term traffic flow forecasting. IEEE Access 2020, 8, 22686–22696. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Cai, L.; Lei, M.; Zhang, S.; Yu, Y.; Zhou, T.; Qin, J. A noise-immune LSTM network for short-term traffic flow forecasting. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30. [Google Scholar] [CrossRef]
Khan, W.A.; Ma, H.L.; Ouyang, X.; Mo, D.Y. Prediction of aircraft trajectory and the associated fuel consumption using covariance bidirectional extreme learning machines. Transp. Res. Part E Logist. Transp. Rev. 2021, 145, 102189. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, B.; Wang, S.; Wang, W.; Qin, W. Mixture correntropy-based kernel extreme learning machines. IEEE Trans. Neural Netw. Learn. Syst. 2020. [Google Scholar] [CrossRef]
Mohanty, D.; Parida, A.K.; Khuntia, S.S. Financial market prediction under deep learning framework using auto encoder and kernel extreme learning machine. Appl. Soft Comput. 2021, 99, 106898. [Google Scholar] [CrossRef]
Khair, U.; Fahmi, H.; Al Hakim, S.; Rahim, R. Forecasting error calculation with mean absolute deviation and mean absolute percentage error. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2017; Volume 930, p. 012002. [Google Scholar]

Figure 1. The workflow of the GSA-KELM-KF model.

Figure 2. Four high ways, named A1, A2, A4, and A8, end on the ring road of Amsterdam.

Figure 3. Location distribution of D1, D2, D3, D4, D5, D6, and P, near the Heathrow airport on M25 expressway in England.

Figure 4. The forecasts of KELM and GSA-KELM-KF models and the measurements in a week on Amsterdam’s ring road dataset.

Figure 5. Visualization of GSA-KELM-KF and KELM models when the true values fluctuate significantly, (a,b) are at peak flow in the morning and gradually decreasing at night, respectively.

Figure 6. The RMSEs forecasting results of GSA-KELM by different numbers of iterations on D5 and P datasets.

Figure 7. The visualization of our model on the England M25 expressway dataset.

Figure 8. Visualization of the predictive performance of GSA-KELM-KF and KELM on the A1 and A4 datasets.

Table 1. The

{R M S E}_{s}

of different forecasting models on Amsterdam high way dataset.

Table 1. The

{R M S E}_{s}

of different forecasting models on Amsterdam high way dataset.

Models	A1	A2	A4	A8
SVR	329.09	259.74	253.66	190.30
DT	316.57	224.79	243.19	238.35
ANN	299.64	212.95	225.86	166.50
kNN	289.33	211.79	231.79	166.39
LSTM	294.52	211.31	224.68	168.91
NiLSTM	285.54	203.69	223.72	163.25
ELM	294.10	201.67	222.07	169.15
KELM	284.83	193.58	221.36	162.84
GSA-KELM-KF	283.67	192.25	216.78	162.21

Bold indicates optimal value.

Table 2. The

{M A P E}_{s}

of different forecasting models on Amsterdam highway dataset.

Table 2. The

{M A P E}_{s}

of different forecasting models on Amsterdam highway dataset.

Models	A1	A2	A4	A8
SVR	14.34	12.22	12.23	12.48
DT	12.08	10.86	12.34	13.62
ANN	12.61	10.89	12.49	12.53
kNN	11.57	10.42	12.19	11.70
LSTM	12.82	11.06	13.71	12.56
NiLSTM	12.00	10.14	11.57	11.76
ELM	11.82	10.34	12.05	12.42
KELM	11.76	9.75	11.37	12.79
GSA-KELM-KF	11.57	9.58	11.20	12.15

Bold indicates optimal value.

Table 3. The

{R M S E}_{s}

of traffic flow datasets from D1 to D6 and P on England

M 25

highway dataset.

Table 3. The

{R M S E}_{s}

of traffic flow datasets from D1 to D6 and P on England

M 25

highway dataset.

Models	D1	D2	D3	D4	D5	D6	P
ELM	161.49	116.81	124.22	51.52	19.44	149.33	29.17
GA-ELM	101.31	110.74	118.90	48.99	18.55	145.18	28.18
KELM	96.04	107.26	112.96	48.47	18.25	138.29	25.93
GSA-KELM(ours)	95.73	106.33	110.90	47.82	17.57	132.02	25.43
GSA-KELM-KF(ours)	94.58	106.08	109.81	47.75	17.48	129.98	25.39

Bold indicates optimal value.

Table 4. Component analysis of GSA-KELM-KF.

Dataset	Model and Variants	MAPE (%)	RMSE (vehs/h)
	GSA-KELM-KF	11.57	283.67
	GSA-KELM	11.63	284.93
A1	KELM-KF	11.61	284.40
	GSA-KELM-KF	11.20	216.78
	GSA-KELM	11.39	221.76
A4	KELM-KF	11.28	219.30

Bold indicates optimal value.

Table 5. The

{R M S E}_{s}

and

{M A P E}_{s}

of GSA-KELM-KF and KELM on A1 and A4 datasets.

Table 5. The

{R M S E}_{s}

and

{M A P E}_{s}

of GSA-KELM-KF and KELM on A1 and A4 datasets.

Model	A1		A4
Model	RMSE (vehs/h)	MAPE (%)	RMSE (vehs/h)	MAPE (%)
KELM	288.54	11.37	223.78	9.65
GSA-KELM-KF	278.24	10.76	220.57	9.40

Bold indicates optimal value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chai, W.; Zhang, L.; Lin, Z.; Zhou, J.; Zhou, T. GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting. Mathematics 2024, 12, 103. https://doi.org/10.3390/math12010103

AMA Style

Chai W, Zhang L, Lin Z, Zhou J, Zhou T. GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting. Mathematics. 2024; 12(1):103. https://doi.org/10.3390/math12010103

Chicago/Turabian Style

Chai, Wenguang, Liangguang Zhang, Zhizhe Lin, Jinglin Zhou, and Teng Zhou. 2024. "GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting" Mathematics 12, no. 1: 103. https://doi.org/10.3390/math12010103

APA Style

Chai, W., Zhang, L., Lin, Z., Zhou, J., & Zhou, T. (2024). GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting. Mathematics, 12(1), 103. https://doi.org/10.3390/math12010103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GSA-KELM-KF: A Hybrid Model for Short-Term Traffic Flow Forecasting

Abstract

1. Introduction

2. Methodology

2.1. Gravitational Search Algorithm

2.2. Kernel Extreme Learning Machine

2.3. Kalman Filter

2.4. GSA-KELM for Traffic Flow Forecasting

2.5. Kalman Filter for Traffic Flow Forecasting

2.6. GSA-KELM-KF Model for Traffic Flow Forecasting

3. Experiments

3.1. Datasets Description

3.2. Evaluation Criteria

3.3. Experimental Setup

3.4. Experimental Result

3.5. Ablation Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI