2. Literature Review
At present, the methods of short-term passenger-flow prediction can be classified into two categories: the parametric model and non-parametric model. The former assumes that the passenger flow follows a certain probability distribution, and uses historical data to calibrate the model. The representative methods include the autoregressive differential average model [
2,
3], Kalman filter [
4], and grey model [
5]. Since the establishment of parametric models depends more on the a-priori knowledge of the research problem, it has a certain subjectivity. The latter makes no assumption on the probability distribution of passenger flow, and directly uses historical data to predict, which is more objective than the former, yet not conducive to being promoted. For example, Sun et al. [
6] designed the wavelet-support vector machine (SVM) hybrid model, used the wavelet technology to decompose the passenger-flow data in high and low frequency, and then used the SVM model to predict. Li et al. [
7] proposed a multi-scale radial basis function network prediction method with sudden passenger flow as the research object. Zhang et al. [
8] proposed a Conv-GCN deep-learning model for short-term passenger-flow prediction of urban rail transit by combining a graph convolution network and three-dimensional convolution neural network. In addition, the probability tree [
9] and the decision tree [
10] are also used in short-term metro passenger-flow forecasting.
With the development of big data, some scholars began to try to apply neural networks [
11,
12] to short-term passenger-flow forecasting, including long-term and short-term memory neural network [
13,
14], fully connected deep neural network [
15], stack autoencoder [
16], and deep belief network [
17]. Among them, long short-term memory neural network (LSTM) prediction has attracted much attention due to its high accuracy. Li et al. [
18] proposed an algorithm based on a back propagation long short-term memory algorithm, which can predict the speed of each vehicle on the road. Zhang et al. [
19] used LSTM to predict short-term passenger flow through clustering to capture the characteristics and trends of passenger flow, and evaluated the reasonable time granularity interval. Li et al. [
20] proposed a prediction model that combines inbound passenger flow characteristics with LSTM by extracting the characteristics affecting the accuracy of prediction. Some scholars improved the LSTM neural network. For example, Yang et al. [
21] proposed an improved model based on the LSTM neural network, which improved the long-term correlation characteristics in passenger-flow data; Jia et al. [
22] combined long short-term memory neural networks with stacked automatic encoders, and predicted the short-term passenger flow of metro stations. In order to effectively deal with complex signals and reduce noise interference, scholars combined empirical mode decomposition (EMD) with other models. Wei et al. [
23] used an EMD and BP neural network combination model to predict short-term metro passenger flow, and the prediction accuracy was better than using a BP neural network alone; Wang et al. [
24] used EMD and ARIMA to predict vehicle speed, and the prediction accuracy was better than ARIMA and ANN prediction methods in different scenarios.
Although domestic and foreign scholars have carried out a series of research on short-term metro passenger-flow prediction, the spatial and temporal distribution characteristics of metro passenger flow are becoming more and more complex. When the passenger flow fluctuates randomly, the prediction accuracy of the traditional model is often greatly affected.
According to the existing research, since the OD time series between subway stations is affected by the time and location of the entry or exit, it often exhibits nonlinearity and non-stationarity. If the basic data is directly input into the LSTM neural network, its own noise will interfere with the model to identify the spatial–temporal relationship between input and output passenger flow, resulting in reduced model prediction accuracy. Therefore, it is necessary to preprocess the time series before using neural network forecasting. Mixed modes refer to the inclusion of very different characteristic time scales in one IMF, or the distribution of similar characteristic time scales in different IMFs. This phenomenon is caused by the multiple jumps of the local extrema in a very short time interval in the process of empirical mode decomposition. Basically, the data is mixed with noise or discontinuous signals that we do not know about. For this problem, this paper proposes the ensemble empirical mode decomposition (EEMD) method to avoid mode mixing by stacking different Gaussian white noises with equal amplitude, and combines the advantages of long-term and short-term memory neural network (LSTM) in memory and forgetting. The EEMD-LSTM combination model is applied to the short-term passenger-flow prediction of a metro to ensure the accuracy of short-term prediction. Taking into account of the shortcomings of existing research, this paper makes two contributions as follows:
In view of the fact that the EMD method is prone to modal aliasing, EEMD can avoid modal aliasing by stacking different Gaussian white noise with equal amplitude. With the advantages of the memory and forgetting of LSTM, the combination model is applied to the field of metro short-term passenger-flow prediction to ensure the accuracy of short-term prediction.
The accuracy of EEMD-LSTM prediction in practical application is further explored. By changing the scale of the training set to achieve the effect of dynamic prediction, the feasibility of the model in practice is verified by comparing the static prediction results of dynamic prediction without changing the training set.
3. Methodology
3.1. Data Sources
Dalian Metro Line 1 and Line 2 basically covers the administrative center, the transportation hub, commercial area, universities and tourist attractions of Dalian. The AFC data of 42 stations of Dalian Metro Line 1 and Line 2 was selected for 91 days from 1 April to 30 June 2020. The missing value and abnormal value were processed to obtain the OD distribution among stations. The OD value among stations on 29 May is shown in
Figure 1.
The POI data of various metro stations and surrounding facilities are obtained from the Open Street map website. The K-Means clustering algorithm is easy to understand, operate and can be used in many fields. The algorithm is used as an unsupervised classification; it can effectively mine the internal features in a dataset [
25]. According to the proportion of land use types within 500 m of the station, these stations are divided into residential stations, commercial stations, the hub station, scenic stations and the university stations by K-means cluster analysis in SPSS. The classification results are shown in
Figure 2.
During the period of research, the passenger flow between commercial–residential stations is relatively high, while that between scenic–residential stations is relatively low, which is suitable for testing the prediction accuracy of the model. According to the classification results of
Figure 2, the historical OD values between the typical commercial station Xi’an Road Station, the scenic station Xinghai Square Station and the residential station Malan Square Station were selected as research data, as shown in
Figure 3.
3.2. Ensemble Empirical Mode Decomposition (EEMD)
EMD is a signal-processing method for non-stationary time series, and it believes that any signals can be decomposed into several simple signals with different periods and a residual signal. However, EMD method is easy to give rise to modal aliasing, that is, the signals of the same scale or frequency are divided into multiple eigenfunctions. Once the IMF experiences aliasing, the IMF is not a single frequency component, and the accurate instantaneous frequency cannot be obtained. EMMD is a noise-aided data-analysis method to make up for the shortcomings of EMD method. In EEMD, random Gaussian white noise sequence is added into the input signal, and the noised signal is decomposed by EMD. When the number of decompositions reaches the overall average number, the decomposition stops. When calculating the average number of IMF in the end, the minimum number of code components in each IMF decomposed by M times of experiments should be selected. The larger the overall average number of times M is, the closer the overall average of the random Gaussian white noise added is to zero [
26]. In M times of experiments, the average value of each IMF is shown in Formula (1).
where,
di is the
ith IMF obtained by EEMD decomposition;
m is the number of times of experiments; and
M is the overall average number of times.
3.3. Long Short-Term Memory Neural Network (LSTM)
LSTM is a memory cell designed on the basis of RNN and can select to memory important information and filter out the noise information, thereby reducing the memory burden and maintaining the long-term memory of the neural network, so that the model can also be well used for long-term sequences. For RNN, since the network layer updates information without restriction, it will make the information chaotic and easy to disappear and change, so the problem of gradient disappearance may happen. LSTM network adds forgetting unit and memory unit in the hidden layer. When there is new information input, LSTM network will filter out information which will be kept or discarded, and store important information into long-term memory. LSTM network has a relatively complex internal structure, conducts information transmission in a selective manner through unique gating unit and has a circular network structure with more complex neurons. The LSTM internal structure diagram is shown in
Figure 4 [
27].
3.4. EEMD-LSTM Model
The OD time series s(t) between metro stations is affected by the time and locations of entering or leaving the station, and often exhibits nonlinearity and non-stationarity. If s(t) is directly input to the LSTM neural network as the basic data, its own noise will interfere with the spatial–temporal relationship between the input and output passenger flow of the model identification, resulting in a decline in the prediction accuracy of the model. Therefore, it is necessary to preprocess s(t) before conducting neural network prediction. Through the original AFC data processing, the OD time series s(t) between stations is obtained. In view of the nonlinear and non-stationary characteristics of s(t) and to avoid modal superposition, this paper adopts the ensemble empirical mode decomposition (EEMD) to decompose s(t), and changes the signal extremum by adding different white noises with the same amplitude each time. After many rounds of decomposition, a series of intrinsic mode function sub-items (IMFs) are obtained and their overall average value is calculated to offset the added white noise, so as to suppress the occurrence of modal aliasing. For each IMF sub-item and residual, the corresponding LSTM models are trained, respectively. The Adam algorithm was used for model training, and it was determined through trial-and-error experiments that the number of iterations with the lowest average absolute percentage error was 200, the initial learning rate was 0.01, and the learning rate degradation factor was 0.2. After the LSTM network of each component is trained, the prediction results of each component are integrated and superimposed. Given that the result at this time is the one obtained by normalizing the data and is distributed in [0, 1], the final predicted results are obtained by conducting inverse normalization to the integrated results. Finally, the prediction results of each model are integrated to obtain the prediction results of passenger flow.
The specific steps of prediction are as follows:
The AFC data is pre-processed to obtain the OD time series s(t) between stations.
EEMD method is used to decompose s(t) to obtain n components of intrinsic mode function (IMF) and residuals.
The partial autocorrelation function (PACF) is used as an index to calculate the autocorrelation between the components of each eigenfunction and the residual, and the corresponding LSTM time step is determined.
The function components and residuals of the intrinsic model are divided into training set and test set to predict IMFs and residuals, respectively.
The predicted daily OD passenger-flow data is obtained by integrating the predicted intrinsic mode function components and residuals.
The flow chart of the prediction model is shown in
Figure 5.
3.5. Model Building
The OD value from 1 April to 9 June 2020 was selected as the training sample, accounting for about 3/4 of the total, and the OD value from 10 June to 30 June 2020 was selected as the verification sample. MATLAB was employed to construct the historical OD time series from 1 April to 9 June 2020, and then EEMD decomposition was carried out. The standard deviation of white noise was set to 0.1, the number of adding noise was set to 10, and the number of integrations was set to 100. For example, in commercial–residential OD, five IMF subsequences and one residual item were obtained after decomposition. The decomposition results are shown in
Figure 6.
The lags of IMF and residuals are synchronized with the corresponding LSTM time step and usually determined by PACF, whose confidence interval was set to 95%. The decomposition results are shown in
Figure 7. It can be seen from
Figure 7 that the time steps of component 1–5 and residual are 3, 5, 7, 7, 7, 6, in turn, and the component 3–5 with low frequency is related to the OD value of the previous 7 days. This is because the passenger-flow regularity is mostly presented with one week as a cycle.
When constructing LSTM network, the training data was normalized to have zero mean and unit variance. The IMF components obtained by EEMD decomposition were predicted, respectively. In this paper, 20 neurons were set, and the number of neurons in the output layer is 1. Since only the passenger-flow characteristics in the time dimension are considered, the input length at each time step is 1. In order to improve the learning efficiency, Adam optimization algorithm was introduced, and 200 rounds of training were set. The initial learning rate was 0.01. After 100 rounds of training, the learning rate was reduced by multiplying factor of 0.2, so as to ensure that the optimal solution was quickly approached and there was no significant fluctuation.
5. Discussion
The traditional passenger-flow prediction model achieves an effect similar to the passenger-flow characteristics by adjusting the neural network parameters [
20,
21]. EEMD is an improved algorithm for EMD that is prone to modal aliasing and can avoid modal aliasing by adding Gaussian white noise [
26]; due to its unique memory forgetting function, LSTM has an advantage over RNN and ARIMA models in dealing with long text data [
20]. Although EMD is a flexible and adaptive time-frequency data analysis method and performs good analysis and interpretation effects on nonlinear or non-stationary noise, it also has some defects: EMD does not consider the noise in the original signal that will interfere in actual conditions, so to adopt EMD to decompose signals with noise will give rise to modal aliasing, that is, signals of the same scale or frequency are divided into multiple eigenfunctions. Once the IMFs are aliased, the IMFs are not a single frequency component and cannot accurately obtain the instantaneous frequency, which is mainly related to the frequency characteristics of the original signal and the algorithm of the EMD itself [
28]. When the signal is stochastic, the modal aliasing will lead to a large fluctuation in the time–frequency distribution of the EMD decomposition results. The EMD decomposition results are poor when the signal is random. EEMD is an analysis method for the phenomenon of modal aliasing. The main idea for improvement is to insert a random Gaussian white noise sequence into the input signal, so that the original signal is continuous on different characteristic time scales, thereby eliminating the sawtooth lines appearing in the time–frequency distribution.
RNN can integrate historical information and current information well. When it comes to long text data, RNN will experience gradient dispersion and gradient explosion due to the fact that historical data transmission becomes too large or too small. When gradient dispersion occurs, the weights in the RNN will not be updated, which will eventually lead to training failure; when the gradient explosion occurs, the parameters in the RNN will change greatly and the optimal parameters cannot be obtained. In addition, the RNN is in the long text data. There will also be a phenomenon of long-distance dependence, that is, the input at the beginning has less and less influence on the subsequent moments. Compared with RNN, a forgetting unit and memory unit are added to the hidden layer of the LSTM network, which enables the model to memorize or forget information. In the model operation, the network will determine the information that needs to be retained or the information that needs to be discarded, and continue to transmit the information that needs to be retained to the next neuron.
Combining the advantages of EEMD and LSTM, the combined model for short-term passenger-flow prediction of a metro is proposed. The prediction accuracy is compared with the existing research on EMD-LSTM [
29] by using AFC data of the Dalian metro. Moreover, most existing studies employ static prediction, that is, using the same size of the training set to predict [
18,
19,
20]. By changing the size of the training set to achieve the effect of dynamic prediction, EEMD-LSTM is used to compare dynamic and static prediction accuracy. The prediction error of the EMD-LSTM model is 14.855%, and the error of EEMD-LSTM is 11.230%. In contrast, the error of the EEMD-LSTM model is reduced by 3.625%. Using the EEMD-LSTM model and the dynamic prediction method, with the continuous conversion of new data into training sets to participate in training, the error is further reduced from 12.57% to 10.06%, which is a great improvement compared to the original. The results show that EEMD-LSTM has higher prediction accuracy than EMD-LSTM in commercial–residential and scenic–residential OD prediction. On the other hand, by comparing the static prediction without changing the scale of the training set and the dynamic prediction with the gradually increasing scale of the training set, the prediction accuracy of the dynamic prediction is higher.
This study has important theoretical value and practical significance for improving the accuracy of short-term passenger-flow prediction, alleviating traffic congestion and making passengers feel more comfortable, and for the planning and operation of urban rail transit. Accurate passenger-flow prediction can better synergize the metro and other transportation modes, provide theoretical support for the development of overall urban public transportation systems, and realize the sustainable development of cities.
6. Conclusions
On the basis of the existing LSTM neural network prediction of short-term passenger flow, EEMD is used to decompose the local characteristic signal of the passenger-flow sequence at the entry and exit stations at different time, so as to weaken the interference of sample noise in the accuracy of the prediction model. With the AFC data of the Dalian metro Line 1 and Line 2 used for testing, the prediction error of the EEMD-LSTM model was reduced by 3.625% on average, compared with that of the EMD-LSTM model, indicating that EEMD-LSTM has higher prediction accuracy.
Starting from the 35-day historical data, the OD value of the next 7 days was predicted. The actual amount of the next day was added to historical data, and then the OD value of next 7 days was predicted again. By analogy, until 42 days is taken as historical data, the prediction accuracy of training samples with different historical data was compared. The results show that the average prediction error of historical samples from the 35-day one to the 42-day one decreases from 12.57% to 10.06%, and shows a trend of further decreases, indicating that the dynamic prediction has higher accuracy than the static prediction method by continuously increasing the scale of the training set.
Due to limited research resources and conditions, the long-term OD volume was not selected to verify the accuracy of the model in the case analysis, and the influence of weather, season, the epidemic situation and other factors on passenger flow were not considered. It is one-sided to only consider the comparison with the EMD-LSTM model, but despite this, in terms of error improvement, data preprocessing, etc., the prediction model and general rules constructed in this study can still be used as reference for similar studies. The problem of the small volume of case object can be gradually overcome in subsequent research.