Passenger Flow Forecasting in Metro Transfer Station Based on the Combination of Singular Spectrum Analysis and AdaBoost-Weighted Extreme Learning Machine

The metro system plays an important role in urban public transit, and the passenger flow forecasting is fundamental to assisting operators establishing an intelligent transport system (ITS). The forecasting results can provide necessary information for travelling decision of travelers and metro operations of managers. In order to investigate the inner characteristics of passenger flow and make a more accurate prediction with less training time, a novel model (i.e., SSA-AWELM), a combination of singular spectrum analysis (SSA) and AdaBoost-weighted extreme learning machine (AWELM), is proposed in this paper. SSA is developed to decompose the original data into three components of trend, periodicity, and residue. AWELM is developed to forecast each component desperately. The three predicted results are summed as the final outcomes. In the experiments, the dataset is collected from the automatic fare collection (AFC) system of Hangzhou metro in China. We extracted three weeks of passenger flow to carry out multistep prediction tests and a comparison analysis. The results indicate that the proposed SSA-AWELM model can reduce both predicted errors and training time. In particular, compared with the prevalent deep-learning model long short-term memory (LSTM) neural network, SSA-AWELM has reduced the testing errors by 22% and saved time by 84%, on average. It demonstrates that SSA-AWELM is a promising approach for passenger flow forecasting.


Introduction
As an import part in urban public transit, metro transit has developed rapidly and attracted a quantity of passengers in recent years. It is a great challenge for operators and design-makers to optimize the metro schedules and organize the passengers in the stations effectively. Accurate and timely short-term passenger flow forecasting is the fundament of intelligent transport systems (ITS) [1]. The prediction results not only offer evidence for passenger guidance to prevent congestion and trampling [2] but, also, provide necessary information for the metro schedule coordination scheme to match the metro capacity with the passenger flow demand.
As the connections of different metro lines, transfer stations are crucial in metro networks. Some researchers utilized the complex network theory to investigate the characteristics of the metro networks such as Beijing [3], Shanghai [4], Guangzhou [5], and some other cities [6]. The findings of their studies indicated that transfer stations played the most significant role in the networks. Some of them [3,4] suggested that the transfer stations should be paid more attention to. In addition, the passenger flow in the transfer station is usually much larger than that in a regular station, and the passenger flow increases more rapidly at the rush hours in the morning and evening. This is because transfer stations are usually located in areas with large travel demands-for instance, a transportation hub, business district, and so forth. Therefore, in order to avoid pedestrian congestion or early warnings of burst passenger flows for operators, it is vital to forecast the passenger flow accurately and timely in a transfer station.
The passenger flow is defined as the number of boarding or alighting pedestrians at the target station during a constant interval in the prediction tasks [7,8]. In previous studies, the collection of passenger flows mainly includes two ways, as follows:

1.
Videos. The passenger flow videos are generally used to extract the passenger trajectories through image-processing techniques. The extracted data can help researchers to investigate and analyze passenger behaviors [9].

2.
Automatic Fare Collection (AFC) systems. Based on AFC systems, the passenger boarding and alighting information is recorded by the sensors in turnstiles automatically, and the recorded data is easy to access. The AFC systems are initially designed and employed to charge the passengers automatically. Since the AFC systems can also record some extra information of the passengers (i.e., personal identification, boarding/alighting time, boarding/alighting station, etc.), the AFC data has been used in the researches of transportation engineering. These studies are mainly focused on four fields: prediction of passenger flow [2,7,[10][11][12], analysis of passenger flow patterns [13], investigation of passenger behaviors [14,15], and evaluation of metro networks [3,6].
The task of passenger flow prediction is quite similar to traffic flow prediction [7,8,12,16], which is only different in the input data of the models. Therefore, many practical models of traffic flow prediction could be referred to as well. In the studies to date, the passenger/traffic flow prediction approaches are roughly classified into four categories, as listed below: 1.
Parametric models. Due to a low computation complexity, parametric models are widely used in early studies-for instance, autoregressive integrated moving average (ARIMA) [17,18], Kalman filter (KF) [11], exponential smoothing (ES) [19], and so on. However, these models are sensitive to passenger flow patterns, since they are established based on the assumption of linearity. 2.
Nonparametric models. In order to capture the nonlinearity of passenger flow, the nonparametric models are introduced in subsequent researches, such as K-nearest neighbor (KNN) [20,21], support vector regression (SVR) [7,10], artificial neural network (ANN) [1,22], etc. The empirical results from these studies have suggested that the nonparametric models usually performed better than parametric models when the data size was large. It is owing to the ability of nonlinearity modeling.

3.
Hybrid models. The hybrid models are the combination of two or more individual methods. Due to both the linearity and nonlinearity of passenger flow, the hybrid models [2,[23][24][25][26] are proposed to capture these two natures to increase the prediction accuracy. Both theoretical and empirical findings have demonstrated that the integration of different models can take full advantage of these models. Thus, this is an effective way to improve the predictive performance.

4.
Deep-learning models. Besides the aforementioned three kinds of models, according to the latest researches, the deep-learning methods have been introduced and developed in the passenger flow forecasting problem, including long short-term memory (LSTM) [12,16,27], deep belief network (DBN) [28], stacked autoencoders (SAE) [29], convolutional neural network (CNN) [12,30], etc. Due to the universal approximation capability of complex neural networks, the deep-learning models can approximate any nonlinear function in theory [24,31]. From the findings of these studies, deep-learning models usually show a superiority of high forecasting accuracy to parametric and nonparametric models. However, because of high computation complexity, the deep-learning models will require significant resources and training time [32]. In addition, these models are usually regraded as a "black box" [23] and lack interpretability of the results [32].
In recent studies, the combination of time series decomposition approaches is a novel research interest of the hybrid models to make a better predictive performance. The principle of this kind of model is that a complicated time series can be simplified through disaggregating the sequence into multiple frequency components. The decomposed components are forecasted separately, and then, these predicted results are summed as the final outcomes. The widely used time series decomposition methods include: wavelet decomposition (WD) [25,33], empirical mode decomposition (EMD) [2,26,34], Seasonal and Trend Decomposition Using Loess (STL) [35,36], singular spectrum analysis (SSA) [37][38][39], and so on. Sun et al. [25] and Liu et al. [33] employed the WD approach to decompose the original passenger flow into several high-frequency and low-frequency sequences, and then, these sequences were forecasted based on least squares SVR by Sun et al. [25] and extreme learning machine (ELM) by Liu et al. [33], respectively. Chen et al. [2], Wei and Chen [26], and Chen and Wei [34] all proposed that the passenger flow could be regarded as a nonlinear and nonstationary signal, and they utilized EMD to decompose the original passenger flow into nine intrinsic mode functions (IMF) components and one residue. Wei and Chen [26] predicted the disaggregated components through ANN, while Chen et al. [2] predicted them through LSTM. Qin et al. [35] utilized STL to disaggregate the monthly air passenger flow into three subseries: seasonal, trend, and residual series. Then, they developed the Echo State Network (ESN) to forecast each decomposed series. Chen et al. [36] also employed STL to decompose the daily metro ridership, and LSTM was used in the prediction stage. As for the SSA method, to the best of our knowledge, this method has never been introduced to an analysis passenger flow to date, although this method was devolved for traffic flow prediction. Mao et al. [37], Shang et al. [38], and Guo et al. [39] all have developed this method to analyze the traffic flow time series and obtained several components with different amplitudes and frequencies. Then, they reconstructed these components into a smoothing part and residue. In this way, the SSA could be regarded as a filter to remove noise from the original sequence. During the stage of forecasting, the denoise data was predicted by ELM [38] and a grey system model [39], respectively. Overall, these studies have clearly indicated that the hybridization of time series decomposition approaches can make an obvious improvement on the predictive accuracy. However, all the aforementioned literatures have failed to investigate the potential characteristic of passenger flow from the decomposed results.
In this study, a novel hybrid model (i.e., SSA-AWELM), SSA combined with an AdaBoost-weighted extreme learning machine (AWELM), is proposed to achieve more accurate predicted results for the metro passenger flow. The experimental data, recorded by the sensors in turnstiles, is collected from an AFC system. The main works of this paper are briefly described as follows:

1.
The SSA approach is developed to decompose the original passenger into three components: trend, periodicity, and residue. Investigation of the three components can discover the inner characteristics of the original data.

2.
The ELM improved by AdaBoost (i.e., AWELM) is developed to forecast the three components. ELM, a neural network famous for fast computer speeds, is implemented, and the prediction performance is enhanced through AdaBoost ensemble learning. Thus, the hybrid model SSA-AWELM has the advantage of both accuracy and speediness for passenger flow forecasting.

3.
Multistep-ahead prediction of the passenger flow is established, which can offer more information of the future. A dataset collected from a metro AFC system is utilized to carry out the prediction tests and comparative analysis.
The rest of this paper is organized as follows: In Section 2, the problem is defined, and the proposed method is formulated. In Section 3, the procedures of data collection, data preprocessing, and design of the experiment are elaborated. The results and findings are analyzed and discussed in Section 4. At last, the conclusions are drawn in Section 5.

Materials and Methods
In this section, the AFC system is briefly introduced, and the passenger flow forecasting problem is explained in detail. In particular, the model SSA-AWELM is formulated to improve upon the performance of predictions.

Automatic Fare Collection Systems
The automatic fare collection (AFC) systems are established on the Internet of Things (IoT) and wireless sensor networks (WSN). As displayed in Figure 1, a typical AFC system consists of five hierarchical levels: cleaning center (CC), line centers (LC), station computers (SC), station equipment, and smart tickets and cards, from top to bottom [40]. A passenger touches a smart ticket or card, which has an integrated circuit (IC) clip (a type of microsensor) inside, to a turnstile when boarding or alighting; meanwhile, the sensor in the turnstile will respond and record some necessary information. Then, the information will be transmitted to the SC, LC, and, finally, to the CC. In addition, there are a few differences between boarding and alighting. When the passenger alights and passes a turnstile, the sensor will compute the traveling mileage and charge the fare automatically, and this transaction could be completed in milliseconds. The AFC system can not only be employed by operators to collect the fares from passengers conveniently. For researchers, what is the most important is that the data mining results from the recorded information could assist with analyzing the operational quality, since the records include the personal identification, boarding/alighting station, boarding/alighting time, and some other useful information. Based on AFC systems, the passenger boarding and alighting information could be recorded by the sensors in turnstiles automatically, and the recorded data could be accessed easily. This makes it possible to realize real-time predictions of the metro passenger flow.

Passenger Flow Forecasting Problem
As mentioned in Section 1, passenger flow is the sum of boarding or alighting pedestrians during a constant interval (i.e., 5 min, 10 min, etc.) in the target station. Suppose x t denotes the entrance or exit passenger flow at the time t, then it is obvious that x t varies with the time. The passenger flow forecasting problem can be treated as a time series forecasting task, and the passenger flow time series takes the instinct of temporal dependence. In other words, the passenger flow is highly related to the historical data. Therefore, the research problem addressed in this paper is to forecast x t by the historical passenger flow data {x t−1 , x t−2 , x t−3 , . . . , x t−n }, which is formulated as follows: wherex t represents the predictive value at time t, E(·) represents an established prediction model, and n represents the order of time lagging. Although the single-step passenger flow forecasting has been widely studied, in order to provide travelers and managers with further information about passenger flow, multistep forecasting is necessary. In our study, the iterated strategy, which is widely used in time series predictions [41,42], is adopted for multistep passenger flow forecasting. As Equation (2) expresses, based on the established model with single-step prediction, the iterated strategy inputs a prediction value into the same model to forecast the value at the next time. It continues in this manner until reaching the maximum prediction horizon. The iterated strategy has two outstanding advantages. One is that the model just requires being trained once, and the other is that the prediction steps are unlimited.

Singular Spectrum Analysis
Singular spectrum analysis (SSA) is a time series analysis approach without any statistical assumptions [43]. It can decompose the original data into several components. This method has been widely used to decompose the time series including traffic flow [37][38][39]. In this study, this approach is implemented to analyze the passenger flow. Suppose Y(t) (t = 1, 2, . . . , N) denotes the original passenger flow sequence with length N. The processes of the SSA approach contains four steps, as follows: Step 1: Embedding The original sequence Y(t) is transformed into the trajectory matrix F ∈ R L×K , which is calculated as the following equation: where L is window length, K = N -L + 1, and f i is the ith (1 ≤ i ≤ N) value of the original sequence.

Step 2: Singular Value Decomposition (SVD)
The SVD algorithm is conducted to decompose the trajectory matrix F, computed as follows: where Σ is diagonal matrix, and the diagonal elements ( are the singular values of F. Vectors U i and V i , which are the ith column of matrix U and V, represent the left and right singular vectors, respectively. d represents the number of singular values, and it is also the rank of trajectory matrix F. The collection U i , Every eigen triple can reconstruct an elementary matrix F i of trajectory matrix F: Thus, the sum of all elementary matrixes F i is identical to the trajectory matrix F. The contribution of elementary matrix F i is measured by the corresponding eigen value (equal to the square of the singular value) as the following equation: Step 3: Grouping is regarded as one group, and the elementary matrixes F i (i ∈ I m ) in each group are summed. In previous papers, the w-correlation method [43] is prevalent to split the results set. However, this method is conducted from the perspective of signal analysis, which lacks the interpretability for passenger flow. In this study, the elementary matrixes F i are grouped into three parts of trend F T , periodicity F P , and residue F R , expressed as Equation (7), and this process is detailed in Section 4.1.
Step 4: Diagonal averaging The grouped matrixes Then, every element y i of the time series Y i (t) is computed as the following equation: As such, the original passenger flow Y(t) is disaggregated into three components of trend T(t), periodicity P(t), and residue R(t).

AdaBoost Ensemble Learning
As a strategy of ensemble learning, AdaBoost was originally proposed by Freund and Schapire [44] for classification problems. Drucker [45] developed this algorithm in the application of a regression problem, and it was improved upon by Solomatine and Shrestha [46,47]. With the integration of a few homogenous models (called base learners), this method can improve the performance of base learners. In this study, the AdaBoost algorithm is utilized to assist the ELM to predict the passenger flow more accurately.
Supposing a dataset (x i , y i ) N i=1 with N samples, T is the maximum iteration number. The specific steps of AdaBoost is presented as the following: Step 1: Initialize the distribution of sample weights: Step 2: For the training process of each iteration, t = 1, 2, . . . , T.
Step 2.1: Use the dataset with a distribution of Γ t to train the WELM and obtain the base learner E t (x).
Step 2.2: Calculate the absolute relative error of each sample and the error rate of E t (x): where (E t (x n ) − y n )/y n represents the absolute relative error of each sample; ε t is the error rate of E t (x); and n = 1, 2, . . . , N is the index of the sample. n : (E t (x n ) − y n )/y n > ϕ represents that only the error for any particular sample is greater than the preset error, the so-called threshold ϕ; the corresponding sample will be considered. ϕ is a preset parameter and will be discussed at the end of the present subsection. More details are described in [47].
Step 2.3: Calculate the coefficient for updating the sample weights: where k is the power coefficient of error rate ε t requiring to be preset. According to the study of Solomatine and Shrestha [47], k is selected from 1 (linear law), 2 (square law), and 3 (cubic law). A high value of k may cause the algorithm to become unstable. Thus, k is set as 1 in our study.
Step 2.4: Update the distribution of sample weights: where Z t is a normalization factor, such that N n=1 γ t+1,n = 1.
Step 3: Update t = t + 1 and loop Step 2.1 to 2.4 until reaching the maximum iteration number T. Finally, the output is computed as: The AdaBoost algorithm is sensitive to the threshold ϕ. If the ϕ is too low, the model will be underfitting. On the other hand, too high a value of ϕ will raise overfitting problems. In our study, the threshold ϕ is set adaptively according to the median of absolute relative errors ε t during each iteration, expressed as the following equation: As presented in the above steps, AdaBoost is an iteration process. The base learner will be trained, and the distribution of the sample weights will be updated during each iteration. Thus, if the base learner is complex and spends lots of computing time, the consuming time of AdaBoost will increase linearly. In this study, ELM is adopted as the base learner, which is famous for its fast training speed. This model is elaborated in the next subsection.

Weighted Extreme Learning Machine
Extreme learning machine (ELM) is a kind of single hidden layer feed-forward network (SLFN), which is proposed by Huang at el. [48]. Compared with traditional ANN models, ELM does not need to tune the input weights and hidden layer biases during training. After the initialization of the ELM, the input weights and hidden biases are fixed, and only the output weights are optimized. Therefore, the training process of ELM is faster than the traditional ANN model. Since weighted samples are used to train the base learners of AdaBoost, the weighted extreme learning machine (WELM) is developed in this study.
with N samples, and x i = [x i,1 , x i,2 , . . . , x i,P ] T ∈ R P×1 and y i = y i,1 , y i,2 , . . . , y i,Q T ∈ R Q×1 , γ i represent the input vector, output vector, and sample weights, respectively. The output of ELM with h hidden neurons is expressed as: where w h = w h,1 , w h,2 , . . . , w h,3 T represents the connection weights from the input layer to the hth hidden neuron; b h represents the bias in the hth hidden neuron; β h = β h,1 , β h,2 , . . . , β h,Q T represents the connection weights from the hth hidden neuron to the output layer; and g(·) is the activation function, and the sigmoid function is adopted in this study, which is formulated as g(·) = 1/(1 + e −x ). Since w h and b h are assigned initially, Equation (15) can be simplified as: where β = [β 1 , β 2 , . . . , β H ] T ; Y = y 1 , y 2 , . . . , y N T ; and H is the output matrix of the hidden layer, expressed as: The purpose of ELM is to optimize β with the object of the minimum mean square error cost function, which is expressed as min where diag(Γ) is the diagonal matrix with the diagonal of Γ, and the solution of Equation (18) is: Overall, the output weights β of the WELM can be computed according to Equation (19) directly. It is different to the training process of traditional ANN, which is an iteration process to update connecting weights and neuron biases. This is the reason why ELM costs much less computing time than the traditional ANN.

The Hybrid Model
The model combination of a singular spectrum analysis and AdaBoost-weighted extreme learning machine is proposed to forecast the passenger flow in this paper, symbolized as SSA-AWELM. The flow chart of this hybrid model is displayed in Figure 2, and the special process of it is described as follows: Step 1: SSA for decomposition. The original passenger flow is decomposed into several components by SSA approach, and these components are grouped into three parts of trend, periodicity, and residue.
Step 2: AWELM for components forecasting. The WELM improved by AdaBoost (AWELM) is implemented to model and predict the three components, separately.
Step 3: Integration for final forecasting results. The final outcomes of forecasting the passenger flow are calculated by summing the predicted results of the three components.

Data Collection
In this paper, the passengers' alighting and boarding dataset is collected from the AFC system of Hangzhou metro in China. The dataset is online and provided by Ali Tianchi [49]. This dataset recorded detailed information when the passengers passed the turnstiles. The duration of the data was from the 1st to 26th in January 2019. The dataset includes seven fields, and they are listed in Table 1. In addition, some samples of the dataset are provided in Table 2.

Data Preprocessing
The preprocessing is to obtain a passenger flow time series data from the raw AFC dataset. In this study, the passenger flow data of the Qianjiang Road Station (Q.R. Sta.) and JinJiang Station (J. Sta.) are selected to conduct experiments. As displayed in Figure 3, the Q.R. Sta. is a transfer station between Line 2 and Line 4, and it is located in the Qianjiang New Town Central Business District (CBD). The Jinjiang Station is a transfer station between Line 1 and Line 4, which is located in Wangjiang New Town. According to previous studies [11,50], the raw recorded data are usually aggregated into 5-min intervals to obtain the passenger flow sequence. In order to keep the complete cycle periods of the sequence data, three continuous weeks, which were from the 6th to 26th of January, were selected from the AFC dataset. The time range selected was from 6:00 to 23:00 according to the operation time of the Hangzhou metro system, though a few records in the AFC dataset were out of this range. At last, there were 204 samples on average in one day and 4284 samples in total. Furthermore, the exit and entrance passenger flow sequences were computed separately. Hence, four experimental datasets were established, and they were used to test the proposed model, respectively.
The extracted passenger flow sequences are presented in Figure 4. Both the exit and entrance passenger flows on weekdays have distinct peaks in the morning (about from 8:00 to 9:00) and evening (about from 18:00 to 19:00) rush hours, while these patterns disappear on the weekends. Moreover, the peak patterns of exit and entrance passenger flows on weekdays are different. Taking the Q.R. Sta. as an example, the exit passenger flow in the morning rush hour (about 500 pedestrians per 5 min) is approximately 2.5 times larger than that in the evening rush hour (about 200 pedestrians per 5 min). On the contrary, the entrance passenger flow in the evening rush hour (about 300 pedestrians per 5 min) is approximately 1.5 times larger than that in the morning rush hour (about 200 pedestrians per 5 min). These results indicate that most passengers in this station are commuters. This finding agrees with the location of this station, i.e., it is in the Qianjiang New Town CBD and surrounded by numerous office buildings. The four datasets are all split into training datasets (i.e., the 6th to 19th of January) and testing datasets (i.e., the 20th to 26th of January). The grid search and 5-fold cross-validation methods are used to evaluate the training performance and determine the hyper-parameters of the models. Then, the models with determined hyper-parameters are evaluated by testing the datasets.

Comparison Models and Evaluation Measures
In order to demonstrate the contributions of the proposed SSA-AWELM model, the classical time series model ARIMA and four extra models based on the neural network, including ANN, LSTM, ELM, and AWELM, are tested as benchmarks. They are listed as follows: • ARIMA: ARIMA is a classical statistical model for time series forecasting. It is widely used to predict traffic flow and passenger flow in early studies [17]. The performance of ARIMA is affected by three parameters: autoregressive order p, difference order d, and moving average order q. Generally, d is set based on the stationarity test, and the p and q are selected from the range of [0,12] based on the Bayesian information criterion (BIC) [51].
• ANN: Due to the ability of nonlinearity, the ANN model is widely used in time series modeling, including passenger flow forecasting. A typical ANN model consists of three parts: one input layer, one hidden layer, and one output layer and optimized through a back-propagation algorithm (thus, it also aliased as BPNN). In this study, the ANN model is optimized by a stochastic Adam algorithm with a mean square error (MSE) loss of function. The learning rate is set as 0.001, the batch-size is 256, and the epochs is 1000. To make sure that every model can achieve the best performance, the well-established grid search and 5-fold cross-validation methods are adopted to determine the hyper-parameters. The neuron number of the hidden layers in four neural network models are all selected from 2 to 50 with step 2, and the base learner of AWELM is selected from 1 to 20 with step 1. The determined hyper-parameters of each model are displayed in the Appendix A (see Table A1). In addition, the input and the output of the models are respectively set as 12 and 1 during training, and the horizon of the multistep-ahead prediction is set as 6. In other words, the passenger flow data at the last hour is used to forecast the next half-hour.
In order to accelerate learning and convergence during the training model, the min-max normalization approach (expressed as Equation (21)) is employed to scale the input data into the range of [0,1] before feeding it into the models. In addition, to obtain the final prediction results, the outputs of the models are rescaled by the reversed min-max normalization approach (expressed as Equation (21)).
In order to evaluate the performances among models, two common measures are introduced in this study. They are the mean absolute error (MAE) and root mean square error (RMSE), computed as follows: where y n andŷ n are the true value and predicted value, respectively, and N is the number of samples.
Besides the aforementioned two measures, the Diebold-Mariano (DM) test [52] is implemented to test the statistical significance between the proposed model and the benchmark models. The null hypothesis is that the prediction accuracy of the tested model E T (x) is equal to the reference model E R (x). In this study, the square error is adopted to measure the model loss, expressed as e i = (ŷ i − y i ) 2 . Then, the DM statistic is defined as follows: where g = N n=1 g n /N, g n = (ŷ T n − y n ) 2 − (ŷ R n − y n ) 2 ,V g = γ 0 + 2 P−1 k=1 γ k , and γ k is the autocovariance at lag k, expressed as γ k = (1/N) N i=k+1 (g i − g)(g i−k − g).ŷ T n andŷ R n respectively represent the predicted values of model E T (x) and E R (x), P is the prediction horizon, and N is the scale of the testing data.

Analysis of SSA Decomposition
As mentioned in Section 2.3.1., window length L is the only parameter that requires to be determined before decomposition. From previous studies [37][38][39], if the time series data shows obvious periodicity, the window length L could be set as one period length. Thus, L = 204, because the passenger flow cycles daily (see Figure 4), and 204 samples on average are collected in one day (has been illuminated in Section 3.2.). Then, the original passenger flow can be disaggregated into 204 components. These components are grouped into the three parts of trend, periodicity, and residue, and this is inspired by the study [53] about using SSA to analyze the variants of electricity prices. To facilitate the analysis, taking the dataset of the Q.R. Sta. as an example, the eigen values of each component are plotted in Figure 5. Taking Figure 5a as an example, it is clear that the first eigen value is significantly larger than the others, and the corresponding component is extracted separately as trend parts. Moreover, the eigen values curve declines slowly after the 23rd component, and the 23rd is regarded as the "break point". Then, the components from the 2nd to 23rd are reconstructed into periodic parts, and the remainder components from the 24th to 204th are reconstructed into residual parts. This is the same way as the entrance passenger flow in Figure 5b, and the "break point" is 13. Then, the components from the 2nd to 13th are reconstructed into periodic parts, and the remainder components from the 14th to 204th are reconstructed into residual parts. Finally, the obtained trend, periodicity, and residues of the original passenger flow are displayed in Figure 6.
As shown in Figure 6, every component can reveal different patterns of the original passenger flow. The trend represents the overall tendency, and the periodicity represents the variants within a day. Furthermore, it could be found in the trend that the passenger flow on weekdays is larger than that on weekends. In the periodical component, the passenger flow shows distinct peaks in the morning and evening rush hour on weekdays, but this is not obvious on the weekends. The peak patterns are different between the exit and entrance passenger flows: the exit passenger flow in the morning rush hour is much larger than that in the evening rush hour, and the entrance passenger flow is contrary to that. As for the residue, it fluctuates irregularly and can be treated as noise.

Analysis of Hyper-Parameters
The performance of SSA-AWELM is highly dependent on the forecasting model AWELM of each component, and the AWELM has two hyper-parameters: the number of base learners (i.e., WELM) T and the hidden neurons of WELM H. The well-established grid search and five-fold cross-validation methods are adopted to determine the T and H. The H is selected from 2 to 50 with step 2, and T is selected from 1 to 20 with step 1. Taking the dataset of the Q.R. Sta. as an example, the process of hyper-parameter selection is displayed in Figure 7, and the log transformation is applied to the MSE to distinguish different values clearly. It could be found that AWELM is sensitive to the hidden neurons H but insensitive to base learner number T. Finally, the determined hyper-parameters H and T of AWELM are provided in Table A1 (see Appendix A).

Analysis of Forecasting Results
For the sake of a comparison analysis, the average evaluation measures of the forecasting results across all the six prediction horizons are presented in Table 3, and the scatter points of the true and predicted values are displayed in Figure 8. From Table 3, it is worth noting that the proposed SSA-AWELM performs best among all the models, followed by LSTM, ANN, AWELM, ELM, and ARIMA. Compared to LSTM, the RMSE and MAE of SSA-AWELM respectively reduced by 22.5% and 21.3% on average in the case of the Q.R. Sta. and reduced by 23.6% and 20.0% on average in the case of the J. Sta. AWELM performs a little better than ELM, which indicates the AdaBoost algorithm can reduce the prediction errors but with limitations. As expected, ARIMA is always inferior to other models, because it is a linear model. In addition, it can be seen in Figure 8 that the scatter points in SSA-AWELM are closest to the expectation line, and the corresponding coefficient of determination R 2 is largest. All the above findings can prove that the proposed SSA-AWELM is an effective approach to improve the accuracy of passenger flow forecasting. Furthermore, to compare the consuming time of different models, the training time is provided in Table A2 (See Appendix A).

Analysis of Multistep-Ahead Forecasting
In order to analyze the multistep forecasting errors, the evaluation measures of each prediction horizon are displayed in Figure 9. The DM test results of the comparison between the proposed SSA-AWELM and benchmarks are presented in Table 4. From Figure 9, it can be seen that the prediction errors of every model increase along the prediction horizons. This is caused by the cumulative errors, which stems from feeding prediction values into the models for multistep-ahead forecasting, and the cumulative errors are inevitable. What stands out in Figure 9 is that the proposed SSA-AWELM always performs best in every prediction horizon, and the errors increase slowest in comparison with the other models. This indicates that the SSA-AWELM can improve the robustness and restrict the propagation of the cumulative error during multistep-ahead forecasting. A reasonable explanation for this finding is that the SSA can decompose the original passenger flow into the three components of trend, periodicity, and residue. Each component holds individual characteristics that can be modeled more easily than the original complex data. Furthermore, compared with ELM, AWELM preforms slightly better. It suggests that AdaBoost can improve the accuracy of ELM but with limitations. Only combining with AdaBoost cannot promote the forecasting accuracy significantly. From Table 4, generally speaking, the proposed SSA-AWELM almost always outperforms the other models at a highly significant level. There are some exceptions when compared with LSTM for the exit passenger flow. In these situations, SSA-AWELM still performs better than LSTM but not always with a highly significant level. This might because LSTM has the advantage of capturing more temporal characteristics in terms of the exit passenger flow. Overall, these findings suggest that the proposed SSA-AWELM is outstanding during multistep-ahead predictions. These results prove that the SSA-AWELM is a robust approach for passenger flow forecasting.

Conclusions
This paper studied the passenger flow forecasting and proposed a novel model SSA-AWELM. In the model, the SSA was developed to decompose the original data into the three components of trend, periodicity, and residue; then, the AWELM was developed to forecast each component separately. The three predicted results were summed as the final outcomes. In order to demonstrate the effectiveness of the proposed model, the passenger flow in two transfer stations, which were extracted from an AFC system, were utilized to carry out prediction testing and a comparison analysis. The main conclusions are drawn and listed as follows: 1.
The SSA approach can get an insight into the inner characteristics of the passenger flow. The trend represents the overall tendency, the periodicity represents the variants within a day, and the residue represents noise. 2.
The AWELM model, which is combined by AdaBoost and WELM, are developed to make a more accurate and faster prediction for each component. Compared to the state-of-the-art model LSTM, the propose model has improved upon the performance by 22% and saved time by 84%, on average.

3.
From the results of the evaluation measures and DM statistical test, the proposed model SSA-AWELM can reduce the cumulative errors during the multistep-ahead prediction. These findings have demonstrated that the SSA-AWELM is a robust model for passenger flow forecasting.
The proposed method in this paper still retains two limitations that will be addressed in the future. One is that the testing cases are in only two transfer stations with large travel demands, and the other is that the passenger flows are collected under regular conditions. Thus, in further studies, more cases including regular stations will be tested and discussed. In addition, the passenger flows during some special incidents, such as extreme weather, passenger control, etc., will be focused on to extend the proposed model.