Next Article in Journal
Studies of Organic Matter in Composting, Vermicomposting, and Anaerobic Digestion by 13C Solid-State NMR Spectroscopy
Next Article in Special Issue
Learning Hierarchical Representations for Explainable Chemical Reaction Prediction
Previous Article in Journal
Xanthone Derivatives in the Fight against Glioblastoma and Other Cancers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Deep Learning Model for Multi-Station Classification and Passenger Flow Prediction

1
College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
2
Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen 361024, China
3
Department of Information Management, Chaoyang University of Technology, Taichung 413, Taiwan
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(5), 2899; https://doi.org/10.3390/app13052899
Submission received: 3 January 2023 / Revised: 14 February 2023 / Accepted: 17 February 2023 / Published: 23 February 2023
(This article belongs to the Special Issue Machine Learning on Various Data Sources in Smart Applications)

Abstract

:
Multiple station passenger flow prediction is crucial but challenging for intelligent transportation systems. Recently, deep learning models have been widely applied in multi-station passenger flow prediction. However, flows at the same station in different periods, or different stations in the same period, always present different characteristics. These indicate that globally extracting spatio-temporal features for multi-station passenger flow prediction may only be powerful enough to achieve the excepted performance for some stations. Therefore, a novel two-step multi-station passenger flow prediction model is proposed. First, an unsupervised clustering method for station classification using pure passenger flow is proposed based on the Transformer encoder and K-Means. Two novel evaluation metrics are introduced to verify the effectiveness of the classification results. Then, based on the classification results, a passenger flow prediction model is proposed for every type of station. Residual network (ResNet) and graph convolution network (GCN) are applied for spatial feature extraction, and attention long short-term memory network (AttLSTM) is used for temporal feature extraction. Integrating results for every type of station creates a prediction model for all stations in the network. Experiments are conducted on two real-world ridership datasets. The proposed model performs better than unclassified results in multi-station passenger flow prediction.

1. Introduction

With the rapid development of urban public transportation (UPT), passenger flow prediction is very significant in meeting passengers’ travel needs, which is one of the important issues in improving the UPT services.
Recently, the research on passenger flow prediction has been converted from a single station to multiple stations because multi-station passenger flow prediction is more applicable in UPT. Due to the complex spatial features and time-varying traffic patterns of networks [1], passenger flow at a single station has been simultaneously affected by the spatio-temporal features of historical passenger flow at the directly or indirectly connected stations in the whole network [2]. Thus, a single station passenger flow prediction model could not dynamically and effectively predict the spatio-temporal distribution and congestion in the entire network, which limits the real-time passenger flow organization, formulation, and adjustment of the operation management strategy [3]. To this end, more and more researchers have devoted themselves to passenger flow prediction for multiple stations. How to deeply capture the complex spatio-temporal features to make a more accurate passenger flow prediction model for the whole network stations is becoming a hotspot in recent studies.
In addition, many well-performing deep learning models continue to emerge, focusing on capturing spatio-temporal correlation between stations by constructing spatio-temporal feature learners (STFL) in traffic inflow and outflow prediction [4]. Bogaerts et al. [5] proposed a deep neural network that simultaneously extracted the spatial features using graph convolution neural network (GCN) and the temporal features using long short-term memory (LSTM) to make both short-term and long-term traffic flow predictions. Li et al. [6] proposed a deep learning model combining convolutional LSTM (ConvLSTM) and stack autoencoder (SAE) to predict the short-term passenger flow of URT for multiple stations. ConvLSTM was used to extract spatio-temporal features of passenger flow based on thirteen external factors related to passenger flow. Zhang et al. [7] proposed a deep learning-based model named GCN-Transformer, which comprised the GCN for spatial features extracting and the modified Transformer for temporal features extracting for short-term passenger flow prediction of multiple stations.
The above models are all hybrid models which capture the spatio-temporal features simultaneously to predict passenger flow at stations in the whole network. However, the passenger flow at the same station in different periods presents different characteristics, and various stations in the same period also have different passenger flow changes. We illustrate the differences by using Figure 1 as an example.
Figure 1a,b show one-week inbound and outbound passenger flow from 4 March 2019 to 8 March 2019 at Xianhou Station and Dongzhai Station in the Xiamen bus rapid transit (BRT), respectively. Both stations had clear characteristics of inbound and outbound tidal flows in their respective time intervals, but the difference was that the largest volume of inflow and outflow between the two stations was nearly 16 times in the same period. Their variability was also quite different.
These indicate that globally extracting the spatio-temporal features for multi-station passenger flow prediction in the whole network may not be powerful enough to achieve the expected performance for every station [8]. Making a more accurate multiple-station passenger flow prediction model for every station is necessary and significant.
Furthermore, most existing studies on urban metro station classification were mainly based on the features of land location [9], point of interest (POI) [10,11,12], population distribution [11], station location [11,13], length of road network [11], passenger flow [9,11,12,14,15], and their combinations. These studies were mainly divided into two directions [16]. The first was “place oriented,” which focused on land use function. And the other was “station oriented,” which focused on station function. To our best knowledge, passenger flow is used as one of the factors in many existing studies for station classification, not as the only factor. There needs to be a study that uses passenger flow as the only feature for multi-station classification based on the similarities of passenger flow among stations. Thus, the existing research on station classification was much more applicable for urban planning and station layout planning, rather than for passenger flow prediction. Furthermore, the previous studies preferred to visualize the passenger flow in the same type of station [9,11,13,14] to verify the effectiveness of station classification, which were not objectivity.
Considering the historical passenger flow is the most important influence factor in passenger flow prediction. Some scholars have classified passenger flow in different time intervals for passenger flow prediction. For example, Wang et. al [17] designed an adaptive K-Means to cluster the time intervals with similar passenger flow at Shenzhen North Railway Station. Passenger flow belonging to the same category had the same time interval tag. Then, this tag combined with the historical passenger flow was used in the passenger flow prediction task. Tan et al. [18] used K-Means to divide the 10-month passenger flow into 16 categories, and then built 16 sub-prediction models at Chengdu East Railway Station. Although the above studies have classified passenger flow classification to further improve the accuracy of passenger flow prediction, they are only for a single station.
To sum up, there are few studies that applying station classification on multi-station passenger flow prediction. Inspired by the fact that the same types of stations have more similar passenger flow, classifying the stations based on the pure passenger flow and then predicting the passenger flow for every type of station with similar flows may be a more effective strategy to improve the prediction accuracy; therefore, we have proposed a novel multi-station passenger flow prediction model that consists of a Transformer encoder, K-Means, Residual Network (ResNet), graph convolution network (GCN), and attention long short-term memory ((Transformer-K-Means)-(ResNet-GCN-AttLSTM)), which can better extract the spatio-temporal features for the same types of stations with similar flows in the whole network. This model uses a two-step strategy: classification and prediction, to achieve a better performance for multi-station passenger flow prediction. To our best knowledge, this is the first time to apply station classification before the downstream task of passenger flow prediction. The main contributions of this paper are summarized as follows:
(1) We propose a novel unsupervised clustering method for station classification using pure passenger flow data. First, this method applies a Transformer encoder to extract the spatio-temporal features from the inflow and outflow data, and then it applies the extracted spatio-temporal features to K-Means for station classification.
(2) Quantitatively, two novel evaluation metrics have been introduced to verify the effectiveness of the results of station classification.
(3) Based on the results of station classification, a deep spatio-temporal network framework, ResNet-GCN-AttLSTM, for passenger flow prediction at each type of station has been proposed. By integrating the passenger flow prediction results of every type of station, a novel passenger flow prediction model for all stations in the whole network is constructed. We implement the proposed model on two real-world ridership datasets to demonstrate its performance.
The remainder of the paper is organized as follows. Section 2 provides the proposed methodology in detail. In Section 3, two real-world ridership datasets in the Beijing metro and the Xiamen BRT are presented. The performances in station classification and passenger flow prediction for multiple stations are provided extensively. Finally, conclusions are drawn and future research directions are indicated in Section 4.

2. Methodology

In this section, we introduce the detailed steps for the construction and combination of the proposed model ((Transformer-K-Means)-(ResNet-GCN-AttLSTM)). As shown in Figure 2, it consists of two blocks: the classification block and the prediction block. The classification block extracts the deep features from the inflow and outflow data based on the Transformer encoder, then classifies the stations based on K-Means. A prediction block is used to predict the inflow for each type of station, and then integrate them as the final result for all stations in the whole network.
As shown in Figure 1, the inflow and outflow at the different stations, even at the same station, are very different. Inflow prediction is more significant for avoiding congestion among stations. Thus, we use both the inflow and outflow data in the classification block to predict the inflow in the prediction block.

2.1. Classification Block

A classification block is used to classify all the stations into several categories in the whole network. Transformer Encoder is used for feature extraction from the inflow and outflow data, and then the extracted feature based on Transformer Encoder will be sent to K-Means for station classification.

2.1.1. Transformer Encoder

The Transformer is wholly based on the attention mechanism, which can simultaneously obtain global and weighted information. Moreover, its multi-head mechanism can map input features from different perspectives. Undoubtedly, its expression ability becomes stronger, which is able to better extract the deep features for time-series problems [16]. Inspired by its powerful ability, we apply Transformer for feature extraction to achieve better clustering results for station classification with the pure passenger flow data in this paper. Furthermore, Transformer is mainly composed of an encoder and a decoder. Our research focuses on the task of passenger flow feature extraction rather than natural language processing (NLP) tasks, so we only use the encoder.
To ensure the effectiveness of station classification, we use more than one week of inflow and outflow data as the inputs. As shown in Figure 2, the inflow and outflow data will be sent into the Transformer encoder, which can be expressed as a matrix X S _ W , T P shown in Equation (1).
X S _ W , T P = x 1 _ 1 ,   t 1 P x 1 _ 1 ,   t 2 P x 1 _ 2 ,   t 1 P x 1 _ 2 ,   t 2 P x 1 _ 1 ,   t M P x 1 _ 2 ,   t M P   x S _ W ,   t 1 P x S _ W ,   t 2 P x S _ W ,   t M P
I C = X S _ W , T I n f l o w | | X S _ W , T O u t f l o w  
where S is the number of stations in the whole network, W is the number of weeks, and M = D S × d a y s is the number of time intervals during a week. Daily sample data (DS) are the number of time intervals per day; days are the number of days during a week. T = {t1, t2,…, tM} is a series of time intervals in a week.   P {Inflow, Outflow} refers to the inflow or outflow patterns. X S _ W , T I n f l o w S × W × D S × d a y s , X S _ W , T O u t f l o w S × W × D S × d a y s represent the inflow data and outflow data, respectively. Inflow and outflow data will be concatenated by column, which is the inputs in the Transformer encoder ( I C S × W × 2 × D S × d a y s ) shown in Equation (2).
(1)
Normalization
First, I C is standardized as ( I C ) S D based on Equation (3).
( I C ) S D = I C ( I C ) m e a n ( I C ) s t d
where ( I C ) m e a n is the mean value of I C ; ( I C ) s t d is the standard deviation of I C . ( I C ) S D S × W × 2 × D S × d a y s with two dimensions will be transformed to ( I C ) S D _ D a y S × W × d a y s × 2 × D S and   ( I C ) S D _ I n t e r v a l S × W × D S × 2 × d a y s with three dimensions, respectively.
(2)
Positional Encoding
After that, we follow the positional encoding in the original Transformer model [16] to realize the position encoding of ( I C ) S D _ D a y and ( I C ) S D _ I n t e r v a l as P E ( I C ) S D _ D a y   and P E ( I C ) S D _ I n t e r v a l   based on Equations (4) and (5), respectively.
P E ( p o s , 2 i ) = s i n ( p o s 10000 2 i d m o d e l )   ( i ( 0 ,   1 / d model ,   pos ( 1 ,   S × W ) )
P E p o s , 2 i + 1 = c o s ( p o s 10000 2 i d m o d e l )   ( i ( 0 ,   1 / d model ,   pos ( 1 ,   S × W ) )
where d m o d e l in Equations (4) and (5) is set as 2 × D S , 2 × d a y s in ( I C ) S D _ D a y and ( I C ) S D _ I n t e r v a l , respectively. P E ( I C ) S D _ D a y   S × W × d m o d e l d m o d e l = 2 × D S , and P E ( I C ) S D _ I n t e r v a l   S × W × d m o d e l d m o d e l = 2 × d a y s are two dimensions, respectively.
(3)
Multi-head Attention
Take   P E ( I C ) S D _ D a y     for   example ,   P E ( I C ) S D _ I n t e r v a l   has the same processing.
P E ( I C ) S D _ D a y   and P E ( I C ) S D _ I n t e r v a l   will be transformed to Qi, Ki, and Vi based on Equations (6)–(8), respectively. We use a particular attention called “Scaled Dot-Product Attention” [16]. The input consists of Qi and Ki with both dimensions d k and Vi with dimension d v . The term d v is the dimension of output. Then, Qi, Ki, and Vi will be input into the multi-head attention layer to calculate the attention scores ( h e a d i ) based on Equation (9) for every head. By concatenating the h e a d i by column, I a t t _ D a y and I a t t _ I n t e r v a l will be obtained as the outputs based on Equation (10), respectively.
Q i = P E ( I C ) S D _ D a y   × W q i
K i = P E ( I C ) S D _ D a y   × W k i
V i = P E ( I C ) S D _ D a y   × W v i
h e a d i = i N Attention Q i , K i , V i =   softmax Q i K i T d K i V i
I a t t _ D a y = h e a d 1   h e a d 2 , ,   h e a d N W o
where Wqi d m o d e l × d k , Wki d m o d e l × d k , Wvi d m o d e l × d v , and W o h × d v × d m o d e l are the trainable weights. The terms d k   , and d v are the dimensions of Ki and Vi, respectively. N represents the number of heads, i 1 , N . I a t t _ D a y S × W × 2 × D S and I a t t _ I n t e r v a l S × W × 2 × d a y s are the outputs of multi-head attention.
(4)
Residual Connection and Layer Normalization
I a t t _ D a y and I a t t _ I n t e r v a l will be sent to a residual connection [19] as O a t t _ D a y and O a t t _ I n t e r v a l based on Equations (11) and (12), respectively, and then followed by layer normalization as O a t t _ D a y S D and ( O a t t _ I n t e r v a l ) S D based on Equation (3), respectively.
O a t t _ D a y = I a t t _ D a y + P E ( I C ) S D _ D a y  
O a t t _ I n t e r v a l = I a t t _ I n t e r v a l +   P E ( I C ) S D _ I n t e r v a l  
(5)
Feed-Forward Network
O a t t _ D a y S D and ( O a t t _ I n t e r v a l ) S D will be sent to two feed-forward networks for full connection for further feature extraction based on Equations (13) and (14), respectively.
O F F N _ D a y = f 2 ( f 1 ( O a t t _ D a y S D × W 1 + b 1 ) × W 2 + b 2 )
O F F N _ I n t e r v a l = f 2 ( f 1 ( O a t t _ I n t e r v a l S D × W 1 + b 1 ) × W 2 + b 2 )  
where W 1 , W 1 , W 2 and W 2 are the trainable weights, and b 1 , b 1 , b 2 and b 2 are the trainable biases.
(6)
The Output of the Transformer Encoder
O F F N _ D a y and O F F N _ I n t e r v a l will be sent for residual connection and layer normalization again as O F F N _ D a y S D S × W × 2 × d a y s and O F F N _ I n t e r v a l S D S × W × 2 × D S based on Equations (3), (11) and (12), respectively. O F F N _ D a y S D and O F F N _ I n t e r v a l S D will be concatenated by column as O T E shown in Equation (15). O T E S × W × 2 × d a y s + 2 × D S is the final output in the Transformer encoder, which is the input sent into K-Means.
O T E = O F F N _ D a y   S D | | O F F N _ I n t e r v a l   S D

2.1.2. K-Means

K-Means is one of the most famous clustering algorithms, and is extensively used in unsupervised clustering tasks [11]. The key problem in K-Means is how to determine the value of K. K is the number of clusters, which is also the number of station categories in this paper.
Previous studies set K artificially [11], or used the elbow method [18]. To better determine K effectively, two novel evaluation metrics named same category rate (SCR) and average same category rate (ASCR) have been proposed. The ridership data used in our model are more than one week, and different weekly passenger flow at the same station may be similar. The clustering result is better if the same station of passenger flow in different weekly periods can be clustered into the same category. Inspired by this hypothesis, SCR and ASCR have been defined as Equations (16) and (17), respectively.
S C R i = 1 N i W
A S C R = i = 1 N S C R i N  
where N i represents the number of categories for the ith station with W different periodic weekly passenger flow data. W is the number of weeks, and N is the total number of stations in the network. The same station with different weekly passenger flow data may be classified into different categories. Thus, the larger the values are in SCR and ASCR, the better the classification results are.
For example, if W = 5, five weekly passenger flow data have been used. If the ith station with the different five weekly passenger flows has been clustered into two categories, N i = 2 and S C R i = 1 − 2/5 = 0.6. If the ith station with the different five weekly passenger flows has been clustered into one category, N i = 1 and S C R i = 1 − 1/5 = 0.8. The optimized result is 0.8, which indicates that 5-week passenger flow at the same station has been clustered in the same category, and the clustering result is quite good. Then, average all S C R i as ASCR based on Equation (17). ASCR is the final evaluation result of all stations. It will be used to determine the number of categories for all stations, which is described in Section 3.4.1 in detail. The details of Algorithm 1 of K-Means for station classification are demonstrated below [11,12].
C = { C 1 , C 2 ,…, C K } is the final result for station classification, and K′ is the number of categories. Notably, if the number of elements in C j is 1 in Step 6, this means that there is a separate category including only one station. This station will be deleted because it is unsuitable for the later prediction block for multi-station passenger flow prediction.
Algorithm 1: K-Means for station classification.
Applsci 13 02899 i001

2.2. Prediction Block

A prediction block is used to predict the inflow for every type of station. X S i ,   T S i × M × W is the input of the prediction block shown in Equation (18). By integrating the predicted passenger flow for every type of station, the final result O P for all the stations in the entire network will be obtained. The prediction block consists of three parts: spatial feature extraction, temporal feature extraction, and prediction.
X S i ,   T = x 1 ,   t 1 x 1 ,   t 2 x 2 ,   t 1 x 2 ,   t 2 x 1 ,   t M × W x 2 ,   t M × W   x S i ,   t 1 x S i ,   t 2 x S i ,   t M × W
where S i is the number of stations in the ith type of station, W is the number of weeks, and M = D S × d a y s is the number of time intervals during a week. T = {t1, t2,…, t M × W } is a series of time intervals during a week.

2.2.1. Spatial Feature Extraction

(1)
Inflow
As shown in Figure 2, to better capture the spatial features, we extract three data modes from inflow data, namely real-time, daily, and weekly, from different periodicities in different time periods. The three types of data are shown in Equations (19)–(21).
X S i , T R = x 1 ,   t n R x 1 ,   t n + 1 R x 2 ,   t n R x 2 ,   t n + 1 R x 1 ,   t 1 R x 2 ,   t 1 R   x S i ,   t n R x S i ,   t n + 1 R x S i ,   t 1 R
X S i , T D = x 1 ,   t n D S D x 1 ,   t n + 1 D S D x 2 ,   t n D S D x 2 ,   t n + 1 D S D x 1 ,   t 1 D S D x 2 ,   t 1 D S D   x S i ,   t n D S D x S i ,   t n + 1 D S D x S i ,   t 1 D S D
X S i , T W = x 1 ,   t n M W x 1 ,   t n + 1 M W x 2 ,   t n M W x 2 ,   t n + 1 M W x 1 ,   t 1 M W x 2 ,   t 1 M W   x S i ,   t n M W x S i ,   t n + 1 M W x S i ,   t 1 M W
I I n f l o w = X S i , T R X S i , T D X S i , T W
where X S i , T R , X S i , T D , and   X S i , T W represent the real-time, daily, and weekly inflow data, respectively. We use n historical time steps data {t − n, t − n + 1,…, t − 1} to predict the t time step inflow.
For example, t represents 9:00 am on Tuesday in the second week. X S i , T R refers to a series of inflow data of the first n time intervals before 9:00 am on Tuesday in the second week to predict the inflow at 9:00 am on Tuesday in the second week. X S i , T D refers to a series of inflow data of the first n time intervals before 9:00 am on Monday in the second week to predict the inflow at 9:00 am on Tuesday in the second week. X S i , T W refers to a series of inflow data of the first n time intervals before 9:00 am on Tuesday in the first week to predict the inflow at 9:00 am on Tuesday in the second week. Thus, T used in X S i , T R , X S i , T D , and X S i , T W is from the second week of training data. X S i , T R , X S i , T D , and   X S i , T W will be concatenated by column as the input I I n f l o w based on Equation (22).
As we know, deeper models can extract richer features [20], but it often brings risks of gradient disappearance and gradient explosion [7]. Therefore, some scholars proposed a residual network with jump links to solve this problem [21]. Residual connection reduces the complexity of the model to avoid overfitting, which is shown in Equation (23).
( O I n f l o w ) R B = F ( I I n f l o w ) R B + ( I I n f l o w ) R B
where ( I I n f l o w ) R B   and   ( O I n f l o w ) R B refer to the input and output of the residual block, respectively. F(·) refers to the processing of the residual block, which is shown in Figure 3.
As shown in Figure 3, the input ( I I n f l o w ) R B will go through a series of processing: BN represents batch normalization for data normalization, ReLU is an activation function, and Conv denotes a convolutional layer. Figure 2 shows two same-shape residual blocks used for inflow feature extraction. Then, the extracted features will be flattened and sent to a feed-forward network for full connection to extract the features based on Equations (13) and (14). O P _ I is the final output of inflow processing.
(2)
Outflow
The outflow processing is identical to the inflow processing. Hence, its final output is given by O P _ O . X S i , T R , X S i , T D , and   X S i , T W represent the real-time, daily, and weekly outflow data, respectively.
(3)
Physical Topology
The physical topology is used to capture the topological information among the stations based on GCN for each type of station. Since the physical location of the stations is fixed, it is easy to construct an adjacent matrix A S i × S i , which is shown in Equations (24) and (25). We only consider the passenger flow in the real-time pattern ( X S i , T R ) because the network topology does not change. The input of physical topology is defined as I T o p o l o g y shown in Equation (26).
A = A i , j
A i , j = 1 ,   s t a t i o n   i   a n d   s t a t i o n   j   a r e   a d j a c e n t 0 ,   o t h e r w i s e
I T o p o l o g y   = ( D ^ 1 2 A   ^ D ^ 1 2 ) X S i , T R
where D ^ 1 2 A   ^ D ^ 1 2 is the symmetric normalized Laplacian. A ^ = A + I . S i refers to the number of stations in the ith type of station. I is the identity matrix, and D ^ is the diagonal node-degree matrix of A ^ .
Then, we apply I T o p o l o g y to a series of processing: two same-shape residual blocks, flattening, and full connection for further feature extraction. O P _ T o p o is the final output of physical topology processing.
(4)
Spatial Feature Fusion
The extracted spatial features from inflow, outflow, and physical topology will be weighted fused as O S F based on Equation (27).
O S F = W 1 O P _ I + W 2 O P _ O + W 3 O P _ T o p o
where O P _ I , O P _ O , and   O P _ T o p o are the outputs of inflow, outflow, and physical topology, respectively. W 1 , W 2 , and W 3 are the trainable weights, respectively. The term “∘” denotes the Hadamard product.

2.2.2. Temporal Feature Extraction

O S F will be continuously sent to the attention LSTM and a fully connected network to obtain the temporal features. Attention LSTM is effective in predicting traffic flow [22,23,24]. Traditional conventional attention LSTM is used to capture the weight scores of different time intervals, usually by assigning heavier weight scores to adjacent time intervals and lower ones to those further apart [21]. However, passenger flow prediction models are affected by many factors, such as weather conditions [25], emergencies, passenger flow, network topology, and so on. Thus, applying the traditional conventional attention LSTM to assign weights for outputs in LSTM is insufficient. Therefore, based on previous work by Wu et al. [26], we use a fully connected network to obtain weights that can be scored according to the output of LSTM based on Equations (28) and (29).
α = f W O u t + b
A t t e n L S T M = α O u t  
where O u t S i × N e u , S i refers to the number of stations in the ith category, and Neu represents the number of neurons used in LSTM. W is the trainable weights, b is the trainable bias, and f represents the activation function in the fully connected layer. The term α is a trainable weight matrix whose shape is identical to Out. The term “∘” denotes the Hadamard product. A t t e n L S T M is the output of attention LSTM.
A t t e n L S T M will be flattened and sent to a feed-forward network as O F F N _ A t t e n L S T M for full connection. O F F N _ A t t e n L S T M is the output of temporal feature abstraction, which is also the prediction result for the same type of station. Every type of station has the same processing to construct its own prediction model and obtain the corresponding prediction result. By integrating the predicted passenger flow for every type of station, the final result for all the stations in the whole network will be obtained.

3. Experiments

In this section, we introduce the two used datasets, the model configuration, the evaluation metrics, and the results of station classification and passenger flow prediction in the two datasets in the proposed (Transformer-K-Means)-(ResNet-GCN-AttLSTM) in detail.

3.1. Data Description

Two real-world ridership datasets are used to validate the effectiveness of the proposed (Transformer-K-Means)-(ResNet-GCN-AttLSTM): (1) the Beijing metro dataset, which is shared in [27]; (2) the Xiamen BRT dataset, which is collected from the BRT system in Xiamen, China. Because the Xiamen BRT adopts the closed viaduct mode, the mode of stations in the Xiamen BRT is similar with the stations in metro systems. Thus, we use the two datasets in our proposed model [28].
The details of these two datasets are summarized in Table 1. The dataset in the Beijing metro is from 29 February to 1 April 2016, which contains five continuous weekly inbound and outbound passenger flows. As of April 2016, there are a total of 17 lines covering 276 stations. The dataset in the Xiamen BRT is from 4 March to 5 April 2019, which contains five continuous weekly inbound and outbound passenger flows. Tomb Sweeping Day is on 5 April 2019, and is one of China’s traditional holidays. As of April 2019, a total of eight lines are covering 44 stations.
Consequently, station-level passenger flow in the Beijing metro is much more complex than it in the Xiamen BRT. To avoid the influence of passenger flow at weekends, we only choose the passenger flow on workdays for study. The total five weekly inflow and outflow data are used in the classification block. The first four weekly data are for training, and the last week’s data are for testing in the prediction block. The time intervals used in the two datasets are 10 min, 15 min, and 30 min, respectively. The two datasets’ daily service hours are from 5:00 a.m. to 11:00 p.m., with both containing 18 h. Thus, DS in different time intervals is different, which is shown in Table 1.
As shown in Figure 4, the real-time, daily, and weekly inflow and outflow at 1st Wharf Station in the Xiamen BRT and No. 1 Station in the Beijing metro during the five workdays are quite different. Different weekly ridership at the same station is periodic and stable. It verifies that using the three types of data modes (real-time, daily, weekly) for temporal feature extraction is useful in our proposed model.
Moreover, as shown in Figure 1 and Figure 4, the inflow and outflow at the same station are also different. For example, No. 1 Station in the Beijing metro has a larger inflow, while there is a smaller outflow in Figure 4b. Xianhou Station in the Xiamen BRT has a smaller inbound flow, and a larger outbound flow in Figure 1a. The inflow and outflow at Dongzhai Station in Figure 1b are quite similar to the flows at 1st Wharf Station in Figure 4a in the Xiamen BRT. Consequently, before passenger flow prediction, it is significant to classify the stations based on passenger flow similarity.

3.2. Model Configuration

Compared with outflow, inflow is more likely to cause congestion in UPT. Additionally, the inflow is more regular than the outflow [4]. Thus, we choose the historical five ahead-of-time interval inflow and outflow data as the inputs to predict the next one-time-step inflow in our experiment. The parameters used in the classification and prediction blocks are specified in Table 2 and Table 3, respectively. S i is the number of stations used in the prediction model in Table 3.

3.3. Evaluation Metrics

To evaluate the classification performance of the proposed model, we use A S C R as an evaluation metric, which is shown in Equation (17). The passenger flow at the same category of stations is also visualized to evaluate the performance, too.
The three common evaluation metrics, including mean square error (RMSE), mean absolute error (MAE), and weighted mean absolute percentage error (WMAPE), are used for evaluating the prediction performance. They are defined in Equations (30) and (32). The smaller the metrics are, the better the results.
Mean squared error (MSE) is used as the loss function in Equation (33).
R M S E = 1 S i = 1 S y i y ^ i 2
M A E = 1 S i = 1 S y i y ^ i
W M A P E % = i = 1 S y i i = 1 S y i y i y ^ i y i
M S E = 1 S i = 1 S ( y i y ^ i ) 2
where y i is the real passenger flow, and y ^ i is the predicted passenger flow. For unclassified prediction, S is the total number of stations in the whole network. For classified prediction, S is the number of stations in the same category.
To integrate the passenger flow prediction results for every type of station, RMSE, MAE, and WMAPE are redefined as Equations (34) and (36).
R M S E = 1 S i = 1 K R M S E i 2 × S i
M A E = 1 S i = 1 K M A E i × S i
W M A P E = 1 i = 1 S y i i = 1 K ( W M A P E i × ( i = 1 S i y i ) )
where K is the number of categories for stations in the whole network, S i is the number of the ith type of station, S is the total number of stations in the whole network, and y i is the passenger number of the ith station.

3.4. Experiment Results

In this section, the results of station classification and passenger flow prediction in the two datasets have been present and discussed in detail.

3.4.1. Classification Results

ASCR with different K in different time intervals in the Xiamen BRT and the Beijing Metro are shown in Table 4. The numbers in bold refer the best results in different values of K. Since the Xiamen BRT network is much simpler than the Beijing metro network, the situations with the highest ASCR = 0.8 are more than those in the Beijing metro.
For 10 min, 15 min, and 30 min time intervals in the Xiamen BRT, we choose K = 4 as the final number of categories for station classification. Because of the fact that when K is 4, the ASCR of all categories is 0.8, which is a more stable result. For 10 min, 15 min, and 30 min time intervals in the Beijing Metro, we choose K = 3, 4, 4 as the classification results, respectively. The classification results show that with more data, the number of station categories will not be too large.
It is notable that only the station number is provided for the Beijing metro; we cannot list the stations’ names. More information about the Beijing metro can be found in [27]. The names of the stations in the Xiamen BRT are listed in Table 5 [29]. We only list the station classification results for the Xiamen BRT in 15 min time interval as an example. The station classification results are shown in Table 6, and the visualization of inflow from 3 April 2019 to 8 April 2019 in the four categories of stations is shown in Figure 5.
As shown in Figure 5, Railway Station (No. 7), Municipal Administrative Service Center Station (No. 14), Xiamen North Railway Station (No. 25), and Qianpu Junction Station (No. 44) have been divided into separate categories. Thus, the four stations have been deleted in our model based on the following Algorithm 1: K-Means for station classification. All four stations are importation stations because of their urban functions and geographical position. Therefore, there are 40 stations in the Xiamen BRT that have been used for classification and prediction, which are divided into four categories in 15 min time intervals. The max volume of inflow at the first category of stations is less than 300, which are defined as the small ridership stations. The max volume of inflow at the second and third category of stations is less than 520, and these stations are defined as the medium ridership stations. The max volume of inflow at the fourth category of stations is less than 800, which are defined as large ridership stations. The weekly inflow in the four categories of stations has regular periodicity, such as similar morning and evening peaks and the same change curves.
Table 7 shows the deleted stations in the different time intervals in the Xiamen BRT. The larger the time interval is, the less the deleted stations are. It verifies that the passenger flow will be become more regular when the time interval increases. Station classification may be a more necessary strategy for short-term passenger flow prediction.
Moreover, the stations in the Beijing metro with 276 stations have not been deleted. It verifies that station classification may be more significant for a larger dataset.
The No. 34 station is Pantu Station, shown in Figure 5a. Since there are two schools near the station, the inbound passenger flow increases sharply at noon and in the afternoon during the peak periods of school and after school. However, during class hours, the passenger flow at this station is the same as usual. The station classification in the proposed model is based on the inflow and outflow at all time intervals during a week. Therefore, the different changes of flows in several time intervals will not affect the classification results. It verifies the effectiveness of the classification block in the proposed model based on the Transformer encoder and K-Means.

3.4.2. Prediction Results in the Beijing Metro

The performance of prediction results in different time intervals in the Beijing metro is summarized in Table 8. The numbers in bold refer the better results between prediction with station classification, and prediction without station classification. The prediction results of MAE, RMSE, and WMAPE with classification in 15 min and 30 min time intervals are better than the prediction results without classification. Only RMSE in 10 min time intervals presents a slightly worse performance. It may be affected by an emergency, which causes the volume of inflow to suddenly decrease at some stations. This influence is more obvious for the passenger flow in smaller time intervals. Take the inflow at No.1 Stations in the Beijing metro, for example. Figure 6a,b illustrates the inflow and outflow in 10 min and 30 min time intervals, respectively. The inflow with a 10 min time interval clearly shows that the flow suddenly drops to zero shown in the dashed circle in Figure 6a. When the time interval increases to 30 min, the passenger flow is substantially reduced caused by the passenger flow dropped sharply to zero in the 10 min time interval, which is shown in the dashed circle in Figure 6b. Because RMSE is more affected by such outliers, poor RMSE results occur, especially in the scenario of 10 min time interval.
In summary, classification can still improve the prediction results for a suddenly decreasing flow. In most scenarios, the larger the time interval is, the better the improvement effects.

3.4.3. Prediction Results in the Xiamen BRT

The testing dataset includes Tomb Sweeping Day on 5 April 2019 in the Xiamen BRT, which is Friday in the fifth week. As shown in Figure 7, the inflow and outflow on the holiday included in the three-dotted red boxes are quite different from that on normal days. The inflow in the morning and evening peaks is slightly less than that on normal days, and the inflow in other hours is slightly more than that in normal times. The outflow on that day and even the day before is much more than usual. It is mainly because there is a ferry terminal near 1st Wharf Station, and a famous place of interest, Gulangyu island, is on the opposite side. Therefore, people who travel to it will choose a more convenient public transport, i.e., the BRT, which causes a sharp increase in outbound flow.
The performance of prediction results in different time intervals in the Xiamen BRT is summarized in Table 9. The numbers in bold refer the better results between prediction with station classification, and prediction without station classification. The prediction results of MAE, RMSE, and WMAPE with classification in 10 min and 15 min time intervals are better than the prediction results without classification. Only MAE and WMAPE in 30 min time interval presents a slightly poorer performance. This is mainly because there is less abnormal flow included on this holiday in 10 min and 15 min time intervals, so a classified prediction can show a better performance. Under the 30 min time granularity, more irregular flow data are included, and the irregular flow cannot be well extracted through classification, which leads to an unsatisfactory prediction performance. In the end, in the most scenarios, classification can still improve the prediction result for suddenly increasing flows on holiday. The smaller the time interval is, the better the improvement effects are in the most scenarios.

4. Conclusions

As far as we know, most existing studies mainly focused on spatio-temporal feature extraction to construct a multi-station passenger flow prediction model by using different deep learning models. Different from the previous studies, we have proposed a novel two-step strategy, namely classification followed by prediction, to develop a better performance model for multi-station passenger flow prediction. Two different complex real-world ridership datasets have been used to demonstrate the effectiveness of the proposed model. Compared with the unclassified results, the proposed model (Transformer-K-Means)-(ResNet-GCN-AttLSTM) with station classification presents a better performance in multi-station passenger flow prediction. As far as we know, this is the first time station classification has been added into multi-station passenger flow prediction, which presents good performance.
Improvements can be made in future work. One issue is in developing a more advanced multi-station passenger flow prediction model by better extracting the complex spatio-temporal features, such as using the fractal-wavelet modeling [30,31,32,33,34,35]. Another one is in combining the classification block with more state-of-the-art models to verify the effectiveness of station classification in multi-station passenger flow prediction.

Author Contributions

The authors’ contributions are summarized below. L.L. was involved in drafting the manuscript; M.W. performed the experiments and was involved in drafting the experimental part of the manuscript; R.-C.C. and S.Z. gave a lot of effective suggestions for an improved method and modified the manuscript; Y.W. provided the idea of the proposed method. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly supported by the National Natural Science Foundation of China, grant number 62103345; and Natural Science Foundation of Fujian Province, grant number 2020H0023 and 2020J02160; and the Youth Innovation Fund Project of Xiamen, grant number 3502Z20206076; and the High-Level Talents Research Launched Project of Xiamen University of Technology, grant number YKJ19012R. All the projects are related to the multi-station passenger flow prediction.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Many thanks to Key Laboratory of Fujian Universities for Virtual Reality and 3D Visualization, which supports the data preprocessing. The authors would like to thank the anonymous reviewers and the editors for all the helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Z.; Han, Y.; Peng, T.; Li, Z.; Chen, G. A comprehensive spatio-temporal model for subway passenger flow prediction. ISPRS Int. J. Geo-Inf. 2022, 11, 341. [Google Scholar] [CrossRef]
  2. Li, X.C.; Peng, Y.Z.; Wu, Z.X.; Chen, Z.W. Short-term forecast of metro station passenger flow based on deep spatial-temporal network. Traffic Transp. 2020, 33, 55–61. [Google Scholar]
  3. Zhang, J.L. Study of the Short-Term Passenger Flow Prediction in Urban Rail Transit Networks. Ph.D. Thesis, Beijing Jiaotong University, Beijing, China, 29 May 2021. [Google Scholar]
  4. Zhao, Y.J.; Lin, Y.F.; Zhang, Y.K.; Wen, H.M.; Liu, Y.X.; Wu, H.; Wu, Z.H.; Zhang, S.C.; Wan, H.Y. Traffic inflow and outflow forecasting by modeling intra- and inter-relationship between flows. IEEE Transp. Intell. Transp. Syst. 2022, 23, 20202–20216. [Google Scholar] [CrossRef]
  5. Bogaerts, T.; Masegosa, A.D.; Angarita-Zapata, J.S.; Onieva, E.; Hellinckx, P. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data. Transp. Res. Part C Emerg. Technol. 2020, 112, 62–77. [Google Scholar] [CrossRef]
  6. Li, S.; Wang, Q.W.; Chen, Y.R.; Qin, J. Prediction of short-time passenger flow on multi-station urban rail based on SAE-ConvLSTM deep learning model. Appl. Res. Comput. 2022, 39, 2025–2031. [Google Scholar]
  7. Zhang, S.X.; Zhang, J.L.; Yang, L.X.; Yin, J.T.; Gao, Z.Y. GCN-Transformer for short-term passenger flow prediction on holidays in urban rail transit systems. arXiv 2022, arXiv:2203.00007v3. [Google Scholar]
  8. Ma, C.Q.; Li, P.K.; Zhu, C.H.; Lu, W.B.; Tian, T. Short-term passenger flow forecast of urban rail transit based on different time granularities. J. Chang’an Univ. Nat. Sci. Ed. 2020, 40, 75–83. [Google Scholar]
  9. Du, C.L.; Li, X.L.; Sun, R.R.; Zhang, P.; Zhu, G.Y. Classification of urban rail station based on passenger flow congestion propagation. J. Beijing Jiaotong Univ. 2021, 45, 39–46. [Google Scholar]
  10. Wang, H.D.; Ma, H.W. Classification method of urban rail transit stations based on POI. Traffic Transp. 2020, 36, 33–37. [Google Scholar]
  11. Xia, X.; Gai, J.Y. Classification of urban rail transit stations and points and analysis of passenger flow characteristics based on K-Means clustering algorithm. Modern Urban Transit. 2021, 4, 112–118. [Google Scholar]
  12. Zhao, Y.; Wang, Y.; Hu, H. Research on clustering method of metro stations based on POI-K_Means. Intell. Comput. Appl. 2022, 12, 114–118. [Google Scholar]
  13. Yuan, F.T.; Chen, T.J.; Wei, J.B. Research on classification of rail stations based on AFC data. J. Transp. Eng. 2021, 21, 48–52, 57. [Google Scholar]
  14. Jiang, Y.S.; Yu, G.S.; Hu, L.; Li, Y. Refined classification of urban rail transit stations based on clustered station’s passenger traffic flow features. J. Transp. Syst. Eng. Inf. Technol. 2022, 22, 106–112. [Google Scholar]
  15. Wang, W.Y.; Liu, J.C.; Yu, Y.D. Research on classification of Xi’an rail transit stations based on land use and population. People’s Public Transp. 2020, 8, 29–33. [Google Scholar]
  16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4 December 2017. [Google Scholar]
  17. Wang, Q.W.; Chen, Y.R.; Liu, Y.C. Metro short-term traffic flow prediction with ConvLSTM. Contral Decis. 2021, 36, 2760–2770. [Google Scholar]
  18. Tan, Y.F.; Liu, H.X.; Pu, Y.; Wu, X.M.; Jiao, Y.B. Passenger flow prediction of integrated passenger terminal based on K-Means-GRNN. J. Adv. Transp. 2021, 10, 1055910. [Google Scholar] [CrossRef]
  19. Hinton, G.E.; Osindero, S.; The, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
  20. Szegedy, C.; Liu, W.; Jia, Y.Q.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 8–10 June 2015. [Google Scholar]
  21. He., K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
  22. Wang, J.X.; Wang, R.; Zeng, X. Short-term passenger flow forecasting using CEEMDAN meshed CNN-LSTM-attention model under wireless sensor network. IET Commun. 2022, 16, 1253–1263. [Google Scholar] [CrossRef]
  23. Yang, J.; Dong, X.; Jin, S. Metro passenger flow prediction model using attention-based neural network. IEEE Access 2020, 8, 30953–30959. [Google Scholar] [CrossRef]
  24. Xue, G.; Liu, S.F.; Ren, L.; Ma, Y.C.; Gong, D.Q. Forecasting the subway passenger flow under event occurrences with multivariate disturbances. Expert Syst. Appl. 2022, 188, 116057–116070. [Google Scholar] [CrossRef]
  25. Liu, L.J.; Chen, R.C.; Zhu, S.Z. Impacts of weather on short-term metro passenger flow forecasting using a deep LSTM neural network. Appl. Sci. 2020, 10, 2926. [Google Scholar] [CrossRef]
  26. Wu, Y.K.; Tan, H.C.; Qin, L.Q.; Ran, B.; Jiang, Z.X. A hybrid deep learning based traffic flow prediction method and its understanding. Transp. Res. Part C Emerg. Technol. 2018, 90, 166–180. [Google Scholar] [CrossRef]
  27. Zhang, J.; Chen, F.; Cui, Z.; Guo, Y.; Zhu, Y. Deep learning architecture for short-term passenger flow forecasting in urban rail transit. IEEE Transp. Intell. Transp. Syst. 2021, 22, 7004–7014. [Google Scholar] [CrossRef]
  28. Chen, L.; Liu, L.J.; Yuan, L. Multi-head attention mechanism for multi-station passenger flow prediction. In Proceedings of the 2022 International Symposium on Design Studies and Intelligence Engineering (DSIE2022), Hangzhou, China, 29–30 October 2022. [Google Scholar]
  29. The Most Authoritative Route Map of Xiamen BRT. Available online: http://xm.bendibao.com/traffic/20161011/51844.shtm (accessed on 29 July 2022).
  30. Siwar, Y.; Salwa, S.; Mourad, Z. Wavelet extreme learning machine and deep learning for data classification. Neurocomputing 2022, 470, 280–289. [Google Scholar]
  31. Emanuel, G.; Rodrigo, C. Chebyshev wavelet analysis. J. Funct. Spaces 2022, 2022, 5542054. [Google Scholar]
  32. Gökalp, Ç.; Bülent, G.E.; Ahmet, H.Y. Prediction of glioma grades using deep learning with wavelet radiomic features. Appl. Sci. 2020, 10, 6296. [Google Scholar]
  33. Yu, X.J.; Liu, Y.R.; Sun, Z.M.; Qin, P. Wavelet-based ResNet: A deep-learning model for prediction of significant wave height. IEEE Access 2022, 10, 110026–110033. [Google Scholar] [CrossRef]
  34. Emanuel, G.; Sergei, S. Fractional-wavelet analysis of positive definite distributions and wavelets on D′(C). In Engineering Mathematics II: Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization; Sringer Proceedings in Mathematics & Statistics; Springer International Publishing: New York, NY, USA, 2016; Volume 179, pp. 337–353. [Google Scholar]
  35. Indumathi, J.; Kaliraj, V. Petite term traffic flow prediction using deep learning for augmented flow of vehicles. Concurr. Eng. 2022, 30, 214–224. [Google Scholar] [CrossRef]
Figure 1. Inflow and outflow at the two stations in the Xiamen BRT. (a) Xianhou Station; (b) Dongzhai Station.
Figure 1. Inflow and outflow at the two stations in the Xiamen BRT. (a) Xianhou Station; (b) Dongzhai Station.
Applsci 13 02899 g001
Figure 2. The framework of the proposed (Transformer-K-Means)-(ResNet-GCN-AttLSTM).
Figure 2. The framework of the proposed (Transformer-K-Means)-(ResNet-GCN-AttLSTM).
Applsci 13 02899 g002
Figure 3. Processing of residual block.
Figure 3. Processing of residual block.
Applsci 13 02899 g003
Figure 4. Stations’ passenger flow in the Xiamen BRT and the Beijing metro. (a) Five weekly inflow and outflow at 1st Wharf Station in the Xiamen BRT. (b) Five weekly inflow and outflow at No.1 Station in the Beijing metro.
Figure 4. Stations’ passenger flow in the Xiamen BRT and the Beijing metro. (a) Five weekly inflow and outflow at 1st Wharf Station in the Xiamen BRT. (b) Five weekly inflow and outflow at No.1 Station in the Beijing metro.
Applsci 13 02899 g004
Figure 5. Visualization of inflow in the four categories of stations in 15 min time interval in the Xiamen BRT. (a) Inflow visualization in the first category; (b) inflow visualization in the second category; (c) inflow visualization in the third category; (d) inflow visualization in the fourth category.
Figure 5. Visualization of inflow in the four categories of stations in 15 min time interval in the Xiamen BRT. (a) Inflow visualization in the first category; (b) inflow visualization in the second category; (c) inflow visualization in the third category; (d) inflow visualization in the fourth category.
Applsci 13 02899 g005
Figure 6. Inflow at No.1 Stations in the Beijing metro. (a) 10 min time interval; (b) 30 min time interval.
Figure 6. Inflow at No.1 Stations in the Beijing metro. (a) 10 min time interval; (b) 30 min time interval.
Applsci 13 02899 g006
Figure 7. Inflow and outflow at 1st Wharf Station in the Xiamen BRT on Tomb Sweeping Day.
Figure 7. Inflow and outflow at 1st Wharf Station in the Xiamen BRT on Tomb Sweeping Day.
Applsci 13 02899 g007
Table 1. Dataset Description.
Table 1. Dataset Description.
DescriptionBeijing MetroXiamen BRT
Train Timespan29/02/2016–04/03/2016
07/03/2016–11/03/2016
14/03/2016–18/03/2016
21/03/2016–25/03/2016
04/03/2019–08/03/2019
11/032019–15/03/2019
18/03/2019–22/03/2019
25/03/2019–29/03/2019
Test Timespan28/03/2016–01/04/201601/04/2019–05/04/2019
Week Number55
Day Number2525
Daily Service Hours5:00 a.m.–11:00 p.m.5:00 a.m.–11:00 p.m.
Time Interval10 min15 min30 min10 min15 min30 min
DS10872361087236
Total DS2700180090027001800900
Line Number178
Station Number27644
Table 2. Parameters used in the classification block.
Table 2. Parameters used in the classification block.
ParametersTransformer EncoderK-Means
d_model2 × DS, 2 × days
Number of heads2
K 2–10
OptimizerAdam
Learning rate0.00004
Table 3. Parameters used in the prediction block.
Table 3. Parameters used in the prediction block.
ParametersInflowOutflowPhysical
Topology
Attention
LSTM
Kernel Size3 × 33 × 33 × 3
Residual Block 132 filters32 filters32 filters
Residual Block 264 filters64 filters64 filters
Batch Size64646464
Activation FunctionReluReluReluLinear
Number of Neurons in FC 1 S i S i S i
Number of Neurons in FC 2 S i
Number of Layers 1
Number of Neurons (Neu) 128
Table 4. ASCR with different K in the Xiamen BRT and the Beijing Metro.
Table 4. ASCR with different K in the Xiamen BRT and the Beijing Metro.
KASCR
Xiamen BRTBeijing Metro
10 min15 min30 min10 min15 min30 min
20.80.80.7950.7940.7970.797
30.80.7950.7950.80.7970.797
40.80.80.80.7870.7980.798
50.80.7950.7950.7910.7960.795
60.80.7950.7950.7820.7920.793
70.7890.7900.80.7860.7910.790
80.7940.7850.80.7850.7890.792
90.80.7950.7950.7810.7880.790
100.7840.7900.7950.7810.7880.790
Table 5. Names of stations in the Xiamen BRT.
Table 5. Names of stations in the Xiamen BRT.
No.Station NameNo.Station NameNo.Station NameNo.Station Name
11st Wharf12Caitang23Dongzhai34Pantu
2Kaihe Intersection13Jinshan24Tiancuo35Binhai Xincheng (Xike) Junction
3Sibei14Municipal Administrative Service Center25Xiamen North Railway Station36Guanxun
4Douxi Road15Shuangshi Middle School26Institute of
Technovation
37Light Industry and Food Park
5Ershi16Xianhou27Gaoqi Airport38Sikouzhen
6Jinbang Park17Airport Terminal 428Fenglin39Industrial Zone
7Railway Station18Tan Kah Kee Stadium29Dong’an403rd Hospital
8Lianban19Chengyi University
College
30Houtian41Chengnan
9Longshanqiao20Huaqiao University31Dongting42Tong’an Junction
10Wolong Xiaocheng21University Town32Meifeng43Hongwen
11Dongfang
Shanzhuang
22Chinese Academy of Sciences33Caidian44Qianpu Junction
Table 6. Station classification results in the Xiamen BRT in 15 min time interval.
Table 6. Station classification results in the Xiamen BRT in 15 min time interval.
CategoriesStation No.NumberPassenger Flow Volume
121, 22, 23, 24, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 37, 38, 40
17Small
21, 9, 11, 17, 19, 20, 36, 39, 41, 4210Medium
32, 3, 4, 5, 6, 8, 437Medium
410, 12, 13, 15, 16, 186Large
Table 7. The deleted stations in the different time intervals in the Xiamen BRT.
Table 7. The deleted stations in the different time intervals in the Xiamen BRT.
No.Deleted StationTime Interval
10 min15 min30 min
3Sibei
7Railway Station
14Municipal Administrative Service Center
25Xiamen North Railway Station
44Qianpu Junction
“✓” refers the deleted stations in different time intervals.
Table 8. Prediction results in the Beijing Metro.
Table 8. Prediction results in the Beijing Metro.
Time IntervalPrediction ModeMetrics
MAERMSEWMAPE
10 minPrediction (with classification)17.278129.69299.668%
Prediction (w/o classification)17.634429.62199.911%
15 minPrediction (with classification)22.191937.96938.269%
Prediction (w/o classification)23.000838.73868.589%
30 minPrediction (with classification)32.524558.16946.120%
Prediction (w/o classification)34.136060.13406.428%
Table 9. Prediction results in the Xiamen BRT.
Table 9. Prediction results in the Xiamen BRT.
Time IntervalPrediction ModeMetrics
MAERMSEWMAPE
10 minPrediction (with classification)9.299815.663318.58%
Prediction (w/o classification)9.399016.015118.80%
15 minPrediction (with classification)12.719322.283516.29%
Prediction (w/o classification)12.755922.327016.33%
30 minPrediction (with classification)23.046346.225614.26%
Prediction (w/o classification)22.698347.230014.03%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, L.; Wu, M.; Chen, R.-C.; Zhu, S.; Wang, Y. A Hybrid Deep Learning Model for Multi-Station Classification and Passenger Flow Prediction. Appl. Sci. 2023, 13, 2899. https://doi.org/10.3390/app13052899

AMA Style

Liu L, Wu M, Chen R-C, Zhu S, Wang Y. A Hybrid Deep Learning Model for Multi-Station Classification and Passenger Flow Prediction. Applied Sciences. 2023; 13(5):2899. https://doi.org/10.3390/app13052899

Chicago/Turabian Style

Liu, Lijuan, Mingxiao Wu, Rung-Ching Chen, Shunzhi Zhu, and Yan Wang. 2023. "A Hybrid Deep Learning Model for Multi-Station Classification and Passenger Flow Prediction" Applied Sciences 13, no. 5: 2899. https://doi.org/10.3390/app13052899

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop