1. Introduction
Urban rail transit is known for its fast speed, large capacity, high level of comfort, efficiency, and reliability. Effective operation of urban rail transit can reduce travel costs, transport more passengers, and relieve road traffic congestions. The urban rail transit system of various cities in China is developing rapidly and is serving more passengers. For example, the Beijing metro served 1846.3 million passengers in 2010, increasing by 114.6% to as high as 3962.3 million passengers in 2019 [
1]. Making management policies to improve the system efficiency and reliability [
2] requires a refined grasp of the characteristics of passengers at each station and each time period. Better management measures can improve the level of service and make rail transit play the role better as the backbone of the urban public transportation system in cities like Beijing [
3].
The research on passenger characteristics can be divided into individual models and collective ones. The focus of individual research is from the perspective of individual passengers, and collective research analyzed is at the level of stations, routes, and networks with aggregated data [
4]. With the increasing popularity of smart card data, research on transit passenger characteristics based on large-scale trajectory data continues to emerge [
5,
6]. Efforts are devoted to the individual passenger’s mode of commuting patterns [
7,
8], travel purpose inference [
9], spatial patterns [
10], temporal patterns [
11,
12], and spatial-temporal joint patterns [
13,
14]. One of the most commonly used methods is to cluster the travel patterns of passengers to better interpret their characteristics [
15,
16]. Collective passenger patterns analysis based on aggregated data focuses on understanding the usage frequency of stations [
12], recognizing passenger gathering places [
17], and identifying the spatial distribution of urban functional zones [
18].
This article focuses on the passenger distribution aggregated at the station level. For the rail transit system, from the perspective of stations, the number of stations is limited. In China, as of 31 December 2020 [
19], the Beijing Metro has the most operating stations at 428, followed by the Shanghai Metro with 354. The size of the stations reduces the scale of the problem to understand the characteristics of the stations better. Therefore, rail transit station managers and operators do not need to consider handling large-scale datasets like other machine learning practitioners. At the same time, a more accurate understanding could lead to better travel flow management and control measures. Instead of focusing on efficiency, more efforts should be taken to make full use of the behaviors of all passengers related to the station when investigating a certain station. Passengers can be represented as a random variable, which is often described by statistical representative values such as the total inflow/outflow [
20] or mean for quantitative variables [
21]. These methods are straightforward and have the advantage in applicability and scalability. However, passenger distributions contain a lot more information than one single representative value. This article fills the gap by measuring the similarity of distributions rather than using a simple representative value such as median or average. We aim to take full advantage of distributions for the stations at a small scale, rather than replace other methods completely in any case.
The idea of this paper is similar to decomposing high-dimensional tensors or tensors with time series by adding a temporal smoothing term [
22], or spatial regularization term according to the topology and spatial proximity [
23], and to consider the spatio-temporal correlation between data points. Temporal smoothing terms constrain the adjacent or periodical time to be close [
24], and spatial regularization terms minimize the gap for points with similar geographic attributes. Different from considering different data points of the same attribute in the above research, this paper assumes the correlation between different values of the same attribute at the same data point.
We utilize the proposed method to cluster stations by their passenger visit distribution. The grasp of passenger visit distribution plays a role in station facilities planning such as map [
25] or route guide signs setting [
26], flow management under both normal [
27] and emergency conditions [
28], and business commercial advertisement design [
29].
The first significance of passenger visit distribution lies in the route guide signs settings [
26,
30]. For example, for stations such as Airport Terminal T2, almost all passengers only visit once or twice. These passengers are not familiar with the internal structure and proper route in the station, so it should intensively display guide signs leading to the platform and different exits without other interference for both passengers entering and leaving the station. On the contrary, if all passengers visit a certain station frequently, they are familiar with the station. The real-time information of trains and passenger flow can be displayed for the inbound passengers, and richer information about the exit and even more commercial advertisements can be displayed for the outbound passengers. To sum up, the sign display strategy should be adjusted according to the passengers’ familiarity with the stations.
The second potential use lies in the passenger flow control within the station. The route choice behavior is also dependent on the familiarity of passengers with the stations, streamline of the stations for passenger flow management [
31]. For stations with different passengers’ familiarity distributions, different control methods should be exploited to make sure passengers obey the rules since familiar passengers show different preferences when choosing routes [
32]. Passengers unfamiliar with the stations could fail to find the shortest path under emergency evacuation. Thus, when evaluating the emergency evacuation efficiency, the passenger visit distribution should be considered.
The third opportunity the passenger visit distribution provides is to improve the business commercial advertisement [
29] displayed in the metro stations. The stations of familiar passengers with repeated visited patterns are of preference for those advertisements targeted to show repeatedly to the same passengers. Railway stations and Airport terminals attracting different unfamiliar passengers are suitable for commercials wishing to cover as most passengers as possible.
The goal of this research is to first construct the visit count distribution, then measure the similarity of all stations using Wasserstein distance and cluster the stations according to the visit distribution similarity matrix for policy implications. The main contributions of this paper are threefold. First, we use Wasserstein distance to measure the similarity, taking into account the similar visit count. By customizing the cost function, the correlation between station-specific visit count and total visit count can be considered. This is also applicable for multi-dimensional joint distribution with multiple attributes, or the time series of multi-dimensional attributes. Second, the obtained distance matrix is used to cluster stations according to the passenger visit distribution of stations, demonstrating its practicability and effectiveness. Lastly, the case study of passenger familiarity clustering for Beijing metro stations further quantitatively characterizes the Beijing metro stations, providing insights for the refined flow management of urban rail transit passengers.
The remainder of the paper is organized as follows. The study area, the Automatic Fare Collection (AFC) data, similarity measures, and the clustering method are presented in
Section 2.
Section 3 shows how we construct the visit distribution, measure the similarity and cluster the stations, together with the analysis of the clustering results.
Section 4 concludes the paper and discusses the implications of the results.
4. Discussion
We first exploit the AFC data to build the visit count distribution of metro stations to show the passenger familiarity characteristics. Then, using a distribution similarity index, we measure the similarity of stations. This paper also proposes a general method that can cluster distributions. Compared with directly representing the distribution of a feature as a single value, it can reflect the characteristic that the distributions with close values are more similar. The case study of the Beijing Metro network illustrates the effectiveness of the proposed method. We compare the clustering results with the functional zone pattern surrounding the stations, concluding that stations with unfamiliar passengers are related to inter-city transportation hubs and leisure stations while stations with familiar passengers are related to residential places.
There are three drawbacks in this paper. First, the efficient solution to transportation problems is not discussed. For each point, when the scale of the transportation problem is large, the number of variables in linear programming is the square of the number of total intervals of the two-dimensional distribution. Even if polynomial algorithms such as the interior point method are selected, the computational complexity is still high. We simply exploit the OpenCV method without discussing the algorithms. Using heuristic methods to solve the problem does not pursue precise solutions, but can greatly reduce calculation complexity. However, since it is beyond the scope of this article, we only exploit the metric in our clustering method, the heuristic algorithm for transportation problems is not further discussed. Second, no sensitivity analysis is performed on the used cost function. The characteristics of the familiarity of passengers are characterized using simple L1 and L2 norms and we only give a general criterion for choosing the number of clusters for each norm. However, the sensitivity analysis of the cost function is not carried out in the case study. Third, no new clustering algorithm is designed, nor is it theoretically analyzed the pros and cons of each clustering algorithm. On the other hand, this article simply compares the evaluation index values of the hierarchical clustering algorithms.
To reduce the complexity of the problem, some measures that can be adopted include: utilizing truncated distribution, merging adjacent groups in discrete distribution, enlarging the group distance of groups in continuous distribution, and performing matrix decomposition on data and other various dimension reduction methods. Applying a more effective transportation problem algorithm, would speed up the process and make the method more scalable. Moreover, the applicability and performance of different clustering algorithms can be analyzed theoretically.