Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering

Zhou, Fangyi; Yao, Jing; Yin, Haodong

doi:10.3390/su17209127

Open AccessArticle

Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering

by

Fangyi Zhou

¹,

Jing Yao

^2,* and

Haodong Yin

²

¹

School of Traffic and Transportation, Beijing Jiaotong University, No. 3 Shang Yuan Cun, Hai Dian District, Beijing 100044, China

²

School of Systems Science, Beijing Jiaotong University, No. 3 Shang Yuan Cun, Hai Dian District, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(20), 9127; https://doi.org/10.3390/su17209127

Submission received: 16 September 2025 / Revised: 8 October 2025 / Accepted: 14 October 2025 / Published: 15 October 2025

(This article belongs to the Special Issue Innovative Strategies for Sustainable Urban Rail Transit)

Download

Browse Figures

Versions Notes

Abstract

Traditional passenger transport corridor identification methods fail to effectively capture the spatiotemporal dynamic characteristics of passenger flows in complex urban rail transit networks. This study proposes a novel passenger transport corridor identification method based on Origin–Destination (OD) clustering. The method enables more accurate identification of passenger groups with similar travel patterns and distributions through a customized clustering similarity function; simultaneously, it can obtain OD pairs with actual physical significance through OD clustering as the source of basic units for identifying passenger transport corridors. By analyzing the spatial distribution of passenger transport corridor constituent units (clustered ODs), the method determines whether the passenger transport corridor is a cross-line corridor. The method is validated using Beijing’s urban rail transit system as a case study, employing the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm with optimal parameters (eps = 0.46, minpts = 980), identifying 21 clusters and ultimately determining six passenger transport corridors, including four cross-line and two non-cross-line types. Furthermore, this study conducted sensitivity analysis on the eps parameter using 80 test configurations to examine its impact on clustering effectiveness metrics, validating the method’s stability. The results demonstrate that the identified corridors exhibit high passenger flow concentration characteristics and accurately reflect passengers’ transfer demands between different lines. This research provides a theoretical foundation for integrated public transportation connectivity and supports sustainable urban development through improved operational efficiency and reduced operational costs.

Keywords:

OD clustering; sustainability of urban rail transit; passenger transport corridor; corridor identification

1. Introduction

With the rapid advancement of China’s urbanization process, the urbanization rate had exceeded 65% by 2023. Large- and medium-sized cities now face increasingly severe traffic pressures and challenges in meeting travel demands. Against this backdrop, urban rail transit—as a high-capacity, high-efficiency public transportation mode—has become a vital infrastructure for modern cities and metropolitan regions. Beyond alleviating urban traffic congestion, urban rail transit delivers substantial benefits in both economic development and urban mobility. Well-planned multi-layer regional rail transit networks promote regional integration, improve employment accessibility, and reduce spatial inequality [1], while providing energy-efficient solutions that enhance social equity through affordable and accessible transportation for diverse populations. This is fundamentally due to its significant advantages, including large passenger capacity, high speed, high punctuality, and environmental friendliness.

However, as rail transit networks continue to improve and passenger volumes steadily increase, numerous operational challenges have emerged. Particularly in passenger flow organization, traditional point-to-point transportation modes can no longer meet the complex and diverse travel demands. Problems such as uneven passenger flow distribution at stations, low transfer efficiency, and severe congestion during peak hours have become increasingly prominent. These problems not only impact the passenger experience but also constrain the sustainable development of the rail transit system.

Passenger transport corridors refer to passenger flow channels with clear directionality and continuity formed within rail transit networks by optimizing route layouts and other measures based on passenger flow characteristics and travel patterns. The introduction of the passenger transport corridor concept provides new approaches and methods for addressing the aforementioned problems. However, traditional passenger transport corridor identification methods struggle when processing high-dimensional data generated by large-scale rail transit networks, making it difficult to effectively extract deep-level passenger flow patterns. Additionally, they lack adaptability to complex network structures. When dealing with networked operations involving multiple lines and transfer nodes, traditional approaches struggle to accurately capture passenger flow variations across different time periods and spatial regions.

To address traditional approach limitations, this study proposes a passenger transport corridor identification method based on Origin–Destination (OD) clustering in an urban rail transit network to reveal concentrated travel paths and analyze passenger demand patterns. The OD clustering approach identifies core OD pairs with the strongest agglomeration effects, filtering out sporadic trips to enable accurate corridor delineation and differentiation between primary and secondary corridors based on passenger flow. Importantly, the identified OD pairs reveal passengers’ cross-line transfer behaviors within the rail transit system, providing valuable insights for implementing through-service operations and inter-line connectivity improvements. This provides quantitative evidence for corridor identification and achieves strategic goals of enhancing transportation equity, optimizing resource allocation, and improving operational efficiency, ultimately contributing to a more efficient, economically viable public transportation system and sustainable urban development. The methodology involves three phases: clustering analysis of OD data to group pairs with similar travel characteristics, passenger flow assignment to obtain primary flow paths for each cluster, and corridor identification through spatial relationship analysis.

The primary innovations of this research are twofold:

Enhanced Similarity Function: An improved similarity function is developed for OD clustering that incorporates both distance similarity and path overlap similarity between different OD pairs. Unlike conventional approaches that rely solely on spatial distance metrics, this enhanced function enables more accurate identification of passenger groups with similar travel patterns and behavioral characteristics, thereby improving the practical applicability of clustering results.
Optimized Clustering Center Determination: A novel method for determining clustering centers is proposed that simultaneously considers passenger flow volume and the distance from cluster members to the center point. Its advantage lies in its ability to align with physically meaningful actual urban rail transit stations, thereby better reflecting passengers’ genuine transfer needs and travel behavior patterns.

The remainder of this paper is organized as follows: Section 2 provides a literature review. Section 3 introduces a passenger transport corridor identification method based on OD clustering. Section 4 discusses the research results and sensitivity analysis. Section 5 presents the conclusions.

2. Literature Review

Recent research on urban passenger transport has evolved significantly, reflecting the increasing complexity and interdisciplinary nature of this field. Verano-Tacoronte et al. conducted a comprehensive bibliometric analysis of urban passenger transport research from 2001 to 2021, revealing key research themes including travel behavior, sustainability, transportation efficiency, and network performance [2]. This evolving research landscape underscores the necessity for data-driven, systematic approaches to identify and optimize passenger transport corridors [3]. To enhance the efficiency of urban transport networks and optimize public transport services, many scholars have utilized big data mining tools to intelligently identify urban collector/distributor points [4]. However, relying solely on information from transportation hubs has limitations in understanding the travel needs of urban residents. While transportation hubs reflect local traffic density, they cannot capture information such as the travel directions and actual travel network trajectories of residents’ starting and ending points.

The concept of priority movement corridors in urban passenger transport has gained increasing attention as cities strive to improve public transport efficiency. Gorev et al. investigated the formation of priority movement corridors for urban passenger transport, analyzing the conditions and requirements for establishing such corridors within road networks [5]. This aligns with our research objective of developing a data-driven approach to identify passenger transport corridors based on actual travel patterns.

In geography, the term “corridor” was first introduced by geographer Griffith Taylor in his 1949 book Urban Geography [6]. Later, C. F. J. Whebell, in a journal article, expanded this concept by describing corridors as linear pathways that connect different areas within a city [7]. Since then, “transport corridors” have emerged as an important planning concept—multi-modal arterials that carry a large proportion of passenger and freight traffic between major transport nodes.

From a theoretical perspective, passenger transport corridor research contributes to enriching and improving rail transit planning theory systems, providing scientific guidance for network operations [8]. From a practical standpoint, the construction and optimization of passenger transport corridors can enhance system operational efficiency, improve passenger travel experience, and strengthen network resilience and safety [9]. The core advantages of passenger transport corridors lie in: first, their ability to achieve orderly guidance and efficient evacuation of passenger flows, improving overall network capacity; second, optimizing transfer organization to reduce passenger travel time costs; third, enhancing the system’s emergency response capabilities in sudden situations; and fourth, providing support for integrated connections between rail transit and other transportation modes [10]. Therefore, constructing an efficient passenger transport corridor system is of great significance for improving urban rail transit service levels and promoting sustainable urban transportation development [11].

Numerous scholars have explored the identification of urban public transport corridors. In terms of graph theory methods, spatial network analysis has been employed using the topological structure and passenger flow distribution of spatial networks [12]. On the data-driven front, probabilistic tensor decomposition models have been used to identify regular passenger transport corridors and patterns within urban areas [13]. To reduce reliance on expert knowledge, some researchers have identified urban public transportation corridors by aggregating neighboring trajectories or Origin–Destination (OD) flows. Tong et al. developed a shared flow clustering algorithm based on single-vehicle bus trip data to identify corridors [14]. Kinan et al. proposed TraClus-DL for clustering desired line trajectories to identify demand corridors [15], while Zhang et al. introduced a network flow model integrating statistical traffic assignment and probabilistic OD estimation [16]. Some scholars have also established the k-Primary Corridor algorithm based on GPS data, setting the objective function as the minimum generalized cost and maximum aggregate utility along the route [17,18]. Yin and Zhang proposed an identification method for optimal urban bus corridor locations, employing a K-shortest path algorithm to generate candidate corridors and considering aggregation effects along potential routes [19]. While their approach focused on bus corridors and route optimization, our research extends this concept to urban rail transit networks and emphasizes the clustering of Origin–Destination pairs to reveal inherent travel patterns and cross-line transfer behaviors.

Existing corridor identification methods primarily rely on idealized prior assumptions or empirical judgments [4,20], with algorithms that sometimes deviate from the constraints of real-world road network information. Traditional full traffic assignment methods for transit corridor identification are characterized by high computational complexity and are susceptible to “noise” interference generated by Origin–Destination (OD) pairs with low passenger flows [21]. Although some researchers have proposed traffic corridor identification approaches based on clustering theory and graph theory [22,23], the clustering centers obtained by existing methods are newly generated, and the paths between OD pairs are directly connected spatially, resulting in spatial discrepancies with existing route stations. These methods ignore actual passenger trajectory information, significantly undermining the reliability and accuracy of corridor identification results.

In summary, traditional passenger transport corridor identification methods primarily include empirical judgment, two-step clustering, and travel expectation graph approaches. These approaches fall short when handling high-dimensional data generated by large-scale rail transit networks, failing to effectively extract underlying passenger flow patterns. Furthermore, they neglect the spatiotemporal dynamics of passenger flows and lack the capacity to adapt to complex network structures. When confronting networked operations involving multiple lines and transfer nodes, traditional methods struggle to accurately capture the patterns of passenger flow variation across different time periods and spatial regions.

3. Methodology

3.1. Overall Framework for Passenger Transport Corridor Identification

To identify passenger corridors in urban areas, we propose a novel corridor identification method based on OD (Origin–Destination) clustering. Figure 1 shows the flowchart of the method. Specifically, OD clustering can identify OD pairs with high similarity, and these OD pairs constitute the components of passenger transport corridors. Each OD pair can correspond to multiple possible paths within the metro network. To accurately allocate passenger flow between these OD pairs, we calculate the path and interval flows for each pair under multiple route selections, thereby identifying the primary passenger flow paths for clustered OD pairs. A multi-path Logit model is employed for the passenger flow assignment, which has been widely validated in metro networks [24]. This model probabilistically assigns flows across multiple feasible paths based on travel impedance, effectively capturing passenger route choice behavior under diverse travel conditions. Based on this, we analyze the spatial mapping relationship between passenger transport corridors and clustered OD pairs to determine the basic compositional units of corridors, thereby completing corridor identification.

3.2. OD Clustering for Critical OD Pair Identification

OD passenger flow clustering differs from traditional clustering methods in Euclidean space. It is not based solely on Euclidean distance similarity, i.e., elements with high distance similarity are grouped together. The distance function in OD clustering must take into account the specific characteristics of the data, which involves designing appropriate clustering variables, constructing custom distance functions, and ultimately generating a distance matrix.

Moreover, the output of OD clustering includes several key components: the number of clusters, the number of sample points within each cluster, the number of outliers in each cluster, and the cluster assignment for each sample point. However, the process does not provide information about the cluster centroids. Therefore, it is necessary to determine the cluster centroids that hold actual physical significance.

Lastly, an essential step is to evaluate the effectiveness of the OD clustering results in order to identify the optimal clustering scheme.

3.2.1. Custom Distance Metric for OD Clustering

An OD pair

O D_{i}

is defined as a spatial vector (

P_{i}^{O}, P_{i}^{D}

), where the

P_{i}^{O} = (x_{i}^{O}, y_{i}^{O})

and

P_{i}^{D} = (x_{i}^{D}, y_{i}^{D})

are the origin point and destination point with coordinates.

Cluster analysis can be used to identify the destinations of a large number of passengers traveling by urban rail transit. The similarity measurement method of the clustering objects is the basis of clustering. Due to the different types and characteristics of clustering data, different similarity measurement methods need to be adopted to characterize the clustering objects. When clustering OD data, the similarity measurement mainly considers the distance between the corresponding destinations on the travel activity chain and the overlap between destinations.

The distance of the OD pair can be defined by the sum of the origin points path distance (

L_{i j}^{O}

) and the destination points path distance (

L_{i j}^{D}

) between the

O D_{i}

and the

O D_{j}

. The path distances

L_{i j}^{O}

and

L_{i j}^{D}

are measured as the actual travel distance passengers would experience when riding the urban rail transit system, fully accounting for the real route topology and network structure. This measurement reflects both the physical distance and the operational characteristics of the transit network, ensuring that the similarity function captures realistic travel patterns.

Path overlap (

γ_{i j}

) is an important function for measuring the similarity of travel paths between OD pairs, calculated based on the number of stations along the path.

γ_{i j} = \frac{S_{i j}}{m a x \{S_{i}, S_{j}\}}

(1)

where

S_{i j}

represents the number of stations shared by both OD paths, and

S_{i}

and

S_{j}

denote the total number of stations on the OD paths of the

O D_{i}

and

O D_{j}

, respectively.

Therefore, the custom distance measurement formula for OD clustering is as follows:

D_{i j} = α \times Y_{i j}^{O} + β \times Y_{i j}^{D} + δ \times R_{i j}

(2)

where

Y_{i j}^{O}

and

Y_{i j}^{D}

are distance variables (

L_{i j}^{O}

,

L_{i j}^{D}

) after dimensionless processing,

R_{i j}

is the overlap variable (

γ_{i j}

) after inverse processing. In this study,

α

and

β

represent the weights for distance variables (

L_{i j}^{O}

,

L_{i j}^{D}

), respectively. Since both the origin and destination are equally fundamental components of an OD pair, they are assigned equal importance in measuring spatial similarity. Neither component should be prioritized over the other in passenger corridor identification. The parameter

δ

weights the path overlap component. Setting

α = β = 0.5

and

δ = 1

ensures that the path overlap similarity receives comparable consideration to the combined spatial distance. This reflects the principle that passengers with similar spatial distributions and shared travel routes should be clustered together.

3.2.2. Physical Meaningful Centroids Determination

In traditional clustering analysis, cluster centers are typically generated randomly, usually based on experimental testing or subjective judgment. The cluster centers in the OD clustering proposed in this study differ from those in traditional methods. To accurately identify passenger transport corridors for cross-line renovation, the cluster centers in OD clustering should be existing stations that appear in pairs, with each cluster containing two centers representing the origin point and the destination.

The optimal cluster centroids (

μ^{O}, μ^{D}

) should satisfy two requirements: the sum of the distances from all points within the same class to the centroids should be minimal, and the OD flow of the centroids should be maximal.

μ^{O} = \arg m i n_{O D_{i} \in Ω_{k}} \{[\sum_{j = 1, j \neq i}^{m} d^{O} (P_{i}, P_{j})] \times (1 - \frac{V_{i}}{\sum_{j = 1}^{m} V_{j}})\}

(3)

μ^{D} = \arg m i n_{O D_{i} \in Ω_{k}} \{[\sum_{j = 1, j \neq i}^{m} d^{D} (P_{i}, P_{j})] \times (1 - \frac{V_{i}}{\sum_{j = 1}^{m} V_{j}})\}

(4)

μ^{D}, μ^{D} \in Ω_{k}

(5)

where

d^{O} (P_{i}, P_{j})

and

d^{D} (P_{i}, P_{j})

denote the distances between the origin points and the destination points in the

O D_{i}

and

O D_{j}

, respectively.

V_{i}

and

V_{j}

represent the OD flow of the

O D_{i}

and

O D_{j}

, and

(1 - \frac{V_{i}}{\sum_{j = 1}^{m} V_{j}})

means the maximum OD flow proportion of

O D_{i}

.

m

is the number of sample points within the cluster

k

.

The cluster centroids (

μ^{O}, μ^{D}

) are determined through a selection process that balances spatial centrality with passenger flow importance. The objective is to identify an OD pair within the cluster that minimizes the total distance to all other cluster members while prioritizing high-flow OD pairs. The algorithm pseudocode is shown in Algorithm 1.

Algorithm 1: Cluster centroids selection
Input: Cluster: A set of OD pairs { $O D_{1}$ , $O D_{2}$ , …, $O D_{m}$ }; $V$ : Passenger flow vector { $V_{1}$ , $V_{2}$ , …, $V_{m}$ }
Output: ( $μ^{O}, μ^{D}$ ): Origin and destination centroids
1:	//Step 1: Calculate total passenger flow
2:	TotalFlow ← $\sum_{j = 1}^{m} V_{j}$
3:
4:	//Step 2: Find origin centroid
5:	MinScore_O
6:	for each $O D_{i}$ in Cluster do
7:	DistSum
8:	for each $O D_{j}$ in Cluster ( $j \neq i$ ) do
9:	DistSum ← DistSum + $d^{O} (P_{i}, P_{j})$
10:	end for
11:	Score_ $i$ ← DistSum × $(1 - V_{i} / T o t a l F l o w)$
12:	if Score_ $i$ < MinScore_O then
13:	MinScore_O ← Score_ $i$
14:	$μ^{O}$ ← $O D_{i}$
15:	end if
16:	end for
17:
18:	//Step 3: Find destination centroid (same process)
19:	MinScore_D
20:	for each $O D_{i}$ in Cluster do
21:	DistSum
22:	for each $O D_{j}$ in Cluster ( $j \neq i$ ) do
23:	DistSum ← DistSum + $d^{D} (P_{i}, P_{j})$
24:	end for
25:	Score_ $i$ ← DistSum × $(1 - V_{i} / T o t a l F l o w)$
26:	if Score_ $i$ < MinScore_D then
27:	MinScore_D ← Score_ $i$
28:	$μ^{D}$ ← $O D_{i}$
29:	end if
30:	end for
31:
32:	return ( $μ^{O}, μ^{D}$ )

For each OD pair

i

in the cluster, we calculate a weighted score that combines two components: (1) the sum of distances from this pair to all other pairs in the cluster, and (2) an inverse passenger flow weight

(1 - \frac{V_{i}}{\sum_{j = 1}^{m} V_{j}})

. The OD pair with the minimum weighted score is selected as the centroid. This calculation is performed separately for origin points (Equation (3)) and destination points (Equation (4)).

3.2.3. Optimal Clustering Scheme Selection

Clustering evaluation metrics are measurement methods used to assess the effectiveness of clustering. The higher the similarity of objects within a cluster and the lower the similarity between clusters, the better the clustering effect. Commonly used clustering effectiveness metrics are internal metrics, which evaluate the quality of clustering based on the distribution characteristics of the data itself. The effectiveness of clustering is primarily evaluated based on two types of metrics: intra-cluster compactness and inter-cluster separation, which correspond to Compactness (CP) and Separation (SP), respectively. Furthermore, the Davies–Bouldin Index (DBI) is employed to comprehensively reflect both intra-cluster compactness and inter-cluster separation.

For a clustering schemes with

k

clusters in the

m^{th}

clustering scheme,

c p_{l} = \frac{1}{2 n} \sum_{P_{i}^{O} \in Ω_{l}} [d^{O} (P_{i}^{O}, μ_{l}^{O}) + d^{D} (P_{i}^{D}, μ_{l}^{D})]

(6)

C P^{m} = \frac{1}{k} \sum_{l = 1}^{k} c p_{l}

(7)

where

C P^{m}

represents the intra-cluster compactness of the

m^{th}

scheme, and

c p_{l}

represents the compactness of the

l^{th}

cluster of the

m^{th}

scheme.

d^{O} (P_{i}^{O}, μ_{l}^{O})

and

d^{D} (P_{i}^{D}, μ_{l}^{D})

denote the distances between the origin points and the destination points in the

O D_{i}

and cluster centroids (

μ_{l}^{O}, μ_{l}^{D}

), respectively.

n

denotes the number of OD pairs within the

l^{th}

cluster, and

Ω_{l}

denotes the set of OD pairs within the

l^{th}

cluster.

S P^{m} = \frac{1}{k^{2} - k} \sum_{i = 1}^{k} \sum_{j = 1}^{k} [d (μ_{i}^{O}, μ_{j}^{O}) + d (μ_{i}^{D}, μ_{j}^{D})]

(8)

where

S P^{m}

represents the inter-cluster separation of the

m^{th}

scheme, and

d (μ_{i}^{O}, μ_{j}^{O})

and

d (μ_{i}^{D}, μ_{j}^{D})

denote the distances between the origin centroid points and the destination centroid points in the

O D_{i}

and

O D_{j}

cluster, respectively.

D B I^{m} = \frac{1}{k} \sum_{i = 1, i \neq j}^{k} m a x \{\frac{c p_{i} + c p_{j}}{m i n [d (μ_{i}^{O}, μ_{j}^{O}) + d (μ_{i}^{D}, μ_{j}^{D})]}\}

(9)

where

D B I^{m}

reflects the clustering quality by comparing intra-cluster distance with inter-cluster distance.

c p_{i}

and

c p_{j}

represent the compactness of the

O D_{i}

and

O D_{j}

cluster of the

m^{th}

scheme, respectively.

The smaller the

C P

value, the better the intra-cluster compactness; the larger the

S P

value, the better the inter-cluster separation; the smaller the

D B I

value, the higher the intra-cluster similarity and the lower the inter-cluster similarity, resulting in better clustering performance.

Use the entropy weight method to determine the weights of the three internal indicators, thereby obtaining the comprehensive clustering effect score

ρ

of the schemes.

ρ = θ_{1} \cdot C P^{'} + θ_{2} \cdot S P^{'} + θ_{3} \cdot D B I^{'}

(10)

where

C P^{'}

,

S P^{'}

, and

D B I^{'}

are the standardized data, respectively.

3.3. Corridor Identification and Spatial Analysis

The OD pairs obtained through OD clustering are isolated and dispersed. The passenger flow allocation method is employed to process clustered OD pairs, thereby identifying the primary passenger flow paths of these clustered pairs. These paths serve as the constituent units for identifying passenger transport corridors.

The constituent units of passenger transport corridors include location information and passenger flow information. It is necessary to analyze the spatial relationship between the OD flows that constitute passenger transport corridors and urban rail transit lines to more clearly distinguish whether passenger transport corridors are cross-line.

The cluster centers obtained from OD cluster analysis are the basis for identifying passenger transport corridors and are also the objects for passenger flow distribution. After completing the passenger flow distribution, the cluster centers and their main passenger flow paths are regarded as the basic units that constitute passenger transport corridors.

Let

C

denote the set of OD clustering results and

L

denote the set of passenger transport corridors. A function

f

is defined as a mapping from

C

to

L

, written as

f : C \to L

, if and only if for each element

x \in C

, there exists a unique element

f (x) \in L

such that

f (x)

corresponds to

x

.

f

performs passenger flow allocation processing on multiple OD clustering results, which facilitates the calculation of critical quantitative metrics for passenger transport corridors, specifically: (1) the cross-sectional passenger flow volumes within the corridors, and (2) the cumulative mileage of route segments that meet specified passenger flow capacity requirements.

When a single passenger transport corridor component unit already satisfies the corridor identification criteria, this component unit can independently serve as a passenger transport corridor.

When a single passenger transport corridor component unit fails to meet the corridor identification criteria, it is necessary to select multiple connectable passenger transport corridor component units that collectively satisfy the identification standards. Multiple component units can be considered connectable if they meet the conditions of end-to-end connection in the same direction or endpoint connection along the same orientation.

When multiple component units cannot be connected end-to-end or at endpoints, assessment of the disconnected segments is required. If the shortest path length of a disconnected segment is greater than or equal to the length of the shortest passenger transport corridor component unit, these units cannot be considered connectable components. Conversely, if the shortest path length of a disconnected segment is less than the length of the shortest passenger transport corridor component unit, then the multiple component units and the shortest path of their disconnected segments can be treated as connectable passenger transport corridor component units.

Passenger transport corridors based on OD clustering can be classified as either cross-line or non-cross-line types. The basic units constituting a corridor may consist of either a single unit or multiple units. Specific examples are provided in Table 1.

To ensure reproducibility, the following quantitative thresholds are established:

(1): Minimum cross-sectional flow: ≥10,000 passengers/day.
(2): Corridor length criterion: ≥50% of length maintains flow ≥5000 pax/day.

These thresholds were determined based on Beijing’s operational standards for high-capacity corridors and validated through consultation with transit planning experts. Different cities may need to adjust these values based on their network scale and operational characteristics.

4. Data and Results

4.1. Case Study Description

Beijing, the capital of China with a population of approximately 22 million, serves as the case study for this research [25]. As one of the world’s largest metropolitan areas, Beijing has developed an extensive multi-modal transportation network to meet its substantial travel demand. The city’s public transportation system handles over 20 million trips daily, with urban rail transit accounting for approximately 60% of public transport ridership, underscoring its critical role in the city’s mobility infrastructure [26]. Table 2 summarizes the key characteristics of Beijing and its urban rail transit system. The network’s extensive coverage, high ridership, and complex transfer patterns make Beijing an ideal case for demonstrating the proposed passenger transport corridor identification method.

4.2. Data Description

We used the all-day Origin–Destination (OD) passenger flow data from Beijing’s urban rail transit system on 21 February 2022 as the dataset for this passenger transport corridor identification study. In the dataset, each record represents OD passenger flow data aggregated at five-minute intervals. After cleaning and preprocessing the data, we selected 21,361 OD passenger flow records and performed cluster analysis on them. The sample data is as shown in Table 3. OriginID and DestinationID represent the codes for the origin and destination stations, with each station having a unique identifier; L_od represents the distance between the origin and destination stations, measured in kilometers; T_s represents the statistical time of the record, such as ‘21 February 2022 04:35:00’, indicating that the data was collected from 4:35 a.m. to 4:40 a.m. on 21 February 2022; and F_od represents the passenger flow.

4.3. Identifying Passenger Corridors by OD Clustering

4.3.1. Comparison of Clustering Methods

To ensure the selection of the most appropriate clustering algorithm for OD pair identification, we conducted a comparative analysis of four widely-used clustering methods: K-means, DBSCAN (density-based spatial clustering of applications with noise), HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise), and Spectral Clustering. Each method was applied to the same Beijing urban rail transit dataset under comparable parameter settings to evaluate their performance in identifying passenger transport corridors.

K-means is a partition-based algorithm that requires pre-specification of cluster numbers and assumes spherical cluster shapes, which may not suit the irregular spatial distributions of transit OD pairs. DBSCAN can automatically identify clusters of arbitrary shapes based on the density distribution of data points without requiring the number of clusters to be specified in advance. HDBSCAN constructs a cluster hierarchy to extract stable clusters across varying density thresholds. Spectral Clustering uses graph theory to identify non-convex cluster structures but is computationally intensive and parameter-sensitive for large datasets [27].

To ensure fair comparison, we applied identical preprocessing procedures to all methods and tuned their respective parameters to achieve optimal performance. The clustering results were evaluated using the clustering effectiveness metrics defined in Section 3.2.3: Compactness (CP), Separation (SP), Davies–Bouldin Index (DBI), and comprehensive score (

ρ

).

Table 4 presents the comparative performance of the four clustering methods. The results demonstrate that DBSCAN achieves superior performance across all evaluation metrics. Specifically, DBSCAN attains the lowest CP value (1071.01), indicating the tightest intra-cluster cohesion, and the highest SP value (19,998.01), reflecting the strongest inter-cluster separation. Its DBI value (0.21) is substantially lower than the other methods, suggesting an optimal balance between cluster compactness and separation.

The superior performance of DBSCAN can be attributed to several factors. First, unlike K-means, DBSCAN does not require pre-specification of cluster numbers, allowing it to adaptively identify clusters based on data density characteristics. Second, DBSCAN effectively handles clusters of arbitrary shapes and varying densities, which is crucial for capturing the complex spatial patterns of OD pairs in urban rail networks. Third, DBSCAN’s ability to identify and exclude noise points prevents the inclusion of sporadic, low-frequency trips that could distort corridor identification. While HDBSCAN offers hierarchical clustering capabilities, its performance in this application context is constrained by the relatively uniform density distribution of high-volume OD pairs. Spectral Clustering, despite its theoretical advantages in detecting non-convex structures, exhibits computational inefficiency and parameter sensitivity that limit its practical applicability to large-scale transit data.

Based on these findings, DBSCAN was selected as the optimal clustering algorithm for passenger transport corridor identification in this study.

4.3.2. OD Clustering Based on DBSCAN

Building upon the comparative analysis presented in Section 4.3.1, this section details the implementation of DBSCAN for OD clustering. DBSCAN is a density-based clustering method [28]. It includes two clustering parameters, namely the neighborhood radius eps and the minimum number of points minpts, which are used to reflect different degrees of density.

Using the DBSCAN method for OD clustering, we detected clustering scheme results for different eps and minpts, as shown in Table 5. From the scores

ρ

in Table 5, scheme 5 is the optimal OD clustering result. Figure 2 shows the optimal OD clustering results, which contains 21 clusters with eps as 0.46 and minpts as 980. The clustered OD volume ranges from 2000 to 300,000 passenger trips, with trip distances spanning 2 to 40 km.

Figure 3 displays the clustering results for the top 800 OD pairs with the highest volume, including the number of OD flows in each cluster. Combining Figure 2 and Figure 3 reveals that this scheme identifies eight distinct travel patterns with clear operational significance:

Short-distance commuting within the urban core: average trip distance of 4.2 km, connecting major residential areas with the CBD;
Long-distance inter-district commuting corridors: average trip distance of 11.8 km, reflecting urban spatial structure characteristics;
Peripheral new town to urban core commuting: average trip distance of 15.3 km, indicating urban expansion characteristics;
Transportation hub distribution trips: connecting major transportation nodes, including airports and high-speed rail stations;
Industrial park commuting corridors: serving commuting demands of major industrial agglomeration areas;
Educational campus commuting patterns: connecting trips to university towns and other educational facilities;
Commercial center shopping travel patterns: non-commuting trips connecting major commercial centers;
Cross-regional long-distance travel patterns: inter-city trips with neighboring municipalities.

It is worth noting that commuting-related patterns (Patterns 1, 2, 3, 5, and 6) exhibit distinct temporal characteristics with tidal flow patterns, featuring concentrated passenger flows during morning and evening peak hours. Additionally, this scheme identified 206 noise point pairs, primarily consisting of sporadic trips and temporary irregular travel patterns.

4.3.3. Identification Results and Spatial Analysis of Passenger Transport Corridors

A multi-path Logit model was used to allocate passenger flows based on the OD clustering results. Based on the OD clustering, 21 pairs of clustered OD centers were identified, and 105 valid paths were detected. Taking the Fengbo–Dongzhimen OD pair as an example for illustration. This OD pair identified 5 valid paths, with their respective selection probabilities as follows: Path 1 at 24.8%, Path 2 at 22.2%, Path 3 at 29%, Path 4 at 17.3%, and Path 5 at 16.7%, as shown in Figure 4. This demonstrates the diversity of potential routes available to passengers travelling between these two key stations. Based on the passenger flow distribution results and the spatial distribution of actual passenger trips reflected in the cluster OD, six passenger transport corridors, which include cross-line corridors and non-cross-line corridors, were finally determined, as shown in Figure 5. The varying shades of color within the corridors indicate the intensity of passenger flow along each route, with darker hues representing higher volumes of passengers choosing that particular corridor for their trips. The specific corridor results are shown in Table 6.

Using the Tuqiao–Tiantongyuan North corridor as an example (Figure 6), this corridor primarily handles commuter flows between the city’s sub-center (Tongzhou District), the business CBD (Chaoyang District), and large residential areas (Changping District). The maximum cross-sectional passenger flow between intervals reaches approximately 80,000 passengers, with over 85% of the corridor’s length having cross-sectional flows exceeding 10,000 passengers, and 40% of the corridor having flows of more than 50,000 passengers. This corridor is formed by three clustered ODs: Tuqiao—Dawanglu, Dawanglu—Datunlu Dong, and Datunlu Dong—Tiantongyuan North. Among these, the Dawanglu—Datunlu Dong cluster represents a cross-line cluster OD pair with significant passenger intensity. This OD involves the urban rail transit Lines 1 and 5, with Dongdan Station serving as the interchange station for both lines and bearing substantial transfer pressure.

Given the corridor’s route and surrounding land use characteristics—including the Central Business District (CBD), Wangfujing Pedestrian Street, and the large residential area of Tiantongyuan—it exhibits strong attractiveness and tidal passenger flow patterns, with significant cross-line travel demand. Therefore, implementing cross-line modifications at transfer stations with high pressure would not only enhance the direct connectivity of the rail network and alleviate operational burdens at these stations but also help relieve passenger pressure on adjacent lines (such as Lines 1 and 5), thereby promoting the interconnectivity of the urban rail transit network.

4.4. Sensitivity Analysis of eps on Clustering Results

4.4.1. Test Scope and Parameter Settings

The core parameter eps (neighborhood radius) of the DBSCAN algorithm significantly influences clustering results. Minor variations in the eps value may lead to differences in cluster numbers and cluster structures, thereby affecting the accuracy and reliability of key travel OD pair identification. To evaluate the impact of the eps parameter on clustering performance, a sensitivity analysis was conducted on the eps parameter [29].

OD pairs with daily travel frequency ≥ 68 were selected as the analysis subjects to ensure data representativeness and stability. To balance computational efficiency with analytical precision, a random sampling strategy was employed to control the data scale within 1000 records. To ensure representativeness, the sample was stratified by time period, spatial distribution, and flow magnitude. The minimum points parameter (minpts) was fixed at 50, while the eps parameter range was set to [0.20, 0.99] with a step size of 0.01, resulting in 80 test points.

4.4.2. Sensitivity Analysis Results

DBSCAN algorithm was executed for each eps value, and five indicators were calculated: cluster number, compactness indicator (CP), separation indicator (SP), Davies–Bouldin index (DBI), and comprehensive score

ρ

. The influence patterns of the parameter on these indicators are as shown in Figure 7.

Experimental results demonstrate that cluster numbers exhibit an exponential decline trend as eps values increase. When eps < 0.50, the cluster number exceeds 800, indicating evident over-segmentation phenomena. When eps > 0.60, cluster numbers decrease to fewer than 3, potentially causing erroneous merging of different travel OD pairs. The compactness indicator decreases as eps values increase, which aligns with theoretical expectations. When eps < 0.50, CP > 1.0, indicating excessively loose intra-cluster structures. The rapid growth point of the compactness indicator occurs around eps = 0.50, suggesting this region represents a parameter-sensitive interval. The separation indicator increases in stages as eps values increase, exhibiting a mirror relationship with the compactness indicator. When eps > 0.50, the indicator begins to change significantly. When SP approaches 1, inter-cluster separation is good but may result in over-segmentation. The DBI variation curve exhibits a fluctuating distribution pattern. The minimum value is reached around eps = 0.52, indicating that clustering results achieve an optimal balance between compactness and separation at this point. When eps deviates from this optimal value, DBI increases significantly, and clustering quality deteriorates markedly. The comprehensive score exhibits a gradual upward trend as eps values change, with the peak occurring around eps = 0.6. This indicates the existence of an optimal eps value that effectively balances intra-cluster compactness and inter-cluster separation, achieving optimal OD clustering performance.

Furthermore, as shown in Figure 8, multiple indicators are mutually influential, with cluster numbers demonstrating strong correlations with inter-cluster compactness and separation. Therefore, single-indicator optimization cannot be considered in isolation. Consequently, to avoid the limitations of single-indicator optimization, introducing the comprehensive score in clustering performance evaluation can effectively control overall clustering quality.

5. Conclusions and Discussion

5.1. Conclusions

This study presents a comprehensive analysis of travel demand intensity between origins and destinations in urban rail transit networks, addressing a critical challenge in urban transportation planning and management. The proposed Origin–Destination (OD) clustering method introduces a novel approach that advances beyond traditional clustering techniques by incorporating domain-specific characteristics of transit networks.

The customized similarity calculation method developed in this research represents a significant methodological advancement. Unlike conventional clustering approaches, our method employs a tailored distance function that generates clustering center pairs with concrete physical meaning, directly corresponding to actual station locations within the network. This innovation enables more accurate and interpretable results that align with the operational realities of urban rail systems. The establishment of a mapping relationship between passenger transport corridors and clustered OD pairs provides a systematic framework for corridor identification, bridging the gap between data-driven analysis and practical transportation planning.

The application to Beijing’s urban rail transit system demonstrates the method’s practical viability and reveals several important insights. The identified passenger transport corridors not only exhibit high-concentration passenger flows but also provide valuable information about transfer patterns within the network. By incorporating spatial location information of clustering OD pairs, the method successfully captures the complexity of passenger movement patterns, including multi-line journeys and transfer behaviors. These findings have direct implications for network planning, capacity allocation, and service optimization strategies.

This research contributes to the theoretical understanding of urban mobility patterns by providing a quantitative framework for analyzing the hierarchical structure of travel demand in complex transit networks. The method offers a new perspective on understanding the relationship between network topology and passenger flow distribution, which is essential for developing more efficient and responsive urban transportation systems.

5.2. Discussion

While this study provides valuable contributions to passenger transport corridor identification, several limitations should be acknowledged, which also point to promising directions for future research.

This study utilizes OD flow data from Beijing’s urban rail transit system during a specific time period, which may not fully capture seasonal variations or long-term evolutionary trends in travel patterns. The analysis focuses exclusively on urban rail transit data without incorporating information from other transportation modes. Future research should integrate multi-modal transportation data, including buses, shared bicycles, and ride-hailing services, to achieve a more comprehensive understanding of urban mobility corridors and seamless connections between different transport modes.

Although the DBSCAN clustering method offers flexibility through adjustable parameters, the eps parameter significantly impacts metrics such as the number of clusters and clustering quality. Different eps values can yield entirely distinct clustering results. Due to computational resource constraints, sensitivity analysis experiments employed a subset of 1000 samples. While this sample size represents 5% of the total dataset, sensitivity analysis results indicate that the observed trends in clustering performance metrics align with those of the full dataset. However, we acknowledge that a larger sample size would provide more robust validation of the sensitivity results. Future research will focus on validating the stability of sensitivity analysis results on full datasets while optimizing the joint parameter space of eps and minpts. The method’s universality will be verified across diverse urban datasets to enhance the robustness of passenger transport corridor identification based on OD clustering.

Although this method successfully identifies passenger flow corridors based on core OD data, the actual implementation of through-service operations or connectivity improvements must consider additional operational constraints beyond the scope of this study. These include vehicle compatibility, signal system integration, depot locations, crew scheduling, and operational safety requirements. Future research should establish an integrated framework to bridge the gap between analytical corridor identification and operational implementation.

Looking ahead, this study will expand into multi-modal transportation integration, as urban mobility increasingly relies on seamless connections between different transport modes. Subsequent research will delve into other factors influencing passenger corridor formation, including temporal variability (such as peak-hour vs. off-peak patterns), socioeconomic characteristics, and land use patterns. These extensions will not only improve the quantitative precision of corridor identification but also provide comprehensive theoretical guidance for enhancing urban public transit connectivity. Furthermore, the insights gained will support the development of innovative operational strategies, such as cross-line train services and dynamic capacity allocation, ultimately contributing to the construction of more sustainable and efficient urban transportation systems.

Author Contributions

Conceptualization, F.Z. and H.Y.; methodology, F.Z. and H.Y.; validation, F.Z. and J.Y.; formal analysis, F.Z.; investigation, F.Z. and J.Y.; data curation, F.Z. and J.Y.; writing—original draft, F.Z.; writing—review and editing, J.Y.; visualization, F.Z.; supervision, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation, grant number L221006.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the first author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OD	Origin–Destination
CP	Compactness
SP	Separation
DBI	Davies–Bouldin Index
DBSCAN	Density-based spatial clustering of applications with noise
CBD	Central Business District

Nomenclature

Symbol	Definition
$O D_{i}$	$The i^{th}$ Origin–Destination (OD) pair
$P_{i}^{O}$ $, P_{i}^{D}$	$Origin and destination points of O D_{i}$
$L_{i j}^{O}$	$Actual travel distance between origin stations i$ $and j$ along urban rail transit network
$L_{i j}^{D}$	$Actual travel distance between origin stations i$ $and j$ along urban rail transit network
$γ_{i j}$	$Path overlap ratio between OD pairs i and j$
$S_{i}$	$Total number of stations on the path of OD pair i$
$S_{i j}$	$Number of stations shared by OD pairs i and j$
$Y_{i j}^{O}$ $, Y_{i j}^{D}$	Normalized distance variables
$R_{i j}$	$Inverse - processed overlap variable (1 - γ_{i j})$
$α, β, δ$	Weighting parameters
$D_{i j}$	$Custom composite distance between OD pairs i and j$
$μ^{O}$ $, μ^{D}$	Cluster centroids (origin and destination)
$V_{i}$	$Passenger flow of OD pair i$
$m$	Number of OD pairs within a cluster

References

Yu, C.; Dong, W.; Lin, H.; Lu, Y.; Wan, C.; Yin, Y.; Qin, Z.; Yang, C.; Yuan, Q. Multi-layer regional railway network and equitable economic development of megaregions. npj Sustain. Mobil. Transp. 2025, 2, 3. [Google Scholar] [CrossRef]
Verano-Tacoronte, D.; Flores-Ureba, S.; Mesa-Mendoza, M.; Llorente-Muñoz, V. Evolution of scientific production on urban passenger transport: A bibliometric analysis. Eur. Res. Manag. Bus. Econ. 2024, 30, 100239. [Google Scholar] [CrossRef]
Ji, Y.; Huang, Y.; Yang, M.; Leng, H.; Ren, L.; Liu, H.; Chen, Y. Physics-informed deep learning for virtual rail train trajectory following control. Reliab. Eng. Syst. Saf. 2025, 261, 111092. [Google Scholar] [CrossRef]
Qi, G.; Li, X.; Li, S.; Pan, G.; Wang, Z.; Zhang, D. Measuring social functions of city regions from large-scale taxi behaviors. In Proceedings of the 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), Seattle, WA, USA, 21–25 March 2011; IEEE: New York, NY, USA, 2011; pp. 384–388. [Google Scholar]
Gorev, A.E.; Solodkii, A.I.; Popova, O.V.; Ospanov, D.T. Formation of priority movement corridors of urban passenger transport. Proc. IOP Conf. Ser. Mater. Sci. Eng. 2019, 632, 012013. [Google Scholar] [CrossRef]
Gibert, A.; Griffith, T. Urban geography. A study of site, evolution, pattern, and classifications in villages, towns and cities. Géocarrefour 1950, 25, 247–248. [Google Scholar]
Whebell, C.F.J. Corridors: A theory of urban systems. Ann. Assoc. Am. Geogr. 1969, 59, 1–26. [Google Scholar] [CrossRef]
Bertolini, L. Spatial development patterns and public transport: The application of an analytical model in the Netherlands. Plan. Pract. Res. 1999, 14, 199–210. [Google Scholar] [CrossRef]
Chorus, P.; Bertolini, L. Developing transit-oriented corridors: Insights from Tokyo. Int. J. Sustain. Transp. 2016, 10, 86–95. [Google Scholar] [CrossRef]
Guzman, L.A.; Cardona, S.G. Density-oriented public transport corridors: Decoding their influence on BRT ridership at station-level and time-slot in Bogotá. Cities 2021, 110, 103071. [Google Scholar] [CrossRef]
Liu, L.; Zhang, M.; Xu, T. A conceptual framework and implementation tool for land use planning for corridor transit oriented development. Cities 2020, 107, 102939. [Google Scholar] [CrossRef]
Zhong, C.; Arisona, S.M.; Huang, X.; Batty, M.; Schmitt, G. Detecting the dynamics of urban structure through spatial network analysis. Int. J. Geogr. Inf. Sci. 2014, 28, 2178–2199. [Google Scholar] [CrossRef]
Sun, L.; Axhausen, K.W. Understanding urban mobility patterns with a probabilistic tensor factorization framework. Transp. Res. Part B Methodol. 2016, 91, 511–524. [Google Scholar] [CrossRef]
Zhang, T.; Li, Y.; Yang, H.; Cui, C.; Li, J.; Qiao, Q. Identifying primary public transit corridors using multi-source big transit data. Int. J. Geogr. Inf. Sci. 2020, 34, 1137–1161. [Google Scholar] [CrossRef]
Bahbouh, K.; Wagner, J.R.; Morency, C.; Berdier, C. Travel demand corridors: Modelling approach and relevance in the planning process. J. Transp. Geogr. 2017, 58, 196–208. [Google Scholar] [CrossRef]
Zhang, P.; Ma, W.; Qian, S. Cluster analysis of day-to-day traffic data in networks. Transp. Res. Part C Emerg. Technol. 2022, 144, 103882. [Google Scholar] [CrossRef]
Jiang, Z.; Evans, M.; Oliver, D.; Shekhar, S. Identifying K Primary Corridors from urban bicycle GPS trajectories on a road network. Inf. Syst. 2016, 57, 142–159. [Google Scholar] [CrossRef]
Evans, M.; Oliver, D.; Shekhar, S.; Harvey, F. Summarizing trajectories into k-primary corridors: A summary of results. In Proceedings of the International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 6–9 November 2012; ACM: New York, NY, USA, 2012. [Google Scholar]
Yin, W.; Zhang, Y. Identification method for optimal urban bus corridor location. Sustainability 2020, 12, 7167. [Google Scholar] [CrossRef]
Pechkurov, I.; Plotnikov, D.; Gorev, A.; Kudryavtseva, T.; Banite, A.; Skhvediani, A. Development of a Method for Selecting Bus Rapid Transit Corridors Based on the Economically Viable Passenger Flow Criterion. Sustainability 2023, 15, 2391. [Google Scholar] [CrossRef]
Rocha, M.; Silva, C.A.M.; Junior, R.G.S.; Anzanello, M.; Yamashit, G.; Lindau, L.A. Selecting the most relevant variables towards clustering bus priority corridors. Public Transp. 2020, 12, 587–609. [Google Scholar] [CrossRef]
Zheng, H.; Gao, S.; Cai, C.; Zheng, H.; Pan, Z.; Li, W. A rapid density method for taxi passengers hot spot recognition and visualization based on DBSCAN+. Sci. Rep. 2021, 11, 9420. [Google Scholar] [CrossRef]
Zheng, L.; Feng, Q.; Liu, W.; Zhao, X. Discovering Trip Hot Routes Using Large Scale Taxi Trajectory Data. In Advanced Data Mining and Applications, Proceedings of the International Conference on Advanced Data Mining and Applications, Sydney, Australia, 3–5 December 2024; Springer: Cham, Switzerland, 2016. [Google Scholar]
Raveau, S.; Guo, Z.; Muñoz, J.C.; Wilson, N.H. A behavioural comparison of route choice on metro networks: Time, transfers, crowding, topology and socio-demographics. Transp. Res. Part A Policy Pract. 2014, 66, 185–195. [Google Scholar] [CrossRef]
Beijing Municipal Bureau of Statistics. Beijing Statistical Yearbook 2021; China Statistics Press: Beijing, China, 2021. [Google Scholar]
Beijing Transport Institute. Beijing Transportation Development Annual Report 2021; Beijing Transport Institute: Beijing, China, 2021. [Google Scholar]
Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall Inc.: Eaglewoods Cliffs, NJ, USA, 1988. [Google Scholar]
Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Zhang, K.; Li, J.; Ji, Y. Critical threshold of the differential pressure valve in the air spring system of straddle monorail vehicles. J. Vib. Control 2025, 10775463251341368. [Google Scholar] [CrossRef]

Figure 1. Flowchart of passenger transport corridor identification based on OD clustering.

Figure 2. OD clustering results based on DBSCAN in urban rail transit dataset.

Figure 3. OD clustering results (top 800 OD flow data).

Figure 4. Effective path matching (Fengbo–Dongzhimen).

Figure 5. Beijing urban rail transit all-day passenger corridor identification results.

Figure 6. Composition of clustered OD pairs for cross-line passenger transport corridors.

Figure 7. Changes in Cluster Numbers.

Figure 8. Sensitivity analysis results: (a) Changes in clustering performance metrics; (b) correlation between metrics.

Table 1. Types of passenger corridors based on OD clustering results.

Type Diagram	Number of Passenger Transport Corridor Units	Whether It Is a Cross-Line
	Single	Non-crossing
	Single	Crossing
	Multiple (Non-Overlapping)	Non-crossing
	Multiple (Overlapping)	Non-crossing
	Multiple (Non-Overlapping)	Crossing
	Multiple (Overlapping)	Crossing

Table 2. Characteristics of Beijing and its urban rail transit network.

Category	Item	Value
City Characteristics	Population	22 million
	Urban area	16,410 km²
	Daily public transport trips	20 million
Urban Rail Transit Network	Number of lines	25
	Total length	783 km
	Number of stations	459
	Transfer stations	72
	Average daily ridership	6.2 million
	Annual ridership	2.3 billion

Table 3. Sample data from Beijing’s urban rail transit.

OriginID	DestinationID	L_od	T_s	F_od
S01002	S02002	10	21 February 2022 04:35:00	1
S01003	S03001	15	21 February 2022 05:30:00	10
S02003	S02004	20	21 February 2022 07:30:00	100

Table 4. Comparative performance of clustering methods for OD pair analysis.

Clustering Method	Clustering Evaluation
Clustering Method	$C P$	$S P$	$D B I$	$Score ρ$
K-means	1629.39	2170.33	3.29	0.001
DBSCAN	1071.01	19,998.01	0.21	0.998
HDBSCAN	1424.67	3746.66	3.18	0.118
Spectral Clustering	1479.27	3000.80	3.05	0.102

Table 5. Pre-selected set of scenarios for OD clustering results.

Scheme	eps	minpts	Clusters	Clustering Evaluation
Scheme	eps	minpts	Clusters	$C P$	$S P$	$D B I$	$Score ρ$
1	0.47	850	24	386.94	2973.11	6.43	0.747
2	0.46	990	24	467.95	3021.5	16.30	0.218
3	0.46	920	26	415.77	3160.80	22.05	0.189
4	0.47	960	21	430.82	3601.50	10.12	0.710
5	0.46	980	21	425.96	3213.96	4.93	0.780
6	0.48	1130	22	456.70	3531.20	10.09	0.625

Table 6. Results of passenger transport corridor identification for Beijing urban rail transit.

Corridor	Type	Length (km)	Max Flow (pax/day)	% Length > 5 k pax/day	Key Stations
Tuqiao—Tiantongyuan North	Cross-line	44.3	80,000	88%	Dawang Lu, Guomao, Dongdan, Datunlu Dong, Tiantongyuan North
Lucheng—Haidian Huangzhuang	Cross-line	48.8	62,000	72%	Jintai Lu, Chaoyang Men, Gulou Dajie, Beitucheng, Zhichun Lu, Haidian Huangzhuang
Fengbo—Dongzhi Men	Cross-line	42.5	45,000	67%	Fengbo, Wangjing Xi (W), Wangjing Dong (E), Dongzhi Men
Dawang Lu—Jinan Qiao	Non-cross-line	2.3	29,000	69%	Dawang Lu, Guomao, Wangfujing, Military Museum, Jinan Qiao
Huangcun Railway Station—Qinghe Station	Cross-line	36.1	38,000	57%	Jiaomen Xi (W), Beijing South Railway Station, Xidan, Xizhimen, Qinghe Station
Liangxiang Univ. Town—Weigong Cun	Cross-line	40.1	13,000	53%	Guogongzhuang, Beijing West Railway Station, National Library, Weigong Cun

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, F.; Yao, J.; Yin, H. Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering. Sustainability 2025, 17, 9127. https://doi.org/10.3390/su17209127

AMA Style

Zhou F, Yao J, Yin H. Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering. Sustainability. 2025; 17(20):9127. https://doi.org/10.3390/su17209127

Chicago/Turabian Style

Zhou, Fangyi, Jing Yao, and Haodong Yin. 2025. "Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering" Sustainability 17, no. 20: 9127. https://doi.org/10.3390/su17209127

APA Style

Zhou, F., Yao, J., & Yin, H. (2025). Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering. Sustainability, 17(20), 9127. https://doi.org/10.3390/su17209127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identifying the Passenger Transport Corridors in an Urban Rail Transit Network Based on OD Clustering

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Overall Framework for Passenger Transport Corridor Identification

3.2. OD Clustering for Critical OD Pair Identification

3.2.1. Custom Distance Metric for OD Clustering

3.2.2. Physical Meaningful Centroids Determination

3.2.3. Optimal Clustering Scheme Selection

3.3. Corridor Identification and Spatial Analysis

4. Data and Results

4.1. Case Study Description

4.2. Data Description

4.3. Identifying Passenger Corridors by OD Clustering

4.3.1. Comparison of Clustering Methods

4.3.2. OD Clustering Based on DBSCAN

4.3.3. Identification Results and Spatial Analysis of Passenger Transport Corridors

4.4. Sensitivity Analysis of eps on Clustering Results

4.4.1. Test Scope and Parameter Settings

4.4.2. Sensitivity Analysis Results

5. Conclusions and Discussion

5.1. Conclusions

5.2. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI