1. Introduction
Urban rail transit (URT) has become a prevailing travel mode in major cities worldwide, contributing to compact urban forms, relieving traffic congestion, and reducing emissions [
1,
2]. The global URT infrastructure has undergone rapid development in past decades, a trend markedly exemplified by the expansion in Beijing, China, where 27 URT lines and over 400 URT stations are now in operation [
3,
4]. Meanwhile, with the construction of URT lines and stations, station areas gradually develop distinct functions like residential, employment, or commercial clusters. Station areas are designed to deliver specific urban services, and their functionality is shaped by the interplay between daily human activities and the built environment [
5,
6]. However, as these functions evolve with planning adjustments and behavioral shifts, accurately identifying their current functional characteristics is crucial for effective urban planning and management.
In the context of global efforts toward sustainable urban development, transit-oriented development (TOD) has emerged as a dominant paradigm, which emphasizes high-density, mixed-use development to curb urban sprawl and reduce carbon emissions. Identifying station-area functions is not merely an academic exercise; it is fundamental to realizing the goals of TOD [
7,
8]. It enables the precise optimization of the last mile, a critical yet often inefficient segment of urban travel, ensuring that URT systems truly function as the backbone of a compact urban form. However, traditional methods for identifying functional areas are mostly based on static land-use classifications, which fail to capture functional dynamics as they ignore facility density and human activity patterns [
9]. A commonly employed framework is the Node-Place (N-P) model developed by Bertolini [
10] in 1996, which has since been enriched through the incorporation of new dimensions and indicators in various contexts [
11,
12,
13]. Nevertheless, the N-P model primarily focuses on built environment metrics reflected by point of interest (POI) data, often neglecting dynamic human-environment interactions. While traditional data sources have limitations, emerging mobility data offer potential solutions. Recent studies have used cellular data and smart card records to capture these interactions [
14,
15,
16,
17]. However, the accuracy of cellular signaling data is constrained by the coarse spatial resolution of cell towers, which fails to capture fine-grained activities within station areas. Similarly, smart card data are typically confined to transactions within the station; their integration with multi-source heterogeneous datasets remains challenging, particularly for tracing multi-modal journeys beyond the station gates.
In recent years, the rise of dockless bike sharing (DBS) has set a new path for discovering station-area functions through travel behavior. Due to their flexibility and low cost, DBS systems have expanded globally, serving millions of daily trips and providing station-free, accessible first/last-mile connections to public transport [
18,
19,
20]. Users generate detailed spatiotemporal data by renting and returning bikes near their trip origins and destinations. These data can help infer the functional characteristics of an area based on the spatiotemporal patterns of trip origins and destinations. For example, a station area with high DBS density in the evening and low density during the day is likely residential, whereas an area with the opposite pattern is likely employment-oriented [
9,
21].
To bridge the abovementioned gaps, this study aims to develop an integrated framework that integrates the dynamic spatiotemporal patterns of DBS (specifically day–night variations) with static POI configurations to identify and characterize the functional types of URT station areas in Beijing. In contrast to prior studies that primarily used DBS for city-scale land use classification or generic travel pattern recognition, we focus on station-level functional heterogeneity by quantifying the day–night differentials in DBS usage. Our key contribution is the use of day–night DBS density and distribution ratios to infer functional shifts. This dynamic dimension is often overlooked in traditional models that rely on static built-environment indicators or coarse-grained mobility data, which lack the spatiotemporal resolution to capture station-area specific dynamics. Accordingly, we formulate the following research questions (RQs) and hypotheses (H) to guide our inquiry. The novelty of this research lies in its integrated approach that leverages day–night DBS dynamics to infer functional shifts at the station level, a dimension often overlooked in static models or coarse-grained mobility data analyses.
RQ1: Can the spatiotemporal patterns of dockless bike sharing (DBS), specifically day–night variations in density and parking distribution, be used to classify urban rail transit (URT) stations into functionally distinct types?
H1: URT stations can be classified into distinct functional types based on their spatiotemporal DBS usage patterns (e.g., nighttime density, daytime parking distance, and their day–night ratios). This hypothesis is testable via cluster analysis and is policy-relevant for identifying stations with unique mobility signatures that require tailored infrastructure planning.
RQ2: What are the characteristic differences in DBS usage patterns (e.g., intensity, concentration, and temporal variation) across the different station clusters identified?
H2: There will be statistically significant differences in DBS usage metrics among the different station clusters. We hypothesize directional patterns, such as urban core clusters exhibiting higher DBS density and more concentrated parking than suburban clusters. Confirming this helps calibrate last-mile services according to local demand patterns.
RQ3: Do the station types identified by DBS data exhibit significant differences in their static built environment, as represented by POI configurations?
H3: The clusters derived from DBS data will show statistically significant differences in their POI profiles. Specifically, we hypothesize that clusters with high daytime DBS density will correlate with higher office POI density, while those with high nighttime density will correlate with higher residential POI density. Validating these bridges the gap between dynamic mobility behavior and static land use, enabling a more holistic understanding of station area functions.
The answers to these questions are not merely academic; they provide actionable insights for optimizing bicycle infrastructure planning, calibrating last-mile connections, and fostering TOD principles, ultimately enhancing URT-integrated sustainable urban development.
The structure of this paper is outlined below.
Section 2 reviews relevant literature and provides a formal statement of the research problem.
Section 3 describes the data and method used in this paper. The results and discussion are presented in
Section 4 and
Section 5, followed by the conclusion in
Section 6.
4. Results
4.1. Variable Analysis
To identify key variables for cluster analysis and avoid multicollinearity, we first examined the Pearson correlations among the six DBS-derived metrics (N_den, N_dis, D_den, D_dis, R_D2N_1, R_D2N_2). The Pearson correlation coefficients (Corr.) are presented in
Figure 6. Most of the variable-pairs show weak correlations or no correlation, while N_den and D_den, N_dis, D_dis are correlated with each other. N_den is correlated with D_den with a significant Corr. of 0.98. Similarly, N_dis is correlated with D_dis with significant Corr. of 0.84. To further validate these strong linear relationships, we plotted the scatter plots with linear regression fits for N_den vs. D_den and N_dis vs. D_dis (
Figure 7a,b). The results confirm an almost perfect linear positive correlation between N_den and D_den (Slope = 0.95, R
2 = 0.95), and a strong linear positive correlation between N_dis and D_dis (Slope = 0.81x, R
2 = 0.70). These robust linear dependencies statistically confirm the high redundancy within each variable pair.
Consequently, for the subsequent cluster analysis, we selected only one representative variable from each highly correlated pair as input features to avoid multicollinearity issues. The selection was guided by the principle of minimizing the overall correlation with variables outside the pair. For each candidate variable, we calculated the sum of its absolute Pearson correlation coefficients with all variables not in its own highly correlated pair. The variable with the smaller sum was retained. Between N_den and D_den, N_den was chosen; between N_dis and D_dis, D_dis was chosen. These two variables, together with the ratio variables, constituted the final set of four input features (N_den, D_dis, R_D2N1, R_D2N2) for cluster analysis.
In particular, to quantitatively assess the day–night variations in DBS usage patterns, paired-samples t-tests were conducted to compare the nighttime and daytime metrics. It revealed a statistically significant difference in the spatial distribution of bikes, with the average parking distance at night (N_dis) being significantly greater than during the daytime (D_dis) (t = 3.149, p = 0.002, Mean Difference = 32.59 m). This indicates that bicycles were parked more dispersed around stations at night. Conversely, no significant difference was found between nighttime and daytime density across all stations (t = 0.721, p = 0.471). This lack of a global density shift suggests a complex interplay of urban functions, where the evening influx of bikes into residential areas is counterbalanced by the outflow from employment-centric areas, resulting in no net change on an aggregate level.
4.2. Cluster Result
The optimal number of clusters was determined using the Elbow method.
Figure 8 illustrates the model performance for cluster numbers (k) ranging from 2 to 14. The Elbow method showed that the rate of decrease in the SSE slowed considerably after k = 5 (SSE = 162.10). Concurrently, the ASCS for this solution was 0.43, representing a stable and relatively high value before a general trend of decreasing volatility was observed at higher k values. Therefore, k = 5 was selected as the optimal number, providing a good balance between model fit and cluster interpretability.
The outputs of classification are presented in
Table 2, which details the cluster profiles using the four input variables. For the density and distance variables (N_den and D_dis), the table reports the mean z-scores to facilitate comparison across variables with different units; positive values indicate above-average levels and negative values indicate below-average levels. The ratio variables (R_D2N1 and R_D2N2) are presented in their original, non-standardized form. This is because these ratios are inherently dimensionless and centered around 1.0, which naturally signifies no change between day and night. Standardizing these ratios would obscure this intuitive interpretation without providing additional analytical benefit. The nighttime DBS density (N_den) exhibits a clear decreasing gradient from Cluster 1 (C1) to Cluster 5 (C5). The average park distance during the daytime increases from C1 to Cluster 3 (C3) and reaches the peak in Cluster 4 (C4) and drops to the lowest in Cluster 5 (C5). The two ratios of the DBS density and parking distance from daytime to night are close to 1 except C5. This indicates that day–night DBS usage patterns are relatively stable across all clusters except C5. To test the robustness of the classification, we performed alternative clustering using the variables that were excluded due to high correlation (D_den and N_dis) in place of the selected ones (N_den and D_dis). The results remained largely consistent, which is expected given the high linear correlation within each variable pair, ensuring that both sets convey nearly identical information.
The spatial distribution of each cluster is shown in
Figure 9. These five clusters are distinct, with each being marked by a unique combination of index values and spatial distribution. Of the five clusters, C1 is located around Beijing’s Central Business District (CBD). It is characterized by the highest N_den (2.93) and a R_D2N1 close to 1 (0.96), consistent with sustained, high-intensity DBS usage throughout the day. The low average D_dis (−0.80) and the R_D2N2 close to 1 (1.01) indicate that individuals in C1 are parking dockless bikes at several subcenters instead of evenly distributed in the whole station area. Cluster 2 (C2) (e.g, Jingsong Station, Jiulongshan Station) and Cluster 3 (C3) (e.g., Fangzhuang Station, Jishuitan Station) are mostly located inside the 5th Ring Road of the urban areas, while most station areas in C2 are located in the northeast of the urban areas, and most of the station areas in C3 are scattered in the southwest of the urban and suburban areas. The parking distributions in both C2 and C3 are more concentrated than average; however, obvious differences in DBS usage are observed between them. C2 has higher N_den (0.83) than average, whereas N_den in C3 (−0.39) is below the average. In addition, the R_D2N1 (1.17) is slightly larger in C3 than in C2 (0.99). In other words, dockless bikes are more available in C2 than in C3 both during the daytime and at night.
Cluster 4 (C4) and Cluster 5 (C5) show almost the opposite trend in terms of the DBS usage except for the low N_den for both (−0.77 for C4, −0.88 for C5). The highest D _dis value (1.53) and the normal R_D2N1 (0.99) and R_D2N2 (1.04) values in C4 indicate that dockless bikes in the C4 area are widely distributed both during the daytime and at night. The lowest R_D2N1 values in C5 (0.55) implies users ride dockless bikes from stations to the outside of the buffer area during the daytime, and ride from the periphery to the buffer area at night. In addition, the lowest D_dis (−1.89) for C5 also shows dockless bikes parked around the station during the daytime, and the lowest R_D2N2 (0.14) indicates that dockless bikes are much more dispersed at night than during the daytime.
Figure 10 presents the relationship between the parking distance and density of dockless bikes. As illustrated in
Section 3.2, N_dis and D_dis values represent the degree of concentration or dispersion of DBS locations around the station. We found a negative correlation between parking distance and density both during the daytime and at night. Remarkably, the centralization of parking distance was also observed as the density was increased. For example, the D_dis value in C1, which has the highest parking density, is centralized to approximately 300 m, whereas it is widely distributed from 600 m to 1400 m in C4.
From a temporal perspective, the majority of R_D2N1 and R_D2N2 are scattered between 0.8 and 1.2, as shown in
Figure 11. Especially, all R_D2N1 and R_D2N2 values of C1 are in the range of 0.8 to 1.2, indicating that the station areas in C1 have compound functions. It is noted that, as for the parking distance, although the ratio of daytime to night was in the range from 0.8 to 1.2, the parking distance in C1 was between 200 m and 400 m, which is longer than C5.
4.3. POIs Configuration and Clusters’ Function Interpretion
4.3.1. POIs Configuration
As presented in
Figure 12, C1 and C2 are high-density development areas with the maximum densities that occur in 200 m, but the poor diversity value within 300 m indicates the homogenized land-use type, whereas the increasing diversity value indicates the various land use in the periphery. The density curve of C3 is slightly below the average and the diversity reaches the top in the first 100 m and then drops to a medium level in the second 100 m where it has remained ever since. Hence, the types of land reduced sharply in the second 100 m and tended to be homogeneous from 200 m to 800 m ring. The density of C4 and C5 is the lowest, but the diversity within 200 m is relatively high, after which the diversity continued to fluctuate. It is therefore concluded that, for well-developed station areas, the lower diversity within the core represents the simple land-use type, and the increased diversity in the periphery diversifies the land use; for low-density developed station areas, the diversity decreases from the top in the first 100 m to a lower value and then stays at the same level. The land-use type in these areas decreases to a certain range and remains relatively stable.
4.3.2. Clusters’ Function Interpreting
The POI configuration reflects area functions (e.g., a station area that contains a variety of POIs has mixed functions) [
5] and we attempt to annotate the function for each cluster through POIs configuration. Notably, as presented in
Figure 12, POI density typically increases within 0–200 m from the station (peaking at 200 m), then declines to levels approaching station-proximate densities by 300 m, beyond which it stabilizes or decreases gradually. This inflection point at 300 m captures the transition from station-influenced functional intensity to the ambient urban areas. Hence, we use residential, office, and entertainment facility distributions within 300 m buffer to interpret functions for each cluster.
Prior to comparing POI densities across clusters, we tested the assumption of homogeneity of variances. The Levene’s test indicated a violation of this assumption for all three POI types (Res_300, Off_300, Ent_300; all with
p < 0.01). Consequently, the robust Welch’s ANOVA was employed. As presented in
Table 3, the results confirmed statistically significant differences in residential, office, and entertainment POI densities among the five clusters.
To delineate the specific pairwise differences between clusters, we conducted Games-Howell post hoc tests, which is the recommended procedure following a significant Welch’s ANOVA. The results are summarized in
Figure 13. Values in parentheses are means, red connectors indicate statistically significant differences, with the corresponding p-values labeled above, and gray connectors denote non-significant results. For residential density (Res_300), C1 demonstrated significantly higher density than C3, C4, and C5. C2 was also significantly higher than C3, C4, and C5. No significant difference was found between C1 and C2 or between C4 and C5. Regarding office density (Off_300), C1 was significantly higher than C4 and C5. C2 was significantly higher than C3, C4, and C5. The difference between C1 and C3 was non-significant. For entertainment density (Ent_300), C1 was significantly higher than C4 and C5. C2 was significantly higher than C4 and C5, and C3 was significantly higher than C5.
This granular statistical evidence, derived from the post hoc analysis, strongly validates that the clusters derived from DBS data possess distinct built-environment characteristics. For example, the residential density is 353.58 per sq.km in C1, which is approximately three times as high as in C3 (116.20 per sq.km). The distribution of the average parking distance can be well explained by the POI configuration. The well-developed C1 has the highest POI density in terms of residential, office, and entertainment facilities, and the DBS users might have to park dockless bikes in a particular place (e.g., the entrance of a neighborhood and a rail station), hence forming one or several parking centers and resulting in low D_dis value or N_dis value.
To further decipher the spatial organization of functions within each cluster, we employed two key ratios: the job-to-residence ratio (R_J2R) and the core-to-periphery density ratio (R_3/8). The R_J2R illuminates the local functional balance at a specific spatial scale. More critically, the R_3/8 ratio operationalizes the concept of ‘inner-outer’ functional layering around a station. A value greater than 1 indicates that the specific POI type is more concentrated in the immediate 300 m core area compared to the 300–800 m periphery, suggesting a station-centered agglomeration. Conversely, a value less than 1 implies that the function is more dispersed in the broader station area. This allows us to move beyond aggregate density and understand how different urban functions are spatially structured in relation to the transit station.
As described in
Table 4, most clusters have similar R_J2R values in the two layers except C4. That is, the two layers have the same balance degree of job and resident, while C4 has larger R_J2R in the inner layer than the outer layer, indicating that more jobs are in the inner layer, relatively. From the perspective of R_3/8, the R_3/8 values of residential from C1 to C5 are greater than 1 and show the increasing trend, indicating that all residential buildings are concentrated in the inner layer. The R_3/8 values of office are quite special, mainly located in the periphery of the compact developed zones (C1 and C2), and the inner layer of the low-density zone (C3, C4, and C5).
Note that both C4 and C5 have close and high R_3/8 values of residential, whereas the R_3/8 values of offices are significantly large in C4 (1.73) than it is in C5 (1.17), hence the inner layer of C4 provides more job opportunities than the inner layer of C5. Given the POIs condition as shown in
Table 4, the various in R_D2N1 of C4 and C5 may be well explained by the POIs configuration. Compared to the C4, more users leave from inner layers to the outer layer for working in C5, hence the DBS in inner layer are reduced, and they come into the inner layer at night, therefore increasing the DBS density. This commute behavior results in the DBS density change from daytime to night.
In summary, according to the differences in built environments (POIs) and activities (DBS), we can interpret these clusters as presented in
Table 5. C1 represents a Central Business District with High-Intensity Mixed Functions. This compact, well-developed urban core exhibits extremely high and concentrated DBS usage, aligned with its agglomeration of office, residential, and entertainment POIs, sustaining vibrant activity both day and night. C2 is characterized as an Urban Residential District with Comprehensive Services. While similar in urban form to C1, its most distinct feature is the dominant residential function and comprehensive local services, resulting in strong but slightly less intense DBS usage. C3 is identified as a Balanced Employment-Residential District. As the most common station type, it features moderate DBS density and a notable balance between employment and residential POIs within the station area, indicating a well-integrated live-work environment. C4 is classified as a Suburban Employment Center with Commuter Hub Functions. Its low-density development and dispersed daytime DBS parking reflect its role as a job concentration point and a key destination for suburban commuters. C5 is defined as a Low-Density Commuter Origin with Limited Local Services. This cluster has the lowest POI density and exhibits the most pronounced day–night DBS variations, underscoring its primary function as a residential origin for commuters with minimal local employment opportunities.
5. Discussion
5.1. Key Findings and Theoretical Implications
This study developed the typology of Beijing’s URT station areas by integrating the spatiotemporal dynamics of DBS with the static built environment represented by POIs. Our findings provide strong support for the proposed hypotheses and offer a dynamic, station-level perspective that is absent in traditional static models like the N-P model and prior DBS studies focused on aggregate, city-scale land use patterns.
First, the cluster analysis revealed five functionally distinct station groups based on DBS usage patterns, confirming our hypothesis (H1) that URT stations can be classified into distinct functional types using DBS data. This demonstrates that DBS data alone can effectively capture functional differences between station areas.
Second, significant and directional differences in DBS usage characteristics were identified across the five clusters, fully supporting hypothesis (H2). Specifically, stations in urban core areas (C1) exhibited high-density, around-the-clock usage with concentrated parking distributions, while suburban stations (C4, C5) showed clear commuter-oriented patterns with distinctive evening surges and more dispersed parking. These systematic variations in DBS metrics provide empirical evidence that different functional areas generate distinctive mobility signatures detectable through bike-sharing patterns.
Third, the station types identified by DBS data exhibited statistically significant differences in their POI configurations, thereby validating hypothesis (H3). Critically, the alignment between DBS-derived clusters and POI-based land-use features provides empirical evidence that DBS patterns effectively reflect underlying urban functions. This validation confirms that the mobility-based classification corresponds to meaningful functional differences in the built environment, bridging the gap between dynamic mobility data and static land-use models.
Our work moves beyond prior research that used DBS data for coarse land-use classification by establishing a station-level classification framework based on fine-grained DBS usage patterns. While extended N-P models incorporate new dimensions, they remain largely reliant on static indicators. Our approach complements these models by introducing a dynamic, behavior-based component (day–night DBS ratios) that directly captures the temporal functional rhythms of station areas, providing a more nuanced understanding that aligns with the theoretical interplay between ‘node’ and ‘place’ but is grounded in observed human activity.
5.2. Practical Implications for Planning and Policy
The functional typology derived from our integrated framework offers actionable insights for urban planners and policymakers. The identified clusters can guide targeted infrastructure investments and policy interventions based on station-specific characteristics. For bicycle parking management, stations in the Central Business District with High-Intensity Mixed Functions (C1) require high-capacity, organized bicycle parking facilities near station entrances to manage the extreme density. In contrast, the dispersed usage pattern in Job-Concentrated Suburban Commuter Hubs (C4) necessitates a network of smaller, flexible parking solutions distributed across the broader catchment area to effectively serve commuters arriving from all directions. Furthermore, the pronounced evening surge in Commuter Residential Areas (C5) highlights the critical need for dynamic parking redistribution strategies, where shared bikes are systematically repositioned to these stations in the afternoon to meet returning commuter demand.
The DBS-based classification system enables a precise understanding of commuter patterns, which directly informs last-mile service calibration. The prominent evening surge in residential clusters (C4, C5) underscores the need for reliable last-mile services from stations to homes, potentially informing the scheduling of shuttle services. Specifically, for the Low-Density Commuter Origin (C5), fixed-route or on-demand micro-transit services in the evening could be prioritized to bridge the connectivity gap. Conversely, stations identified as employment centers (C1, C2) may require enhanced morning arrival facilities and evening departure services. This could translate into designated drop-off zones and optimized traffic flow at C1 and C2 stations during peak hours.
Our findings reinforce TOD principles through data-driven station classification and provide a basis for targeted land-use and zoning policies [
45]. The well-developed, mixed-use nature of C1 and C2 stations exemplifies successful TOD. To sustain this, zoning in C1 should continue to encourage high-density, vertical mixed-use development. For stations in C3, C4, and C5, planners can use these insights to encourage more balanced development. In the Balanced Employment-Residential District (C3), policies should protect existing job-housing balance and incentivize infill development of both types. For the Suburban Employment Center (C4), strategic densification and the introduction of supportive residential and retail land uses around the station core could help create a more vibrant, round-the-clock environment and reduce reliance on long-distance commuting. Conversely, in the Commuter Residential Area (C5), the primary goal should be to introduce local employment opportunities and basic services through relaxed zoning and incentives, thereby reducing its mono-functional dependence and associated cross-city travel demands. The methodology developed in this study provides a transferable framework for other cities to develop similarly nuanced station area typologies for targeted planning interventions.
5.3. Limitations and Future Research
This study has several limitations that offer directions for future work. First, the temporal scope of DBS data is a limitation. Our analysis relied on data from two consecutive days in May 2020. While the selected days were representative of typical spring weekdays in terms of weather and overall trip volume, the two-day snapshot cannot account for seasonal variations or the potential long-term impacts of exceptional events like the COVID-19 pandemic on commuting habits. Future studies should incorporate multi-season and multi-year data to validate the temporal stability of the identified clusters and explore the dynamics of functional patterns across different timescales.
Methodologically, while informative, our metrics may be influenced by operator interventions like bike repositioning [
46], and the clustering results could be further validated through sensitivity analyses using different algorithms or variable selection schemes. Future research could integrate repositioning data or apply filtering algorithms to isolate user behavior more precisely, employ non-parametric correlation methods to strengthen variable selection, and conduct a quantitative comparison with traditional models like the N-P framework. Beyond these methodological refinements, the transferability of the proposed framework warrants consideration. The core analytical approach, classifying stations by integrating dynamic DBS patterns with static POI data, is highly transferable, as the required data are increasingly accessible globally. The principle of leveraging day–night DBS ratios to infer functional shifts is universally applicable. However, the specific typology (e.g., the clear dichotomy between a dominant urban core and sprawling suburbs) is contextual, shaped by Beijing’s monocentric urban structure and mature DBS market. In cities with different urban forms (e.g., polycentric) or DBS market structures, the manifestation of clusters may differ. Therefore, future research should test this framework in diverse urban contexts to refine the typology and distinguish universal from context-specific station-area patterns. The proposed typology, though statistically sound, would also benefit from ground-truthing through field surveys or interviews to verify cluster-assigned functions.