1. Introduction
The public transport system plays an important role in addressing urban traffic problems, such as traffic congestion, air pollution, and traffic accidents [
1,
2]. Numerous city administrators around the world have given priority to the development of public transport infrastructure, and encourage residents to use it by optimizing supply and policy. Taking the Beijing public transport network as an example, at the end of 2019, there were 23 subway lines with operating mileage of about 700 km and approximately 890 bus lines covering more than 40 thousand bus stops in the city of Beijing. The usage of public transport in Beijing is quite high, carrying 20 million riderships on average every day, accounting for 45.6 percent of all daily travel demand [
3]. Public transport commuters are an important part of travel demand, especially during peak hours of weekdays. Understanding the travel demand of different groups of passengers, and especially that of commuters, will help urban planners and managers to improve the service level of public transport for a more sustainable public transport system.
Smart card data are collected during fare collection, thus recording the time and location of boarding and alighting for each passenger. Due to the advantage of continuous observation over a long period of time, it has been widely used to unveil travel behavior [
4,
5,
6,
7], forecast the travel demand [
8,
9,
10], and investigate job–housing relationships [
11,
12,
13]. Nevertheless, smart card data have several inherent limitations. Since smart cards are anonymous due to privacy concerns, the socioeconomic attributes of the cardholders, such as gender, age, education, and trip purpose, are not available. Furthermore, the public transport system is an open system, and cardholders may switch to other travel modes, such as private cars or taxis, at any given time, so their travel demands are also incomplete. Therefore, the identification of commuters using smart card data is a challenging task. Several studies have addressed this problem utilizing the spatiotemporal regularities of repetitive trips on public transit networks [
4,
5,
6,
7,
14,
15,
16,
17,
18,
19]. However, these studies always used threshold or rule-based methods to classify regular passengers. For example, Briand et al. [
4] proposed that the regular passengers are public transit riders who needed to travel at least ten days during a period of a month. Confirming a reasonable threshold value for the identification according to the statistics proved to be quite challenging, as the accuracy of the identification results was difficult to verify.
Although many studies measure spatiotemporal regularity of individual passengers based on smart card data, only a few studies further interpret the patterns combined with other source data, such as traditional household travel survey data. Travel survey data have the disadvantage of being time-consuming and expensive to gather, with small sample sizes and low recovery rates, but they record the individual socioeconomic attributes, travel purpose, travel mode, place of residence, place of employment, and other information, which are not available in the smart card data. Combining the advantages of smart card data with those of individual travel behavior survey data would improve both the accuracy and the interpretability of commuter identification and segmenting models.
To fill these gaps, our study developed a two-stage model to assess the rhythm of daily commuting by integrating smart card data and travel behavior survey data from a one-month period in Beijing. Firstly, we employed the light gradient boosting machine (LightGBM) model, a supervised learning algorithm, to identify the commuter considering the spatial and temporal regularity of travel behavior, and analyzed the residence and workplace distributions of commuters in Beijing. Secondly, we further divided the commuters into fine-grained groups according to their departure time using a latent Dirichlet allocation model, and analyzed the travel behavior patterns of each group. Finally, to enhance the interpretation of the patterns, we associated the socioeconomic characteristics of the identified residence locations with each commuter cluster in order to inspect the relationship between them. The contribution of our study is to provide deep insights into the spatial and temporal characteristics of commuting patterns, based on multi-source datasets, in Beijing, and interpret the behavior features that are challenging to obtain from each individual dataset. It will support the improvement of the existing transportation policies and service optimization while adapting to variations in different kinds of travel demand.
The remainder of this paper is organized as follows. In
Section 2, we synthesize the related studies regarding public transport passengers’ travel patterns. In
Section 3, the multi-source data for the study are presented. In
Section 4, the method framework of the two-stage model is introduced. In
Section 5, the results of the model are delineated and discussed. In
Section 6, we summarize the key findings and propose suggestions for future work.
6. Conclusions
This study proposed a two-stage model to identify and classify the commuters by incorporating the smart card data and travel behavior survey data. We employed the LightGBM to identify non-commuters and commuters using six travel behavior attributes by mining spatiotemporal travel regularities over a one-month period of weekdays. We identified the residence and workplace areas and analyzed the job–housing balance after extracting the home and workplace stations of each commuter. We used the LDA model to classify the commuters into fine-grained groups based on their departure time. We also introduced the socioeconomic attribute of each TAZ to enhance the interpretation of the patterns of different commuter groups.
Approximately 3.1 million cardholders were identified as public transit commuters in Beijing, accounting for 67.39% of the daily passenger volume; this finding is similar to Ma et al. [
16]. The identification accuracy of the LightGBM model reaches as high as 93.43%, which outperforms other classification methods such as the KNN, SVM, DT, ANN, and NB algorithms. The home and workplace location distributions indicated a job–housing imbalance and excess commuting in Beijing, which are related to the single-centered urban structure of Beijing [
12,
16,
43]. The commuters were further divided into six groups according to their temporal pattern during every weekday such as two-peak, staggered shifts, flexible departure time, and single-peak patterns. The spatiotemporal heterogeneity of travel behavior for the six groups was further presented by comparing their travel intensity, travel distances, activity entropy, and commuting route distributions, etc. The commuting demand in Beijing is mainly concentrated in the low housing price and high or medium population density areas, and the subway facilities also attract more people to travel by public transport.
Our findings could be useful for several relevant applications and scenarios. The identified excessive commuting phenomenon in Beijing will help planners to reshape a more balanced job and housing relationship in Beijing to further reduce reliance on private cars and relieve traffic congestion. It may help public transport operators better know the travel demand of passengers and propose targeted policies accordingly; for example, it is suggested to implement more elaborate discount fare strategy to attract more commuters with flexible departure times to travel during non-peak hours, so as to reduce the crowdedness of some commuter routes during the peak hours. Moreover, it may also help redesign and optimize the existing public transit networks. For example, planning a demand–response customized shuttle bus route between Tongzhou Distinct and the CBD will significantly reduce travel time for public transit commuters.
However, several aspects need to be further improved. Our study focused on the commuting behavior (for work or study) during the weekdays, but we plan to explore the characteristics of other travel activities, such as leisure, and the inherent difference between travel activities during the weekday and weekend. Furthermore, we studied the travel behavior of passengers with student type cards and regular type cards. Routine travel behavior of other passenger groups, such as older people, will also be investigated in the future. Finally, we identified that commuter travel behavior varies based on their place of residence and place of work. In the future, we plan on using long-term smart card data and on introducing other socioeconomic attributes in each TAZ to deeply analyze the longitudinal changes in public transport commuting behavior.