1. Introduction
Travel mode selection is an essential research topic in urban transportation planning and management, as it partially determines the structure of urban transportation modes. Predicting travel demand and flow distribution is dependent on the mechanism and influencing factors of travel mode selection. Research on travel behavior is crucial for enhancing the structure of urban transportation modes, optimizing the network layout of an integrated transportation system, and enhancing the operating efficiency of a comprehensive transportation system.
Existing research on travel mode selection behavior focuses primarily on the following categories. On the one hand, the probability prediction model is constructed based on the internal rules of travelers’ mode selection behavior, and the sharing rate of each mode of transportation can be estimated using a disaggregated model or game theory [
1,
2]. On the other hand, some scholars analyze influencing factors on travel choice, focusing on travel characteristics such as travel time, distance, and cost [
3,
4,
5], as well as personal attributes such as age, gender, and income [
6,
7], so as to make better use of the advantages of different travel modes and enhance the operational efficiency of the integrated transportation system. On this basis, researchers also examine the impact on residents’ travel choices under special circumstances, such as parking fee policies [
8,
9], congestion fee policies [
10,
11], the outbreak of COVID-19 [
12,
13], and climate change [
14,
15]. However, these works focus on the identification of factors and how they influence travel behavior, ignoring the construction of models and the extraction of influence factors, which directly impacts the precision of model parameter calibration and the rationality of subsequent flow distribution. Therefore, this paper will improve upon the two aspects of model construction and data extraction.
In terms of models, the logit model with stochastic utility maximization, such as multinomial logit (MNL), nested logit (NL), and cross-nested logit (CNL), is most frequently applied to the selection of travel modes [
16,
17,
18]. However, these models assume that everyone’s travel preferences are identical and the parameters of the independent variables are fixed, which is inconsistent with the actual situation. The random parameter logit (RPL) model permits parameters to vary with individuals, taking into account the diversity of residents’ travel preferences, and avoids the limitations of independent and irrelevant alternatives (IIA) [
19]. For parameter calibration, maximum likelihood estimation (MLE) is superior to the generalized matrix method (GMM) [
20] and Bayesian estimation [
21] in terms of convergence and deviation, and it can more accurately reflect the real situation of the model when dealing with a large amount of data [
22,
23]. Therefore, this paper adopts the RPL model as the travel choice model and estimates its parameters using MLE.
In terms of data source, data used to describe travel choices are usually derived from questionnaires or survey reports [
24,
25], and residents’ personal and socioeconomic characteristics can be obtained in greater detail. Nevertheless, the survey data suffer from high subjectivity, low precision, and a small sample size [
26]. Several scholars have confirmed that travelers’ multi-day travel behaviors are inconsistent [
27,
28,
29], but the survey data is limited to specific scenarios and cannot describe the variety of multi-day travel options. With the advancement of data collection and storage technologies, transport data now contain a greater quantity of travel information; however, its application in travel behavior has primarily focused on travel purpose inference [
30,
31], travel pattern mining [
32,
33], traffic OD matrix estimation [
34,
35], etc. It has been demonstrated that this travel information—in particular, the travel chain—accurately reflects actual travel choices and provides accurate information such as travel mode, time, and location [
36]. However, it has been applied less frequently to the analysis and modeling of travel mode selection in previous studies. Therefore, this paper takes this aspect into consideration and extracts more accurate travel characteristics from the travel chain, which will serve as the input of the selection model.
The development and construction of a subway increases the frequency with which residents utilize various modes of public transportation to fulfill their travel needs. Most studies assume that residents have at most one transfer when traveling by public transport [
37], but there may be multiple travel stages in practice, resulting in diverse travel patterns that are difficult to categorize. In the meantime, not all travel patterns are suitable for individuals due to the uneven growth of public transport. This paper focuses on the analysis of commuters’ choice between public and private transport modes and does not consider the selection of multiple travel patterns within the public transportation system. The acquisition of private transport data is both expensive and challenging. Observations of vehicles on the road indicate, however, that for the same segment, the passing times of cars and buses with comparable entry times are essentially identical, indicating that their average speeds are similar. Consequently, in the absence of private transport data, one or more buses can be used to represent the driving condition of automobiles, which is another innovation of this paper.
Urban planning and development, as well as road network construction and layout, will alter the travel status of residents, including the distribution of travel flow and the market share of travel mode, etc., necessitating a re-prediction of the travel OD and an analysis of residents’ travel preferences. Traditional survey data cannot overcome the limitations of poor timeliness and small sample size, so this paper utilizes travel information from public transport data to identify boarding stations, alighting stations, and transfer behaviors. On the basis of the public transport travel chain, the round-trip and closed travel characteristics of residents can be used to supplement the travel situation of other modes of transportation. Using the RPL model, a commuter’s multi-mode choice model based on travel chains can be constructed, and its feasibility and dependability are illustrated with an example. By analyzing parameter calibration results, traffic managers are able to take targeted actions to improve the level of public transport operation, as well as plan transportation development and infrastructure construction to enhance residents’ travel satisfaction.
The remainder of the paper is structured as follows:
Section 2 introduces the background and methodology, including a data description, method for identifying commuters, and methodology framework.
Section 3 describes the extraction process of the public transportation travel chain from the boarding station, alighting station, and transfer behavior.
Section 4 clarifies and explains the definition and calculation of the influencing factors. In
Section 5, the data are applied to the model of travel mode selection, and the calibration outcomes are analyzed.
Section 6 summarizes the conclusions and suggestions for relevant departments.
3. Travel Chain Extraction
The travel chain describes a traveler’s entire journey from the origin to multiple destinations and back to the origin. Generally speaking, a typical passenger travel chain consists of origin stations, origin lines, transfer stations, transfer lines, and destination stations. Therefore, a complete passenger’s travel chain will be discussed in terms of the following three components: boarding station, alighting station, and transfer behavior.
3.1. Boarding Station Identification
Identification of the boarding station is the first step in travel chain extraction. The data on the subway IC card include the boarding time and station, so additional analysis is unnecessary. However, for bus data, the boarding station falls short.
Figure 6 depicts the boarding station identification process. First, the latitude and longitude of the boarding station can be deduced by comparing the same attributes (license plate number, line ID, and time) in the IC card and GPS data. Then, the closest station on the bus line is located to serve as the boarding location. Significantly, the time difference between card swiping and bus arrival is set to no more than three minutes in order to increase the matching success rate [
35].
3.2. Alighting Station Identification
Passengers must swipe a card upon entering and exiting the subway station in order to accurately determine the alighting station based on travel records and station location data. However, the current bus fare system in China is primarily based on a single-ticket system, and IC cards do not contain information about the alighting station; therefore, it is necessary to confirm the alighting station. Several fundamental rules regarding travel logs for the same day are defined:
The passenger’s final destination is close to the origin of his or her first trip.
For consecutive trips of a passenger, the destination of the last trip is often close to the origin of the next trip.
For two consecutive travel records of a passenger, the lines are Y and , respectively. If and the running directions are opposite, their origins are each other’s destinations.
The alighting station of the closed travel chains can be easily identified using the above rules. For unclosed travel chains, however, it is difficult to confirm the alighting station; so, a method for calculating is proposed, which is represented as the probability of alighting at station j from station i, and the algorithm is detailed as follows:
(1) The card swiping times is , and that of passenger k at station j is represented as . The set of the downstream stations is I, and the set of stations where passenger k has boarded is .
(2) If
, the passenger will get off at the station where he has boarded the most frequently, then
is expressed as
(3) If
, the alighting station can be analyzed based on the passengers’ bus travel rules. When traveling by bus, the number of stations they pass through is primarily concentrated in a certain range, and the probability of alighting is greatest when a certain threshold is reached. Above or below this threshold, the probability decreases. Typically, the probability of a passenger alighting follows a Poisson distribution with the number of stations they pass through [
42], which can be expressed as Equation (2). Since passengers transit between 1 and
n − 1 (
n is the number of stations on the line) stations in reality,
is therefore normalized as Equation (3):
where λ is the average number of stations passed by passengers on a bus line, which varies with different lines, setting
if the number of downstream stations is less than λ.
(4) It is determined that the station j with the highest probability is the alighting station. Specifically, if two or more stations have identical maximum probabilities, the station closest to station i will be selected.
The complete description for this algorithm is shown in
Figure 7.
3.3. Transfer Behavior Identification
The purpose of a passenger’s transfer behavior is to switch to a different line enroute and ultimately reach their destination.
Figure 8 illustrates a simple example of a single change.
Tv1 and
Tv2 are the in-car times;
Tw and
WDw are the walking time and distance during the transfer;
Td is the transfer waiting time. Two travel records can be generated due to the transfer with two card swipes, and it can also be obtained through the second travel; therefore, it is necessary to identify some distinguishing characteristics between the two behaviors.
As depicted in
Figure 8, if the dwell time at a transfer station or the distance between two stations exceeds predetermined criteria, the behavior of transferring to a different line will be considered a separate trip. The time threshold
and distance threshold
are, therefore, defined as indicators (the subscript
m represents the transfer mode). Moreover,
is determined primarily by walking time and waiting time, and is expressed as
. Due to the unique characteristics of transportation modes,
and
could be valued differently.
Significant correlation exists between and the maximum distance between adjacent bus stations. is typically set at 500 m in the central urban area and 700 m in the suburbs. Pedestrian walking speed is approximately 1.2 m/s, and the maximum walking distance is equivalent to ; therefore, can be calculated as , which is, respectively, 6.9 and 9.7 min in the major urban and suburban areas. is the maximum time interval between bus departures.
According to studies, the maximum distance passengers are willing to walk to reach subway stations is 770 m, so and . Meanwhile, is the maximum departure time interval for the subway, and represents the bus.
In this mode, the transfer time and distance can be disregarded because only two records at the origin and destination are stored, and the journey is regarded as identical.
In the research region, the maximum bus departure intervals in urban and suburban areas are 10 and 15 min, respectively, during peak hours and 15 and 30 min, respectively, during off-peak hours. The value for the subway is between 7.4 and 10.5 min.
Table 2 displays the values of
and
for various transfer modes.
Figure 9 provides a comprehensive description of transfer behavior identification based on the preceding definition. First, the passenger’s daily travel records are sorted by time, and then the time difference between alighting time at stage
k and boarding time at stage
k + 1 is calculated using
.
L is approximated by the Euclidean distance between the corresponding stations at time
and
. If
and
L , then the two records are defined as the transfer behavior in an identical travel.
3.4. Accuracy Validation
To examine the accuracy of the data and verify the efficacy of the data process, a test based on station passenger flow is conducted.
253,034 travel chains can be recognized by identifying the boarding, alighting, and transfer stations.
Table 3 displays the travel chain information for a commuter with ID 2660000000556150, which includes card ID; boarding and alighting time, line, and station; and travel stage and mode. In addition, passengers boarding and alighting at each station can be tallied, and the feasibility of the extraction method is evaluated based on the criterion that the passenger flow at each station remains balanced over time.
Due to the round trips of passengers, the departures
and arrivals
at station
i are generally equal [
42], i.e.,
. Nonetheless, the variety of travel plans enables passengers to make diverse travel decisions, which leads to bias, and the equation is rewritten as Equation (4). To determine the degree of correlation between
and
, correlation coefficient
is introduced and its calculation process is shown in Equation (5).
where
and
are the average of
and
. When
, the departures and arrivals of stations tend to balance, and the correlation becomes stronger.
and are counted and substituted into the preceding equation, and a and b are then estimated using the least squares method. The calculated results are a = 0.9793, b = 0.3509, and = 0.92, which indicates that departures and arrivals are roughly balanced, meet the accuracy requirements of station identification, and reflect the effectiveness and viability of the travel chain extraction method utilized in this paper.
It is difficult to obtain private traffic data, and license plate recognition (LPR) and taxi GPS data cannot be linked to specific individuals. However, PT and PC are interchangeable, so if commuters do not take PT, they will be assumed to have chosen PC. It is assumed that commuters do not request time off or travel for business during the study period, and that their residence is the origin and destination of their daily commute. Linking the travel chain of public transportation by time may result in the following situations:
(1) All public transportation travel chains within a 24-h period form a continuous chain; commuters do not choose PC travel.
(2) All public transportation travel chains within a single day cannot form a closed chain; a break exists. If a commuter takes a taxi to work and a bus home, the break is from home to workplace. Therefore, the number of breaks represents the number of travel chains by PC.
(3) Commuters who have not swiped their IC card in a few days will travel via PT. Additionally, the number of travel chains by PC can be expressed as the number of daily travel chains made by the commuters most frequently.
Following the preceding discussion, it is possible to calculate the number of travel chains by PT and PC, which are, respectively, 253,034 and 52,286, resulting in respective share rates of 82.87 and 17.13%.
4. Characteristic Variables for Choice Model
In existing literature, travel time, distance, and cost are frequently used to construct choice models [
43,
44]. In the meantime, Ashalatha notes that walking distance and wait time have a substantial impact on commuters’ mode choice [
5]. Other than the attributes of the mode, travelers’ perceptions of the travel environment, such as comfort, cannot be ignored [
45,
46]. However, it is difficult to directly calculate travel comfort, so the passenger loading factor will be implemented. In this paper, therefore, travel time, distance, cost, comfort, walking distance, and waiting time are chosen as influential factors; the definitions and calculations are provided below.
4.1. Travel Time (TT) and Travel Distance (TD)
Travel time can be directly calculated from the first boarding time and the last alighting time in an identical travel chain, expressed as .
A coefficient
is introduced to correct errors caused by the road’s curvature, which represents the ratio of the actual distance
of bus line
i to its Euclidean distance
; namely,
. Generally, if
< 1 km,
; otherwise,
. Consequently,
can be calculated as follows:
where
and
are the boarding station and alighting station at stage
k,
k = 1,2, …,
s;
is the Euclidean distance between
and
.
Through the investigation of road vehicles, it is found that buses and automobiles with comparable entry times on the same road face similar traffic conditions, so their traveling time and average speed are basically the same. The buses mentioned here do not have their stops on this road. Although PC’s travel time cannot be directly calculated, the route can be divided into several segments, and the driving of PC on this route is viewed as the driving of multiple buses on these segments. Thus, the travel time and distance of PC can be obtained, and the special steps are as follows.
The route between origin and destination is divided into
N sections, i.e.,
. For section
, the average speed of private cars is approximately equal to that of bus A, i.e.,
; therefore, the driving time of private cars on section
can be expressed as
. In other sections, the same calculation is applied. The total travel time and distance for private cars can be expressed as:
In addition, this paper uses the k-shortest path algorithm to determine the optimal driving route, with k set to 3 in accordance with navigation search rules. The route with the shortest travel time will be chosen as the driving path for private cars, which is also the PC travel chain for this commuter.
4.2. Travel Cost (TC)
Using the transaction amount recorded by bus/subway IC card data and the information extracted from the passenger travel chain, it is simple to calculate the cost of public transportation.
Private transportation consists primarily of taxis and private cars. When commuters have cars, they are more willing to travel by private car than by taxi [
47,
48]; therefore, car ownership is an important factor to judge whether commuters choose taxis or private cars for private travel. This information is absent from the existing content, but it is possible to estimate the number of owners by analyzing the probability of car ownership, which will be used to mark the attribute of commuters’ car ownership. There are approximately 1.24 million residents and 0.36 million cars in the study area, so the probability of ownership is approximately 29%; or, 29% of people have cars. Therefore, 1513 commuters are selected at random, identified as car owners, and assigned to drive for personal travel.
and
represent private car and taxi travel cost.
is determined by travel distance (
), fuel consumption per kilometer (
FCK), and fuel price (
FP), i.e.,
.
FCK = 0.0685 L/km [
49] and
FP = 7.025 CNY/L through the investigation.
has special charging rules and is primarily associated with travel distance; the calculation is shown below. In particular, for night travel (from 22:00 to 5:00 the next day), the mileage fee increases by 0.4 CNY/km.
where
x represents the travel distance;
is the flag-fall price, which is CNY 9 if
; and
,
, and
represent mileage fee for different distance ranges.
= 1.4 CNY/km,
.
4.3. Travel Comfort (CF)
The passenger load factor
LR is defined as the ratio of the number of actual passengers on a bus to its maximum capacity
RPC, which is used to calculate the level of public travel comfort [
50].
and represent the number of passengers boarding and alighting at station j, while is the actual number of passengers on the bus at station k. Setting station , represents the set of stations which the passenger passes through, and its size is .
This paper assigns PC a value of 1 for travel comfort, i.e., , because private cars and taxis ensure that every passenger has a seat and their environment is significantly more pleasant than that of public transportation.
4.4. Walking Distance (WD)
Generally speaking, this characteristic value refers to the transfer walking distance (
) and the distance walking to the original station (
). Due to the facts that the extracted travel chain uses the station as the node, a station may serve multiple traffic zones (See
Figure 10), and the distance from different zones to the same station varies, it is necessary to analyze the origin of passengers.
In this paper, the spatial relationship between the station and the surrounding traffic zones serves as a metaphor for the relationship between their passengers, and the proportion of trips from the station to all nearby zones is defined as being equal to the proportion of overlapping area between the buffer zone and traffic zones. Therefore, the number of passengers of traffic zone
j departing from station
i is represented as
, and the calculation is expressed as follows:
where
is the overlapping area between the buffer zone of station
i and traffic zone
j;
is the number of passengers departing from station
i; and
K is the number of the surrounding traffic zones.
Passengers at station
i are randomly selected based on the number of passengers from each traffic zone, and the corresponding traffic zone is labeled as their origin.
is thus defined as the distance from the centroid of the origin
to the first boarding station
.
is the sum of the walking distance of all transfer stages; therefore, the calculation process of
is calculated as follows.
where
,
represent the adjacent bus stations for each transfer,
is the alighting station at stage
k, and
is the boarding station at stage
k + 1;
c is the transfer time.
Taxi passengers must walk to the nearby roadside in order to hail a cab. Drivers of private vehicles must walk to their parking spot, which may be in a parking lot or along the road outside the zone. This paper describes the walking distance of these two modes uniformly as the straight-line distance from the origin’s centroid to its nearest road , i.e., .
4.5. Waiting Time (WT)
The waiting time for PT can be broken down into starting station waiting time
and transfer station waiting time
. Several studies indicate a correlation between station wait times and the distribution of passenger and vehicle arrival times. If passengers’ arrival time is subject to uniform distribution and the departure time interval of vehicles is constant,
is half of the departure time interval
of line
y [
51], i.e.,
. Since the transfer time comprises transfer waiting time and transfer walking time,
is written as
where
is the boarding time at stage
k + 1, and
is the alighting time at stage
k;
is transfer walking distance.
When passengers travel by private car or taxi, they can leave at any time with little need to wait; therefore, .
6. Conclusions and Discussion
This paper identifies the boarding stations, alighting stations, and transfer behaviors of commuters based on smart card data, GPS data, and station location data of urban public transportation, and extracts a complete public transportation travel chain. Then, considering the travel characteristics of commuters (round-trip and closed-off), private travel chains are obtained based on the origin and destination of each travel stage. The quality of every station’s passenger flow to maintain equilibrium over time is observed. By analyzing the station’s departures and arrivals, the feasibility and efficacy of the travel chain extraction method have been demonstrated. The results of parameter calibration indicate that travel time, distance, comfort, cost, walking distance, and waiting time have a significant impact on mode selection. Walking distance is the most influential variable, and its coefficient follows the normal distribution of N (−8.127, 4.7302), which reflects the preference heterogeneity of travelers. In addition, this paper analyzes the marginal effect of each variable on travel mode selection, i.e., the degree to which the probability of selecting each transport mode is influenced by changes in various factors. These findings can be used as a basis for adjusting the structure of the urban transportation system.
This paper is also a novel effort to replace traditional survey data with the travel characteristic value extracted from the travel chain, and it has been successful. In light of the findings of this study, government agencies can take the following steps to alleviate urban traffic congestion and encourage more urban residents to use public transportation: Increase the costs of private transportation in an appropriate manner, such as by modifying fuel prices and taxi fares, etc. Optimize the layout of public transport platforms and lines, add new platforms, or relocate existing platforms in densely populated areas so that residents can more easily utilize public transportation. During peak travel times, the frequency of public transportation can be reduced to prevent overcrowding caused by an excessive number of passengers in the bus. In addition, attention must be paid to the cleanliness of the vehicle in order to maintain a clean travel environment and provide passengers with a better travel experience.
This paper’s limitations primarily stem from two aspects. On the one hand, the influence factors focus primarily on the characteristics of the mode of transport, ignoring the influence of individual characteristics such as socioeconomic attributes and psychological factors. In the future, residents’ surveys and other methods can be used to enrich the detailed information of travelers and analyze the influence of individual characteristics on travel selection. On the other hand, this paper discusses commuters’ choice between public and private transportation, but does not further categorize each mode. Take public transportation as an example; there are various travel combinations, such as bus-only, bus-and-subway, and subway-only. Despite the fact that they are all forms of public transportation, travelers will make different decisions depending on the circumstances. Subsequent research may subdivide each mode and investigate the multi-modal selection behavior of travelers.