1. Introduction
The rapidly developing transportation systems have changed people’s travel behaviors, especially in metropolitan areas. Many transport agencies attempt to develop sustainable public transport systems, which give a promising solution for the problems of traffic congestion and air pollution. In order to make public transport more attractive, the main measure is the improvement of public transport service quality, e.g., transit reliability.
In transit systems, transportation characteristics in terms of mode speeds and service frequency are spatio-temporally different. Transit operators put considerable effort into improving reliable services via timetable adjustment, network optimization, and infrastructure investments. Passengers are considerably attracted to multimodal (or intermodal) public transport due to its advantages on safety, affordability, environmental impact, etc. [
1]. The major concern may refer to the travel time variability (TTV), which has been defined as the time variance for vehicles traveling similar trips, of either the inter-vehicle, inter-period, or inter-day type [
2]. Previous studies suggested splitting transit journey time into separate components, assuming the independence of these components, i.e., access time, wait time, in-vehicle time, transfer time, and egress time [
3,
4]. The reliability of public transport is sensitive to the variability in the time components [
4]. This variability is mainly affected by service frequency (or headway) [
5] and a range of other variables, such as temporal factors, infrastructure, and passenger demographics [
6,
7]. Among the journey components, the impact of transfers on the reliability of multimodal transit systems has been highlighted in the literature [
6,
8,
9,
10,
11]. In general, a transfer is defined as the changing act between modes or between services of the same mode. The concept may include a pure transfer (e.g., walking from a bus station to a train platform) and an incidental activity transfer (e.g., buying a newspaper) [
1].
In the literature, there are abundant studies on the estimation of transit travel time distributions (TTDs), aiming to represent network conditions and get insights on the TTV. The TTDs are mainly related to two forms: (1) normal, and (2) skewed, e.g., lognormal or gamma distribution [
12,
13]. It is said that the decrease in temporal aggregation tends to increase the normality of travel time distributions [
12]. This evidence holds the potential to model the linear regression relationship between travel time and explanatory variables. In recent years, many studies have conducted TTV analyses and measured the reliability of transit systems [
4,
7,
14]. For example, the authors of [
7] proposed a method to estimate passenger waiting time at transit stations and analyzed the effects of influential variables with a multivariate regression model.
There is very limited literature on the estimation of transfer time distribution between two transit modes. Existing studies took advantages of transit smart card data to identify the transfers and estimate the time spent, according to the tap-in and tap-out times of transit modes [
1,
11,
15,
16]. Seaborn et al. [
1] established three levels of maximum-elapsed-time thresholds to identify the transfers between the bus and metro systems in London, using smart card data. The thresholds’ estimation did not distinguish the impacts from time and space dimensions. Normally, transfer time includes walking time for a transfer and waiting time at the platform. However, many studies only estimated one of these two components and only a few considered them together. Eltved et al. [
6] estimated the walking time distributions from bus stops to train platforms based on a matching of smart card data and automatic vehicle location data. They found that the passengers’ walking speeds and the passengers who engage in activities during the transfer have impacts on the walking time estimation. Sun and Xu, in their work [
3], distinguished the O-D metro trips with or without transfers for the wait time estimation at platforms. The platform elapsed time—PET (a generalized platform wait time)—was inferred from the trips without transfers, while the platform elapsed transfer time (i.e., interchange wait time) was inferred based on the trips with transfers, as well as on the previously deduced PET. Our study is inspired by this stage-based procedure. Wahaballa et al. [
16] estimated the platform waiting time distribution in London’s underground network, using passive smart card data. Afterward, the same authors in [
11] estimated the distribution of transfer time between bus stops and rail stations, using the stochastic frontier model. Both the walking time and waiting time distributions were presented. From the literature review, on one hand, a large amount of studies have used smart card data for the estimation of travel time or time components, and the study of socio-economic relationships is rarely mentioned, due to this kind of information being lacking. On the other hand, to the best of our knowledge, there are no studies using the household travel survey (HTS) dataset, which includes both the users’ mobility and their socio-economic information, for the estimation of different time components.
From the HTS without any information of time components, how to infer average wait time and in-vehicle time for a transit mode? Does the transfer time between two transit modes have spatio-temporal differences? To answer these questions, this paper proposes an integrated model framework to estimate the passengers’ average waiting time, transit mode speeds, and transfer time in the transit system of the Paris region, based on the 2018 HTS. The basic trip-level information (such as departure/arrival times, trip O/D locations, and purposes) and stage-level information (such as stage start/end locations and travel modes within a trip) are available in the survey. However, like many other large-scale HTSs, there is no further information on the time components at the stage level. Therefore, our study will handle this challenging issue, especially for the estimation of transfer time between two transit modes or lines.
The remainder of the paper is organized as follows.
Section 2 introduces the study area of the Paris region and the transit data preparation.
Section 3 introduces the method, including the linear regression model and the transfer time estimation based on the multi-stage transit trips.
Section 4 presents the results of average wait time and mode speeds, and provides evidence of transit network performance in terms of transfer time in different time periods and territorial spaces. The topics on data accuracy, model extension, and applications are discussed in
Section 5. Finally,
Section 6 provides the main conclusions and highlights our future work.
3. Method
Here, we distinguish two types of transit trips to infer the journey time components. The first type of 3-stage trips with one transit leg are used to estimate average platform wait time and transit speeds (followed by in-vehicle time), using an integrated linear regression model. The second type of 4-stage trips with two transit legs are used to infer the transfer time between two transit modes or lines using the estimated wait time and speeds from the first step. Before introducing the inference procedure above, we establish the following assumptions.
3.1. Assumptions
The declaration errors of the journey time in the survey are unbiased;
The average wait time (only after the walk access stage) and transit mode speeds estimated from the 3-stage transit trips are also representative for all transit trips;
The average wait time is highly relevant to the factors of time periods and transit modes, and the mode speeds are distinguished in the urban and suburban areas only for the road transit (i.e., the bus), rather than the railway transit;
The transfer time is defined as the time spent from alighting one transit mode (or line) to boarding another transit mode (or line) in the same trip. The transfer time estimated from the 4-stage transit trips is also applicable to other multi-stage transit trips.
3.2. Linear Regression Model Based on 3-Stage Transit Trips
Figure 4 illustrates the modal-related 3-stage transit trip with walk access, one transit leg, and walk egress. Wait time at the platform is included in the trip. The walk access stage refers to the passengers’ walking distance from the trip origin to the transit platform. After passengers arrive at the platform, they wait for boarding before the in-vehicle stage. The walk egress stage includes the passengers’ walking distance from the transit platform to the trip destination.
As shown in
Figure 4, transit trip
i satisfies the following expressions on trip distance and duration:
In Equation (1),
,
,
and
are the surveyed distances. Among them, the stage distances of
,
, and
have been derived from the declared information, such as the O/D locations and transit stops. It is worth noting that the related distances are Euclidian distances. In Equation (2), the trip time
is surveyed and other travel time components are unknown, which signifies that they need to be inferred. According to the physical kinematics,
and
can be calculated by:
where
adopts the average walking speed by age groups from reference [
21], i.e., 4.3 km/h~4.8 km/h;
is the average transit speed that needs to be inferred;
and
are error items. In this study, for railway transit modes, such as train and metro (except tramway),
and
are both updated by the sum of two parts: (1) the surveyed values (i.e., distance from origin to station entrance or from station exit to destination), and (2) the estimated mean distance
c inside the station for the access or egress stage. Here,
c is set to 250 m, according to the study in [
22]. Thus, Equation (2) is updated by:
We assume that there exists a linear relationship between and when the wait time and transit speed become constants. Thus, and can be estimated by the coefficients through the simple linear regression model .
The passengers’ average wait time for public transport varies by different modes and time periods of the day. Regarding different land-use patterns and urbanization in the region, the average transit speeds should be different in the urban and rural areas, particularly for the bus speeds. Therefore, the trips are segmented by time periods and modes. In other words, , and in Equation (4) are associated with these two attributes. For wait time inferences, we set the indices of time periods and the indices of transit modes . For transit speed inferences, we set the indices of transit sub-modes . In our study, there is a total of four time periods (i.e., AM peak, inter peak, PM peak, and off peak) and three transit modes (i.e., train, metro, and bus), or four sub-modes (i.e., train, metro, bus_urb, and bus_sub) with the consideration of space.
Giving that
, we build the following linear regression model as:
where
and
are dummy values of 0 or 1. As a whole, there are
explanatory variables in Equation (5). Assuming that trip
i with the time and mode attributes correspond to the indices of
p,
q and
r, we then have
and
. According to Equations (4) and (5), we estimate the wait time and mode speed by:
The matrix notation for Equation (5) with
k observations (i.e.,
i = 1, 2, …,
k) can be written as:
where
Generally, the above parameters in the vector of
β can be estimated by the ordinary least squares (OLSs) method or the maximum likelihood estimation (MLE) method. The average wait time and transit speeds are finally obtained by:
3.3. Estimation of Transfer Time Based on 4-Stage Transit Trips
Due to a lack of time components information in the survey, the transfer time between two transit legs is defined by the total time of covering transfer distance, engaging in activities if applicable, and waiting for the transit mode.
Figure 5 illustrates the modal-related 4-stage transit trips, including walk access and egress stages, and two PT stages. In addition, the wait time
for the first PT stage and the transfer time
for the second PT stage are included. The wait time is estimated from the previous 3-stage transit trips, and the transfer time needs to be inferred in this section.
Similar to the 3-stage transit trip in
Figure 4, the 4-stage transit trip
i satisfies Equations (11) and (12) in terms of travel distance and duration, respectively, and the values of
,
,
,
,
and
are known from the survey.
From Equation (12), we have
. According to the inferences of average wait time and transit mode speeds (see Equations (6) and (7) in
Section 3.2), the transfer time for each trip
i is estimated by:
where trip
i has the attributes of time period
p, the first waiting transit mode
q, the transit sub-modes for the two PT stages
r1 and
r2, and
is the estimated transfer time, being subject to
. For the trips with attributes corresponding to
p,
q,
r1, and
r2, noted as
, the average transfer time is calculated by:
5. Discussion
At first, we discuss the data accuracy and the limitations of using the HTS for this study. Similar to other traditional and large-scale HTSs, there is no declared information about transfer time and in-vehicle time, only about the entire journey time from trip origin to destination. Our proposed method can estimate the transfer time after the decomposition of the journey time. Thus, precisely estimating the time components becomes the key issue. As sojourn locations (such as trip O/D, stage-based start/end locations) are easily declared in the survey, the travel distance per trip or stage is represented by the Euclidian distance between two recorded locations, instead of the route distance. These Euclidian distances are used for the estimation of time components at the stage level. The underestimated distances may cause the bias on time estimation. To reach the real experienced distance, the Euclidian distance can be weighted by adjusted factors, regarding different travel modes and GIS information. In the era of big data, it is possible to obtain the route distance, for example, using GPS tracking data from mobile phones [
24].
In the interest of the method’s robustness, we used the integrated linear regression model with all considered explanatory variables, instead of the disaggregate linear regression model for each mode and time period. There are two reasons. First, the parameters associated with transit speeds are estimated by the integrated model with the assumption of time independence for mode speeds. This reduces the estimation errors caused by insufficient samples during the off-peak hours (see
Figure 6). Second, the integrated model is more flexible in terms of aggregating the variables that are assumed to have no dependence on time and space, so as to reduce the number of variables and ease our analysis. Although the obtained results have statistical significance, they seem overestimated. For example, in the Paris region, it is reported that the average commercial speed of RER A (one train line in the region) is about 49 km/h, the metro speed is between 21 km/h and 27 km/h, and the speed of bus on priority lanes is about 12 km/h [
25]. Our estimated railway transit speeds, which were estimated based on the Euclidian distance, are close to the aforementioned commercial speeds, but will be greater after the adjustment by factor over one when considering the route distance. This overestimation is more evident in the bus speed comparison. It may be caused by the sample representativeness (e.g., many short bus trips in the sample) and the declaration bias of travel time in the HTS. One possible solution may be using the weighted regression model to estimate appropriate parameters [
26]. As the model fitting performance is still satisfied in our study, it has the potential for model extension in a more general case study. For any modes, as long as the modal distance traveled is known, the average mode speed can be estimated through the proposed linear regression model, and the time cost can consequently be calculated. This is also applicable to other more efficient access and egress modes compared to walking, such as bicycles, scooters, and shared vehicles.
As for practice, the obtained results have potential to guide transit operations in the study area. For example, bus frequency needs to be coordinated with the time frequency of railway systems, especially for the passengers’ transfers from trains to buses in the urban area (see
Figure 7). In some areas where the transfer time for buses is more than 20 min during the peak hours (see
Figure 8), this indicates the imperfect reliability of bus travel time. We may have two ways to improve it. First, bus stops and passageways can be designed coordinately to avoid many conflicts with high-density traffic flows. Second, we can establish bus-dedicated or priority lanes to ensure the bus arrives on time or deploy the transport hubs in locations that would allow for the transfer to become seamless. Moreover, the reduction of the transfer time in rural areas deserves a special concern from our study, and a more accurate time-dependent OD demand might be required for transit operations. The passenger security at peak hours should also be paid attention to. This is notably important for the large and complex transit system in the Paris region. A trade-off may exist between transfer time and ensuring passengers’ security.
6. Conclusions
This paper aims to estimate the transfer time in the multimodal transit networks from the most recent HTS in the Paris region. The average wait time and transit mode speeds are initially estimated by the linear regression model. The related inferences of transfer time in different time periods and space are investigated. From the study, some evidence is worth mentioning. In the Paris region, the transfer to the train or metro costs less time than the transfer to the bus. The transfers between the suburban buses cost a little more than the transfers between the urban buses. Regarding the different time periods, the inter peak period seems to be the best time for transfers from the railway system (both train and metro) to the bus. Our preliminary results are more qualitatively reliable than the estimated values themselves, which are subject to the sample size for the regression model, declaration bias in the HTS, and some ignored influential variables.
The current work could be extended by three aspects in the future. First, the dataset of the transit trips is anticipated to be enriched in application to the proposed model. Once the HTS is completely finished for the survey planning horizon, the study can be replicated and more representative results may be generated. Second, other kinds of datasets, such as GPS traces and automated fare collection data, will be considered to further validate and complement our estimated results. At last, the socio-economic relationship can be established in the model to find the preferences of targeted passenger groups in the transit system.