A Bicycle Origin–Destination Matrix Estimation Based on a Two-Stage Procedure

: As more people choose to travel by bicycle, transportation planners are beginning to recognize the need to rethink the way they evaluate and plan transportation facilities to meet local mobility needs. A modal shift towards bicycles motivates a change in transportation planning to accommodate more bicycles. However, the current methods to estimate bicycle volumes on a transportation network are limited. The purpose of this research is to address those limitations through the development of a two-stage bicycle origin–destination (O–D) matrix estimation process that would provide a di ﬀ erent perspective on bicycle modeling. From the ﬁrst stage, a primary O–D matrix is produced by a gravity model, and the second stage reﬁnes that primary matrix generated in the ﬁrst stage using a Path Flow Estimator (PFE) to build the ﬁnalized O–D demand. After a detailed description of the methodology, the paper demonstrates the capability of the proposed model for a bicycle demand matrix estimation tool with a real network case study.


Introduction
Bicycling is increasing in many urban communities around the world due to the push for sustainable living and other factors [1][2][3]. The volume of bicycle users has increased enough to significantly affect the modal split in such communities. Bicycle mode share is traditionally modest in the United States, so the recent increases in bicycle mode share [4,5] signal a need for better bicycle accommodations in transportation planning.
Bicycle planning models are essential in helping decision-makers allocate resources more effectively in terms of building infrastructure. However, traditional transportation planning focuses on private motorized vehicles. As a result, there is a limited number of tools available to model bicycle planning models in a network, and there are correspondingly few research efforts dedicated to network analysis for bicycles. Turner et al. [6] and Cambridge Systematics [7] provided a review of bicycle planning models in practice. Their review included the following: a comparison study of observed bicycle counts [8]; aggregate behavior studies [9,10]; bicycle sketch plan methods [11]; discrete choice models [12,13]; the regional travel demand model [14]; and latent demand score [9]. Ortuzar et al. [15] showed how to estimate the bicycle demand from stated preference (SP) survey data. In terms of the bicycle route choice model, Hood et al. [16] introduced a path-size logit (PSL) model [17] to perform bicycle traffic assignment. Menghini et al. [18] also adopted a PSL-based model on a pregenerated route, and Fagnant and Kockelman [19] developed a cyclist route choice model based on travel time and travel suitableness. Most methodologies can only estimate or extrapolate the bicycle volumes on a network based on some bicycle counts in selected locations [19,20]. While these methodologies are certainly useful in bicycle modeling, they do not provide a complete picture of the bicycle transportation network.
The two-stage model introduced in the paper distinguishes itself from other methods, because it makes use of both field and planning data to estimate the bicycle O-D matrix, and assigned bicycle trips also match the bicycle counts on the network. The first stage generates a primary bicycle O-D matrix, and the second stage refines the bicycle O-D matrix generated in the first stage. In the first stage, a gravity model (i.e., a doubly constrained) is used with obtained planning data, which are zonal productions and zonal attractions, as its main input. Then, the second stage refines the primary bicycle O-D matrix generated in the first stage. After the first stage, Path Flow Estimator (PFE) is used with two inputs, which are observed data (e.g., link counts) and the primary O-D matrix in the second stage. This two-stage process ensures that the estimated O-D matrix would be more consistent with the observed bicycle counts. These unique features may provide insight into improving current practices in bicycle transportation planning.
This paper is organized as follows: after the introduction, the two-stage bicycle O-D estimation model is presented, followed by a case study in the City of Winnipeg, Canada, to demonstrate the applicability of the proposed model, and then the paper ends with concluding remarks.

A Two-Stage Bicycle O-D Estimation
Estimating bicycle demand consists of two main stages ( Figure 1). In the first stage, a doubly constrained gravity model is adopted to generate a primary bicycle O-D matrix, and then PFE is used to refine the primary bicycle O-D matrix.

Stage 1: Generation of a Primary Bicycle O-D Matrix
As shown in Figure 1 (left panel), a doubly constrained gravity model is adopted to generate the primary bicycle O-D matrix with planning data (i.e., zonal productions and zonal attractions) and calibrated impedance function or friction factors as inputs. The simplified procedure involves the following steps: Step 1. Estimation of zonal productions and zonal attractions using existing planning data; Step 2. Estimation of a primary O-D matrix in the gravity model.
Step 1 utilizes a trip generation model focusing on travel choice to perform the first step in the conventional transportation planning model (also known as the four-step model). From the trip generation model, the number of trips in a traffic analysis zone (TAZ) are predicted. If there is no direct demand available between zones, typical methods can be used to model trip generation. These methods include: (1) regression models, and (2) category (or cross-classification) models. Once the trip productions and attractions are estimated, Step 2 distributes the trips through a trip distribution model. For the trip distribution step, the gravity model (i.e., doubly constrained gravity model) is widely used as follows: where T ij = Number of trips produced between zone i and zone j P i = Number of trips produced in zone i A j = Number of trips attracted to zone j F ij = An empirically derived "friction factor," which expresses the average area-wide effect of spatial separation on the trip interchanges between the two zones, i and j K i , K j = Empirically derived origin and destination adjustment factor in zone i and zone j Generally, the trip distribution model is a required calibration process with F ij , K i , and K j , but it is usually time-consuming [35]. In addition, the calibration process requires many factors, which is not established from travel surveys. In Stage 2, we introduce the alternative method, which is the refinement of the primary matrix.

Stage 2: Bicycle O-D Matrix Refinemnet
To refine the primary bicycle O-D matrix with observed bicycle counts, we developed a sequential model consisting of route generation and O-D demand adjustment. The route generation process considers two main factors, which are distance-related factors and safety-related factors). With these two factors, efficient routes for an O-D pair are generated. Once the route generation process is completed, PFE is adopted to adjust the primary bicycle O-D demand generated in Stage 1, and the adjusted bicycle O-D demand matrix in the second stage can assign more consistent link flows with observed link count data.
Stage 2 consists of two steps. In Step 1, the bicycle routes are generated based on two key factors, which are distance-related factors (i.e., route distance) and safety-related factors (i.e., bicycle level of service (BLOS)), and then O-D demand is adjusted with the primary O-D matrix generated in Stage 1, observed link counts, and zonal productions and attractions data. Figure 2 shows the overall flowchart of the bicycle O-D matrix adjustment procedure in Stage 2.

Route Generation in Stage 2
Because of the many factors [36][37][38][39][40][41][42] affecting a cyclist's route choice, using a single objective may have limitations to construct cyclist route choice behavior [18]. Ehrgott et al. [43] introduced the bi-objective bicycle route generation model based on the route distance and route suitableness criteria. In this paper, distance-related factors (i.e., route distance) and safety-related factors (i.e., BLOS) are adopted to construct cyclist route choice behavior.

Route Distance
Route distance consists of the sum of link distances and the delays at intersections the route passes through. The delays at intersections have been shown to be an unfavorable factor in cycling route decision-making. Because the unit of link length (e.g., kilometer) and intersection delay (e.g., second) are different, it needs to match the same unit between two measurements. In the paper, the delay (i.e., minute) is converted to distance unit (i.e., kilometer) with a conversion factor. The details, which are equation and description of route distance, can be found in [44].

Bicycle Level of Service (BLOS)
There are many measurements to assess the safety or the suitability effect of bicycle facilities. Lowry et al. [45] provided thirteen methods used in the literature. Among the numerous measurements, the BLOS measure developed by the Highway Capacity Manual (HCM) [46] is used as a surrogate measure to account for safety. The BLOS measurement is introduced in the HCM as a guide for bicycle facility design. The details in terms of the BLOS development can be shown in the HCM [46].

Bi-Objective Shortest Path
The bi-objective shortest path problem may not have a single solution which dominates all other solutions. Therefore, the solution requires a set of nondominated (or Pareto) solutions. Several solution algorithms [47][48][49][50] have been developed, but it is not easy to solve with nonadditive route cost, which is route BLOS in the bi-objective shortest path problem. Readers may refer to Raith and Ehrgott [51] for a comparison of different solution strategies for solving the bi-objective shortest path problem. In the paper, we modified the ranking method proposed by Climaco [47]. First, routes are generated based on the route distance attributes (i.e., link distance and intersection turning movement penalty) without exceeding the maximum upper bound (i.e., 10 kilometers in this study) by using the k-shortest path algorithm, then BLOS scores are calculated for each corresponding route in the set to determine the nondominated routes.

Bicycle O-D Demand Adjustment Using Path Flow Estimator (PFE)
The PFE originally developed by Bell and Shield [52] is a one-stage network observer using traffic counts. The main component of the PFE is a logit-based path choice model in which the perception errors of path travel times are assumed to be independent Gumbel variates. Note that one of the major issues in bicycle route choice decisions is the route overlapping problem (see [53]). Hence, the aim of this section is to extend the logit-based PFE model to implement the route overlapping issue by using the path-size logit (PSL) model, and to refine the primary bicycle O-D matrix. The PSL-based PFE formulation is shown with several side constraints as follows.
Minimize : where The objective function in Equation (2) consists of three terms. The first term, which is the entropy term, distributes the O-D demand onto multiple routes with the provided dispersion parameter. The second term, which is the system optimal term, minimizes users' utilities, and the third term, which is the Path-Size (PS) term, considers the route overlapping.
The PS factor is determined by the length of links within a path, as follows: where PS rs k = PS factor of path k connecting O-D pair rs l a = Length of link a L rs k = Length of path k connecting O-D pair rs Path-size factor is determined by the length of links the route passes through [17]. Hence, a path heavily overlapped with other paths has a smaller PS value, while paths that are more distinct have a larger PS value.
Compared to the traditional traffic assignment model (i.e., logit-based stochastic user equilibrium [54]), the adopted PSL-based PFE model finds path flows while reproducing observed bicycle counts in Equation (3), zonal productions and attractions in Equations (4) and (5), and target demands in Equation (6) within predefined error bounds, simultaneously. These error bounds indicate confidence levels of the observed data. If the provided data are more reliable, a smaller error bound is used to constrain the estimated flows within a narrower range, while a less reliable dataset will use a larger error bound to allow for a larger range of the estimated flows. Equation (7) constrains the path flows to be of non-negativity. Equations (8)- (11) are definitional constraints that sum up the estimated path flows to obtain the O-D demand, link flows, zonal production flows, zonal attraction flows, and O-D demands, respectively.
By constructing the Lagrangian function of the above formulation in Equations (2)-(11), path flows can be derived analytically as a function of route costs and dual, as follows. (13) where u − a ,ρ − r , τ − s , o − rs and u + a ,ρ + r , τ + s , o + rs are the dual variables representing the lower and upper limits of constraints (Equations (5)-(8)), respectively.

Numerical Results
To demonstrate the proposed bicycle O-D estimation model, a real network in the City of Winnipeg, Canada, was adopted in the numerical experiments. The network consists of 1067 nodes, 2555 links, and 154 zones. Among the 2555 links, 541 links are set up as bicycle links. The O-D trip table for motorized vehicles and link performance parameters were obtained from Emme/4 [55]. The bicycle network information was obtained from the City of Winnipeg [56], in Canada.

Stage 1: Generate a Primary O-D Demand
We generated a primary bicycle O-D matrix in the Winnipeg network as follows, in Stage 1. First, the zonal productions and zonal attractions were generated. The data for the total number of commuters and the percentages of commuters for each mode (e.g., car, transit, and bicycle) and in each neighborhood (i.e., zonal production flows) were obtained from the City of Winnipeg [57]. In Winnipeg, there are about 5575 commuters (or 1.8%) who use bicycles. From the census data, bicycle zonal production flows can be obtained. However, there is no available data for zonal attraction flows. As an alternative, we used the proportion of motorized vehicle demand obtained from Emme/4 [55] to estimate the bicycle zonal attraction flows as follows: where T B is the total bicycle demand; A B s is the bicycle attraction flows of zone s; T C is the total motorized vehicle demand; and A C s is the motorized vehicle attraction flows of zone s. Figure 3 illustrates the obtained bicycle production (blue color) and attraction demands (green color) for each zone from the 2006 census data. From the figure, the zonal demand pattern reveals that a majority of the bicycle attraction flows are from the central business district (CBD) area, while the bicycle production flows are generated from the non-CBD areas. Once the bicycle productions and attractions are estimated from each zone in Figure 3, the primary bicycle O-D matrix is estimated using the doubly constrained gravity model given in Equation (1). Because of the absence of bicycle friction factors from the City of Winnipeg, bicycle trip length introduced by [58] was substituted for the friction factors. Note that origin and destination adjustment factors were not used in this study (i.e., K i = K j = 1 for all zones).

Stage 2: Refine the Primary Bicycle O-D Matrix with PFE
After generation of the primary bicycle O-D matrix in Stage 1, we transitioned into Stage 2 by applying the bi-objective route generation process to generate a set of bicycle routes. Once the efficient routes were generated, the PFE was performed with the observed counts data to refine the primary bicycle O-D matrix.

Efficient Route Generation
Figures 4 and 5 show the transportation characteristics in the Winnipeg network. These characteristics were used to determine the route choice criteria (i.e., route distance and route BLOS). Figure 4a,b shows the vehicle volume and speed; Figure 4c,d shows the bicycle segment Level of Service (LOS) and intersection LOS from the HCM [46]. These characteristics serve for computing the two route choice criteria: route distance and route BLOS.
A bi-objective route generation was performed based on the characteristics of the Winnipeg network shown in Figure 5. For the upper bound of the two key criteria, we assumed 10 kilometers for route distance and 7 for route BLOS. In other words, routes with higher than 10 kilometers for route distance and routes with higher than 7 for route BLOS were excluded from the generation of the efficient route set. For the network, the bi-objective shortest path problem was performed for each O-D pair (there are 7368 O-D pairs in the Winnipeg network, hence, it was performed a total of 7368 times). From 7368 O-D pairs, the total number of generated efficient routes are 37,263 routes, and this result implies that each O-D pair has an average of about 5 efficient routes. Figure 5 shows the route distribution of the 37,263 routes. Most routes are between 5.0 and 8.0 kilometers, the average route distance is 6.6 km in terms of length, the routes have values ranging from 3.0 to 4.0, and average BLOS is 3.4.

Link Counts Data
Generally, most traffic count collections are for motorized vehicles. Hence, bicycle count data may not be collected regularly, but we expect the collecting bicycle counts to increase in the future.
Hull [59] analyzed cycling commuter patterns from 2007 to 2013 with 65 link count locations. From their work, we obtained observed counts data (highest counts recorded in 2007-2013) corresponding to the 65 link count locations. Figure 6 shows the obtained bicycle counts and corresponding locations.

Utility Parameters and Error Bounds of Observed Data
With generated efficient routes, the bicycle PFE was applied to refine the primary bicycle O-D matrix to meet the error bound of bicycle counts (Table 1), target or primary O-D demand, and zonal productions and attractions. A composite utility function was used, as suggested by Kang and Fricker [60], U rs k = − d rs k α · BLOS rs k β , where α and β are parameters. We also used the parameter values (i.e., α= 0.862 and β = 0.117), suggested by Kang and Fricker [60]. Because of flexibility of error bound in PFE, we can provide different error bounds for different data. A more reliable dataset will constrain the estimation to be within a smaller error bound, while a less reliable dataset will have a larger error bound (i.e., estimated values are allowed to have a large deviation). For bicycle counts, zonal productions, and attractions, error bound of 30% was used. For target O-D demands (i.e., primary bicycle O-D matrix), a flow-dependent error bound was adopted from 30% (flows are higher than 30) to 60% (flows are lower than 30).

Comparisons of Analysis Results
After performing the two-stage bicycle O-D estimation, we analyzed the differences between: (1) the observed flows and the estimated flows; (2) the primary matrix and the refined matrix; and (3) the link flows by Path Size Logit-based Assignment (PSLA) and the link flows by PFE.
Our first comparison delves into the differences between the observed flows to the estimated flows. Figure 7 shows the scatter plots in link flow, zonal productions, zonal attractions, and O-D demands. The root-mean-square error (RMSE) value is provided for accuracy.   For the third comparison, the assigned link flows by PSL assignment and the estimated link flows by PFE were compared. Figure 9a shows the link flow comparison between these two methods. On the map, the red and orange colors indicate that the assigned link flows by PFE model have greater flows than the assigned link flows by PSL model. The purple and blue colors indicate the reverse. As can be seen, there are more flow adjustments in the CBD area. Because most traffic count locations are located near CBD areas, and because areas near CBD areas have relatively higher demand than other areas, these areas are more affected in O-D demand adjustment. Figure 9b provides a more detailed analysis through a bar graph, displaying the link flow difference distribution. Only 28% of links have 0-1 flow difference, while about 32% of links have significant difference.

Conclusions
The paper presents a methodology for bicycle demand estimation from observed data. The estimation process involves two key stages: a primary bicycle O-D estimation and a refinement of the estimated primary bicycle O-D demand. The first stage generates the primary O-D demand in a gravity model with available planning data as input. The second stage then generates efficient routes (considering route distance and route BLOS) and uses PFE to refine the primary O-D trips generated in the first stage with observed counts data. This two-stage process enhances the accuracy of the results by making the route flows estimated by the O-D demand matrix more consistent with the observed flows.
A case study was conducted on a real network in the City of Winnipeg, Canada, to show proof of concept. In the City of Winnipeg network, incorporating bicycle counts in bicycle O-D matrix estimation results in significantly different flow patterns. Because of the absence of a calibration process in the trip distribution step and traffic assignment step, the reason for the significantly different flow patterns between estimated and observed link flows is probably supported. The findings in this study have important implications in bicycle planning. The proposed two-stage process is demonstrated to be a useful tool in modeling cyclist planning such as bicycle facility construction, navigating bicycle routes, and can be used to guide the design of networks and to guide bicycle travel more efficiently. For future research on bicycle O-D estimation, we recommend the following five considerations. Firstly, we recommend conducting parameter calibration, because we believe that the absence of calibration in this research had significant effects on the results. Secondly, it would be more realistic to incorporate a variety of network topologies and bicycle facilities into the analysis. Thirdly, we recommend integrating the congestion effect of motorized vehicles as well as a more comprehensive look at travel patterns. On a similar note of increased inclusivity, we also recommend incorporating travelers' driving skills. Lastly, we recommend implementing more data collection to avoid data inconsistency problems.