A Maximum Entropy Fixed-Point Route Choice Model for Route Correlation

: In this paper we present a stochastic route choice model for transit networks that explicitly addresses route correlation due to overlapping alternatives. The model is based on a multi-objective mathematical programming problem, the optimality conditions of which generate an extension to the Multinomial Logit models. The proposed model considers a fixed point problem for treating correlations between routes, which can be solved iteratively. We estimated the new model on the Santiago (Chile) Metro network and compared the results with other route choice models that can be found in the literature. The new model has better explanatory and predictive power that many other alternative models, correctly capturing the correlation factor. Our methodology can be extended to private transport networks.


Introduction
The present study formulates a new route choice model for public transport networks that features significant innovations compared to existing models.The main enhancement in the proposed model is the ability to simultaneously and explicitly integrate the traveler's lack of information (randomness or uncertainty) and the correlation between route alternatives (due to overlapping).The proposed model is based on a multi-objective mathematical programming problem and its respective scalarized single-objective problem.The multi-objective problem considers the exogenous cost functions of the network, entropy of the route choice, and the covariance matrix for the route flows.An extension to networks with congestion (e.g., private transport with endogenous costs) is also proposed.
A traditional approach to defining the route choice process, and the subsequent traffic equilibrium, is to assume a deterministic behavior.This deterministic equilibrium usually states optimality conditions, such as minimizing transport costs or satisfying Wardrop's first principle of traffic equilibrium [1].These models assume that travelers have perfect information and seek to unilaterally minimize their travel costs [2][3][4][5].Typically, a mathematical programming model is formulated and solved by an iterative algorithm.If applied with care and understanding, a deterministic user-equilibrium model provides a simple but effective method of traffic assignment [6][7][8][9][10][11][12].
Alternatively, stochastic/probabilistic route choice models differ from deterministic formulations in that they incorporate the uncertainty, randomness and/or the heterogeneity of travelers and alternative routes, and passenger's imperfect knowledge.This is the approach followed in this study.Reviews of this class of models are found in Daganzo and Sheffi [13], Hazelton [14], Ramming [15], and Prashker and Bekhor [16].Among these models, there are some that explicitly consider correlations between alternative routes, such as Cascetta et al. [17], Ben-Akiva and Bierlaire [18], Bekhor and Prashker [19,20], and Bovy et al. [21].A complete review of these studies can be found in Prashker and Bekhor [16] and Prato [22].Several of the models presented by the authors were considered for performing a comparison with our new model.
In the remainder of this paper, Section 2 contains a brief review of the literature that provides context for understanding the proposed model; Section 3 provides an analytic derivation of the new formulation; Section 4 applies the model to a medium-sized network (the Santiago Metro), comparing the results with existing models and proposing a version for private networks with congestion; and lastly, Section 5 summarizes the results and gives the main conclusions.

Literature Review
The formulation of the route choice or traffic assignment stage in transportation modeling has long followed an approach in which users minimize their generalized trip cost on the assumption of perfect knowledge of the transport network.Under this approach, travelers are considered to be homogeneous and each one is fully informed of the cost of each link on the network at any level of flow [1,2].These assumptions are both rather strong even for a small network, and the results obtained are often not satisfactory.However, due to their simplicity and availability, many transport planners continue to apply such models, especially in large networks.
Because the perfect information assumption is usually not correct, there is a clear need for models that represent users who have incomplete or imperfect information on the transport system in regard to existing routes and their levels of congestion.Various route choice models in the specialized literature are based on system attributes perceived by travelers and their socioeconomic and demographic characteristics [13,15,16,23,24].In these models, users behave in accordance with the costs they perceive.The socioeconomic and demographic variable data are usually obtained through user surveys or from network data records and are easily justified as an integral part of individuals' rational decision-making processes.However, because the modeler lacks information on these processes, the modeled choices necessarily embody a degree of variability.
Regarding knowledge of routes, it is widely accepted that individual users do not know (or do not perceive/consider) all of the route possibilities between a given origin-destination pair [17,[25][26][27][28].This is a reality that should be incorporated into route choice models.Cascetta et al. [27] propose a Logit-type model of route perception and choice similar to those based on random utility theory.Ben-Akiva et al. [25] model interurban route choice as a two-stage process in which a set of routes is defined in the first stage, and the route choice is performed in the second stage.
Correlations between alternative routes can (to a certain extent) be indirectly addressed, according to Bovy and Hoogendoorn-Lanser [29], by hierarchical nested Logit and multi-nested GEV models in cases where it arises from overlapping segments and/or nodes.Prato and Bekhor [30] and Bekhor et al. [31] study the way in which individuals construct their set of route alternatives and the implications of route similarity for user behavior.Bliemer and Bovy [32] study the interdependence of these two aspects.

Multinomial Logit for Route Choice
The probability of choosing route p to travel between O-D pair w (P p w ) can be estimated by discrete choice models in which the cost of a given route as perceived by a user (P c w ) is assumed to be given by Equation (1): The cost is thus perceived as the sum of a deterministic component (c p w ) and a random error component (ε p w ).The latter can be interpreted in various ways, one of which is that the error reflects the user's inaccurate perception of route cost due to, among other things, the scarcity of information.In this case, the first term (c p w ) represents the real mean cost of the route.The expression for the choice probability (P p w ) is a function of the assumed distribution of the error terms and whether or not they are independent.If we assume the errors are i.i.d.Gumbel-distributed with a scale parameter θ > 0, we have a multinomial logit (MNL) choice model in which P p w is given by Equation (2) (see [5,23,33]): where p w is the set of routes uniting the O-D pair w.

Z c h h h s t h T w
where T w is the total number of trips (exogenous demand) between the O-D pair w and h p w is the flow along route p (P p w = h p w / T w ).In problem (3), the set of routes p w between each pair w must be predefined and maintained invariant during the assignment process.The optimality conditions lead to the assignment criterion given by Equation (2)  represents the negative value of the entropy.The latter is weighted by 1/θ to scale both terms in the objective function.In this way, problem (3) can be interpreted as a bi-objective problem that simultaneously minimizes the total cost and maximizes the total entropy [34].
The principal limitation of model ( 3) is that it does not incorporate a structure for capturing correlations between routes.Possible direct extensions are either extremely simplistic or difficult to apply correctly to real-world-scale networks (for example, using a hierarchical Logit model) given that they cannot properly capture the various types of correlations between routes that share links with each other in different ways.In urban networks, the routes linking a given O-D pair will typically have many overlaps due to common links, with the result that the independent error assumption implicit in Logit-type models such as Equation ( 2) is unrealistic.
Different extensions of the multinomial logit model have been proposed to explicitly capture correlations between alternative routes, some of which are presented below.These models are used as a comparative basis for our proposed model.

C-Logit Model
In Cascetta et al. [17] and Cascetta et al. [27], the authors propose a joint implicit availability/perception (IAP) and route choice (C-Logit) model that also explicitly addresses the correlation issue (i.e., the lack of route independence due to common links) and is analytically tractable even for large-scale networks.
The basic idea behind the model is to address route interdependence via a cost attribute called the "similarity factor", which is added to the cost of the route in a conventional Logit model, instead of addressing it in terms of error non-independence as in the Probit model.The probability of choosing route p is then given by Equation (4): where CF p w is the "similarity factor" for route p joining pair w and is constructed as follows: where β is a parameter to be calibrated that must be negative, l a is the length of link a, L p is the length of route p, and δ ar is equal to 1 if link a belongs to some route r joining w but is 0 otherwise.Other specifications for CF p w may be found in Prato [22].

Path-Size Logit Model
Ben-Akiva and Ramming [36] develop the path-size logit (PSL) model, which also aims to correct for routes that have overlaps and are therefore correlated.It attempts to incorporate behavioral theory in Cascetta's C-logit model.In this case, P p w is given by Equation ( 6): where PS p w is the correction for the route size.The correction principle applied in this model is as follows: a route with no links overlapping another route needs no correction and is therefore assigned a size of 1.At the other extreme, if there are J duplicate routes (i.e., total overlap), each one has a size of 1/J.Lastly, the length of routes with partial overlap is based on the sizes of the links, which are appropriately weighted on some criterion, such as the link's contribution to the total length of the route.Thus, PS p w can take the following form: where the variables are the same as those employed in Equation ( 5) to define the similarity factor.Further specifications for PS p w may be determined in Bovy et al. [21].

Paired Combinatorial Logit Model
The paired combinatorial logit model (PCL) belongs to a family of models that derives from the generalized extreme value (GEV) model [33].The general PCL model can be derived from the following generation function: where: y k characterizes each alternative (in our case, a route between an O-D pair), σ kj is a similarity index between alternatives k, and n is the number of alternatives.The model was adapted for route choice by Bekhor and Prashker [19], and, in our case, we use the expression used by Chen et al. [37]: In this case, for a choice set of p w alternative routes, there are p w (p w − 1)/2 pairs of alternatives.If σ pq w equals 0 for every pair (p,q) in w routes, then the PCL model reduces to a MNL model such as Equation (2).We use the similarity index proposed byChen et al. [37]: where L pq w is the length of overlap between routes p and q and L p w and L q w are the respective lengths of routes p and q, respectively.It is possible (and convenient) to include an additional degree of freedom in this model by including a parameter β in Equation (10) as shown in Equation (11).This new parameter must be estimated together with the rest of the model's parameters:

Cross-Nested Logit Model
The cross-nested model (CNL) also belongs to the GEV family of models and was adapted for the route choice context byPrashker and Bekhor [38] and Vovsha and Bekhor [39].In that adaptation, the model considers a two-level hierarchical structure.The upper level includes all of the links in the network, and the lower level includes all of the routes that belong to p w .Next, every route is assigned to the nests that represent the links that belong to it.In this case, the probability of choosing route p is given by Equation ( 12): where: m characterizes the links and therefore the nests, M p w is the set of links that belongs to route p in the O-D pair w, α mp w are parameters that represent the degree of inclusion of alternative p in nest m, and µ is the nesting coefficient.If µ = 1, the model reduces to a MNL model.The CNL model is adapted to a route choice context, defining the parameters α mp w depending on the topology of the network.Prashker and Bekhor [38] proposed the following expression: where L m w is the length of link m, L p w is the length of route p, δ mp w equals 1 if link m belongs to route p, and γ is a parameter that must be calibrated, which reflects the perception of the travelers regarding the similarity of the alternative routes.For the estimation of the CNL model, we use γ = 1 following Bekhor et al. [31].

Route Choice Model with Correlated Routes
In this section, we develop a new route choice model that simultaneously incorporates: (i) users with imperfect knowledge of the network; (ii) correlations between alternatives (in this case, routes); and (iii) the effect of demand or flow levels on network costs (congestion).As noted earlier, the proposed formulation is entropy-based with quadratic constraints.

Mathematical Formulation of Model
The proposed stochastic equilibrium assignment model that captures route correlation is based on the following multi-objective optimization problem (other applications of multi-objective models in transportation can be found in [34,35,40])

F c h
Objective F 1 relates to the total system cost.Objective F 2 attempts to maximize the entropy to determine the most likely routes; in probabilistic terms, it finds the most feasible route combinations for travelers in equilibrium.Combining F 1 and F 2 with the flow conservation constraints gives the stochastic assignment model expressed in Equation (3).
Objective F 3 , the novel element in the proposed model, explicitly incorporates correlations of flows between different routes, whether or not they join the same O-D pair w.The objective is constructed as a weighted sum of the divergences of the flows on the individual defined routes joining w from the average flow on those routes, where is the average flow, N w is the number of defined routes (i.e., the cardinality of p w ) and T w is the (fixed) total number of trips between w.The parameters η pq w are exogenous and determine the degree of correlation (0 to 1) between the routes p and q of w.The values of the parameters can be defined in a number of ways (see [15,17,41]).
As with F 2 , objective F 3 is an information criterion [42].If, for example, the flows on all traveled routes were uniform (that is, if , p w w h t p   ), the value of F 3 would be 0 and thus contain no information.F 2 also takes its lowest possible value if , An alternative optimization problem [34,43] for Equation ( 14) that generates the proposed stochastic equilibrium model is the following:

F c h h h h t h t s t h T w
The terms 1/θ and 1/ρ are the respective relative weights of the two information criteria F 2 and F 3 with respect to the reference objective F 1 .θ and ρ are parameters to be estimated.
The first-order conditions for Equation (15) are: where p w c could be, for a public transportation application, replaced by , Equation ( 16), we obtain: Dividing Equations ( 17) and ( 18) we have: Because pq w  is an exogenous model parameter (defined by the modeler) and w t is assumed to be constant (for calibration and modeling purposes), the term , which is an intercept or specific constant, Equation (19) can then be rewritten as: exp exp w w w p p p q q w w w w q p q p p w w r r r q q w w w w r p q p q p c h This non-linear expression is similar in structure to the model specified by Ben-Akiva and Ramming [36], given here as Equation (6).The main difference is that the right-hand side of Equation ( 20) includes the endogenous variable h q w and is therefore a fixed-point function [44], whereas in Equation ( 6), the right-hand side contains only the model's exogenous variables.
An alternative can be specified for F 1 that can model service levels in public transportation networks.An example of such a replacement is 1 generalized cost given by the weighted sum of attributes or explanatory variables , p w k X that could represent, for example, the trip time, wait time, or the cost of route p between pair w.
We can define w p p p p q q w w w w w q p q p V c h as the "utility level" of the proposed Logit-based model.By the route choice probability function in Equation ( 20), the marginal utility of the attribute , p w k X can be written as: , , Clearing, we obtain: Because the denominator of Equation ( 21) is independent of attribute k, the marginal rates of substitution are generic and have the same functional form as traditional discrete choice models: Thus, in the proposed fixed-point model with spatial correlation, the marginal rates of substitution between attributes can be obtained without any additional complexity.
To implement model (20), the parameters (α p w , θ, ρ*), where *     must first be estimated and the fixed-point Equation ( 20) and then solved given that the variable h p w appears on both sides of the equation.However, because the endogenous variable (h p w ) appears on the right-hand side of the model, the parameters (α p w , θ, ρ*) cannot be estimated using the maximum likelihood because the presence of the endogenous variable violates the assumption of independent marginal probability functions necessary for defining the likelihood function.To bypass this problem, we resort to the use of an instrumental variable to replace h p w as explained below.

Estimation of Model Parameters
An instrument or instrumental variable [45] is an exogenous variable that is highly correlated with an explanatory variable exhibiting endogeneity and can therefore be used as a replacement for the latter without a loss of asymptotic properties in the estimated parameters.In our case, a suitable instrument h p0 w to replace h p w is: This formula is a classic MNL model and can be easily estimated.Once this is done, the value of h p0 w is substituted into the right-hand side of Equation ( 20), as follows: w w w p p p q q w w w w q p q p p w w r r r q q w w w w r p q p q p c h h T c h (25) Unlike Equation ( 20), the parameters (α p w , θ, ρ*) in Equation ( 25) can be directly estimated by the maximum likelihood [46][47][48] because h q0 w does not exhibit endogeneity.Using these estimates, which we now denote (α p0 w , θ 0 , ρ 0 *), we can estimate h p1 w by Equation ( 26): p p p q q w w w w q p q p p w w r r r q q w w w w r p q p q p c h h T c h (26) The original model ( 26) is thus estimated iteratively from the following recursive relation (which represents the equilibrium of the fixed-point function): p n n n q n p p q w w w w q p q p p n w w r n n n q n r r q w w w w r p q p q p c h The iterative estimation process concludes when convergence in obtained; that is, 27) is significantly different from 0, the null hypothesis of no correlation between route p and any of the other routes joining O-D pair w is rejected.
In an analogous fashion to the derivation of Equation ( 21), upon iteratively solving model (20), the marginal utility of attribute X p w,k is then: This expression generates the marginal utilities recursively in successive iterations.The model converges to the equilibrium of the fixed-point function as follows: The marginal rates of substitution thus continue to be generic and equal in their functional form to those of the traditional discrete choice models.
However, because the endogenous variable (h p w ) appears on the right-hand side of the model, the parameters (α p w , θ, ρ*) cannot be estimated using the maximum likelihood because the presence of the endogenous variable violates the assumption of independent marginal probability functions necessary for defining the likelihood function.To bypass this problem, we resort to the use of an instrumental variable to replace h p w as explained below.

Existence and Uniqueness of the Fixed-Point Problem Solution
To demonstrate the existence of a solution to the fixed-point problem expressed by Equation ( 27), we apply the Brouwer fixed-point theorem [49].This theorem may be stated as follows: Let H be a non-empty compact convex set of a finite-dimensional Euclidean space k, and let : f H H  be a continuous function.Then, f has a fixed point, i.e.,

 
Below, it is shown that the system of equations describing the equilibrium in model ( 27 p p p q q w w w w q p w r r r q q w w w w r p q p p c h This expression is a construction of two continuous functions and is therefore itself continuous.The parameters (α p w , θ, ρ*) are estimated using the maximum likelihood given an h w and thus are continuous functions with respect to the latter, thereby satisfying the Brouwer's theorem hypotheses and proving the existence of at least one fixed point.
The function f is continuous because it is a composition of continuous functions.In this case, the parameters (α p w , θ, ρ) are continuous functions of h, which is given by the continuity and strict concavity (α p w , θ, ρ) of the log-likelihood function to be used in the multinomial logit model.The estimation of our model, which corresponds to the point where succession described in the previous section converges, belongs to the set H and will be a fixed point of the function f.Using Brouwer's Theorem, we can establish the existence of at least one fixed point, i.e., p p p q q w w w w q p w r r r q q w w w w r p q p c h

 
We now prove uniqueness for the case of ρ* ≥ 0. Assume that there are 2 different equilibrium points * * , h g H  .It is therefore the case that:    p p p q q w w w w p p p q p pq p pq q w w w w w p q p q p w p p p q p w w w w Without a loss of generality, we can assume that:   * * * * exp : exp w w w p p p q q w w w w q p w r r r q q w w w w r p q p c h For the case of p + , we have: This is true given assumption (35) and provided that ρ* ≥ 0. From this, we deduce that There is thus a contradiction, and we conclude that a fixed point not only exists but is unique.

Numerical Results and Extensions
In Section 4.1, we present a route choice case study comparing the results of a real application of the proposed model to those generated by the classic MNL model in Equation ( 2), C-Logit model in Equation ( 4), path-size logit model in Equation ( 6), PCL model in Equation ( 9) and the CNL model in Equation (12).Next, in Section 4.2, we develop an extension of the proposed formulation with route correlations to traffic assignments on congested networks (private transportation).

Application to a Medium-Sized Network: The Santiago Metro
This case study applies the various models to route choice on the Metro rail transport system in the Chilean city of Santiago.Successive transfer points are chosen between origin and destination pairs that in many instances are joined by more than one feasible route (see Figure 1).The analysis focuses on the morning and evening peak periods (7 am to 9 am and 6 pm to 8 pm) when approximately 790,000 trips are taken across the system, 44% of which include transfers.Trip data were obtained through an O-D survey conducted on the Metro in which 92,800 system users, or approximately 12% of the total, participated.Because only those trips for which there existed more than one alternative route were retained in the data set, the number of individuals or observations eventually employed was 16,029, or approximately 40% of users who transferred.
When the survey was performed in October 2008, the Santiago Metro consisted of five lines and 85 stations, seven of which were transfer points.Of the 7140 O-D pairs in the network, 4985 (70%) of them required transfers.The reasonability criterion applied in including a route as a possible alternative for a given pair was that at least one surveyed traveler was observed to have used it.Data on the alternative routes for O-D pairs across the system that had more than one route on this criterion are given in Table 1.Although in the majority of cases there were only two alternatives, some pairs had as many as four.Denser networks than this one would undoubtedly have a greater proportion of pairs with alternative routes.
, where X p time,w is total trip time, X p access,w is the access time (walking and waiting), X p trans,w is the number of transfers along the route, D p rr,w is a binary variable that takes the value 1 if route p is reasonable for travelling in w (based onDial [23]) and takes the value 0 if not, D p old,w is a binary variable that takes the value 1 if the route is less than 10 years old (a proxy of how well known the route is) and takes the value 0 if it is older.The number of transfers on a given route alternative is considered as an indicator of the disutility of transferring.Because the Metro uses a flat fare system, the fare is the same regardless of the route and can therefore be omitted from the cost function.
The η pq w term, the spatial correlation factor due to route overlap between routes p and q joining pair w, is defined as suggested by Yai et al. [41]: where D pq w is the length of overlap between routes p and q and D p w and D q w are the respective lengths of routes p and q.This expression stems from the traditional definition of the similarity factor in the C-logit model.As with the parameter β in that model (see Equation (5) above), the parameter -ρ* must be negative, ensuring that ρ* is positive.
The proposed model was compared to a conventional MNL model that included only network service level variables.That is, ρ* was set to 0. The comparison was performed using statistical tests.Thus, the six route choice models estimated for the Santiago Metro were the following:  2. The proposed FPM converged in only four iterations at the 0.01% tolerance level.As can be seen, all explanatory variables had the correct sign and were statistically significant.In the C-logic, PCL, CNL and FPL models, the correlation parameter was significant; in the PSL model, the statistical significance was lower.The models with better goodness-of-fit (log-likelihood value) were the CNL and our proposed FPL model, and both were very similar.In addition to the log-likelihood function, the following indicators were used to compare the MNL and fixed-point models based on their goodness of fit [50,51]: i. percent of correct predictions (PCP); ii.residual sum of squares (RSS): 2 ( ) where Y i is 1 if an alternative i is chosen and 0 otherwise and P i is the probability predicted by the model of choosing alternative I; and iii. weighted residual sum of squares (WRSS): The results of the three indicators for the estimated models are given in Table 3.The PCP indicator is practically the same for all models; therefore, it does not allow any analysis regarding forecasting capability.In regard to RSS and WRSS, once again the best models are the CNL and FPL models.

Extension of Proposed Fixed-Point Model to Traffic Assignment with Route Correlation
The proposed fixed-point model can be extended to the traffic assignment problem in any congested transportation networks.The multi-objective optimization problem would then take the following form: Objective F 1 is the classic Beckmann transformation of the traffic assignment problem [2].Objectives F 2 and F 3 were described above in Section 3.1 as information criteria.Combining F 2 and F 3 with flow conservation constraints gives the Fisk stochastic assignment model [52].
A substitute optimization problem for Equation ( 14) that defines a new stochastic traffic assignment model with route correlation is the following:

Z c x dx h h h t h t s t h T w h f a
The first-order conditions for Equation (40)

Conclusions
A new transportation network route choice model was developed that explicitly incorporates the phenomenon of correlations between routes.The model is applicable to public transportation networks and extensible to the traffic assignment problem.The presence of correlations between route alternatives was contrasted empirically by means of classical econometric techniques.
A multi-objective problem was stated and a substitute problem then formulated whose optimality conditions yielded a logit specification with an endogenous variable constituting a fixed-point model that is estimated and solved iteratively.The estimation was performed by the maximum likelihood and by the use of instrumental variables due to the presence of endogeneity in the model's explanatory variables.The functional form of the proposed fixed-point model combined with the utilization of instrumental variables guaranteed both the existence and uniqueness of the solution.
The proposed model was compared with other route choice models reported in the literature in a case study of the Santiago Metro.The results obtained were both satisfactory and superior to many other existing formulations, and statistically equivalent to CNL model.Unlike the latter, the proposed fixed-point model was able to capture the correlation between routes and provided a better goodness of fit.A future research should be related to compare FPM with CNL in large networks.However, FPM model is able to include congestion in traffic equilibrium conditions; this is an advantage of our new model.
Although the proposed formulation could be more complex to estimate due to the iterative process that must be used to obtain the parameters, in practice the iterations converge rapidly.Furthermore, the additional estimation complexity does not complicate the derivation of project evaluation indicators such as the marginal rates of substitution, which retain the simple functional form of the MNL models.
Lastly, future research should address the apparent advantages of the proposed model on larger networks and traffic assignment problems.
the probability of choosing route p when traveling between O-D pair w is Equation (9): which is exactly the point of equilibrium that we want:In formal terms:
(a) Multinomial logit (MNL).This is the base model because it does not account for correlations and is also used to construct the route choice proxy variables.(b) C-logit.(c) Path-size logit (PSL).(d) PCL.(e) CNL.(f) Fixed-point model (FPM) with spatial correlations, which is our proposed model.The estimation results are set out in Table

2
in which users divide up among the alternative routes based on a Logit model.

Table 2 .
Route Choice Model Estimation Results.

Table 3 .
Goodness-of-fit indicators for route choice models. are: