A Microeconomic Interpretation of the Maximum Entropy Estimator of Multinomial Logit Models and Its Equivalence to the Maximum Likelihood Estimator

Maximum entropy models are often used to describe supply and demand behavior in urban transportation and land use systems. However, they have been criticized for not representing behavioral rules of system agents and because their parameters seems to adjust only to modeler-imposed constraints. In response, it is demonstrated that the solution to the entropy maximization problem with linear constraints is a multinomial logit model whose parameters solve the likelihood maximization problem of this probabilistic model. But this result neither provides a microeconomic interpretation of the entropy maximization problem nor explains the equivalence of these two optimization problems. This work demonstrates that an analysis of the dual of the entropy maximization problem yields two useful alternative explanations of its solution. The first shows that the maximum entropy estimators of the multinomial logit model parameters reproduce rational user behavior, while the second shows that the likelihood maximization problem for multinomial logit models is the dual of the entropy maximization problem.


Introduction
The maximum entropy approach is widely used for formulating demand models, primarily aggregate, for urban transportation and land use systems.The first application to transportation demand was Wilson's aggregate doubly constrained gravity model of spatial trip distribution [1].Since then, various writers have published similar versions with various improvements (see [2][3][4]).
A relevant class of travel demand models are the combined maximum entropy ones.These are aggregate designs that integrate multiple transportation user decisions such as trip generation, destination choice, and mode and/or route choice.Examples may be found in [5][6][7][8][9][10][11][12][13][14][15][16][17].In every case the modeling method involves formulating an optimization problem with entropy components whose optimality conditions define the required demand models.
One of the main reasons for employing maximum entropy models is that their probabilistic formulation exhibits a multinomial logit structure which has an obvious microeconomic interpretation.The discrete choice multinomial logit model has been widely utilized for modeling urban transportation and land use systems due to its ability to represent, by means of a closed formula, the paradigm of the rational consumer (or producer) who maximizes utility (or benefit) within his or her possibility space.Both the model's simplicity and its limitations stem from the fact that the utility functions' stochastic errors are Gumbel-distributed i.i.d.random variables (see [18]).The most commonly used method for estimating the parameters of the model is maximum likelihood due to its good asymptotic statistical properties.
It was demonstrated in [5] that for the multinomial logit model, the entropy maximization problem with linear constraints has the same solution as the likelihood maximization problem because the Kuhn-Tucker conditions of the two are identical.As a consequence of this result, the Lagrange multipliers of the entropy maximization problem are the maximum likelihood estimators of the model.It did not, however, provide either an interpretation of entropy maximization in a microeconomic modeling context or any explanation of the origin of the above-mentioned equivalence.The intended contribution of the present paper is to fill these two gaps.
In what follows, Part 2 formulates the entropy maximization problem of a general discrete choice probability distribution with linear constraints, identifies its solution and specifies the dual problem.Part 3 gives a microeconomic interpretation of the dual problem based on the rational consumer behavior paradigm, while Part 4 provides a statistical interpretation of the dual in terms of the estimation criteria of the multinomial logit model parameters.Finally, Part 5 sets out our conclusions.

Formulation of Entropy Maximization Problem and its Dual
Consider the following entropy maximization problem (ME) of a discrete choice probability distribution with linear constraints: for all i (4) Now let   k  be the Lagrange multipliers of constraint (2).Applying the Kuhn-Tucker conditions to ME we obtain as the solution to the problem the following multinomial logit model: This same model is obtained if we apply the random utility theory approach (see [18,19]), specifying that: where ai V is the deterministic component of the conditional indirect utility perceived by individual i from alternative a and  is the scale factor for a Gumbel probability distribution of the random utility error (.As demonstrated by [5], the Lagrangian multipliers   k  are the maximum likelihood estimators of the model (5).The Lagrangian function of the ME problem is: where i  is the Lagrange multiplier of constraint (3).The equivalent problem of ME is therefore: If the value of a / i ln p obtained from ( 5) is substituted into the first summation term of the Lagrangian (7) and the value of a / i p is substituted into the third summation term, the resulting expression reduces to: From this expression we can formulate the known dual of the ME problem (see [20]) as: In the following two sections we present two interpretations of the DME problem, one microeconomic and the other statistical.By extension, they are applied to the ME problem.

Microeconomic Interpretation of the Entropy Maximization Dual Problem
If ( 6) is substituted into (10) then, given (4), the dual problem can be written as: where: is the maximum expected utility of individual i [21].
Given that for all a,i, the DME problem consists in finding the parameters   k  such that for each individual the expected utility for the chosen alternative approaches as closely as possible to the maximum expected utility of the various available alternatives.This allows us to interpret the Lagrange multipliers of restrictions (2) micro economically in that they adjust to the fullest extent possible so that the model reflects rational behavior by the individuals.
An interesting variant of the ME problem occurs when the parameters   k  are known.If this condition is applied to (8) then, given (6), this variant is: or equivalently: 1 The entropy term of this problem can be interpreted simply as the penalty imposed on the deterministic problem for finding a discrete choice probability law that maximizes the sum, over all individuals, of the expected value of the deterministic utilities of the chosen alternatives.
There is, however, another interesting microeconomic interpretation.Since the solution of the VME problem is (5), we deduce that the maximum expected utility of individual i, defined by (12), can be written as: In ( 16) the function is linear in the logarithm of the probability.It can be proved that the functions that are linear the logarithm are the only ones that gives what is called proper local score functions.
If we multiply ( 16) by a / i p and sum over a, then, given (15), we have: Thus, the value of the objective function of the VME problem at the optimum is the sum, over all individuals, of their maximum expected utilities among the available alternatives.Observe also that for all p.The discrete choice probability law that solves the VME problem therefore conforms with the rational consumer paradigm for each individual as regards expected value.

Statistical Interpretation of the Entropy Maximization Dual Problem
The statistical interpretation of the DME problem ( 10) is obtained from the following reformulation of its objective function: Thus, the negative of the DME problem objective function is just the log likelihood function of the multinomial logit model.It follows, then, that the log likelihood maximization problem is equivalent to the entropy maximization problem because its dual is just the DME problem.
Then, the DME problem can therefore be reformulated as: Note also that this equivalence is independent of the number of individuals or alternatives or whether the probabilistic model is aggregate in individuals or not.
The variance-covariance matrix of the parameters   k  that solve the ME problem is derived as the inverse of the expected value of the matrix of second derivatives of the dual function D of the DME problem, known as the information matrix (see [22][23][24][25][26][27][28]).
The second derivatives of the dual problem measure the variations in the optimum of the ME or DME objective function caused by changes in the parameters of the resulting model.Thus, if a parameter undergoes a large change but the dual problem optimum varies very little (that is, the second derivative of the dual problem is small), that parameter provides little information and intuitively its variance should be very high.If, on the other hand, the second derivative of the dual with respect to a given parameter is high, the parameter is significant and its variance should be small.Notice that this point is closely related to the fact that the Fisher information matrix in an exponential family is inversely proportional to the variance-covariance matrix.
By a similar process we can estimate the variance-covariance matrix of the parameters of the multinomial logit model using the log likelihood function.This implies that by demonstrating that this function is the negative of the dual function we will have reconciled the two approaches to estimating the variance-covariance matrix of the model parameters.
It also follows from the equivalence of the DME problem (10) and problem (19) that the latter is equivalent to problem (11).From this we deduce that the maximum likelihood estimators of the multinomial logit model represent rational behavior by the users in terms of expected value.This method of estimating the multinomial logit model parameters is therefore congruent with the rational consumer paradigm defined by the model itself.

Conclusions
Maximum entropy models have been widely applied to various economic systems, and are particularly common in representations of supply and demand behavior in urban transportation and land use modeling.A major criticism of these models, however, is that they do not embody actual behavioral rules of the agents in the system and their parameters adjust only to the constraints imposed by the modeler.This paper demonstrated that two useful alternative interpretations of the models' optimality conditions can be derived from an analysis of the dual of the classical entropy maximization problem.
The first of these interpretations was derived from the fact that the dual problem consists in finding parameters of the utility (benefit) functions of users (suppliers) such that each user behaves rationally, that is, maximizes utility (benefit) in terms of the expected value of the consumption (production) alternatives available.The maximum entropy model parameters thus take on a clear microeconomic interpretation.
The second interpretation is that from a statistical standpoint: the dual maximum entropy problem is equivalent to the likelihood maximization problem for multinomial logit models.This in turn explains the equivalence between the problems of likelihood maximization and entropy maximization.Furthermore, with this result the methods for estimating the variance-covariance matrices of maximum entropy and maximum likelihood estimators are completely reconciled.Finally, we note that both interpretations are valid for aggregate or disaggregate models independently of the number of individuals or alternatives considered.
A possible extension of this study would be to analyze logit models with non-linear constraints and hierarchical structures.Given the characteristics that distinguish them from multinomial logit models, the results may be different from those presented here.