Biases in the Maximum Simulated Likelihood Estimation of the Mixed Logit Model

: In a recent study, it was demonstrated that the maximum simulated likelihood (MSL) estimator produces significant biases when applied to the bivariate normal and bivariate Poisson-lognormal models. The study’s conclusion suggests that similar biases could be present in other models generated by correlated bivariate normal structures, which include several commonly used specifications of the mixed logit (MIXL) models. This paper conducts a simulation study analyzing the MSL estimation of the error components (EC) MIXL. We find that the MSL estimator produces significant biases in the estimated parameters. The problem becomes worse when the true value of the variance parameter is small and the correlation parameter is large in magnitude. In some cases, the biases in the estimated marginal effects are as large as 12% of the true values. These biases are largely invariant to increases in the number of Halton draws.


Introduction
This paper examines the maximum simulated likelihood (MSL) estimator of the error components (EC) mixed logit (MIXL) model.The MIXL has been preferred by applied economists due to its flexible latent structure, which allows for various specifications of behavioral patterns.Since the model does not have a closed form, its estimation relies on simulation-based methods, specifically the MSL estimator, which has been the dominant estimation strategy for more than 20 years.However, Jumamyradov and Munkin (2021) showed that the MSL estimator produces significant biases when applied to the bivariate normal and bivariate Poisson-lognormal models.Their conclusion is that similar biases could be present in other models generated by correlated bivariate normal structures, which include the most commonly used specifications of the MIXL models.Therefore, further analysis of the MSL estimator in the context of the MIXL model is necessary.
The multinomial logit (MNL) model was introduced by McFadden (1974).It has a closed-form solution due to two convenient, however, restrictive, assumptions.First, the MNL model assumes that the error terms are independently and identically distributed (i.i.d.) as a type 1 extreme value (EV1) across the individuals and alternatives.As a result, the MNL model suffers from the independence from irrelevant alternatives (IIA) property (Debreu 1960), which in the literature has been illustrated by the "red-bus, bluebus" example (Quandt 1970).Second, the MNL model does not allow for the unobserved variation in individual tastes (i.e., taste heterogeneity) in the population, meaning that the coefficients, associated with alternative-specific variables and observable alternative attributes that vary among individuals, are fixed.Although the MNL model has become the "workhorse" in discrete choice analysis (Hensher and Greene 2003), its inconsistencies with realistic behavioral patterns have led researchers to look for more flexible alternative models.The MIXL model was derived by relaxing these restrictive assumptions (see McFadden 2001).
The first contribution to the development of the MIXL model came with relaxing the assumption of homogeneous parameters.Specifically, Boyd and Mellman (1980) and Cardell and Dunbar (1980) analyzed market demand for automobiles by allowing the consumer taste coefficients associated with the attributes of the alternatives to vary among individuals, in the form of random variables representing random taste heterogeneity (i.e., taste patterns).This specification of the MIXL model is also known as random coefficients, with application examples including Revelt and Train (1998) and Bhat (2000).Revelt and Train (1998) analyzed households' choices of efficiency levels for refrigerators based on rebates and loans using the panel data MIXL model.Bhat (2000) studied urban work travel mode choices by incorporating observed and unobserved individual characteristics into the panel data MIXL model.Similar to random coefficients, alternative-specific constants (ASCs) may not be homogeneous within a sample, potentially leading to substitution patterns (e.g., red bus-blue bus Quandt 1970).The challenges of accommodating such heterogeneity are well-known in choice modeling (see Hensher et al. 2005).
Next, the i.i.d.assumption of the MNL model was relaxed, allowing for non-independent and non-identical errors, leading to the EC MIXL and generalized mixed logit (GMIXL).The EC specification of the MIXL model assumes that the stochastic portion of the utility consists of two parts, the i.i.d.errors with an EV1 distribution and additional components varying among the alternatives and individuals.This specification induces various correlation structures (i.e., taste and substitution patterns) as well as heteroskedasticity through the nests or cross nests created among the alternatives as a result of shared error components.Brownstone and Train (1998) used this approach to forecast new product penetration rates by allowing for flexible substitution patterns among the alternative sources of fuel for vehicles.
Recent studies related to discrete choice modeling have recognized the necessity of the heterogeneity of the scale parameter (see Louviere et al. 1999Louviere et al. , 2002Louviere et al. , 2008)), which led to another specification of the MIXL model that relaxed the i.i.d.assumption.The scale parameter is directly related to the variance of the EV1 error terms, and is usually restricted to one because it cannot be identified separately from the slope coefficients.However, Fiebig et al. (2010) as well as Greene and Hensher (2010) proposed the generalized mixed logit (GMIXL) model that allows for individual variation in the variance of the EV1 error terms (i.e., scale heterogeneity) along with the unobserved individual heterogeneity of the slope coefficients.Although it has been shown that the GMIXL model performed better than the standard MIXL model (Fiebig et al. 2010;Keane and Wasi 2013), Hess and Train (2017), as well as Hess and Rose (2012), raised concerns about the identifiability of the GMIXL model.It is an open research question as to what additional assumptions need to be imposed to make the GMIXL model estimable.
The flexibility of the MIXL model is achieved by introducing latent variables into the model.However, this leads to the intractability of the choice probabilities, which cannot be evaluated analytically since they do not have a closed form.Therefore, the estimation of the MIXL model relies on a numerical approximation of the choice probabilities through simulation.The MSL estimator was introduced by Lerman and Manski (1980) to replace the intractable choice probabilities of the multinomial probit (MNP) model with simulated probabilities.
A well-known limitation of the MSL estimator is that it is biased when the number of simulations is limited, as is always the case in applications (see Gourieroux and Monfort 1996;Lee 1995;Hajivassiliou et al. 1996;Train 2009).Nevertheless, the estimation of the MIXL model in the literature is based on the MSL estimator, including in studies by Ben-Akiva et al. (1993), Revelt and Train (1998), Bhat (1998), Brownstone and Train (1998), McFadden and Train (2000), and Hess et al. (2005).The usual practice is to use the MSL in combination with Halton draws to reduce the simulation bias.Bhat (2001) showed that 100 Halton draws provide better approximation results than 1000 pseudo-random draws for the mixed logit model.According to Palma et al. (2020), around 93% of over 150 papers indexed in the Research Papers in Economics (RePEc) produced during 2008-2018 used less than 1000 Halton draws in their estimations of the mixed logit model.Furthermore, 72% and 40% of these papers used less than 500 and 250 Halton draws, respectively.Czajkowski and Budzi ński (2019) found that more than 3000 Halton draws are necessary to achieve a Minimum Tolerance Level of 5%.However, in the RePEc database, only 5.6% of papers used more than 2000 Halton draws (Palma et al. 2020).Jumamyradov and Munkin (2021) primarily focused their analysis on the estimation of the correlation parameter in the bivariate normal and bivariate Poisson-lognormal models.In this paper, we closely follow their strategy and allow for correlation across the utilities of different alternatives.We also utilize Halton draws and analyze two error components specifications of the MIXL model.The first specification is the MIXL model with correlated slope coefficients and fixed alternative-specific coefficients (ASCs).The second example is the MIXL model with correlated ASCs and fixed slope coefficients.Moreover, for simplicity, we assume that there is only one attribute that varies among the alternatives and individuals.It should be noted that in most specifications of the MIXL model used by practitioners, the correlation parameter is assumed to be zero for simplicity, compromising robustness to the IIA property.However, practitioners are mostly interested in the estimated mean and variance of the random parameters.In this paper, we simulate the data according to the MIXL model and assess the MSL performance based on the difference between the true and estimated parameters.Our findings confirm simulation biases even in cases with zero correlation.
There have been several studies that have compared the MIXL results produced by estimators and software packages.Huber andTrain (2001), Regier et al. (2009), Haan et al. (2015), Bastin and Cirillo (2010), and Elshiewy et al. (2017) compared the MSL and Bayesian estimation of the MIXL model.The first three of these studies were based on a single panel dataset.Bastin and Cirillo (2010) estimated the simulation biases in the MSL with respect to the number of draws and sample sizes, without comparing the MSL with the Bayesian estimation.Elshiewy et al. (2017) used cross-sectional and panel data with three empirical and four simulated datasets.Although Elshiewy et al. (2017) found MSL biases in the correlation parameter of the cross-sectional MIXL model, they only tested two values (0.75 and 0.25).We analyze the MSL estimator with respect to an extensive range of values of the correlation parameter and standard deviation, as well as different numbers of Halton draws.To the best of our knowledge, an extensive Monte Carlo simulation study like this has not been conducted before.
The rest of the paper is organized as follows.Section 2 introduces the MSL estimator.Section 3 presents different logit model specifications.Section 4 presents numerical examples using MIXL data simulation and produces MSL estimation results.Section 5 discusses the results.

Maximum Simulated Likelihood Estimator
The maximum likelihood (ML) estimator of parameter vector θ can be utilized when f (y i |x i , θ), where the density of dependent variable y i conditional on the vector of independent variables x i , has a closed form, such that where (y i .xi ) is a set of independent observations for i = 1, . . ., N. However, the ML is not feasible when f (y i |x i , θ) does not have a tractable closed form.This could be because the density is specified only conditionally on latent variables, which cannot be integrated out.Then, the MSL estimator is a possible alternative, which we define following Gourieroux andMonfort (1990, 1996).Suppose ∼ f (y i , x i , u, θ) is an unbiased simulator of the conditional density f (y i |x i , θ), such that where the distribution of u is known and independent of y i and x i .Then, the MSL estimator of θ is defined as where u s i (s = 1, . . ., S) are drawn independently for each individual i from the distribution u i .The MSL estimator is obtained by replacing the intractable conditional p.d.f.f (y i |x i , θ) with its unbiased approximation based on the simulator is not an unbiased simulator of log f (y i |x i , θ), which results in simulation biases in the MSL estimator.
The asymptotic properties of the MSL estimator are determined by the relationship between S and N.For instance, the MSL estimator is biased when S is fixed and N tends to infinity (Property 1 in Gourieroux and Monfort 1990).If S increases with N, then the MSL estimator is consistent (Property 2 in Gourieroux and Monfort 1990).If S increases faster than √ N( √ N/S → 0) , then the MSL estimator is also efficient and, therefore, asymptotically equivalent to the ML estimator (Property 7 in Gourieroux and Monfort 1990).In practice, neither N or S might be close enough to infinity.However, the expectation is that there are achievable levels large enough for the biases to become acceptably small.

Materials and Methods
In this section, we define the MNL model and two specifications of the EC MIXL model.We also provide detailed information on how to simulate the corresponding likelihood functions.

Random Utility Maximization
Discrete choice models are usually introduced based on the random utility maximization (RUM) theory (see McFadden 1974), which states that the utility of individual i = 1, . . ., N from the chosen alternative j = 1, . . ., J can be presented as where V ij is the observed part of the utility and ε ij is the stochastic portion, unobserved by the researcher.Individual i will choose alternative j if and only if the level of utility associated with alternative j is higher than the levels associated with the other alternatives: Since the utilities are latent, the choice probabilities are evaluated at relative measures, where the utility of one of the alternatives is taken as the reference.In order to calculate the choice probabilities, the distributional assumptions of the stochastic utility must be made.In the logit family of models, ε ij is assumed to be independently and identically distributed (i.i.d.) across individuals and alternatives with an extreme value type 1 (EV1) distribution.As a result, the difference between two i.i.d.EV1 error terms (ε ik − ε ij ) has a logistic distribution with the cumulative distribution function The observed utility V ij is a function of individual characteristics and alternative attributes, and usually assumed to be linear for the parameters.

Multinomial Logit (MNL) Model
The MNL model is derived under the assumption that all the coefficients are fixed, implying that all the individuals in the population have homogeneous tastes.In this paper, we consider the case of three alternatives, in which the third alternative is restricted as the referent category.Therefore, we work with two utility differences, where ε i1 ∼ Logistic(0, 1) and ε i2 ∼ Logistic(0, 1) are i.i.d.logistically distributed, x i1 and x i2 are alternative attributes, α 1 and α 2 are alternative-specific coefficients (ASC), and β 1 and β 2 are coefficients of the alternative attributes.In some specifications, these coefficients are restricted to be equal, β 1 = β 2 = β.In the numerical examples, we choose the distribution of the covariates to be standard normal, such that x i1 ∼ N(0, 1) and x i2 ∼ N(0, 1).The observability conditions for the outcome variables y i1 , y i2 and y i3 are defined as In other words, individual i chooses the alternative with the highest utility.

Mixed Logit (MIXL) Model
The assumption of homogeneous preferences leads to computationally convenient functional forms for the choice probabilities.However, preference homogeneity is not consistent with realistic behavioral patterns.Next, we present two specifications of the EC MIXL model that allow for various taste and substitution patterns through a correlation among the utilities of the different alternatives.The first specification is the MIXL model, with correlated slope coefficients and fixed ASCs.The second example is the MIXL model, with correlated ASCs and fixed slope coefficients.We refer to these two examples as EC1 and EC2, respectively.Under the EC1 specification taste patterns, we assume that where u i1 and u i2 are jointly normally distributed (u i1 , u i2 ) ∼ N((0, 0), Σ) with covariance matrix Σ.Similarly, under the EC2 specification substitution patterns, we assume where, once again, (u i1 , u i2 ) ∼ N((0, 0), Σ).The covariance matrix in both cases is parametrized as where restriction σ 1 = 1 is imposed for identification, such that We define the lower triangular matrix to be the Choleski decomposition of the covariance matrix, such that Σ = AA ′ .Then, the bivariate normal u i1 and u i2 can be written as where v i1 ∼ N(0, 1) and v i2 ∼ N(0, 1), which helps us to approximate the simulated likelihood function drawn from the known density.
Both the EC1 and EC2 specifications induce correlation in the utilities of the different alternatives.The EC1 specification allows for correlation through the coefficients associated with alternative attributes xi1 and xi2.This correlation is known as taste patterns, because the weights for an attribute are associated with the weights of another attribute.The EC2 specification allows for correlations through the ASCs, similar to the classic red bus-blue bus example.This is also known as the substitution patterns, because the weights of an alternative are associated with those of another (e.g., red and blue bus).Each MIXL specification relaxes the preference homogeneity assumption in a slightly different way, and may be warranted depending on the decision context.

Simulated Likelihood Function of MIXL
The MIXL choice probabilities, unconditional of the unobserved latent variables v i1 and v i2 , can be written as integrals over the density f (v i1 , v i2 ), such that where the form of V i1 and V i2 depends on the EC model.In the EC1 specification, 11) and in the EC2 specification, The log-likelihood function to be maximized can be written as However, the choice probabilities in Equation ( 10) do not have a closed form, and the log-likelihood function cannot be calculated analytically.Therefore, we approximate the choice probabilities through simulation, and maximize the simulated log-likelihood function where the simulated choice probabilities are and s = 1, . . ., S represents the draw for v s i1 and v s i2 used to evaluate V s i1 and V s i2 .
To examine the performance of the MSL estimator, we estimate the MIXL model under three different sets of restrictions imposed on the covariance matrix (M0, M1, and M2).Under M0, we do not impose any restrictions on the covariance matrix and estimate all the parameters.Under M1, we restrict the correlation parameter to zero, ρ = 0, and estimate the remaining parameters.Finally, under M2, we restrict the correlation parameter to its true value (ρ = TV) and estimate the remaining parameters.In each example, the number of Halton draws are chosen to be H = (250, 500, 1000), which is consistent with the levels used in leading MSL applications.
In summary, new MIXL data sets are generated for all 117 values of the covariance matrices for both the EC1 and EC2 specifications.For each data set, three specifications (M0, M1, and M2) are estimated, each with three different numbers of Halton draws (250, 500, and 1000).We repeat each simulation 100 times, R = 100, generating a new data set and collecting the MSL estimates.The reported results are based on the means and standard errors calculated for these 100 simulations (i.e., Wald test).
Second, the MSL produces biased results for σ 2 with small true values, regardless of the chosen number of Halton draws.For example, when σ 2 = 0.25 and H = 250, the estimated value of σ 2 is 0.413 (0.026), which is separated from its true value by 6.27 standard errors; therefore, the null hypothesis H o : σ 2 = 0.25 is rejected.However, the estimated σ 2 comes closer to its true value when we increase the variance.For example, when σ 2 = 0.5, the estimated σ 2 is 0.569 (0.023).The case when σ 2 = 1 produces a similar result.However, the biased results for small variances do not change with the number of Halton draws.For instance, when H = 500 and H = 1000, the estimated values for the true σ 2 = 0.25 are 0.439 (0.030) and 0.405 (0.025), respectively.In both of these cases, the null hypothesis H o : σ 2 = 0.25 is rejected.It is also interesting to notice that when the true standard deviation is σ 2 = 1, the estimated σ 2 is much smaller for M0 than for M1 and M2.Third, for almost all the parameter sets presented in Table 1, the estimated ρ is within three standard errors from its true value.The only case where the null hypothesis, that H o : ρ = −0.95,may be rejected is when H = 250 and σ 2 = 1.The EC1 simulation results for all the other 38 correlation values are provided in the Supplementary Materials.
Figure 1 plots the estimated ρ against its true values, which ranges from −0.95 to 0.95 with increments of 0.05, where ρ is calculated as the averages of ρ MSL estimates under M0 specification, obtained based on 100 samples (R = 100) generated for the same set of true values and estimated with 1000 Halton draws (H = 1000).The diagonal black line represents the true value of ρ.The blue, red and green lines correspond to σ 2 = 0.25, σ 2 = 0.5 and σ 2 = 1, respectively.Figure 1 shows that ρ is mostly biased downward for H = 1000.
Finally, when the researcher erroneously assumes that the true correlation is zero (M1), there is no substantial worsening in the performance of the MSL estimates.Similarly, when ρ is restricted at the true values (M2), there is no substantial improvement in the estimation of the parameters.A potential explanation for this is that the biases in the MSL estimation of taste patterns are mostly caused by difficulties in estimating the correlation parameter, with the efficiency of the ρ estimates declining for smaller values of σ 2 .
results for all the other 38 correlation values are provided in the Supplementary Materials.
Figure 1 plots the estimated  against its true values, which ranges from −0.95 to 0.95 with increments of 0.05, where  is calculated as the averages of  MSL estimates under M0 specification, obtained based on 100 samples ( = 100) generated for the same set of true values and estimated with 1000 Halton draws ( = 1000).The diagonal black line represents the true value of .The blue, red and green lines correspond to  = 0.25,  = 0.5 and  = 1, respectively.Figure 1 shows that  is mostly biased downward for  = 1000.Finally, when the researcher erroneously assumes that the true correlation is zero (M1), there is no substantial worsening in the performance of the MSL estimates.Similarly, when  is restricted at the true values (M2), there is no substantial improvement in the estimation of the parameters.A potential explanation for this is that the biases in the MSL estimation of taste patterns are mostly caused by difficulties in estimating the correlation parameter, with the efficiency of the  estimates declining for smaller values of  .

Substitution Patterns: EC2 Simulation Evidence
Table 2 presents the M0, M1, and M2 results for the EC2 simulations when the true correlation is  = −0.95.First, notice that increasing the true value of variance  in M0 reduces the bias in  ,  , and  .For example, given the estimated  ,  , and  as
Second, the standard errors for the estimated σ 2 are substantially larger than those for α 1 , α 2 , and β.As a result, the estimated σ 2 is within 3 standard errors of its true value across almost all the parameter sets presented in Table 2. Therefore, the null hypothesis that σ 2 is equal to the true value cannot be rejected for almost all the cases.The only exception is the case when σ 2 = 1 and H = 1000, and σ2 is 0.853 (0.043) and it is separated from the true value by 3.4 standard errors.The standard errors decrease slightly when the correlation is restricted in the specifications M1 and M2.
Third, the correlation parameter ρ is estimated with substantial biases in all the M0 specifications.The estimated ρ is separated from the true value by 4 (H = 1000, σ 2 = 1) to 6 standard errors (H = 500, σ 2 = 0.25), and the null hypothesis H o : ρ = −0.95 is rejected in all the cases.The EC2 results for the other 38 correlation values are provided in the Supplementary Materials.
Figure 2 plots the estimated ρ against the true values, once again ranging from −0.95 to 0.95 with increments of 0.05.Although ρ is close to ρ for some values, the estimated correlation parameter mostly displays biases.The biases are smaller for σ 2 = 1 relative to when σ 2 = 0.25 or σ 2 = 0.5.This finding is consistent with that of Jumamyradov and Munkin (2021) for the bivariate normal and bivariate Poisson-lognormal models.They report larger biases for smaller standard deviations.Overall, the M0 results show biases for all five parameters α 1 , α 2 , β, σ 2 , and ρ.
It is also interesting to notice that the MSL estimates of α 1 and α 2 in M1 have larger biases than in M0 for larger variances, and this is regardless of the number of Halton draws.For example, when H = 250 and σ 2 = 1, the estimated α 1 and α 2 are −0.095(0.009) and −0.161 (0.013), and are separated from their true value α 1 = α 2 = −0.25 by 17 and 7 standard errors, respectively.This does not change much for larger numbers of Halton draws.Thus, misspecifying the model setting correlation to ρ = 0 results in very large biases in α 1 and α 2 .Moreover, M1 produces larger positive biases for σ 2 compared to M0.For example, when H = 1000, the estimated σ 2 are 0.365 (0.041), 0.558 (0.050), and 1.211 (0.055), while the true σ 2 = 0.25, σ 2 = 0.5 and σ 2 = 1, respectively.Moreover, the estimates of α 1 , α 2 , and β improve with larger variances in M0; however, we do not observe similar patterns in the M2 estimation, although there is an improvement in the estimation of σ 2 .

Choice Probabilities and Marginal Effects
Next, we examine how these reported biases affect the estimated choice probabilities and marginal effects (see Appendix A for the formulas of the marginal effects).Figure 3 plots the true and estimated P(y = 1) calculated based on the M0 estimates of the EC1 specification with 500 Halton draws.The true probability means are calculated for the true values of all the parameters.The straight lines represent the true choice probabilities and the dashed lines represent the estimated choice probabilities.Figure 4 plots the true and estimated P(y = 1) based on the M0 estimates of the EC2 specification with 500 Halton draws.Even though there are significant biases in the estimated parameters, as expected, the choice probabilities are close to their true values for both the EC1 and EC2 specifications (i.e., taste and substitution patterns).However, when comparing the true and estimated marginal effects, the differences are considerable.Figure 5 plots the true and estimated ∂P(y = 1)/∂x 1 for EC1 (M0, 500 Halton draws).For example, when σ 2 = 1 and ρ = 0.95, the true ∂P(y = 1)/∂x 1 is 0.1509 and the estimated ∂P(y = 1)/∂x 1 is 0.1679.Thus, the marginal effect in this case is overestimated by 11%. Figure 6 plots the true and estimated ∂P(y = 1)/∂x 1 for EC2 (M0, 500 Halton draws).For example, when σ 2 = 1 and ρ = −0.95, the true ∂P(y = 1)/∂x 1 is 0.164 and the estimated ∂P(y = 1)/∂x 1 is 0.1839, which is overestimated by 12%.    ) for EC2 (substitution patterns) using M0 (H = 500).

Discussion
In this paper, we examine the properties of the MSL estimator in the context of two MIXL model specifications, EC1 and EC2 (i.e., taste and substitution patterns), where random parameters are generated by a correlated bivariate normal structure.We find that the MSL estimator produces significant biases in the estimated parameters.The problem becomes worse when the true value of the variance parameter is small and the correlation parameter is large in magnitude.Furthermore, we find that the biases in the estimated marginal effects can be as large as 12% of the true values.These biases are largely invariant to increases in the number of Halton draws.Since the existing literature has relied heavily on the MSL estimator in the analysis of the MIXL model, our findings should be an important additional warning to researchers about potential sizable biases in the results.
We also discover that the performance of the MSL depends on other factors, such as the model specification (i.e., EC1 or EC2), distributional assumptions, exogenous variation, as well as the true values of variance and correlation parameters.Therefore, we believe that biases in empirical applications (e.g., discrete choice experiments in health pref-

Discussion
In this paper, we examine the properties of the MSL estimator in the context of two MIXL model specifications, EC1 and EC2 (i.e., taste and substitution patterns), where random parameters are generated by a correlated bivariate normal structure.We find that the MSL estimator produces significant biases in the estimated parameters.The problem becomes worse when the true value of the variance parameter is small and the correlation parameter is large in magnitude.Furthermore, we find that the biases in the estimated marginal effects can be as large as 12% of the true values.These biases are largely invariant to increases in the number of Halton draws.Since the existing literature has relied heavily on the MSL estimator in the analysis of the MIXL model, our findings should be an important additional warning to researchers about potential sizable biases in the results.
We also discover that the performance of the MSL depends on other factors, such as the model specification (i.e., EC1 or EC2), distributional assumptions, exogenous variation, as well as the true values of variance and correlation parameters.Therefore, we believe that biases in empirical applications (e.g., discrete choice experiments in health preference research) are likely to be worse due to real-world complexity; however, more research is needed to address such questions.Future simulation studies may examine biases for EC2 (substitution patterns) using M0 (H = 500).

Discussion
In this paper, we examine the properties of the MSL estimator in the context of two MIXL model specifications, EC1 and EC2 (i.e., taste and substitution patterns), where random parameters are generated by a correlated bivariate normal structure.We find that the MSL estimator produces significant biases in the estimated parameters.The problem becomes worse when the true value of the variance parameter is small and the correlation parameter is large in magnitude.Furthermore, we find that the biases in the estimated marginal effects can be as large as 12% of the true values.These biases are largely invariant to increases in the number of Halton draws.Since the existing literature has relied heavily on the MSL estimator in the analysis of the MIXL model, our findings should be an important additional warning to researchers about potential sizable biases in the results.
We also discover that the performance of the MSL depends on other factors, such as the model specification (i.e., EC1 or EC2), distributional assumptions, exogenous variation, as well as the true values of variance and correlation parameters.Therefore, we believe that biases in empirical applications (e.g., discrete choice experiments in health preference research) are likely to be worse due to real-world complexity; however, more research is where the conditional mean βi1 is estimated by simulation methods where v q i1 and v q i2 are independent standard normal random variables for individual i = 1, . . ., N and random draws q = 1, . . ., Q.

Figure 2 .
Figure 2. Plots of  for EC2 (substitution patterns) using M0 (H = 1000).It is also interesting to notice that the MSL estimates of  and  in M1 have larger biases than in M0 for larger variances, and this is regardless of the number of Halton draws.For example, when  = 250 and  = 1 , the estimated  and  are −0.095(0.009) and −0.161 (0.013), and are separated from their true value  =  = −0.25 by 17

Figure 6 .
Figure 6.Plots of true and estimated
Munkin (2021)for the bivariate normal and bivariate Poisson-lognormal models.They report larger biases for smaller standard deviations.Overall, the M0 results show biases for all five parameters  ,  , ,  , and .