Next Article in Journal
Transient and Persistent Technical Efficiencies in Rice Farming: A Generalized True Random-Effects Model Approach
Previous Article in Journal
Instrumental Variable Method for Regularized Estimation in Generalized Linear Measurement Error Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches?

by
Péter Czine
1,
Péter Balogh
2,3,*,
Zsanett Blága
4,5,
Zoltán Szabó
6,
Réka Szekeres
5,
Stephane Hess
7 and
Béla Juhász
5
1
Coordination Center for Research in Social Sciences, Faculty of Economics and Business, University of Debrecen, 4032 Debrecen, Hungary
2
Institute of Methodology and Business Digitalization, Faculty of Economics and Business, University of Debrecen, 4032 Debrecen, Hungary
3
HUN-REN-DE High-Tech Technologies for Sustainable Management Research Group, University of Debrecen, Boszormenyi Street 138, 4032 Debrecen, Hungary
4
University Pharmacy, Clinical Centre, University of Debrecen, 4032 Debrecen, Hungary
5
Department of Pharmacology and Pharmacotherapy, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
6
Department of Emergency Medicine, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary
7
Institute for Transport Studies, University of Leeds, Leeds LS2 9JT, UK
*
Author to whom correspondence should be addressed.
Econometrics 2024, 12(3), 22; https://doi.org/10.3390/econometrics12030022
Submission received: 21 May 2024 / Revised: 30 June 2024 / Accepted: 30 July 2024 / Published: 8 August 2024

Abstract

:
Heterogeneity in preferences can be addressed through various discrete choice modeling approaches. The random-parameter latent class (RLC) approach offers a desirable alternative for analysts due to its advantageous properties of separating classes with different preferences and capturing the remaining heterogeneity within classes by including random parameters. For latent class specifications, however, more empirical evidence on the optimal number of classes to consider is needed in order to develop a more objective set of criteria. To investigate this question, we tested cases with different class numbers (for both fixed- and random-parameter latent class modeling) by analyzing data from a discrete choice experiment conducted in 2021 (examined preferences regarding COVID-19 vaccines). We compared models using commonly used indicators such as the Bayesian information criterion, and we took into account, among others, a seemingly simple but often overlooked indicator such as the ratio of significant parameter estimates. Based on our results, it is not sufficient to decide on the optimal number of classes in the latent class modeling based on only information criteria. We considered aspects such as the ratio of significant parameter estimates (it may be interesting to examine this both between and within specifications to find out which model type and class number has the most balanced ratio); the validity of the coefficients obtained (focusing on whether the conclusions are consistent with our theoretical model); whether including random parameters is justified (finding a balance between the complexity of the model and its information content, i.e., to examine when (and to what extent) the introduction of within-class heterogeneity is relevant); and the distributions of MRS calculations (since they often function as a direct measure of preferences, it is necessary to test how consistent the distributions of specifications with different class numbers are (if they are highly, i.e., relatively stable in explaining consumer preferences, it is probably worth putting more emphasis on the aspects mentioned above when choosing a model)). The results of this research raise further questions that should be addressed by further model testing in the future.

1. Introduction

The modeling of stated choice experiments is rarely limited to a simple multinomial logit (MNL) specification estimation. This is because it is necessary to address the restrictive assumptions of the model in order to draw truly accurate conclusions and develop welfare measures based on them. A vital limitation of the MNL is that it assumes homogeneous tastes (homogeneous preferences) for respondents in the sample for the product/service attributes under analysis. Extending the model with sociodemographic interactions helps address the problem, but in most cases, further analysis is needed, as a substantial part of the preference heterogeneity has yet to be explained (Hess 2014; Train 2009; Walker and Ben-Akiva 2002).
As grouped by Mariel et al., the so-called mixed logit (MXL) models offer an efficient and relatively simple option to capture preference heterogeneity (Mariel et al. 2021). These models assume that preference heterogeneity can be effectively addressed by introducing continuous or discrete distributions. The former group includes the random-parameter logit (RPL) specification, which allows the taste parameters to vary among respondents along a predefined distribution. Another widely used alternative is latent class (LC) modeling, which separates a discrete number of classes with different tastes and assumes that members’ preferences are already homogeneous within heterogeneous classes. Comparisons between the two types of models have been investigated by several authors, who all concluded that further comparisons between the models should be made in the future (Greene and Hensher 2003; Scarpa et al. 2005; Shen 2009).
A hybrid solution has been proposed by Bujosa et al. and Greene and Hensher because for latent classes separated by LC, a significant part of the preference heterogeneity remains unexplained within classes, i.e., the inclusion of random parameters may be justified (Bujosa et al. 2010; Greene and Hensher 2013). The former authors introduced the random-parameter latent class (RLC) approach by examining preferences for recreational trips to forest sites. At the same time, the latter used a stated choice data set on alternative freight distribution attribute packages. Both studies conclude that RLC outperforms the RPL and fixed-parameter LC specifications. However, the authors highlight the need for further comparisons.
Nevertheless, the authors point out that, as in the case of fixed-parameter LC, determining the optimal number of classes is a crucial issue in the specification. Bujosa et al., for example, chose to estimate and analyze the two-class specification because their models with more than two classes often did not converge (Bujosa et al. 2010). Greene and Hensher also estimated two-class versions and concluded that when two classes are defined, all parameter estimates are significant in at least one class, whereas in three- and four-class versions, there is already at least one variable that is not significant in either class (Greene and Hensher 2013).
So why is it critical to get the class number right? Too few classes can result in a high level of preference heterogeneity within classes, which increases the imprecision of the model. Too many classes, on the other hand, can lead to an overspecified model, which may, for example, result in less generalizability of the conclusions drawn from our model estimation. There may also be the problem that the class allocation is very unbalanced for certain class numbers, i.e., there may be too many/few respondents in certain class(es) or the proportion of significant parameter estimates in certain classes may be very low.
The aim of this research is to investigate which aspects should be considered in order to choose the optimal number of classes for LC modeling.
Its objectives are as follows:
(1) To investigate the consistency of the “best model” proposed by the different information criteria suggested in the literature with the specification showing the best significant parameter estimation ratio;
(2) To examine the similarities and differences between the results of the LC model specifications estimated with different class numbers;
(3) To examine the similarities and differences between the distributions of the MRS (marginal rate of substitution) calculations for the LC model specifications estimated with different class numbers.
To investigate these objectives, we will model data from a discrete choice experiment conducted in 2021 that examined preferences regarding COVID-19 vaccines. In the following sections of the paper, we describe the modeling approach to be tested (latent class modeling with fixed and random parameters), present the research details (a stated choice experiment in Debrecen with 1011 participants), and compare the results of latent class models with fixed and random parameters (focusing on the structure of the objectives: (1) to examine information criteria, (2) to compare model estimates, (3) to compare the distributions of the MRS calculations), emphasizing the choice of the optimal number of classes.

2. Materials and Methods

This chapter will describe the methodological and modeling approach and the details of our research.

2.1. Methodology

Stated preference (SP) type valuation methods include commonly used methods such as conjoint analysis (CA), best–worst scaling (BWS), and a discrete choice experiment (DCE). The difference between these methods mainly relates to the theoretical background and the complexity and form of the decision task (Louviere et al. 2010; Hensher et al. 2015). In the case of DCE, the former is provided by the random utility theory (RUT), which assumes the decision-maker utility maximization (in the context of the random utility maximization (RUM)) and decomposes the utility perceived by the decision-maker into a systematic and a random component according to Equation (1) (Ben-Akiva and Lerman 1985):
U n , i , t = V n , i , t + ε n , i , t ,
where U denotes the utility, V is the systematic part of the utility, ε the random part of the utility, n is the decision-maker, i is the alternative, and t is the decision situation.
A widely known and long-used specification of RUT-based discrete choice models is the conditional logit (CL) type associated with Daniel McFadden (McFadden 1974). The model has the advantage of being relatively easy to estimate, and its results are simple to interpret. However, one of its drawbacks is the assumption of homogeneous preferences, suggesting that respondents in the study sample have the same level of sensitivity to the observed attributes (e.g., they are equally cost-, risk-, and possibly brand-sensitive). The researchers can address this through the introduction of interactions as well as by estimating more complex models (Louviere et al. 2000).
The modeler can address heterogeneity in preferences through the use of both discrete and continuous distributions. The former approach is known as latent class (LC) modeling, while the latter is known as random-parameter logit (RPL) modeling. The LC specification aims to create a discrete number of classes with different preferences in which the members’ tastes are already homogeneous (Boxall and Adamowicz 2002). In contrast, in RPL, the taste parameters (β coefficients) are allowed to vary along a distribution predetermined by the researcher, and then specific distribution parameters are estimated (McFadden and Train 2000).
It is essential to mention that it is also possible to combine these two approaches to obtain the random-parameter latent class (RLC) specification proposed by Bujosa et al. and Greene and Hensher (Bujosa et al. 2010; Greene and Hensher 2013). The RLC is based on the assumption that even when classes with different preferences are identified (fixed-parameter LC modeling), there remains a significant degree of heterogeneity within classes, so introducing random parameters (similar to RPL) may be justified.
In summary, latent class modeling is basically composed of two main parts: (1) a class-specific choice model and (2) a class allocation model.
The former models the probabilities with which each alternative is chosen (with fixed or random parameters in the utility functions depending on whether we are using fixed-parameter or random-parameter latent class modeling), while the latter models the probability with which an individual belongs to each latent class.
In modeling our experiment, we defined our utility function for the fixed-parameter LC according to Equation (2).
U V a c c i n e   i = A S C V a c c i n e N o   c h o i c e + β U S A q O r i g i n U S A V a c c i n e   i + β E u r o p e a n   U n i o n q O r i g i n E u r o p e a n   U n i o n V a c c i n e   i + β H u n g a r y q O r i g i n H u n g a r y V a c c i n e   i + β R u s s i a q O r i g i n R u s s i a V a c c i n e   i + β 60 70 % q E f f i c i e n c y 60 70 % V a c c i n e   i + β 71 90 % q E f f i c i e n c y 71 90 % V a c c i n e   i + β A s   d e s c r i b e d   i n   t h e   p a c k a g e   l e a f l e t q S i d e   e f f e c t A s   d e s c r i b e d   i n   t h e   p a c k a g e   l e a f l e t V a c c i n e   i + β 6   m o n t h s q P r o t e c t i o n 6   m o n t h s V a c c i n e   i + β 12   m o n t h s q P r o t e c t i o n 12   m o n t h s V a c c i n e   i + ε i
where β q denotes the parameter vector estimated for the q-th class. The attributes under study are discussed in the following subsection. For the estimates, China (country of origin), more than 90% (level of efficiency), long term (side effect), and lifelong (ensuring protection) were the reference levels.
In the RLC estimations, the utility function parameters in Equation (2) were modified according to Equation (3).
U V a c c i n e   i = A S C V a c c i n e N o   c h o i c e + ( β U S A q + σ U S A n , q ) O r i g i n U S A V a c c i n e   i + ( β E u r o p e a n   U n i o n q + σ E u r o p e a n   U n i o n n , q ) O r i g i n E u r o p e a n   U n i o n V a c c i n e   i + ( β H u n g a r y q + σ H u n g a r y n , q ) O r i g i n H u n g a r y V a c c i n e   i + ( β R u s s i a q + σ R u s s i a n , q ) O r i g i n R u s s i a V a c c i n e   i + ( β 60 70 % q + σ 60 70 % n , q ) E f f i c i e n c y 60 70 % V a c c i n e   i + ( β 71 90 % q + σ 71 90 % n , q ) E f f i c i e n c y 71 90 % V a c c i n e   i + ( β A s   d e s c r i b e d   i n   t h e   p a c k a g e   l e a f l e t q + σ A s   d e s c r i b e d   i n   t h e   p a c k a g e   l e a f l e t n , q ) S i d e   e f f e c t A s   d e s c r i b e d   i n   t h e   p a c k a g e   l e a f l e t V a c c i n e   i + ( β 6   m o n t h s q + σ 6   m o n t h s n , q ) P r o t e c t i o n 6   m o n t h s V a c c i n e   i + ( β 12   m o n t h s q + σ 12   m o n t h s n , q ) P r o t e c t i o n 12   m o n t h s V a c c i n e   i + ε i
where σ denotes the person-dependent deviation for the random parameters.
In LC modeling, researchers mostly use information criteria to decide the optimal number of classes. One example is the Bayesian information criterion (BIC) (Equation (4)) [4].
BIC = −2LL + KlnN,
where LL is the converged log-likelihood, K is the number of parameters estimated, and N is the number of observations.
To perform our model estimations and MRS calculations, we used the Apollo package of the R program, which is a user-written, highly customizable R package. Through the use of this package, users have the possibility to estimate a wide range of choice models, for example in the context of RUM and non-RUM (e.g., random regret minimization)-based approaches, less complex (e.g., CL, RPL, LC) and more complex (e.g., ICLV—integrated choice latent variable model) specifications, or to integrate the package into the analysis process of other types of stated preference estimation methods (e.g., the support. BWS package created by Aizaki and Fogarty for the analyses of object-type BWSs) (Hess and Palma 2019; Hess and Palma 2021; R Core Team 2020; Aizaki and Fogarty 2023).

2.2. Case Description

We used data from a discrete choice experiment examining decision-makers’ preferences regarding COVID-19 vaccines to test the modeling questions. The survey was conducted in 2021 in Debrecen, the second-largest city in Hungary (Blaga et al. 2023). The attributes of the final experiment (Table 1) were determined through a qualitative approach (expert interviews, focus group interviews) and a pilot study. Our Bayesian D-efficient experimental design, created with Ngene 1.2 software, included 32 decision situations (an example is shown in Table 2), each with four options (three vaccine alternatives and one “no choice” option) (ChoiceMetrics 2018). Due to the large number of decision situations, we used blocking (4 blocks were created), so our respondents were only faced with a subset of decision situations (8 situations).
Our sample includes 1011 respondents, the details of which are shown in Table 3 (it is important to note that in the study by Blaga et al., two respondents were excluded from the sample (i.e., a sample of 1009 was analyzed) due to incomplete responses, but in this case we were able to work with a full sample (n = 1011), as the questions underlying the exclusion are not relevant for the present study) (Blaga et al. 2023). It is necessary to mention that at the time of the data collection, the vaccination of the middle-aged population was taking place in Hungary, so this bias should be considered when interpreting our results.
For more details on the survey process, see the study by Blaga et al. (2023).

3. Results

3.1. Comparison of Latent Class Models with Different Number of Classes—Information Criteria

The information criteria and additional modeling considerations shown in Table 4 simultaneously illustrate the outputs for each class number of fixed-parameter LC and RLC models. For the fixed-parameter LC modeling, a decrease in the BIC is observed up to the seven-class version, but for the RLC, an increase is detected in the four-class case, so the random-parameter specification testing ended with the four-class case.
Some conclusions can be drawn from the results presented in Table 4. In the case of latent class modeling, a different model could be selected as the “best” based on the values of the recommended information criteria for the optimal number of classes, as indicated by the ratio of significant parameters. When the LC model with only fixed parameters is selected, the four-class version shows the lowest BIC, but the three-class version has the most significant parameter estimates. Also, for the RLC, the specification with fewer classes (two-class case) shows a significantly higher parameter estimation rate. In comparison, the case with more classes (the three-class case) performs better regarding the information criteria.

3.2. Comparison of Latent Class Models with Different Numbers of Classes—Estimation Results

Based on the ambiguous conclusions of Table 4, the results of the estimations of all three versions (two-, three-, and four-class case) are presented below. For the RLC modeling, all attribute coefficients were simulated using a normal distribution with 1000 mlhs draws. The results of the two-class fixed-parameter LC and RLC model estimations are shown in Table 5.
The estimates in Table 5 show that the no-choice option was less preferred (based on the negative and significant value of ASC no-choice) compared to the vaccine alternative choice for both models. This conclusion is also true for the other models discussed in this paper, and ASCs were not included as class-specific in any of the models, so we will not reinterpret them in the following paragraphs.
Based on the class probability values of the two-class cases, it is clear that the two types of models (fixed-parameter LC and RLC) do not have the same classes (fixed-parameter LC: Class 1: 0.32, Class 2: 0.68; RLC: Class 1: 0.49, Class 2: 0.51). The direction of the estimated coefficients is mostly in line with expectations (e.g., as the level of efficiency and the duration of ensuring protection increase, the respondents’ perception of the utility of COVID-19 vaccines increases). In contrast, their effect is significant in most cases. In the latter case, however, it is important to note the interesting conclusion that only 66.67% of the estimated parameters for the first class of fixed-parameter LC are significant at the 5% level, compared to the 100% significant coefficient estimates for the second class. The RLC specification shows a much more balanced ratio, with 88.89% of the parameters for the first class and 83.33% for the second class considered significant at the 5% level. An aggregate comparison of the models shows a significantly lower BIC for the RLC, indicating a better model fit (which is probably due to the fact that the additional information gained from using the RLC specification (mainly due to the introduction of random parameters within latent classes) is greater than the “cost” of the complexity of the model), and most of the estimated standard deviation parameters represent significant effects, suggesting the existence of preference heterogeneity within groups. The exceptions to the latter for the second class are the standard deviation parameters for the efficiency level of 71–90% and the side effects described in the package leaflet.
The estimates for the three-class versions of fixed-parameter LC and RLC are presented in Table 6.
The three-class estimates (Table 6) show that, as in the two-class versions, we have obtained entirely different classes for the fixed-parameter LC and RLC models (fixed-parameter LC: Class 1: 0.11, Class 2: 0.55, Class 3: 0.34; RLC: Class 1: 0.21, Class 2: 0.29, Class 3: 0.50). The direction of the estimated coefficients in the three-class models is also mostly in line with expectations, except for the positive parameter estimate for the first class of LC for the side effect as described in the package leaflet (indicating that members of this group are more positive about the long-term side effects of the vaccine compared to those in the package leaflet). An exciting conclusion on the ratio of significant parameter estimates can also be drawn when comparing the fixed-parameter LC and RLC models. While the former has a very high ratio of parameters significant at the 5% level (Class 1: 88.89%, Class 2: 100.00%, Class 3: 88.89%), the latter already shows a group where none of the parameter estimates shows a significant effect (Class 1: 0.00%, Class 2: 77.78%, Class 3: 88.89%). Comparing the models based on the BIC information criterion, we can also conclude, as in the two-class case, that the RLC performs significantly better. In addition to the fact that for the first class of RLC, none of the attributes’ standard deviations represent a significant effect at the 5% level (as mentioned earlier in the discussion of the ratio of significant parameter estimates), it may be interesting to note that for the second group, three standard deviation parameters do not represent a significant effect at the 5% level (for those with the USA and Hungary as the country of origin and the 71–90% efficiency level).
The four-class fixed-parameter LC and RLC model estimations are presented in Table 7.
The four-class estimates in Table 7 show that, as in previous cases, the same classes were not identified using the fixed-parameter LC and RLC models (fixed parameter LC: Class 1: 0.31, Class 2: 0.33, Class 3: 0.26, Class 4: 0.10; RLC: Class 1: <0.01, Class 2: 0.43, Class 3: 0.36, Class 4: 0.20). The directions of the estimated coefficients were mainly in line with our expectations, as in the previously estimated models. An exception to this is seen in the case of the four-class LC, which is reflected in the fact that members of the third class have the highest preference for efficiency levels between 71 and 90% (compared to more than 90% efficiency) and that for the fourth group, the side effect as described in the package leaflet is associated with a negative sign (less preferred over long-term side effects). Regarding the ratio of significant parameter estimates, it can be concluded that the ratio of significant parameter estimates is relatively high for all classes in the four-class fixed-parameter LC (Class 1: 77.78%, Class 2: 100.00%, Class 3: 88.89%, Class 4: 88.89%), however, in the RLC there is one class where no parameter estimate is significant at the 5% level (Class 1: 0.00%, Class 2: 83.33%, Class 3: 83.33%, Class 4: 88.89%). When comparing the fixed-parameter LC and RLC specifications—based on the BIC information criterion—it can be concluded that the latter shows a better fit. However, the difference between the two models is now significantly smaller than it was for the lower-class cases. In addition to the fact that in the case of RLC, we estimated a whole class without significant parameters, the conclusion drawn for the three-class case that there is a standard deviation parameter with no significant effect at the 5% level in the other classes (e.g., for the second class for the 71–90% efficiency level and the side effect as described in the package leaflet, and for the third class for the Hungary country of origin) is also valid here.

3.3. Comparison of Latent Class Models with Different Numbers of Classes—Marginal Rate of Substitution (MRS) Calculations

Marginal rate of substitution (MRS) calculations were also carried out, where the numerator was the attribute of interest and the denominator was the “as described in the package leaflet” side effect in all cases (Table 8). This is not a typical MRS calculation, in which the cost parameter is included in the denominator, and WTP (willingness to pay) calculations are performed. This is because, in our final study, the cost attribute was not included in the characteristics of the alternatives. However, it should be emphasized that the present approach is perfectly applicable for comparing models in the MRS context.
The results in Table 8 show that the central tendencies (mean and median) of the MRS calculations differ significantly for different class number specifications. We can see that the higher-class number model has a higher mean MRS calculation for the country of origin. However, for the efficiency level and protection duration, we can already see that the two-class specification shows the most negative MRS estimates and the three-class specification shows the most positive MRS estimates, with the four-class estimates falling in between. It can also be seen that while for the two-class specification, the mean is always lower than the median, for the three- and four-class cases, the mean is usually higher (although there are exceptions).
Since there were significant differences between the MRS calculations for the different class numbers of models (with several changes in the sign), we tested the more standard scenario of including a coefficient estimated with a linear effect (the models were re-estimated by calculating a single coefficient for vaccine efficiency, assuming the same difference in utility between the levels) in the denominator of the MRS calculations. The results are shown in Table 9.
The results in Table 9 show that compared to the calculations presented earlier (Table 8), we can see a much more balanced picture between the models. The MRS calculations are closer to each other, while no sign change occurs in either case. Regarding the position of the central tendencies, it can be seen that the average exceeds the median in only a few cases, with the second quartile being larger in the majority of cases.
It can therefore be seen that there are significant differences in the distributions of the MRS calculations for the fixed-parameter LC models estimated at different class numbers. Although these differences are reduced when a linear effect is included in the denominator in the conventional form, it may nevertheless be advisable to examine the distributions of MRS estimates before choosing the optimal class number. It can be seen, however, that for RLC we did not perform MRS calculations assuming either a non-linear or a linear effect, due to the difficulty and complexity caused by the handling of double heterogeneity.

4. Conclusions

In the paper, we examined which aspects should be considered in order to choose the optimal number of classes for LC modeling. To examine this question, we analyzed data from a discrete choice experiment in 2021, which examined preferences for COVID-19 vaccines in Hungary’s second-largest city, Debrecen.
To compare the fixed-parameter LC and RLC models, we tested a two-, three-, and four-class version. In addition to the Bayesian information criterion, the ratio of estimated significant parameters within the total estimated parameters was also considered when comparing models with different class numbers. Based on the results, the two aspects always led to other “best models”. Based on the information criterion, the decision favored four-class fixed-parameter LC and three-class RLC, while the ratio of significant parameter estimates favored three-class fixed-parameter LC and two-class RLC. In most cases, the direction of the estimated coefficients for the models was as expected. Still, for the three- and four-class RLCs, one class appeared with no significant parameter estimates. In addition, significant differences in class probability values were also detected between the fixed- and random-parameter models, i.e., in none of the class number cases were the same groups of classes identified. It is also necessary to mention that non-significant standard deviations have become more common in cases of RLC with more than two classes. Finally, it is worth mentioning that the differences in the distributions of the MRS estimates were reduced when we assumed a linear effect in the denominator in the traditional form.
Based on our conclusions, we can make the recommendations that for latent class models (including both the fixed- and random-parameter cases), analysts should consider additional aspects in addition to the information criteria, such as (1) the ratio of significant parameter estimates (it may be interesting to examine this both between and within specifications to find out which model type and class number has the most balanced ratio), (2) the validity of the estimated parameters (we recommend focusing on whether the conclusions are consistent with our theoretical model), (3) the justification for including random parameters (it is important to find a balance between the complexity of the model and its information content, i.e., to examine when (and to what extent) the introduction of within-class heterogeneity is relevant), (4) the distributions of MRS calculations (since they often function as a direct measure of preferences, it is necessary to test how consistent the distributions of specifications with different class numbers are (if they are highly, i.e., relatively stable in explaining consumer preferences, it is probably worth putting more emphasis on the aspects mentioned above when choosing a model)).
Our research has several limitations, among which it is necessary to mention that we analyzed data from only one experiment with a given experimental design. In addition, it is necessary to mention that in the comparison of the models, based on information criteria, a strong emphasis was put on the BIC, and the other information criteria were not analyzed (AIC was only mentioned). It is also necessary to mention that for the fixed-parameter LC and RLC specifications estimated under different class numbers, the class allocation model included only one constant, with no explanatory variables. Turning the focus to the RLC, it is necessary to mention that only the normal distribution was tested and applied in the definition of the random parameters, and that due to the complexity of the specification (handling double heterogeneity), no MRS calculations were performed.
These limitations could represent future research directions. One such direction could be to investigate the question (determining the optimal number of classes for LC modeling) through the analysis of further experiments. It may be possible to extend the range of information criteria under investigation, further increasing the complexity of the class allocation model through the inclusion of explanatory variables. In addition, testing additional distributions in the determination of the random parameters for the RLC specification could be a promising option. Finally, it is necessary to find the optimal method of testing through MRS calculations.

Author Contributions

Conceptualization: P.C., P.B., Z.B., S.H. and B.J.; methodology: P.C., P.B. and S.H.; software: P.C., P.B. and S.H.; validation: P.C., P.B., Z.B., S.H. and B.J.; formal analysis: P.C., P.B. and S.H.; investigation: P.C., P.B., Z.B. and B.J.; resources: B.J. and Z.S.; data curation: P.C.; writing—original draft preparation: P.C.; writing—review and editing: P.C., P.B., Z.B., S.H. and B.J.; visualization: P.C., P.B. and S.H.; supervision: P.C., P.B., Z.B., S.H. and B.J.; project administration: B.J., R.S. and Z.S.; funding acquisition: B.J., R.S. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by the University of Debrecen Program for Scientific Publication. Stephane Hess acknowledges the financial support by the European Research Council through the advanced grant 101020940-SYNERGY.

Data Availability Statement

The data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aizaki, Hideo, and James Fogarty. 2023. R packages and tutorial for case 1 best–worst scaling. Journal of Choice Modelling 46: 100394. [Google Scholar] [CrossRef]
  2. Ben-Akiva, Moshe E., and Steven R. Lerman. 1985. Discrete Choice Analysis: Theory and Application to Travel Demand (Transportation Studies). Cambridge, MA: MIT Press. [Google Scholar] [CrossRef]
  3. Blaga, Zsanett, Peter Czine, Barbara Takacs, Anna Szilagyi, Reka Szekeres, Zita Wachal, Csaba Hegedus, Gyula Buchholcz, Balazs Varga, Daniel Priksz, and et al. 2023. Examination of Preferences for COVID-19 Vaccines in Hungary Based on Their Properties—Examining the Impact of Pandemic Awareness with a Hybrid Choice Approach. International Journal of Environmental Research and Public Health 20: 1270. [Google Scholar] [CrossRef] [PubMed]
  4. Boxall, Peter C., and Wiktor L. Adamowicz. 2002. Understanding heterogeneous preferences in random utility models: A latent class approach. Environmental and Resource Economics 23: 421–46. [Google Scholar] [CrossRef]
  5. Bujosa, Angel, Antoni Riera, and Robert L. Hicks. 2010. Combining discrete and continuous representations of preference heterogeneity: A latent class approach. Environmental and Resource Economics 47: 477–93. [Google Scholar] [CrossRef]
  6. ChoiceMetrics. 2018. Ngene 1.2 User Manual & Reference Guide. Sydney: ChoiceMetrics. [Google Scholar]
  7. Greene, William H., and David A. Hensher. 2003. A latent class model for discrete choice analysis: Contrasts with mixed logit. Transportation Research Part B: Methodological 37: 681–98. [Google Scholar] [CrossRef]
  8. Greene, William H., and David A. Hensher. 2013. Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model. Applied Economics 45: 1897–902. [Google Scholar] [CrossRef]
  9. Hensher, David A., John M. Rose, and William H. Greene. 2015. Applied Choice Analysis. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
  10. Hess, Stephane. 2014. Latent class structures: Taste heterogeneity and beyond. In Handbook of Choice Modelling. Edited by Stephane Hess and Andrew Daly. Cheltenham: Edward Elgar Publishing, pp. 311–30. [Google Scholar] [CrossRef]
  11. Hess, Stephane, and David Palma. 2019. Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling 32: 100170. [Google Scholar] [CrossRef]
  12. Hess, Stephane, and David Palma. 2021. Apollo Version 0.2.4, User Manual. Leeds: University of Leeds. [Google Scholar]
  13. Louviere, Jordan J., David A. Hensher, and Joffre D. Swait. 2000. Stated Choice Methods: Analysis and Applications. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
  14. Louviere, Jordan J., Terry N. Flynn, and Richard T. Carson. 2010. Discrete choice experiments are not conjoint analysis. Journal of Choice Modelling 3: 57–72. [Google Scholar] [CrossRef]
  15. Mariel, Petr, David Hoyos, Jürgen Meyerhoff, Mikolaj Czajkowski, Thijs Dekker, Klaus Glenk, Jette Bredahl Jacobsen, Ulf Liebe, Søren Bøye Olsen, Julian Sagebiel, and et al. 2021. Environmental Valuation with Discrete Choice Experiments: Guidance on Design, Implementation and Data Analysis. Berlin: Springer Nature. [Google Scholar] [CrossRef]
  16. McFadden, Daniel. 1974. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics. Edited by Paul Zarembka. New York: Academic Press, pp. 105–42. [Google Scholar]
  17. McFadden, Daniel, and Kenneth Train. 2000. Mixed MNL models for discrete response. Journal of Applied Econometrics 15: 447–70. [Google Scholar] [CrossRef]
  18. R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  19. Scarpa, Riccardo, Kenneth G. Willis, and Melinda Acutt. 2005. Individual-specific welfare measures for public goods: A latent class approach to residential customers of Yorkshire Water. Econometrics Informing Natural Resource Management 14: 316–37. [Google Scholar] [CrossRef]
  20. Shen, Junyi. 2009. Latent class model or mixed logit model? A comparison by transport mode choice data. Applied Economics 41: 2915–24. [Google Scholar] [CrossRef]
  21. Train, Kenneth E. 2009. Discrete Choice Methods with Simulation, 2nd ed. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
  22. Walker, Joan, and Moshe Ben-Akiva. 2002. Generalized random utility model. Mathematical Social Sciences 43: 303–43. [Google Scholar] [CrossRef]
Table 1. Attributes and their levels in the experiment.
Table 1. Attributes and their levels in the experiment.
AttributeAttribute Level
Country of originUSA
European Union
Hungary
Russia
China
Level of efficiency (%)60–70
71–90
More than 90
Side effectAs described in the package leaflet
Long term
Ensuring protection6 months
12 months
Lifelong
Table 2. Example of a decision situation.
Table 2. Example of a decision situation.
Vaccine 1Vaccine 2Vaccine 3No Choice
Country of originUSAChinaRussia
Level of efficiency (%)More than 90 More than 90 60–70
Side effectAs described in the package leafletAs described in the package leafletLong term
Ensuring protection6 months12 monthsLifelong
Your choice (X):
Table 3. Sample characteristics.
Table 3. Sample characteristics.
CharacteristicSample (n = 1011)
Gender (%)
Male44.2
Female55.8
Age category (%)
18–29 37.9
30–45 38.9
46–60 20.8
61–75 2.4
Highest level of education (%)
Primary6.1
Secondary54.7
Higher (minimum BSc)39.2
Residence category (%)
Debrecen67.8
Other cities in Hajdú–Bihar county21.8
Other municipalities in Hajdú–Bihar county8.3
Other2.1
Table 4. Information criteria and additional modeling aspects for LC (fixed- and random-parameter) versions with different class numbers.
Table 4. Information criteria and additional modeling aspects for LC (fixed- and random-parameter) versions with different class numbers.
2 Class Fixed- Parameter LC2 Class RLC3 Class Fixed- Parameter LC3 Class RLC4 Class Fixed- Parameter LC4 Class RLC
Estimated parameters203830574076
Ratio of significant parameters at 5% level (%)85.0084.2193.3357.8987.5063.16
Log-likelihood (final) −9016.99−7981.58−8653.83−7767.81−8252.72−7799.10
AIC18,073.9816,039.1517,367.6615,649.6216,585.4415,750.19
BIC18,213.9516,305.0817,577.6116,048.5116,865.3716,282.05
Note: AIC: Akaike information criterion; BIC: Bayesian information criterion; Log-likelihood (0): −11,212.35. In the table, the specifications with the most favourable significant parameter ratio and BIC value in fixed-parameter LC and RLC contexts are shown in bold.
Table 5. Results of the two-class fixed-parameter LC and RLC model estimations.
Table 5. Results of the two-class fixed-parameter LC and RLC model estimations.
Attributes and Descriptive Data of the ModelTwo−Class Fixed-Parameter LCTwo−Class RLC
Class 1Class 2Class 1Class 2
ASC no choice−1.67
(−23.36)
−1.66
(−18.14)
USA−0.33
(−2.65)
0.71
(10.59)
−0.67
(−4.22)
2.28
(8.25)
USA (SD)--1.52
(7.30)
1.51
(8.16)
European Union0.10
(0.86)
1.12
(17.54)
−0.04
(−0.27)
3.07
(10.14)
European Union (SD)--1.45
(8.70)
1.79
(10.58)
Hungary0.04
(0.29)
1.16
(18.15)
0.03
(0.19)
2.97
(10.21)
Hungary (SD)--1.66
(8.03)
2.26
(9.46)
Russia0.01
(0.07)
0.50
(7.64)
−0.50
(−3.26)
1.63
(7.64)
Russia (SD)--1.48
(7.37)
2.09
(8.36)
60–70 −0.81
(−7.98)
−0.81
(−18.00)
−2.41
(−12.29)
−0.81
(−7.10)
60–70 (SD)--1.64
(10.44)
0.71
(4.49)
71–90 −0.57
(−6.28)
−0.33
(−8.35)
−1.28
(−9.84)
−0.02
(−0.17)
71–90 (SD)--0.98
(7.05)
0.13
(0.54)
As described in the package leaflet0.13
(1.77)
0.86
(21.84)
2.22
(10.78)
0.47
(6.28)
As described in the package leaflet (SD)--2.87
(12.71)
0.01
(0.05)
6 months−3.15
(−18.77)
−1.09
(−18.99)
−2.55
(−12.84)
−3.40
(−13.69)
6 months (SD)--1.48
(7.62)
1.60
(8.58)
12 months−1.99
(−18.63)
−0.44
(−9.79)
−1.28
(−11.08)
−2.37
(−11.00)
12 months (SD)--0.97
(6.46)
1.55
(9.35)
Class membership probability0.320.680.490.51
Constant in class allocation equation−0.74
(−7.30)
-−0.02
(−0.19)
-
Ratio of significant parameters at 5% level (%)66.67100.0088.8983.33
Log-likelihood (final)−9016.99−7981.58
BIC18,213.9516,305.08
Note: ASC: alternative specific constant; t-ratios are shown in parentheses below the parameter estimates; SD: standard deviation; BIC: Bayesian information criterion; Log-likelihood (0): −11,212.35.
Table 6. Results of the three-class fixed-parameter LC and RLC model estimations.
Table 6. Results of the three-class fixed-parameter LC and RLC model estimations.
Attributes and Descriptive Data of the ModelThree-Class Fixed-Parameter LCThree-Class RLC
Class 1Class 2Class 3Class 1Class 2Class 3
ASC no choice−1.59
(−22.04)
−1.87
(−20.19)
USA−0.26
(−1.32)
1.51
(14.34)
−0.46
(−4.49)
−0.60
(−0.54)
0.98
(3.26)
0.61
(3.76)
USA (SD)---5.76
(1.16)
0.42
(1.03)
2.00
(11.88)
European Union0.48
(2.31)
1.94
(18.12)
−0.21
(−2.35)
3.39
(0.89)
1.34
(4.46)
1.50
(8.71)
European Union (SD)---7.30
(1.24)
0.59
(1.80)
2.20
(12.50)
Hungary−1.02
(−4.50)
2.06
(19.37)
0.07
(0.76)
1.53
(0.51)
1.75
(5.72)
1.31
(7.38)
Hungary (SD)---13.68
(1.23)
0.05
(0.15)
2.34
(12.46)
Russia−1.42
(−5.64)
1.37
(15.27)
−0.19
(−1.67)
−4.65
(−1.15)
1.49
(5.09)
0.21
(1.47)
Russia (SD)---8.22
(1.17)
1.15
(4.57)
1.65
(8.41)
60–70 −1.97
(−9.25)
−0.69
(−12.64)
−1.03
(−12.42)
−18.40
(−1.18)
−0.60
(−3.44)
−1.49
(−11.94)
60–70 (SD)---12.33
(1.20)
0.67
(3.26)
1.06
(8.55)
71–90 −1.43
(−7.70)
−0.18
(−3.68)
−0.63
(−8.55)
−7.63
(−1.27)
0.20
(1.39)
−0.73
(−8.08)
71–90 (SD)---2.33
(1.34)
0.09
(0.23)
0.70
(5.58)
As described in the package leaflet−0.45
(−3.12)
0.29
(6.67)
1.73
(17.06)
26.48
(1.20)
0.68
(4.85)
0.59
(6.04)
As described in the package leaflet (SD)---30.03
(1.21)
0.86
(4.95)
0.85
(6.95)
6 months−3.14
(−13.26)
−1.78
(−25.47)
−1.23
(−11.95)
−20.86
(−1.14)
−5.00
(−10.21)
−1.34
(−11.32)
6 months (SD)---9.54
(1.17)
1.65
(4.37)
0.40
(2.22)
12 months−2.36
(−12.91)
−1.17
(−19.43)
−0.48
(−6.32)
−12.27
(−1.15)
−2.78
(−9.36)
−0.65
(−7.37)
12 months (SD)---2.99
(1.11)
1.55
(7.13)
0.29
(1.03)
Class membership probability0.110.550.340.210.290.50
Constant in class allocation equation−1.66
(−13.30)
−0.48
(−4.30)
−0.33
(−2.27)
-0.56
(3.71)
Ratio of significant parameters at 5% level (%)88.89100.0088.890.0077.7888.89
Log-likelihood (final)−8653.83−7767.81
BIC17,577.6116,048.51
Note: ASC: alternative specific constant; t-ratios are shown in parentheses below the parameter estimates; SD: standard deviation; BIC: Bayesian information criterion; Log-likelihood (0): −11,212.35.
Table 7. Results of the four-class fixed-parameter LC and RLC model estimations.
Table 7. Results of the four-class fixed-parameter LC and RLC model estimations.
Attributes and Descriptive Data of the ModelFour-Class Fixed-Parameter LCFour-Class RLC
Class 1Class 2Class 3Class 4Class 1Class 2Class 3Class 4
ASC no choice−1.47
(−19.43)
−1.93
(−21.12)
USA−0.40
(−3.66)
1.75
(14.43)
1.03
(5.56)
−0.20
(−1.01)
−33.58
(<0.01)
2.96
(8.69)
−0.48
(−3.48)
−0.36
(−0.57)
USA (SD)----31.07
(<0.01)
1.95
(7.79)
0.78
(4.03)
6.79
(3.89)
European Union−0.10
(−0.98)
2.34
(19.12)
1.00
(6.30)
0.65
(3.21)
−24.42
(<0.01)
3.90
(10.48)
−0.01
(−0.09)
5.26
(2.73)
European Union (SD)----8.20
(<0.01)
2.07
(8.39)
1.05
(6.11)
7.20
(4.79)
Hungary0.22
(2.14)
2.12
(17.40)
1.67
(9.80)
−0.94
(−4.14)
−15.70
(<0.01)
3.60
(10.11)
0.26
(2.05)
1.09
(1.10)
Hungary (SD)----31.02
(<0.01)
3.12
(9.40)
0.36
(1.27)
14.84
(3.77)
Russia0.04
(0.38)
0.65
(4.63)
1.81
(11.19)
−1.41
(−5.36)
12.81
(<0.01)
1.75
(6.41)
0.05
(0.43)
−4.19
(−3.44)
Russia (SD)----0.87
(<0.01)
2.89
(8.71)
0.87
(4.96)
7.75
(3.90)
60–70 −1.13
(−12.35)
−1.00
(−13.09)
−0.12
(−1.17)
−2.14
(−10.18)
−6.06
(<0.01)
−0.95
(−6.06)
−1.51
(−11.19)
−12.06
(−4.21)
60–70 (SD)----4.71
(<0.01)
0.97
(4.74)
1.12
(9.10)
6.54
(3.94)
71–90 −0.65
(−8.19)
−0.37
(−5.72)
0.28
(2.83)
−1.56
(−8.40)
−32.03
(<0.01)
0.02
(0.20)
−0.71
(−7.59)
−7.03
(−4.47)
71–90 (SD)----49.63
(<0.01)
0.17
(0.34)
0.69
(5.66)
1.25
(2.22)
As described in the package leaflet2.02
(17.99)
0.14
(2.26)
0.32
(4.06)
−0.34
(−2.31)
−12.36
(<0.01)
0.44
(4.45)
0.93
(8.17)
22.38
(3.87)
As described in the package leaflet (SD)----26.34
(<0.01)
0.08
(0.27)
0.92
(7.03)
27.28
(3.87)
6 months−1.36
(−11.71)
−0.85
(−10.12)
−3.57
(−17.30)
−3.08
(−13.29)
23.58
(<0.01)
−4.47
(−13.26)
−1.51
(−9.25)
−14.30
(−4.43)
6 months (SD)----19.65
(<0.01)
1.79
(6.83)
0.96
(6.20)
8.65
(3.68)
12 months−0.60
(−7.43)
−0.33
(−4.47)
−2.08
(−16.74)
−2.26
(−12.57)
29.47
(<0.01)
−3.17
(−11.91)
−0.67
(−5.95)
−9.07
(−4.09)
12 months (SD)----13.29
(<0.01)
1.84
(7.33)
0.53
(3.71)
2.41
(3.18)
Class membership probability0.310.330.260.10<0.010.430.360.20
Constant in class allocation equation−0.06
(−0.51)
-−0.23
(−2.14)
−1.18
(−9.11)
−11.71
(<0.01)
-−0.75
(−6.47)
−0.19
(−1.44)
Ratio of significant parameters at 5% level (%)77.78100.0088.8988.890.0083.3383.3388.89
Log-likelihood (final)−8252.72−7799.10
BIC16,865.3716,282.05
Note: ASC: alternative specific constant; t-ratios are shown in parentheses below the parameter estimates; SD: standard deviation; BIC: Bayesian information criterion; Log-likelihood (0): −11,212.35.
Table 8. Results of the MRS calculations (with non-linear effect in the denominator).
Table 8. Results of the MRS calculations (with non-linear effect in the denominator).
AttributeMRS Calculations for the Fixed-Parameter LC Specifications
(Using Posterior Probabilities)
Two-ClassThree-ClassFour-Class
Mean
(Q1)
(Median)
(Q3)
USA−0.26
(−2.10)
(0.72)
(0.83)
2.28
(−0.00)
(1.04)
(5.15)
4.85
(0.59)
(3.23)
(11.07)
European Union1.14
(0.85)
(1.29)
(1.30)
2.39
(0.39)
(1.02)
(5.53)
5.96
(0.07)
(3.14)
(14.69)
Hungary1.01
(0.44)
(1.32)
(1.35)
2.70
(0.84)
(2.82)
(4.66)
6.52
(1.77)
(5.25)
(13.42)
Russia0.41
(0.13)
(0.56)
(0.58)
2.35
(0.10)
(2.34)
(4.46)
3.39
(0.93)
(4.38)
(4.65)
60–70−2.69
(−5.61)
(−1.13)
(−0.96)
−0.49
(−1.91)
(−0.86)
(−0.75)
−1.93
(−6.23)
(−0.60)
(−0.40)
71–90−1.71
(−3.94)
(−0.52)
(−0.39)
0.19
(−0.37)
(−0.35)
(−0.35)
−0.24
(−2.18)
(−0.32)
(0.86)
6 months−8.83
(−21.54)
(−2.05)
(−1.30)
−2.47
(−7.29)
(−1.45)
(−0.76)
−4.16
(−7.19)
(−5.74)
(−0.74)
12 months−5.40
(−13.60)
(−1.01)
(−0.53)
−1.38
(−4.59)
(−0.75)
(−0.30)
−1.89
(−3.86)
(−2.28)
(−0.34)
Table 9. Results of the MRS calculations (with linear effect in the denominator).
Table 9. Results of the MRS calculations (with linear effect in the denominator).
AttributeMRS Calculations for the Fixed-Parameter LC Specifications
(Using Posterior Probabilities)
Two-ClassThree-ClassFour-Class
Mean
(Q1)
(Median)
(Q3)
USA0.53
(−0.79)
(1.24)
(1.32)
1.20
(1.14)
(1.25)
(1.27)
1.70
(−0.37)
(1.93)
(3.74)
European Union2.13
(2.09)
(2.16)
(2.16)
1.68
(1.05)
(2.18)
(2.27)
2.08
(−0.41)
(1.62)
(5.00)
Hungary1.92
(1.15)
(2.33)
(2.38)
2.15
(1.97)
(2.29)
(2.32)
2.61
(0.72)
(2.70)
(4.64)
Russia1.01
(0.96)
(0.97)
(1.07)
1.27
(0.62)
(0.69)
(1.63)
1.68
(0.24)
(1.81)
(2.37)
As described in the package leaflet1.52
(1.11)
(1.75)
(1.77)
1.57
(1.01)
(1.90)
(1.98)
1.37
(0.48)
(0.83)
(2.72)
6 months−8.88
(−19.45)
(−3.10)
(−2.51)
−2.97
(−4.64)
(−2.35)
(−2.31)
−2.70
(−4.37)
(−2.34)
(−1.96)
12 months−5.16
(−11.87)
(−1.50)
(−1.13)
−1.43
(−2.46)
(−0.97)
(−0.94)
−1.34
(−2.43)
(−1.04)
(−0.75)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Czine, P.; Balogh, P.; Blága, Z.; Szabó, Z.; Szekeres, R.; Hess, S.; Juhász, B. Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches? Econometrics 2024, 12, 22. https://doi.org/10.3390/econometrics12030022

AMA Style

Czine P, Balogh P, Blága Z, Szabó Z, Szekeres R, Hess S, Juhász B. Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches? Econometrics. 2024; 12(3):22. https://doi.org/10.3390/econometrics12030022

Chicago/Turabian Style

Czine, Péter, Péter Balogh, Zsanett Blága, Zoltán Szabó, Réka Szekeres, Stephane Hess, and Béla Juhász. 2024. "Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches?" Econometrics 12, no. 3: 22. https://doi.org/10.3390/econometrics12030022

APA Style

Czine, P., Balogh, P., Blága, Z., Szabó, Z., Szekeres, R., Hess, S., & Juhász, B. (2024). Is It Sufficient to Select the Optimal Class Number Based Only on Information Criteria in Fixed- and Random-Parameter Latent Class Discrete Choice Modeling Approaches? Econometrics, 12(3), 22. https://doi.org/10.3390/econometrics12030022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop