# Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision

## Abstract

**:**

## 1. Introduction

- (1)
- Medicine and Healthcare: Bayesian methods are employed in clinical trials, diagnostic tests, epidemiology, and personalized medicine to quantify uncertainty and make informed decisions.
- (2)
- Finance and Economics: Bayesian analysis is used in risk assessment, portfolio optimization, forecasting, and economic modeling to account for uncertainty and update beliefs.
- (3)
- Engineering: Bayesian techniques are applied in reliability analysis, optimization, and decision-making under uncertainty in various engineering domains.
- (4)
- Machine Learning and Artificial Intelligence: Bayesian inference is used in probabilistic modeling, Bayesian networks, and Bayesian optimization to reason under uncertainty and provide robust predictions.
- (5)
- Environmental Science: Bayesian analysis is utilized in environmental modeling, ecological studies, and climate change research to integrate diverse data sources and quantify uncertainty in predictions [5].

#### 1.1. Bayesian Latent Variable Modeling

#### 1.2. Bayesian Factor Analysis

#### 1.3. Bayesian Latent Class Analysis

## 2. Theoretical Framework

#### 2.1. The LCA Model

#### 2.2. Estimation Procedures

#### 2.3. The Bayesian Approach

#### 2.4. Bayesian LCA

_{C}) has a Dirichlet distribution, which can be notated as:

_{C}~ D[d

_{1},.., d

_{C}],

_{1}…d

_{C}determine the uniformity of the D distribution. When d

_{1}…d

_{C}have relatively equal values, the identified latent classes are similar in size and have similar probabilities [43].

_{v},

_{rv|C}). The Bayesian estimation calculates this parameter in two ways. The response probability can be calculated as a probability as follows:

_{v},

_{rv|C}~ D[d

_{1},.., d

_{C}].

_{1}…d

_{C}.

_{v},

_{rv|C}~ N[µ

_{ρ}, σ

^{2}

_{ρ}],

_{ρ}and variance σ

^{2}

_{ρ}parameters. Depending on the software used for estimation, the variance parameter may be referred to as precision [43].

#### 2.5. Label Switching

#### 2.6. Classification Precision

## 3. Objectives

## 4. Simulation Study

- Specify the predictive model including the independent and dependent variables.
- Specify the distribution of the independent variables (based on historical information and theory.
- Use multiple sets of randomly generated values following the specified distribution to calculate a representative sample of results [50].

## 5. Results

## 6. Discussion and Conclusions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
- Kruschke, J.K. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
- McElreath, R. Statistical Rethinking: A Bayesian Course with Examples in R and Stan; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
- Carlin, B.P.; Louis, T.A. Bayesian Methods for Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
- Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Kaplan, D. Bayesian Statistics for the Social Sciences; Guilford Publications: New York, NY, USA, 2014. [Google Scholar]
- Gill, J. Bayesian Methods: A Social and Behavioral Sciences Approach; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]
- Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature
**2015**, 521, 452–459. [Google Scholar] [CrossRef] [PubMed] - Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Lee, M.D.; Wagenmakers, E.J. Bayesian Cognitive Modeling: A Practical Course; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- van de Schoot, R.; Kaplan, D.; Denissen, J.; Asendorpf, J.B.; Neyer, F.J.; van Aken, M.A. A gentle introduction to Bayesian analysis: Applications to developmental research. Child Dev.
**2014**, 85, 842–860. [Google Scholar] [CrossRef] [PubMed] - Muthén, B.; Asparouhov, T. Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychol. Methods
**2012**, 17, 313–335. [Google Scholar] [CrossRef] [PubMed] - Wang, W.; Hancock, G.R. Bayesian factor analysis for structural equation modeling. J. Educ. Behav. Stat.
**2010**, 35, 22–50. [Google Scholar] - DeCarlo, L.T. On the analysis of factorial surveys by Bayesian confirmatory factor analysis. Sociol. Methods Res.
**2012**, 41, 313–337. [Google Scholar] - Asparouhov, T.; Muthén, B. Bayesian Analysis of Latent Variable Models Using Mplus; Technical Report; Version 4; Muthén & Muthén: Los Angeles, CA, USA, 2010; Available online: http://www.statmodel.com/download/BayesAdvantages18.pdf (accessed on 5 May 2023).
- Asparouhov, T.; Muthén, B. Bayesian Analysis Using Mplus: Technical Implementation (Technical Appendix); Muthén & Muthén: Los Angeles, CA, USA, 2010; Available online: http://www.statmodel.com/download/BayesAdvantages18.pdf (accessed on 5 May 2023).
- Lee, S.Y. A Bayesian approach to confirmatory factor analysis. Psychometrika
**1981**, 46, 153–160. [Google Scholar] [CrossRef] - Martin, J.K.; McDonald, R.P. Bayes estimates in restricted factor analysis: A treatment of Heywood cases. Psychometrika
**1975**, 40, 505–517. [Google Scholar] [CrossRef] - Mayekawa, S. Bayesian Factor Analysis (ONR Technical Report No. 85-3); CadaResearch Group, University of Iowa: Iowa City, IA, USA, 1985. [Google Scholar]
- Albert, J.H.; Chib, S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc.
**1993**, 88, 669–679. [Google Scholar] [CrossRef] - Vermunt, J.K.; Magidson, J. Latent class cluster analysis. In The Handbook of Advanced Multilevel Analysis; HoX, J.J., Roberts, J.K., Eds.; Routledge: Oxfordshire, UK, 2016; pp. 141–160. [Google Scholar]
- Friel, N.; Wyse, J. Estimating the number of classes in a finite mixture model. J. R. Stat. Soc. Ser. B
**2012**, 74, 411–438. [Google Scholar] - Celeux, G.; Soromenho, G. An entropy criterion for assessing the number of clusters in a mixture model. J. Classif.
**1996**, 13, 195–212. [Google Scholar] [CrossRef] [Green Version] - Hagenaars, J.A.; McCutcheon, A.L. (Eds.) Applied Latent Class Analysis; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics
**1993**, 49, 803–821. [Google Scholar] [CrossRef] - Everitt, B.S. Cluster Analysis; Edward Arnold: London, UK, 1993. [Google Scholar]
- McLachlan, G.; Peel, D. Finite Mixture Models; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
- Everitt, B.S.; Hand, D.J. Finite mixture models. In Handbook of Markov Chain Monte Carlo; Gelman, A., Rubin, D.B., Eds.; CRC Press: Boca Raton, FL, USA, 2011; pp. 79–110. [Google Scholar]
- Vermunt, J.K.; Magidson, J. Latent class cluster analysis. In Applied Latent Class Analysis; Hagenaars, J.A., McCutcheon, A.L., Eds.; Cambridge University Press: Cambridge, UK, 2002; pp. 89–106. [Google Scholar]
- Nylund-Gibson, K.; Choi, A.Y. Ten frequently asked questions about latent class analysis. Transl. Issues Psychol. Sci.
**2018**, 4, 440–461. [Google Scholar] [CrossRef] - Muthén, B. Beyond SEM: General latent variable modeling. Behaviormetrika
**2002**, 29, 81–117. [Google Scholar] [CrossRef] [Green Version] - Muthén, B. Bayesian analysis in Mplus: A brief introduction. Mathematics, 2010; Unpublished manuscript. Available online: www.statmodel.com/download/IntroBayesVersion,203(accessed on 5 May 2023).
- Geiser, C. Data Analysis with Mplus (Methodology in the Social Sciences); Guilford Press: New York, NY, USA, 2013. [Google Scholar]
- Collins, L.M.; Lanza, S.T. Latent Class and Latent Transition Analysis for the Social, Behavioral, and Health Sciences; Wiley: New York, NY, USA, 2010. [Google Scholar]
- DiStefano, C. Cluster analysis and latent class clustering techniques. In Handbook of Developmental Research Methods; The Guilford Press: New York, NY, USA, 2012; pp. 645–666. [Google Scholar]
- Finney, S.J.; DiStefano, C. Non-normal and categorical data in structural equation modeling. Struct. Equ. Model. Second Course
**2006**, 10, 269–314. [Google Scholar] - Muthén, L.K.; Muthén, B.O. Mplus User’s Guide; Muthén and Muthén: Los Angeles, CA, USA, 2017. [Google Scholar]
- Goodman, L.A. The analysis of systems of qualitative variables when some of the variables are unobservable. Part IA modified latent structure approach. Am. J. Sociol.
**1974**, 79, 1179–1259. [Google Scholar] [CrossRef] - Elliott, M.R.; Gallo, J.J.; Ten Have, T.R.; Bogner, H.R.; Katz, I.R. Using a Bayesian latent growth curve model to identify trajectories of positive affect and negative events following myocardial infarction. Biostatistics
**2005**, 6, 119–143. [Google Scholar] [CrossRef] [PubMed] - Asparouhov, T.; Muthén, B. Using Bayesian priors for more flexible latent class analysis. In Proceedings of the 2011 Joint Statistical Meeting, Miami Beach, FL, USA, 30 July–4 August 2011; American Statistical Association: Alexandria, VA, USA, 2011. [Google Scholar]
- Jackman, S. Bayesian Analysis for the Social Sciences; John Wiley & Sons: New York, NY, USA, 2009; Volume 846. [Google Scholar]
- Silvey, S.D. Statistical Inference; CRC Press: Boca Raton, FL, USA, 1975; Volume 7. [Google Scholar]
- Depaoli, S. The Latent Class Model. In Bayesian Structural Equation Modeling; The Guilford Press: New York, NY, USA, 2021. [Google Scholar]
- Redner, R.A.; Walker, H.F. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev.
**1984**, 26, 195–239. [Google Scholar] [CrossRef] - Stephens, M. Dealing with label Switching in mixture models. J. R. Stat. Soc.
**2000**, 62, 795–809. [Google Scholar] [CrossRef] - Jasra, A.; Holmes, C.C.; Stephens, D.A. Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling; Mathematical Statistics: Shaker Heights, OH, USA, 2005. [Google Scholar]
- Farrar, D. Approaches to the Label-Switching Problem of Classification Based on Partition-Space Label Invariant Visualization (Technical Report); Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2006. [Google Scholar]
- Akaike, H. On Entropy Maximization Principle; Krishnaiah, P.R., Ed.; Applications of Statistics; North Holland Publishing Company: Amsterdam, The Netherlands, 1977; pp. 27–47. [Google Scholar]
- Ramaswamy, V.; Desarbo, W.S.; Reibstein, D.J. An empirical pooling approach for estimating marketing mix elasticities with PIMS data. Mark. Sci.
**1993**, 12, 103–124. [Google Scholar] [CrossRef] - Kroese, D.P.; Brereton, T.; Taimre, T.; Botev, Z.I. Why the Monte Carlo method is so important today. WIREs Comput. Stat.
**2014**, 6, 386–392. [Google Scholar] [CrossRef] - Gagniuc, P.A. Markov Chains: From Theory to Implementation and Experimentation; John Wiley & Sons: Hoboken, NJ, USA, 2017; pp. 1–235. ISBN 978-1-119-38755-8. [Google Scholar]
- Sawilowsky, S.; Fahoome, G.C. Statistics via Monte Carlo Simulation with Fortran; JMASM: Rochester Hills, MI, USA, 2003; ISBN 978-0-9740236-0-1. [Google Scholar]

**Figure 5.**Bayes and MLR average latent class probabilities for the most likely latent class membership in relation to sample size and model size.

Variable Type | Computation Procedure |
---|---|

Continuous | Linear regression equations |

Censored | Censored-inflated normal regression |

Count | Poisson or zero-inflated Poison regression equations |

Ordered categorical | Logistic regression |

Binary | Logistic regression |

Nominal | Multinomial logistic regression |

**Table 2.**Average Latent Class Probabilities and Misclassification Probabilities for a Hypothetical 4 × 4 Latent Class Model.

Class 1 | Class 2 | Class 3 | Class 4 | |
---|---|---|---|---|

Class 1 | 0.980 | 0.010 | 0.000 | 0.010 |

Class 2 | 0.030 | 0.961 | 0.000 | 0.009 |

Class 3 | 0.020 | 0.040 | 0.890 | 0.050 |

Class 4 | 0.020 | 0.049 | 0.010 | 0.921 |

**Note:**The diagonal elements are the average latent class probabilities and are marked in bold. The off-diagonal elements represent the misclassification probabilities.

LCA Model | Estimator | Sample Size | Average Latent Class Probabilities for Most Likely Latent Class Membership | |||
---|---|---|---|---|---|---|

Class 1 | Class 2 | Class 3 | Class 4 | |||

2 Class Model | Bayes | 1000 | 0.999 | 0.999 | ||

750 | 0.999 | 0.999 | ||||

500 | 0.999 | 0.999 | ||||

250 | 1.000 | 0.999 | ||||

100 | 0.999 | 0.999 | ||||

75 | 1.000 | 1.000 | ||||

MLR | 1000 | 0.974 | 0.982 | |||

750 | 0.974 | 0.981 | ||||

500 | 0.975 | 0.978 | ||||

250 | 0.993 | 0.987 | ||||

100 | 0.984 | 0.967 | ||||

75 | 0.987 | 0.968 | ||||

3 Class Model | Bayes | 1000 | 0.941 | 0.938 | 0.987 | |

750 | 0.939 | 0.939 | 0.989 | |||

500 | 0.940 | 0.939 | 0.993 | |||

250 | 0.935 | 0.943 | 0.995 | |||

100 | 0.916 | 0.948 | 0.993 | |||

75 | 0.910 | 0.948 | 0.993 | |||

MLR | 1000 | 0.867 | 0.848 | 0.67 | ||

750 | 0.874 | 0.855 | 0.695 | |||

500 | 0.882 | 0.868 | 0.735 | |||

250 | 0.889 | 0.884 | 0.807 | |||

100 | 0.915 | 0.914 | 0.872 | |||

75 | 0.921 | 0.922 | 0.905 | |||

4 Class Model | Bayes | 1000 | 0.548 | 0.874 | 0.768 | 0.742 |

750 | 0.560 | 0.882 | 0.788 | 0.770 | ||

500 | 0.535 | 0.889 | 0.801 | 0.741 | ||

250 | 0.540 | 0.887 | 0.834 | 0.731 | ||

100 | 0.528 | 0.913 | 0.756 | 0.780 | ||

75 | 0.574 | 0.925 | 0.808 | 0.815 | ||

MLR | 1000 | 0.821 | 0.756 | 0.599 | 0.539 | |

750 | 0.832 | 0.77 | 0.621 | 0.570 | ||

500 | 0.845 | 0.793 | 0.664 | 0.616 | ||

250 | 0.866 | 0.823 | 0.752 | 0.707 | ||

100 | 0.891 | 0.881 | 0.855 | 0.835 | ||

75 | 0.911 | 0.901 | 0.887 | 0.868 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mindrila, D.
Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision. *Mathematics* **2023**, *11*, 2753.
https://doi.org/10.3390/math11122753

**AMA Style**

Mindrila D.
Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision. *Mathematics*. 2023; 11(12):2753.
https://doi.org/10.3390/math11122753

**Chicago/Turabian Style**

Mindrila, Diana.
2023. "Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision" *Mathematics* 11, no. 12: 2753.
https://doi.org/10.3390/math11122753