Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations
Abstract
:1. Introduction
1.1. Background
1.2. Current Study
2. Materials and Methods
2.1. Test Datasets
2.2. Test Procedures
3. Results
3.1. First Dataset: Purpose and Moral Psychological Indicators
3.2. Second Dataset: Character Strengths and Moral Reasoning
3.3. Third Dataset: Trust and COVID-19 Vaccine Intent
3.4. Performance Trends across Different Sample Sizes
4. Discussion
5. Concluding Remarks
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jack, R.E.; Crivelli, C.; Wheatley, T. Data-Driven Methods to Diversify Knowledge of Human Psychology. Trends Cogn. Sci. 2018, 22, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Wagenmakers, E.-J. A Practical Solution to the Pervasive Problems of p Values. Psychon. Bull. Rev. 2007, 14, 779–804. [Google Scholar] [CrossRef] [PubMed]
- Wagenmakers, E.-J.; Marsman, M.; Jamil, T.; Ly, A.; Verhagen, J.; Love, J.; Selker, R.; Gronau, Q.F.; Šmíra, M.; Epskamp, S.; et al. Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications. Psychon. Bull. Rev. 2018, 25, 35–57. [Google Scholar] [CrossRef] [PubMed]
- Weston, S.J.; Ritchie, S.J.; Rohrer, J.M.; Przybylski, A.K. Recommendations for Increasing the Transparency of Analysis of Preexisting Data Sets. Adv. Methods Pract. Psychol. Sci. 2019, 2, 214–227. [Google Scholar] [CrossRef] [PubMed]
- McNeish, D.M. Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences. Multivar. Behav. Res. 2015, 50, 471–484. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Han, H. A Method to Explore the Best Mixed-Effects Model in a Data-Driven Manner with Multiprocessing: Applications in Public Health Research. EJIHPE 2024, 14, 1338–1350. [Google Scholar] [CrossRef] [PubMed]
- Han, H.; Dawson, K.J. Applying Elastic-Net Regression to Identify the Best Models Predicting Changes in Civic Purpose during the Emerging Adulthood. J. Adolesc. 2021, 93, 20–27. [Google Scholar] [CrossRef] [PubMed]
- Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model Averaging: A Tutorial. Stat. Sci. 1999, 14, 382–401. [Google Scholar] [CrossRef]
- Lu, M.; Zhou, J.; Naylor, C.; Kirkpatrick, B.D.; Haque, R.; Petri, W.A.; Ma, J.Z. Application of Penalized Linear Regression Methods to the Selection of Environmental Enteropathy Biomarkers. Biomark. Res. 2017, 5, 9. [Google Scholar] [CrossRef]
- Feher, B.; Lettner, S.; Heinze, G.; Karg, F.; Ulm, C.; Gruber, R.; Kuchler, U. An Advanced Prediction Model for Postoperative Complications and Early Implant Failure. Clin. Oral Implants Res. 2020, 31, 928–935. [Google Scholar] [CrossRef] [PubMed]
- Babyak, M.A. What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosom. Med. 2004, 66, 411–421. [Google Scholar] [CrossRef] [PubMed]
- Ng, A.Y. Preventing “Overfitting” of Cross-Validation Data. In Proceedings of the Machine Learning: Fourteenth International Conference (ICML 97), Nashville, TN, USA, 8–12 July 1997. [Google Scholar]
- Johnson, J.W.; LeBreton, J.M. History and use of relative importance indices in organizational research. Organ. Res. Methods 2004, 7, 238–257. [Google Scholar] [CrossRef]
- Kruskal, W.; Majors, R. Concepts of relative importance in recent scientific literature. Am. Stat. 1989, 43, 2–6. [Google Scholar] [CrossRef]
- Budescu, D.V.; Azen, R. Beyond global measures of relative importance: Some insights from dominance analysis. Organ. Res. Methods 2004, 7, 341–350. [Google Scholar] [CrossRef]
- Lipovetsky, S.; Conklin, W.M. Predictor relative importance and matching regression parameters. J. Appl. Stat. 2015, 42, 1017–1031. [Google Scholar] [CrossRef]
- Johnson, J.W. A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivar. Behav. Res. 2000, 35, 1–19. [Google Scholar] [CrossRef] [PubMed]
- Shou, Y.; Smithson, M. Evaluating predictors of dispersion: A comparison of dominance analysis and Bayesian model averaging. Psychometrika 2015, 80, 236–256. [Google Scholar] [CrossRef] [PubMed]
- Han, H.; Dawson, K.J.; Walker, D.I.; Nguyen, N.; Choi, Y.-J. Exploring the Association between Character Strengths and Moral Functioning. Ethics Behav. 2022, 33, 286–303. [Google Scholar] [CrossRef]
- Galasso, V.; Pons, V.; Profeta, P.; Becher, M.; Brouard, S.; Foucault, M. Gender Differences in COVID-19 Attitudes and Behavior: Panel Evidence from Eight Countries. Proc. Natl. Acad. Sci. USA 2020, 117, 27285–27291. [Google Scholar] [CrossRef]
- Han, H.; Dawson, K.J. Improved Model Exploration for the Relationship between Moral Foundations and Moral Judgment Development Using Bayesian Model Averaging. J. Moral Educ. 2022, 51, 204–218. [Google Scholar] [CrossRef]
- Raftery, A.E.; Zheng, Y. Discussion: Performance of Bayesian Model Averaging. J. Am. Stat. Assoc. 2003, 98, 931–938. [Google Scholar] [CrossRef]
- Brown, D.L. Faculty Ratings and Student Grades: A University-Wide Multiple Regression Analysis. J. Educ. Psychol. 1976, 68, 573–578. [Google Scholar] [CrossRef]
- Henderson, D.A.; Denison, D.R. Stepwise Regression in Social and Psychological Research. Psychol. Rep. 1989, 64, 251–257. [Google Scholar] [CrossRef]
- Ghani, I.M.M.; Ahmad, S. Stepwise Multiple Regression Method to Forecast Fish Landing. Procedia-Soc. Behav. Sci. 2010, 8, 549–554. [Google Scholar] [CrossRef]
- DataCamp; Step: Choose a Model by AIC in a Stepwise Algorithm 2024. Available online: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/step (accessed on 11 July 2024).
- Clyde, M. Model Uncertainty and Health Effect Studies for Particulate Matter. Environmetrics 2000, 11, 745–763. [Google Scholar] [CrossRef]
- George, E.I.; Clyde, M. Model Uncertainty. Stat. Sci. 2004, 19, 81–94. [Google Scholar] [CrossRef]
- Hawkins, D.M. The Problem of Overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [Google Scholar] [CrossRef]
- Kumar, S.; Attri, S.D.; Singh, K.K. Comparison of Lasso and Stepwise Regression Technique for Wheat Yield Prediction. J. Agrometeorol. 2021, 21, 188–192. [Google Scholar] [CrossRef]
- Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian Model Averaging for Linear Regression Models. J. Am. Stat. Assoc. 1997, 92, 179–191. [Google Scholar] [CrossRef]
- Raftery, A.E.; Hoeting, J.A.; Volinsky, C.T.; Painter, I.; Yeung, K.Y. Package “BMA”. Available online: https://cran.r-project.org/web/packages/BMA/BMA.pdf (accessed on 11 July 2024).
- Han, H. A Method to Adjust a Prior Distribution in Bayesian Second-Level fMRI Analysis. PeerJ 2021, 9, e10861. [Google Scholar] [CrossRef] [PubMed]
- Raftery, A.E.; Painter, I.S.; Volinsky, C.T. BMA: An R Package for Bayesian Model Averaging. Newsl. R Proj. 2005, 5, 2–8. [Google Scholar]
- Hinne, M.; Gronau, Q.F.; Van den Bergh, D.; Wagenmakers, E.-J. A Conceptual Introduction to Bayesian Model Averaging. Adv. Methods Pract. Psychol. Sci. 2020, 3, 200–215. [Google Scholar] [CrossRef]
- Yarkoni, T.; Westfall, J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect. Psychol. Sci. 2017, 53, 174569161769339. [Google Scholar] [CrossRef] [PubMed]
- Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R.; Narasimhan, B.; Tay, K.; Simon, N.; Qian, J. Glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. Available online: https://cran.r-project.org/web/packages/glmnet/index.html (accessed on 11 July 2024).
- Hastie, T.; Qian, J. Glmnet Vignette. Available online: https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html (accessed on 11 July 2024).
- Kim, M.-H.; Banerjee, S.; Park, S.M.; Pathak, J. Improving Risk Prediction for Depression via Elastic Net Regression-Results from Korea National Health Insurance Services Data. In AMIA Annual Symposium Proceedings; AMIA Symposium: San Francisco, CA, USA, 2016; Volume 2016, pp. 1860–1869. [Google Scholar]
- Finch, W.H.; Hernandez Finch, M.E. Regularization Methods for Fitting Linear Models with Small Sample Sizes: Fitting the Lasso Estimator Using R. Pract. Assess. Res. Eval. 2019, 21, 7. [Google Scholar] [CrossRef]
- Doebler, P.; Doebler, A.; Buczak, P.; Groll, A. Interactions of Scores Derived from Two Groups of Variables: Alternating Lasso Regularization Avoids Overfitting and Finds Interpretable Scores. Psychol. Methods 2023, 28, 422–437. [Google Scholar] [CrossRef]
- Fei, S.; Chen, Z.; Li, L.; Ma, Y.; Xiao, Y. Bayesian Model Averaging to Improve the Yield Prediction in Wheat Breeding Trials. Agric. For. Meteorol. 2023, 328, 109237. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, W.; Bakhai, A. Comparison of Bayesian Model Averaging and Stepwise Methods for Model Selection in Logistic Regression. Stat. Med. 2004, 23, 3451–3467. [Google Scholar] [CrossRef]
- Heinze, G.; Dunkler, D. Five Myths about Variable Selection. Transpl. Int. 2017, 30, 6–10. [Google Scholar] [CrossRef]
- Han, H. Exploring the Relationship between Purpose and Moral Psychological Indicators. Ethics Behav. 2022, 34, 28–39. [Google Scholar] [CrossRef]
- Davis, M.H. Measuring Individual Differences in Empathy: Evidence for a Multidimensional Approach. J. Personal. Soc. Psychol. 1983, 44, 113–126. [Google Scholar] [CrossRef]
- Aquino, K.; Reed, A. The Self-Importance of Moral Identity. J. Personal. Soc. Psychol. 2002, 83, 1423–1440. [Google Scholar] [CrossRef] [PubMed]
- Choi, Y.-J.; Han, H.; Dawson, K.J.; Thoma, S.J.; Glenn, A.L. Measuring Moral Reasoning Using Moral Dilemmas: Evaluating Reliability, Validity, and Differential Item Functioning of the Behavioural Defining Issues Test (bDIT). Eur. J. Dev. Psychol. 2019, 16, 622–631. [Google Scholar] [CrossRef]
- Han, H.; Dawson, K.J.; Choi, Y.R.; Choi, Y.-J.; Glenn, A.L. Development and Validation of the English Version of the Moral Growth Mindset Measure [Version 3; Peer Review: 4 Approved]. F1000Research 2020, 9, 256. [Google Scholar] [CrossRef] [PubMed]
- Bronk, K.C.; Riches, B.R.; Mangan, S.A. Claremont Purpose Scale: A Measure That Assesses the Three Dimensions of Purpose among Adolescents. Res. Hum. Dev. 2018, 15, 101–117. [Google Scholar] [CrossRef]
- McGrath, R.E. A Summary of Construct Validity Evidence for Two Measures of Character Strengths. J. Personal. Assess. 2023, 105, 302–313. [Google Scholar] [CrossRef] [PubMed]
- Blackburn, A.M.; Vestergren, S.; the COVIDiSTRESS II Consortium. COVIDiSTRESS Diverse Dataset on Psychological and Behavioural Outcomes One Year into the COVID-19 Pandemic. Sci. Data 2022, 9, 331. [Google Scholar] [CrossRef] [PubMed]
- Han, H. Trust in the Scientific Research Community Predicts Intent to Comply with COVID-19 Prevention Measures: An Analysis of a Large-Scale International Survey Dataset. Epidemiol. Infect. 2022, 150, e36. [Google Scholar] [CrossRef]
- Han, H. Testing the Validity of the Modified Vaccine Attitude Question Battery across 22 Languages with a Large-Scale International Survey Dataset: Within the Context of COVID-19 Vaccination. Hum. Vaccines Immunother. 2022, 18, 2024066. [Google Scholar] [CrossRef]
- De Rooij, M.; Weeda, W. Cross-Validation: A Method Every Psychologist Should Know. Adv. Methods Pract. Psychol. Sci. 2020, 3, 248–263. [Google Scholar] [CrossRef]
- Bengio, Y.; Grandvalet, Y. No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Adv. Neural Inf. Process. Syst. 2003, 16, 513–520. [Google Scholar]
- Tuarob, S.; Tucker, C.S.; Kumara, S.; Giles, C.L.; Pincus, A.L.; Conroy, D.E.; Ram, N. How Are You Feeling?: A Personalized Methodology for Predicting Mental States from Temporally Observable Physical and Behavioral Information. J. Biomed. Inform. 2017, 68, 1–19. [Google Scholar] [CrossRef] [PubMed]
- Lorenz, E.; Remund, J.; Müller, S.C.; Traunmüller, W.; Steinmaurer, G.; Pozo, D.; Ruiz-Arias, J.A.; Fanego, V.L.; Ramirez, L.; Romeo, M.G.; et al. Benchmarking of Different Approaches to Forecast Solar Irradiance. In Proceedings of the 24th European Photovoltaic Solar Energy Conference, Hamburg Germany, 21–25 September 2009; pp. 21–25. [Google Scholar]
- Morey, R.D.; Rouder, J.N.; Jamil, T.; Urbanek, K.; Ly, A. Package ‘BayesFactor. Available online: https://cran.r-project.org/web/packages/BayesFactor/BayesFactor.pdf (accessed on 11 July 2024).
- Berry, D.A.; Hochberg, Y. Bayesian Perspectives on Multiple Comparisons. J. Stat. Plan. Inference 1999, 82, 215–227. [Google Scholar] [CrossRef]
- Wagenmakers, E.-J.; Love, J.; Marsman, M.; Jamil, T.; Ly, A.; Verhagen, J.; Selker, R.; Gronau, Q.F.; Dropmann, D.; Boutin, B.; et al. Bayesian Inference for Psychology. Part II: Example Applications with JASP. Psychon. Bull. Rev. 2018, 25, 58–76. [Google Scholar] [CrossRef] [PubMed]
- Meskó, N.; Kowal, M.; Láng, A.; Kocsor, F.; Bandi, S.A.; Putz, A.; Sorokowski, P.; Frederick, D.A.; García, F.E.; Aguilar, L.A.; et al. Exploring Attitudes Toward “Sugar Relationships” across 87 Countries: A Global Perspective on Exchanges of Resources for Sex and Companionship. Arch. Sex. Behav. 2024, 53, 811–837. [Google Scholar] [CrossRef] [PubMed]
- Kass, R.E.; Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
- Ahmed, S.E.; Hossain, S.; Doksum, K.A. LASSO and Shrinkage Estimation in Weibull Censored Regression Models. J. Stat. Plan. Inference 2012, 142, 1273–1284. [Google Scholar] [CrossRef]
- Scaliti, E.; Pullar, K.; Borghini, G.; Cavallo, A.; Panzeri, S.; Becchio, C. Kinematic Priming of Action Predictions. Curr. Biol. 2023, 33, 2717–2727.e6. [Google Scholar] [CrossRef]
- Štěrba, Z.; Šašinka, Č.; Stachoň, Z.; Kubíček, P.; Tamm, S. Mixed Research Design in Cartography: A Combination of Qualitative and Quantitative Approaches. Kartographische Nachrichten 2014, 64, 262–269. [Google Scholar] [CrossRef]
- Conn, V.S.; Chan, K.C.; Cooper, P.S. The Problem With p. West. J. Nurs. Res. 2014, 36, 291–293. [Google Scholar] [CrossRef] [PubMed]
- Berger, J.O.; Sellke, T. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence. J. Am. Stat. Assoc. 1987, 82, 112–122. [Google Scholar] [CrossRef]
- Wasserstein, R.L.; Lazar, N.A. The ASA’s Statement on p-Values: Context, Process,\r\nand Purpose. Am. Stat. 2016, 70, 129–133. [Google Scholar] [CrossRef]
- Cohen, J. The Earth Is Round (p < 0.05). Am. Psychol. 1994, 49, 997–1003. [Google Scholar] [CrossRef]
- Raftery, A.E. Bayesian Model Selection in Social Research. Sociol. Methodol. 1995, 25, 111. [Google Scholar] [CrossRef]
- Dreisbach, C.; Maki, K. A Comparison of Hypothesis-Driven and Data-Driven Research: A Case Study in Multimodal Data Science in Gut-Brain Axis Research. CIN Comput. Inform. Nurs. 2023, 41, 497–506. [Google Scholar] [CrossRef] [PubMed]
- Mizumoto, A. Calculating the relative importance of multiple regression predictor variables using dominance analysis and random forests. Lang. Learn. 2023, 73, 161–196. [Google Scholar] [CrossRef]
- Lee, Y.; Song, J. Robustness of model averaging methods for the violation of standard linear regression assumptions. Commun. Stat. Appl. Methods 2021, 28, 189–204. [Google Scholar] [CrossRef]
- Fragoso, T.M.; Bertoli, W.; Louzada, F. Bayesian model averaging: A systematic review and conceptual classification. Int. Stat. Rev. 2018, 86, 1–28. [Google Scholar] [CrossRef]
BMA vs. LASSO | BMA vs. Stepwise | LASSO vs. Stepwise | ||||
---|---|---|---|---|---|---|
2log(BF) | Cohen’s d | 2log(BF) | Cohen’s d | 2log(BF) | Cohen’s d | |
CPS (full) | 1032.83 | 1.35 | 566.63 | 0.88 | 229.50 | −0.52 |
GACS | 16.67 | 0.15 | 380.59 | −0.69 | 460.33 | −0.77 |
Trust | 29.76 | 0.19 | 94.39 | −0.33 | 130.69 | −0.38 |
CPS (n = 100) | 37.96 | −0.21 | 332.11 | −0.64 | 46.07 | −0.23 |
CPS (n = 200) | 473.10 | 0.79 | 38.38 | 0.21 | 245.62 | −0.54 |
CPS (n = 400) | 721.52 | 1.04 | 61.59 | 0.27 | 426.50 | −0.74 |
CPS (n = 800) | 713.33 | 1.03 | 239.52 | 0.53 | 278.34 | −0.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, H. Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations. Stats 2024, 7, 732-744. https://doi.org/10.3390/stats7030044
Han H. Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations. Stats. 2024; 7(3):732-744. https://doi.org/10.3390/stats7030044
Chicago/Turabian StyleHan, Hyemin. 2024. "Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations" Stats 7, no. 3: 732-744. https://doi.org/10.3390/stats7030044
APA StyleHan, H. (2024). Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations. Stats, 7(3), 732-744. https://doi.org/10.3390/stats7030044