Modelling Student Retention in Tutorial Classes with Uncertainty—A Bayesian Approach to Predicting Attendance-Based Retention
Abstract
:1. Introduction
2. Literature Review
2.1. Types of Data Analytics in Higher Education
2.1.1. Academic Analytics
2.1.2. Educational Data Mining
2.1.3. Learning Analytics
2.2. Bayesian Modelling
2.2.1. Bayesian Models
- Probability distributions: Probability distributions are used to represent unknown quantities, known as parameters.
- Bayes theorem: Bayes theorem is employed as a mechanism to update the parameter values based on the available data.
- Creating a model by combining and transforming random variables, based on assumptions about how the data were generated, using available data.
- Using Bayes theorem to condition the model to the available data. This process is called inference, resulting in the posterior distribution. While this step is expected to reduce uncertainty in possible parameter values, it is not guaranteed.
- Critiquing the model by evaluating whether it aligns with different criteria, such as the available data and domain-knowledge expertise. This step is necessary due to the uncertainties that practitioners or researchers may have about the model, sometimes requiring comparison with other models.
2.2.2. Bayesian Inference
2.3. The Use of Predictive Analytics in Modelling Student Retention
3. Materials and Methods
3.1. Data Collection and Understanding
3.2. Data Preprocessing and Transformation
3.3. Modelling
3.3.1. Random Forest Regressor
3.3.2. Bayesian Additive Regression Trees
3.3.3. Evaluation
4. Results
4.1. Descriptive Analysis
4.2. Model Evaluation
4.3. BART Model Highest Density Interval (HDI) Estimates
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shafiq, D.A.; Marjani, M.; Habeeb, R.A.A.; Asirvatham, D. Student Retention Using Educational Data Mining and Predictive Analytics: A Systematic Literature Review. IEEE Access 2022, 10, 72480–72503. [Google Scholar] [CrossRef]
- Uliyan, D.; Aljaloud, A.S.; Alkhalil, A.; Amer, H.S.A.; Mohamed, M.A.E.A.; Alogali, A.F.M. Deep Learning Model to Predict Student Retention Using BLSTM and CRF. IEEE Access 2021, 9, 135550–135558. [Google Scholar] [CrossRef]
- Nguyen, A.; Gardner, L.; Sheridan, D. Data Analytics in Higher Education: An Integrated View. J. Inf. Syst. Educ. 2020, 31, 61–71. [Google Scholar]
- Trivedi, S. Improving Students’ Retention Using Machine Learning: Impacts and Implications. Sci. Prepr. 2022. [Google Scholar] [CrossRef]
- Cardona, T.A.; Cudney, E.A. Predicting Student Retention Using Support Vector Machines. Procedia Manuf. 2019, 39, 1827–1833. [Google Scholar] [CrossRef]
- Palacios, C.A.; Reyes-Suárez, J.A.; Bearzotti, L.A.; Leiva, V.; Marchant, C. Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile. Entropy 2021, 23, 485. [Google Scholar] [CrossRef] [PubMed]
- Arqawi, S.M.; Zitawi, E.A.; Rabaya, A.H.; Abunasser, B.S.; Abu-Naser, S.S. Predicting University Student Retention Using Artificial Intelligence. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 315–324. [Google Scholar] [CrossRef]
- Meeter, M. Predicting Retention in Higher Education from High-Stakes Exams or School GPA. Educ. Assess. 2022, 28, 1–10. [Google Scholar] [CrossRef]
- Wakelin, E. Personal Tutoring in Higher Education: An Action Research Project on How to Improve Personal Tutoring for Both Staff and Students. Educ. Action Res. 2021, 31, 998–1013. [Google Scholar] [CrossRef]
- Caballero, B.F. Higher Education: Factors and Strategies for Student Retention. HETS Online J. 2020, 10, 82–105. [Google Scholar] [CrossRef]
- Bertolini, R.; Finch, S.J.; Nehm, R.H. An Application of Bayesian Inference to Examine Student Retention and Attrition in the STEM Classroom. Front. Educ. 2023, 8, 1073829. [Google Scholar] [CrossRef]
- Nimy, E.; Mosia, M.; Chibaya, C. Identifying At-Risk Students for Early Intervention—A Probabilistic Machine Learning Approach. Appl. Sci. 2023, 13, 3869. [Google Scholar] [CrossRef]
- Nimy, E.; Mosia, M. Web-Based Clustering Application for Determining and Understanding Student Engagement Levels in Virtual Learning Environments. Psychol. Rev. 2023, 33, 863–882. [Google Scholar] [CrossRef]
- Murphy, K.P. Probabilistic Machine Learning: An Introduction; MIT Press: London, UK, 2021; pp. 1–2. [Google Scholar]
- Susnjak, T.; Ramaswami, G.S.; Mathrani, A. Learning Analytics Dashboard: A Tool for Providing Actionable Insights to Learners. Int. J. Educ. Technol. High. Educ. 2022, 19, 12. [Google Scholar] [CrossRef] [PubMed]
- Campbell, J.P.; DeBlois, P.B.; Oblinger, D.G. Academic Analytics: A New Tool for a New Era. EDUCAUSE Rev. 2007, 42, 40–57. [Google Scholar]
- Mosia, M.S. Periodisation of Mathematics Teacher Knowledge for Teaching: A Construction of Bricolage. S. Afr. Rev. Educ. Prod. 2016, 22, 134–151. [Google Scholar]
- Baker, R.S.J.D. Data Mining for Education. Int. Encycl. Educ. 2010, 7, 112–118. [Google Scholar]
- Mohamad, S.K.; Tasir, Z. Educational Data Mining: A Review. Procedia Soc. Behav. Sci. 2013, 97, 320–324. [Google Scholar] [CrossRef]
- Society for Learning Analytics Research (SoLAR). What Is Learning Analytics? 2019. Available online: https://www.solaresearch.org/about/what-is-learning-analytics/ (accessed on 10 April 2023).
- Martin, O.A.; Kumar, R.; Lao, J. Bayesian Modeling and Computation in Python; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
- Yadav, S.K.; Bharadwaj, B.; Pal, S. Mining Education Data to Predict Student’s Retention: A Comparative Study. arXiv 2012, arXiv:1203.2987. [Google Scholar]
- Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. From Data Mining to Knowledge Discovery in Databases. AI Mag. 1996, 17, 37. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, D.R.; Stevens, J.R. Random Forests. In Ensemble Machine Learning; Zhang, C., Ma, Y., Eds.; Springer: New York, NY, USA, 2012; pp. 157–176. [Google Scholar] [CrossRef]
- Segal, M.R. Machine Learning Benchmarks and Random Forest Regression. UCSF: Center for Bioinformatics and Molecular Biostatistics. 2004. Available online: https://escholarship.org/uc/item/35x3v9t4 (accessed on 26 July 2023).
- Chipman, H.A.; George, E.I.; McCulloch, R.E. BART: Bayesian Additive Regression Trees. Ann. Appl. Stat. 2010, 4, 266–298. [Google Scholar] [CrossRef]
- Hill, J.; Linero, A.; Murray, J. Bayesian Additive Regression Trees: A Review and Look Forward. Annu. Rev. Stat. Its Appl. 2020, 7, 251–278. [Google Scholar] [CrossRef]
- Zhang, T.; Geng, G.; Liu, Y.; Chang, H.H. Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components. Atmosphere 2020, 11, 1233. [Google Scholar] [CrossRef] [PubMed]
- Um, S. Bayesian Additive Regression Trees for Multivariate Responses. Ph.D. Thesis, The Florida State University, Tallahassee, FL, USA, 2021. Available online: https://purl.lib.fsu.edu/diginole/2021_Summer_Um_fsu_0071E_16667 (accessed on 20 July 2023).
- Angelopoulos, A.N.; Bates, S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. arXiv 2021, arXiv:2107.07511. [Google Scholar]
Variable | Description |
---|---|
Tutorial date | Dates in which students attended tutorial classes. |
Encoded student number | Anonymized student number of students. Each anonymized student number serves as a unique identifier for each student. |
Variable | Description |
---|---|
Cohort | The date students started attending tutorials. |
Period | The date students stopped attending tutorials. |
Cohort Age | The difference between period and cohort, in days. |
Students | The number of students that started attending tutorials for a particular cohort. |
Active Students | The number of students currently attending tutorials at a particular period. |
Retention | The number of active students divided by the number of students. |
Variable | Mean | Median | Standard Deviation | Max | Min |
---|---|---|---|---|---|
Cohort | - | - | - | 1 October 2022 | 1 January 2022 |
Period | - | - | - | 1 December 2022 | 1 March 2022 |
Cohort Age | 141.6226 | 122 | 85.6869 | 344 | 28 |
Students | 183.6038 | 80 | 155.0793 | 421 | 10 |
Active Students | 50.8679 | 32 | 65.1652 | 263 | 1 |
Retention | 33.34% | 24.23% | 28.54% | 94.92% | 0.45% |
Model | R2 Score | MAE | RMSE | MedAE | Max Error | Min Error |
---|---|---|---|---|---|---|
BART | 0.9414 | 4.75% | 6.85% | 3% | 19% | 0% |
RFR | 0.9150 | 6.66% | 8.25% | 6% | 20% | 0% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nimy, E.; Mosia, M. Modelling Student Retention in Tutorial Classes with Uncertainty—A Bayesian Approach to Predicting Attendance-Based Retention. Educ. Sci. 2024, 14, 830. https://doi.org/10.3390/educsci14080830
Nimy E, Mosia M. Modelling Student Retention in Tutorial Classes with Uncertainty—A Bayesian Approach to Predicting Attendance-Based Retention. Education Sciences. 2024; 14(8):830. https://doi.org/10.3390/educsci14080830
Chicago/Turabian StyleNimy, Eli, and Moeketsi Mosia. 2024. "Modelling Student Retention in Tutorial Classes with Uncertainty—A Bayesian Approach to Predicting Attendance-Based Retention" Education Sciences 14, no. 8: 830. https://doi.org/10.3390/educsci14080830
APA StyleNimy, E., & Mosia, M. (2024). Modelling Student Retention in Tutorial Classes with Uncertainty—A Bayesian Approach to Predicting Attendance-Based Retention. Education Sciences, 14(8), 830. https://doi.org/10.3390/educsci14080830