Predicting Course Engagement with Machine Learning Techniques †
Abstract
1. Introduction
2. Literature Review
Knowledge Gaps and Limitations in Existing Work
- Overfitting Risks: Ensemble methods like Gradient Boosting faced overfitting challenges in some studies [5].
References | Methodologies | Accuracy |
---|---|---|
[1] | Supervised algorithms using Moodle logs | N/A |
[2] | Decision Tree, SVM, ANN | ANN: 85% |
[3] | Random Forest, Clustering | 84.10% |
[5] | Gradient Boosting, Random Forest, Neural Network | Random Forest: 95% |
[8] | Random Forest with feature engineering | 94.40% |
[11] | CATBoost, XGBoost | CATBoost: 94.40% |
[12] | J48, JRIP, Gradient Booster | J48: 88.52% |
3. Methodology
3.1. Dataset Preparation
3.2. Model Selection
- Decision Tree: Decision trees are chosen as the primary example due to their simplicity and interpretability. They provide a clear, principled basis for decision making, making it easy to understand the factors that influence engagement. While they tend to be quite complex, they can serve as a useful starting point for comparison with more complex models.
- Random Forest: This clustering method aggregates multiple decision trees, which helps improve the overall model by reducing the bias and variability associated with individual decision trees. Random Forests are very useful when dealing with large datasets and are less likely to be added to a single decision tree. They are a strong candidate for this study because of their ability to handle data, including statistical data.
- Gradient Booster: This model was chosen due to its strong ability to capture complex relationships and non-linearities in the data. Gradient boosting works by adding decision trees to correct for errors made by previous decision trees. This iterative process improves the accuracy of the model, which is particularly useful for datasets with confounding factors, such as predicting student enrollment.
- Naive Bayes: Naive Bayes is a probability model that assumes independence between estimates and is simple and efficient for tasks involving categorical data. Despite its simplicity, Naive Bayes works well when independence assumptions are met. It was chosen to study how the program reflects participatory politics.
- KNN (K-Nearest Neighbors): Number of their nearest neighbors. Although KNN is intuitive and easy to implement, it is very sensitive to the size of the data, which can affect its performance if not implemented properly. It aims to examine how a simple non-parametric model performs in this context.
3.3. Evaluation Metrices
- Accuracy: The ratio of correctly predicted instances to all instances. This measure gives a global evaluation of how well the model is performing.
- Precision: The ratio of true positive predictions over all positive predictions. Precision is particularly crucial when the false positive cost is extremely high, like in the case of predicting a student is active when they are not.
- Recall: Number of true positive predictions out of all actual positives. Recall is desirable when the aim is to minimize false negatives, i.e., failing to find students at risk of disengagement.
- F1-Score: Harmonic mean of precision and recall, providing an equal weight to both precision and recall. This is helpful for imbalanced datasets, where class distribution may favor one class compared to another.
4. Results
4.1. Random Forest
4.2. Decision Tree
4.3. Gradient Booster
4.4. Naïve Bayes
4.5. KNN (K-Nearest Neighbors)
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Torun, E.D. Online Distance Learning in Higher Education: E-Learning Readiness as a Predictor of Academic Achievement. Open Praxis 2020, 12, 191–208. [Google Scholar] [CrossRef]
- Naik, V.; Kamat, V. Predicting Engagement Using Machine Learning Techniques. In Proceedings of the International Conference on Computers in Education (ICCE 2018), Manila, Philippines, 26–30 November 2018. [Google Scholar]
- Orji, F.; Vassileva, J. Using Machine Learning to Explore the Relation Between Student Engagement and Student Performance. In Proceedings of the 24th International Conference Information Visualization, Melbourne, Australia, 7–11 September 2020. [Google Scholar]
- Soffer, T.; Cohen, A. Students’ Engagement Characteristics Predict Success and Completion of Online Courses. J. Comput. Assist. Learn. 2019, 35, 378–389. [Google Scholar] [CrossRef]
- Ayouni, S.; Hajjej, F.; Maddeh, M.; Al-Otaibi, S. A New ML-Based Approach to Enhance Student Engagement in the Online Environment. PLoS ONE 2021, 16, e0258788. [Google Scholar] [CrossRef] [PubMed]
- Alshabandar, R.; Hussain, A.; Keight, R.; Khan, W. Students Performance Prediction in Online Courses Using Machine Learning Algorithms. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar]
- Ben Brahim, G. Predicting Student Performance from Online Engagement Activities Using Novel Statistical Features. Arab. J. Sci. Eng. 2022, 47, 10225–10243. [Google Scholar] [CrossRef] [PubMed]
- Alruwais, N.; Zakariah, M. Student-Engagement Detection in Classroom Using Machine Learning Algorithm. Electronics 2023, 12, 731. [Google Scholar] [CrossRef]
- Vezne, R.; Yildiz Durak, H.; Atman Uslu, N. Online Learning in Higher Education: Examining the Predictors of Students’ Online Engagement. Educ. Inf. Technol. 2023, 28, 1865–1889. [Google Scholar] [CrossRef] [PubMed]
- Toti, D.; Capuano, N.; Campos, F.; Dantas, M.; Neves, F.; Caballé, S. Detection of Student Engagement in E-Learning Systems Based on Semantic Analysis and Machine Learning. In Advances on P2P, Parallel, Grid, Cloud and Internet Computing; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A New Model for Predicting Component-Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
- Kok, S.H.; Abdullah, A.; Jhanjhi, N.Z.; Supramaniam, M. A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 2019, 12, 8–15. [Google Scholar]
Model | Accuracy | Strengths |
---|---|---|
Decision Tree | 96% | Simple, interpretable, handles categorial/numerical data |
Random Forest | 96% | Reduces overfitting, handles high-dimensional datasets |
Gradient Booster | 95% | Excels in imbalanced data and nuanced patterns |
Naïve Bayes | 82% | Efficient, works well for small datasets |
KNN | 71% | Simple and intuitive |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali, F.Z.; Ayazuddin, R.; Sanjaya, I. Predicting Course Engagement with Machine Learning Techniques. Eng. Proc. 2025, 107, 46. https://doi.org/10.3390/engproc2025107046
Ali FZ, Ayazuddin R, Sanjaya I. Predicting Course Engagement with Machine Learning Techniques. Engineering Proceedings. 2025; 107(1):46. https://doi.org/10.3390/engproc2025107046
Chicago/Turabian StyleAli, Fayez Zulfiqar, Rizwan Ayazuddin, and Imam Sanjaya. 2025. "Predicting Course Engagement with Machine Learning Techniques" Engineering Proceedings 107, no. 1: 46. https://doi.org/10.3390/engproc2025107046
APA StyleAli, F. Z., Ayazuddin, R., & Sanjaya, I. (2025). Predicting Course Engagement with Machine Learning Techniques. Engineering Proceedings, 107(1), 46. https://doi.org/10.3390/engproc2025107046