Previous Article in Journal
MCMC Methods: From Theory to Distributed Hamiltonian Monte Carlo over PySpark
Previous Article in Special Issue
Enhancing Discoverability: A Metadata Framework for Empirical Research in Theses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education

by
Abdelkarim Bettahi
1,*,
Fatima-Zahra Belouadha
1 and
Hamid Harroud
2
1
AMIPS Research Team, E3S Research Center, Computer Science Department, Mohammadia School of Engineers, Mohammed V University in Rabat, Avenue Ibn Sina B.P. 765, Rabat 10090, Morocco
2
School of Science and Engineering, Al Akhawayn University in Ifrane, Ifrane 53000, Morocco
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(10), 662; https://doi.org/10.3390/a18100662 (registering DOI)
Submission received: 31 August 2025 / Revised: 4 October 2025 / Accepted: 16 October 2025 / Published: 18 October 2025

Abstract

Student dropout remains a persistent challenge in higher education, with substantial personal, institutional, and societal costs. We developed a modular dropout prediction pipeline that couples data preprocessing with multi-model benchmarking and a governance-ready explainability layer. Using 17,883 undergraduate records from a Moroccan higher education institution, we evaluated nine algorithms (logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), gradient boosting, Extreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and multilayer perceptron (MLP)). On our test set, XGBoost attained an area under the receiver operating characteristic curve (AUC–ROC) of 0.993, F1-score of 0.911, and recall of 0.944. Subgroup reporting supported governance and fairness: across credit–load bins, recall remained high and stable (e.g., <9 credits: precision 0.85, recall 0.932; 9–12: 0.886/0.969; >12: 0.915/0.936), with full TP/FP/FN/TN provided. A Shapley additive explanations (SHAP)-based layer identified risk and protective factors (e.g., administrative deadlines, cumulative GPA, and passed-course counts), surfaced ambiguous and anomalous cases for human review, and offered case-level diagnostics. To assess generalization, we replicated our findings on a public dataset (UCI–Portugal; tables only): XGBoost remained the top-ranked (F1-score 0.792, AUC–ROC 0.922). Overall, boosted ensembles combined with SHAP delivered high accuracy, transparent attribution, and governance-ready outputs, enabling responsible early-warning implementation for student retention.
Keywords: student dropout prediction; machine learning; ensemble learning; explainable AI; SHAP; higher education analytics; early-warning systems; educational data mining; student retention student dropout prediction; machine learning; ensemble learning; explainable AI; SHAP; higher education analytics; early-warning systems; educational data mining; student retention

Share and Cite

MDPI and ACS Style

Bettahi, A.; Belouadha, F.-Z.; Harroud, H. A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms 2025, 18, 662. https://doi.org/10.3390/a18100662

AMA Style

Bettahi A, Belouadha F-Z, Harroud H. A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms. 2025; 18(10):662. https://doi.org/10.3390/a18100662

Chicago/Turabian Style

Bettahi, Abdelkarim, Fatima-Zahra Belouadha, and Hamid Harroud. 2025. "A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education" Algorithms 18, no. 10: 662. https://doi.org/10.3390/a18100662

APA Style

Bettahi, A., Belouadha, F.-Z., & Harroud, H. (2025). A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms, 18(10), 662. https://doi.org/10.3390/a18100662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop