This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education
by
Abdelkarim Bettahi
Abdelkarim Bettahi 1,*
,
Fatima-Zahra Belouadha
Fatima-Zahra Belouadha 1 and
Hamid Harroud
Hamid Harroud 2
1
AMIPS Research Team, E3S Research Center, Computer Science Department, Mohammadia School of Engineers, Mohammed V University in Rabat, Avenue Ibn Sina B.P. 765, Rabat 10090, Morocco
2
School of Science and Engineering, Al Akhawayn University in Ifrane, Ifrane 53000, Morocco
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(10), 662; https://doi.org/10.3390/a18100662 (registering DOI)
Submission received: 31 August 2025
/
Revised: 4 October 2025
/
Accepted: 16 October 2025
/
Published: 18 October 2025
Abstract
Student dropout remains a persistent challenge in higher education, with substantial personal, institutional, and societal costs. We developed a modular dropout prediction pipeline that couples data preprocessing with multi-model benchmarking and a governance-ready explainability layer. Using 17,883 undergraduate records from a Moroccan higher education institution, we evaluated nine algorithms (logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), gradient boosting, Extreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and multilayer perceptron (MLP)). On our test set, XGBoost attained an area under the receiver operating characteristic curve (AUC–ROC) of 0.993, F1-score of 0.911, and recall of 0.944. Subgroup reporting supported governance and fairness: across credit–load bins, recall remained high and stable (e.g., <9 credits: precision 0.85, recall 0.932; 9–12: 0.886/0.969; >12: 0.915/0.936), with full TP/FP/FN/TN provided. A Shapley additive explanations (SHAP)-based layer identified risk and protective factors (e.g., administrative deadlines, cumulative GPA, and passed-course counts), surfaced ambiguous and anomalous cases for human review, and offered case-level diagnostics. To assess generalization, we replicated our findings on a public dataset (UCI–Portugal; tables only): XGBoost remained the top-ranked (F1-score 0.792, AUC–ROC 0.922). Overall, boosted ensembles combined with SHAP delivered high accuracy, transparent attribution, and governance-ready outputs, enabling responsible early-warning implementation for student retention.
Share and Cite
MDPI and ACS Style
Bettahi, A.; Belouadha, F.-Z.; Harroud, H.
A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms 2025, 18, 662.
https://doi.org/10.3390/a18100662
AMA Style
Bettahi A, Belouadha F-Z, Harroud H.
A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms. 2025; 18(10):662.
https://doi.org/10.3390/a18100662
Chicago/Turabian Style
Bettahi, Abdelkarim, Fatima-Zahra Belouadha, and Hamid Harroud.
2025. "A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education" Algorithms 18, no. 10: 662.
https://doi.org/10.3390/a18100662
APA Style
Bettahi, A., Belouadha, F.-Z., & Harroud, H.
(2025). A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education. Algorithms, 18(10), 662.
https://doi.org/10.3390/a18100662
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.