Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts
Abstract
1. Introduction
2. Related Work
2.1. Academic Risk Prediction
2.2. Reflection of the Existing Literature
2.3. Explainable Artificial Intelligence (XAI) and SHAP
3. Materials and Methods
3.1. Research Questions
- RQ1: At the overall level, what factors and how do they predict students’ academic risk under a specific background?
- RQ2: At the individual level, what factors and how do they predict students’ academic risk under a specific background?
3.2. Data and Features
3.2.1. Data Collection and Feature Summary
3.2.2. Data Characteristics and Associated Methodological Challenges
3.3. Explainable Prediction Model Based on SHAP
3.3.1. The Optimal ML Model
3.3.2. Configuration and Implementation Details of SHAP
4. Results
4.1. Factors Predicting Academic Risk at the Overall Level
4.1.1. Factors Predicting Academic Risk
4.1.2. How Specific Factors Influence the Prediction of Academic Risk
- Learning support from learning peers (LpLrng)
- Comprehensive subjects score of College entrance exam (ExamCmpr)
- Self-study (SlfStdy)
- Truancy level (Truant)
- Math score (ExamMth)
4.2. Factors Predicting Academic Risk at the Individual Level
- 1.
- Student with high potential academic risk
- 2.
- Student with low potential academic risk
5. Discussion
5.1. The Influence of Learning Peers on Academic Risk
5.2. The Role of XAI in EDM
5.3. Exploring High-Dimensional and Small Datasets in Education
5.4. Considerations of Context and Generalizability
6. Conclusions and Future Plan
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- OECD. Education at a Glance 2022: OECD Indicators; OECD: Paris, France, 2022. [Google Scholar]
- Spight, D.B. Undeclared versus declared: Who is more likely to graduate? J. Coll. Stud. Retent. Res. Theory Pract. 2022, 23, 945–964. [Google Scholar] [CrossRef]
- Abu Saa, A.; Al-Emran, M.; Shaalan, K. Factors affecting students’ performance in higher education: A systematic review of predictive data mining techniques. Technol. Knowl. Learn. 2019, 24, 567–598. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Educational data mining and learning analytics: An updated survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
- Mi, J.-X.; Li, A.-D.; Zhou, L.-F. Review study of interpretation methods for future interpretable machine learning. IEEE Access 2020, 8, 191969–191985. [Google Scholar] [CrossRef]
- Roslan, M.B.; Chen, C. Educational data mining for student performance prediction: A systematic literature review (2015–2021). Int. J. Emerg. Technol. Learn. (iJET) 2022, 17, 147–179. [Google Scholar] [CrossRef]
- Albreiki, B.; Zaki, N.; Alashwal, H. A systematic literature review of student’performance prediction using machine learning techniques. Educ. Sci. 2021, 11, 552. [Google Scholar] [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Fiok, K.; Farahani, F.V.; Karwowski, W.; Ahram, T. Explainable artificial intelligence for education and training. J. Def. Model. Simul. 2022, 19, 133–144. [Google Scholar] [CrossRef]
- Swamy, V.; Frej, J.; Käser, T. The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations. J. Artif. Intell. Res. 2025, 84, 2–7. [Google Scholar] [CrossRef]
- Ashfaq, U.; Booma, P.; Mafas, R. Managing student performance: A predictive analytics using imbalanced data. Int. J. Recent Technol. Eng. 2020, 8, 2277–2283. [Google Scholar] [CrossRef]
- Zhang, Y.; Yun, Y.; An, R.; Cui, J.; Dai, H.; Shang, X. Educational data mining techniques for student performance prediction: Method review and comparison analysis. Front. Psychol. 2021, 12, 698490. [Google Scholar] [CrossRef]
- Nguyen, N.B.C.; Karunaratne, T. Learning analytics with small datasets—State of the art and beyond. Educ. Sci. 2024, 14, 608. [Google Scholar] [CrossRef]
- Fonteyne, L.; Duyck, W.; Fruyt, F.D.J.L.; Differences, I. Program-specific prediction of academic achievement on the basis of cognitive and non-cognitive factors. Learn. Individ. Differ. 2017, 56, 34–48. [Google Scholar] [CrossRef]
- Zimmerman, B.J.; Kitsantas, A.J.C.E.P. Comparing students’ self-discipline and self-regulation measures and their prediction of academic achievement. Contemp. Educ. Psychol. 2014, 39, 145–155. [Google Scholar] [CrossRef]
- Troll, E.S.; Friese, M.; Loschelder, D.D. How students’ self-control and smartphone-use explain their academic performance. Comput. Hum. Behav. 2021, 117, 106624. [Google Scholar] [CrossRef]
- Alyahyan, E.; Düştegör, D. Predicting academic success in higher education: Literature review and best practices. Int. J. Educ. Technol. High. Educ. 2020, 17, 3. [Google Scholar] [CrossRef]
- Goh, E.; Kim, H.J. Emotional intelligence as a predictor of academic performance in hospitality higher education. J. Hosp. Tour. Educ. 2021, 33, 140–146. [Google Scholar]
- Khan, A.; Ghosh, S.K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 2021, 26, 205–240. [Google Scholar] [CrossRef]
- Li, X.; Zhu, X.; Zhu, X.; Ji, Y.; Tang, X. Student Academic Performance Prediction Using Deep Multi-source Behavior Sequential Network. In Advances in Knowledge Discovery and Data Mining; Lauw, H., Wong, R., Ntoulas, A., Lim, E., Ng, S., Pan, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12084, pp. 567–579. [Google Scholar]
- Sokkhey, P.; Okazaki, T. Development and optimization of deep belief networks applied for academic performance prediction with larger datasets. IEIE Trans. Smart Process. Comput. 2020, 9, 298–311. [Google Scholar] [CrossRef]
- Ramanathan, K.; Thangavel, B. Minkowski Sommon feature map-based densely connected deep convolution network with LSTM for academic performance prediction. Concurr. Comput. Pract. Exp. 2021, 33, e6244. [Google Scholar] [CrossRef]
- Pallathadka, H.; Wenda, A.; Ramirez-Asís, E.; Asís-López, M.; Flores-Albornoz, J.; Phasinam, K. Classification and prediction of student performance data using various machine learning algorithms. Mater. Today Proc. 2023, 80, 3782–3785. [Google Scholar] [CrossRef]
- Arashpour, M.; Golafshani, E.M.; Parthiban, R.; Lamborn, J.; Kashani, A.; Li, H.; Farzanehfar, P. Predicting individual learning performance using machine-learning hybridized with the teaching-learning-based optimization. Comput. Appl. Eng. Educ. 2023, 31, 83–99. [Google Scholar] [CrossRef]
- Casillano, N.F.B.; Cantilang, K.W. Employing educational data mining techniques to predict programming students at-risk of dropping out. Indones. J. Electr. Eng. Comput. Sci. 2024, 35, 1219–1226. [Google Scholar] [CrossRef]
- Malik, S.; Patro, S.G.K.; Mahanty, C.; Hegde, R.; Naveed, Q.N.; Lasisi, A.; Buradi, A.; Emma, A.F.; Kraiem, N. Advancing educational data mining for enhanced student performance prediction: A fusion of feature selection algorithms and classification techniques with dynamic feature ensemble evolution. Sci. Rep. 2025, 15, 8738. [Google Scholar] [CrossRef]
- Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning—A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2019. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust You?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Aljohani, O. A comprehensive review of the major studies and theoretical models of student retention in higher education. High. Educ. Stud. 2016, 6, 1–18. [Google Scholar] [CrossRef]
- Vygotsky, L.S. Mind in Society: The Development of Higher Psychological Processes; Harvard University Press: Cambridge, MA, USA, 1978; Volume 86. [Google Scholar]
- Bandura, A. Social cognitive theory of moral thought and action. In Handbook of Moral Behavior and Development; Psychology Press: East Sussex, UK, 2014; pp. 45–103. [Google Scholar]
- Rienties, B.; Nolan, E.-M. Understanding friendship and learning networks of international and host students using longitudinal Social Network Analysis. Int. J. Intercult. Relat. 2014, 41, 165–180. [Google Scholar] [CrossRef]
- Ahn, M.Y.; Davis, H.H. Four domains of students’ sense of belonging to university. Stud. High. Educ. 2020, 45, 622–634. [Google Scholar] [CrossRef]
- Tang, Y.M.; Lau, Y.-y.; Chau, K.Y. Towards a sustainable online peer learning model based on student’s perspectives. Educ. Inf. Technol. 2022, 27, 12449–12468. [Google Scholar] [CrossRef]
- Woodward, R.; Pattinson, N. Informal Peer Learning of Diverse Undergraduate Students: Some Learners Make Meaning through Collaborative Activity. Pract. Res. High. Educ. 2023, 15, 72–85. [Google Scholar]
- Lainio, A. Independent learner as the ideal—Normative representations of higher education students in film and television drama across Europe. Crit. Stud. Educ. 2024, 65, 39–56. [Google Scholar] [CrossRef]
- Leathwood, C. Gender, equity and the discourse of the independent learner in higher education. High. Educ. 2006, 52, 611–633. [Google Scholar] [CrossRef]
- Geister, S.; Keser Aschenberger, F.; Çetinkaya-Yıldız, E.; Apaydın, S. The role of informal learning spaces in promoting social integration and wellbeing in higher education. Front. Educ. 2025, 10, 1637874. [Google Scholar] [CrossRef]
- Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine learning interpretability: A survey on methods and metrics. Electronics 2019, 8, 832. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.-I. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
- Shapley, L.S. A value for n-person games. Contrib. Theory Games 1953, 2, 307–317. [Google Scholar]
- Pekrun, R. The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. Educ. Psychol. Rev. 2006, 18, 315–341. [Google Scholar] [CrossRef]
- Marbouti, F.; Diefes-Dux, H.A.; Madhavan, K. Models for early prediction of at-risk students in a course using standards-based grading. Comput. Educ. 2016, 103, 1–15. [Google Scholar] [CrossRef]
- Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2021, 17, 168–192. [Google Scholar] [CrossRef]
- Brodersen, K.H.; Ong, C.S.; Stephan, K.E.; Buhmann, J.M. The balanced accuracy and its posterior distribution. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR 2010), Istanbul, Turkey, 23–26 August 2010; pp. 3121–3124. [Google Scholar]
- Worley, J.T.; Meter, D.J.; Ramirez Hall, A.; Nishina, A.; Medina, M.A. Prospective associations between peer support, academic competence, and anxiety in college students. Soc. Psychol. Educ. 2023, 26, 1017–1035. [Google Scholar] [CrossRef]
- Chen, C.; Bian, F.; Zhu, Y. The relationship between social support and academic engagement among university students: The chain mediating effects of life satisfaction and academic motivation. BMC Public Health 2023, 23, 2368. [Google Scholar] [CrossRef]
- Zhu, Y.; Lu, H.; Wang, X.; Ma, W.; Xu, M. The relationship between perceived peer support and academic adjustment among higher vocational college students: The chain mediating effects of academic hope and professional identity. Front. Psychol. 2025, 16, 1534883. [Google Scholar] [CrossRef]
- De Carvalho, F.C.; Geschwind, L.; Weurlander, M.; Mendonça, M. Possibilities and challenges of out-of-class interactions in the Mozambican academic context. Cogent Educ. 2025, 12, 2441057. [Google Scholar] [CrossRef]
- Thompson, M.; Pawson, C.; Evans, B. Navigating entry into higher education: The transition to independent learning and living. J. Furth. High. Educ. 2021, 45, 1398–1410. [Google Scholar] [CrossRef]
- Chilvers, L. The Peer-to-Peer Model: A UK Institution’s Approach to Broadening and Embedding the Provision of Peer Learning and Support. J. Peer Learn. 2025, 16, 1–15. [Google Scholar] [CrossRef]
- Cabir Hakyemez, T.; Mardikyan, S. The interplay between institutional integration and self-efficacy in the academic performance of first-year university students: A multigroup approach. Int. J. Manag. Educ. 2021, 19, 100430. [Google Scholar] [CrossRef]
- Samoila, M.E.; Vrabie, T. First-year seminars through the lens of Vincent Tinto’s theories of student departure. A systematic review. Front. Educ. 2023, 8, 1205667. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Mustofa, S.; Emon, Y.R.; Mamun, S.B.; Akhy, S.A.; Ahad, M.T. A novel AI-driven model for student dropout risk analysis with explainable AI insights. Comput. Educ. Artif. Intell. 2025, 8, 100352. [Google Scholar] [CrossRef]
- Choi, W.-C.; Lam, C.-T.; Pang, P.C.-I.; Mendes, A.J. A Systematic Literature Review of Explainable Artificial Intelligence (XAI) for Interpreting Student Performance Prediction in Computer Science and STEM Education. In Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education, Nijmegen, The Netherlands, 30 June–2 July 2025; Volume 1, pp. 221–227. [Google Scholar]
- Sanfo, J.-B.M.B. Application of explainable artificial intelligence approach to predict student learning outcomes. J. Comput. Soc. Sci. 2024, 8, 9. [Google Scholar] [CrossRef]
- Lin, L.; Zhou, D.; Wang, J.; Wang, Y. A systematic review of big data driven education evaluation. Sage Open 2024, 14, 21582440241242180. [Google Scholar] [CrossRef]
- Tang, Y.; Harvey, E.; Yao, C.; Yu, R.; Kizilcec, R.F.; Brooks, C. Understanding Predictive Models of Student Success with a Multiverse Analysis. In Proceedings of the 18th International Conference on Educational Data Mining, Palermo, Italy, 20–23 July 2025; pp. 518–525. [Google Scholar]
- Islam, M.M.; Sojib, F.H.; Mihad, M.F.H.; Hasan, M.; Rahman, M. The integration of explainable ai in educational data mining for student academic performance prediction and support system. Telemat. Inform. Rep. 2025, 18, 100203. [Google Scholar] [CrossRef]
- Allgaier, J.; Pryss, R. Cross-validation visualized: A narrative guide to advanced methods. Mach. Learn. Knowl. Extr. 2024, 6, 1378–1388. [Google Scholar] [CrossRef]
- Tyrovolas, M.; Nápoles, G.; Stylios, C. Backpropagation-Based Counterfactual Explanations for Quasi-Nonlinear Fuzzy Cognitive Maps. IEEE Trans. Syst. Man Cybern. Syst. 2026, 1–15. [Google Scholar] [CrossRef]
- Mansouri, T.; ZareRavasan, A.; Ashrafi, A. A learning fuzzy cognitive map (LFCM) approach to predict student performance. J. Inf. Technol. Educ. Res. 2021, 20, 221–243. [Google Scholar] [CrossRef]









| Feature Category | Feature | Abbreviation | Value |
|---|---|---|---|
| Demographics | Gender | Gndr | Male = 1, Female = 0 |
| Age at entrance | Age | 16, 17, 18, 19, 20, 21, 22, … | |
| Guardian type | Gurdn | Parents = 1, Father = 2, Mother = 3, Other = 4 | |
| From urban or rural | UrbnRrl | Rural = 0, Urban = 1 | |
| Entrance type | EntrTp | Re-taker = 0, Freshman = 1 | |
| College entrance exam scores | Chinese score | ExamCn | Mean (Std): 114.82 (7.51) |
| Foreign language score | ExamFrn | Mean (Std): 110.76 (10.62) | |
| Math score | ExamMth | Mean (Std): 106.15 (9.67) | |
| Comprehensive subjects score | ExamCmpr | Mean (Std): 221.16 (19.86) | |
| Learning activity | Self-study | SlfStdy | Never = 1, Seldom = 2, Sometimes = 3, Always = 4 |
| Seating choice in a classroom | Seat | Back = 1, Middle = 2, Front = 3 | |
| Truancy level | Truant | Never = 1, Seldom = 2, Sometimes = 3, Always = 4 | |
| Academic awards | Awrds | No = 0, Yes = 1 | |
| Teacher–student relationship | TsRltn | Tense = 1, Neutral = 2, Harmonious = 3 | |
| Part-time job | PtJob | No = 0, Yes = 1 | |
| In campus resident | Rsdnt | No = 0, Yes = 1 | |
| Learning peers | Learning support from learning peers | LpLrng | From “Very low = 1” to “Very high = 5”; Mean (Std): 3.21 (1.03) |
| Social and emotional support from learning peers | LpScl | From “Very low = 1” to “Very high = 5”; Mean (Std): 3.15 (0.96) | |
| Stability of learning peers | LpStbl | From “Very low = 1” to “Very high = 5”; Mean (Std): 2.79 (0.94) | |
| Number of learning peers | LpNmb | From “One = 1” to “Equal or more than 5”; Mean (Std): 3.05 (1.13) | |
| Dorm learning climate | DmClmt | From “Very low = 1” to “Very high = 5”; Mean (Std): 3.17 (1.16) | |
| Social life | Love relationships | LvRltn | No = 0, Yes = 1 |
| Campus loan | CmpsLn | No = 0, Yes = 1 | |
| Smoke | Smk | No = 0, Yes = 1 | |
| Playing video games | Game | Never = 1, Seldom = 2, Sometimes = 3, Always = 4 |
| Model | PR-AUC | Balanced Accuracy | Hyperparameters |
|---|---|---|---|
| Logistic Regression | 0.5398 ± 0.1330 | 0.7714 ± 0.0855 | {‘C’: 1, ‘penalty’: ‘l2’} |
| SVM | 0.5430 ± 0.1378 | 0.7828 ± 0.0573 | {‘C’: 0.1} |
| Gradient Boosting Decision Tree (GBDT) | 0.6616 ± 0.1457 | 0.6975 ± 0.0867 | {‘n_estimators’: 200, ‘max_depth’: 3} |
| eXtreme Gradient Boosting (XGBoost) | 0.6266 ± 0.1546 | 0.7058 ± 0.0919 | {‘n_estimators’: 100, ‘max_depth’: 7} |
| Categorical Boosting (CatBoost) | 0.6931 ± 0.1378 | 0.7358 ± 0.0719 | {‘iterations’: 200, ‘depth’: 7} |
| Light Gradient Boosting Machine (LightGBM) | 0.6092 ± 0.1566 | 0.7050 ± 0.0918 | {‘n_estimators’: 200, ‘max_depth’: 7} |
| K-Nearest Neighbor (KNN) | 0.6776 ± 0.1178 | 0.6635 ± 0.0720 | {‘n_neighbors’: 11, ‘weights’: ‘distance’} |
| Multinomial Naive Bayes (MNB) | 0.5023 ± 0.1295 | 0.5275 ± 0.0264 | {‘alpha’: 0.1} |
| Random Forest (RF) | 0.6729 ± 0.1459 | 0.6437 ± 0.0741 | {‘n_estimators’: 200, ‘max_depth’: 7} |
| Decision Tree (DT) | 0.4465 ± 0.1440 | 0.7612 ± 0.1017 | {‘max_depth’: 5} |
| Model | PR-AUC [CI] | Balanced Accuracy [CI] |
|---|---|---|
| Logistic Regression | 0.3406 [0.1143, 0.6085] | 0.7220 [0.5523, 0.8811] |
| SVM | 0.3637 [0.1213, 0.6271] | 0.6258 [0.4653, 0.7857] |
| Gradient Boosting Decision Tree (GBDT) | 0.3329 [0.1274, 0.6334] | 0.6057 [0.4575, 0.7721] |
| eXtreme Gradient Boosting (XGBoost) | 0.4423 [0.1779, 0.7753] | 0.6841 [0.5463, 0.8447] |
| Categorical Boosting (CatBoost) | 0.4661 [0.1918, 0.7991] | 0.6688 [0.5166, 0.8397] |
| Light Gradient Boosting Machine (LightGBM) | 0.4154 [0.1522, 0.7516] | 0.6750 [0.5210, 0.8414] |
| K-Nearest Neighbor (KNN) | 0.3815 [0.1353, 0.7198] | 0.5422 [0.4761, 0.6667] |
| Multinomial Naive Bayes (MNB) | 0.3318 [0.1472, 0.5849] | 0.5 [0.5, 0.5] |
| Random Forest (RF) | 0.4015 [0.1633, 0.7629] | 0.6298 [0.4841, 0.7846] |
| Decision Tree (DT) | 0.3059 [0.1024, 0.5517] | 0.6140 [0.4419, 0.8043] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, D.; Xu, P.; Cheng, G.; Zhang, P. Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts. Electronics 2026, 15, 626. https://doi.org/10.3390/electronics15030626
Sun D, Xu P, Cheng G, Zhang P. Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts. Electronics. 2026; 15(3):626. https://doi.org/10.3390/electronics15030626
Chicago/Turabian StyleSun, Di, Pengfei Xu, Gang Cheng, and Ping Zhang. 2026. "Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts" Electronics 15, no. 3: 626. https://doi.org/10.3390/electronics15030626
APA StyleSun, D., Xu, P., Cheng, G., & Zhang, P. (2026). Unpacking Prediction: Contextualized and Interpretable Academic Risk Modeling with XAI for Small Cohorts. Electronics, 15(3), 626. https://doi.org/10.3390/electronics15030626

