Mapping Heterogeneity in Psychological Risk Among University Students Using Explainable Machine Learning
Abstract
1. Introduction
2. Theoretical Analysis
2.1. Supervised Learning: Robustness and Classification Logic
2.2. Subtype Discovery: Transitioning to the Attribution Space
2.3. Methodological Validation via Mutual Information
3. Methodology
3.1. Experimental Preparation
3.1.1. Dataset Description
3.1.2. Benchmark Model Establishment
3.2. Constructing a Heterogeneity Interpretation Framework
3.2.1. Theoretical Foundation: TreeSHAP Attribution
3.2.2. Key Insight: Attribution-Based Phenotyping
3.2.3. Implementation Framework
3.2.4. Methodological Advantages
4. Experiments
4.1. Experimental Design and Setup
4.1.1. Data Preprocessing
4.1.2. Experimental Configuration
4.2. Benchmark Model Comparison
4.2.1. Performance Analysis
4.2.2. Methodological Implications
4.3. Ablation Study and Robustness Testing
4.3.1. Feature Contribution Analysis
4.3.2. Robustness Evaluation
4.4. Heterogeneity Discovery via SHAP-Based Clustering
4.4.1. Validation of Core Innovation
4.4.2. Subtype Characterization
- Subtype 1 (Academic-Driven): Strongly associated with features such as “Exams” (feature_260) and “Graduation Pressure”.
- Subtype 2 (Socio-Emotional): Dominated by interpersonal indicators like “Social isolation” (feature_250).
- Subtype 3 (Internal Regulation): Linked to physiological markers and emotional dysregulation indicators.
4.5. Discussion of Experimental Findings
5. Results
5.1. Identification of Psychological Risk Subtypes
5.2. Feature Composition and Quantification
5.3. Subtype Characterization Analysis
- Subtype 1 (Academic-Driven): Primary drivers included academic achievement stressors, such as “Exams” (feature_260, MI = 0.2399).
- Subtype 2 (Socio-Emotional): Decision logic was dominated by interpersonal indicators, specifically “Social isolation” (feature_250, MI = 0.2209).
- Subtype 3 (Internal Regulation): Characterized by distributed contributions from physiological markers and emotional dysregulation indicators.
5.4. Validation and Clinical Consistency
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kumar, G.E.; Musa, N.; Lim, L.; Sturgill, M. Pandemic Pressures: A Decline in Psychological Well-Being Among Higher Education Students During the Coronavirus Pandemic. Am. J. Health-Syst. Pharm. 2025, 82, S2438. [Google Scholar]
- Shwartz Ziv, R.; LeCun, Y. To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review. Entropy 2024, 26, 252. [Google Scholar] [CrossRef]
- Le Vigouroux, S.; Chevrier, B.; Montalescot, L.; Charbonnier, E. Post-pandemic student mental health and coping strategies: A time trajectory study. J. Affect. Disord. 2025, 376, 260–268. [Google Scholar] [CrossRef]
- Dingle, G.A.; Han, R.; Huang, K.; Alhadad, S.S.; Beckman, E.; Bentley, S.V.; Edmed, S.; Gomersall, S.R.; Hides, L.; Lorimer, N.; et al. Sharper minds: Feasibility and effectiveness of a mental health promotion package for university students targeting multiple health and self-care behaviours. J. Affect. Disord. 2025, 378, 271–280. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Xiao, M.; Liu, L.; Yang, K. Building resilience: The long-term psychological impact of COVID-19 on college students. Curr. Psychol. 2025, 44, 18586–18600. [Google Scholar] [CrossRef]
- Song, X.; Han, D.; Zhang, J.; Fan, J.; Ning, P.; Peng, Y. Study on the impact of COVID-19 pandemic on the mental health of Chinese college students: A cross-sectional analysis. Front. Public Health 2024, 12, 1340642. [Google Scholar] [CrossRef]
- Martinez, A.L.J.; Sood, K.; Mahto, R. Early Detection of At-Risk Students Using Machine Learning. In Proceedings of the Foundations of Computer Science and Frontiers in Education: Computer Science and Computer Engineering, FCS 2024, FECS 2024, Las Vegas, NV, USA, 22–25 July 2024; Communications in Computer and Information Science; Mohammadi, F., Shenavarmasouleh, F., Arabnia, H., Deligiannidis, L., Amirian, S., Eds.; Springer: Cham, Switzerland, 2025; Volume 2261, pp. 396–406. [Google Scholar] [CrossRef]
- Qiang, Q.; Hu, J.; Chen, X.; Guo, W.; Yang, Q.; Wang, Z.; Liu, Z.; Zhang, Y.; Li, Q. Identifying risk factors for depression and positive/negative mood changes in college students using machine learning. Front. Public Health 2025, 13, 1606947. [Google Scholar] [CrossRef] [PubMed]
- Luo, L.; Yuan, J.; Wu, C.; Wang, Y.; Zhu, R.; Xu, H.; Zhang, L.; Zhang, Z. Predictors of depression among Chinese college students: A machine learning approach. BMC Public Health 2025, 25, 470. [Google Scholar] [CrossRef]
- Zhang, T.; Zhong, Z.; Mao, W.; Zhang, Z.; Li, Z. A New Machine-Learning-Driven Grade-Point Average Prediction Approach for College Students Incorporating Psychological Evaluations in the Post-COVID-19 Era. Electronics 2024, 13, 1928. [Google Scholar] [CrossRef]
- Wang, J. Optimization Method Based on Machine Learning for College Students’ Psychological Control Source Propensity Classification. J. Test. Eval. 2024, 52, 1714–1727. [Google Scholar] [CrossRef]
- Wang, D.; Ma, R.; Li, Y.; Li, Y. Latent profile analysis of college students’ healthy lifestyles and its association with physical activity. BMC Public Health 2025, 25, 3369. [Google Scholar] [CrossRef]
- Argyriou, E.; Prestigiacomo, C.; Samuel, D.B.; Stewart, J.C.; Wu, W.; Cyders, M.A. Hierarchical taxonomy of psychopathology and personalized mental health treatment selection. Front. Psychiatry 2025, 16, 1597879. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, N.; Hong, X.; Peng, Y.; Shao, S. Deep Learning-Based Image Steganography with Latent Space Embedding and Smart Decoder Selection. Entropy 2025, 27, 1223. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- de Arriba-Pérez, F.; García-Méndez, S. Detecting anxiety and depression in dialogues: A multi-label and explainable approach. arXiv 2024, arXiv:2412.17651. [Google Scholar] [CrossRef]
- Insel, T.R. The NIMH research domain criteria (RDoC) project: Precision medicine for psychiatry. Am. J. Psychiatry 2014, 171, 395–397. [Google Scholar] [CrossRef]
- Davies, E.B.; Morriss, R.; Glazebrook, C. Computer-delivered and web-based interventions to improve depression, anxiety, and psychological well-being of university students: A systematic review and meta-analysis. J. Med. Internet Res. 2014, 16, e3142. [Google Scholar] [CrossRef]
- Canal, G.; Leung, V.; Sage, P.; Heim, E.; Wang, I. A Decision-driven Methodology for Designing Uncertainty-aware AI Self-Assessment. arXiv 2024, arXiv:2408.01301. [Google Scholar]
- Alnuaimi, A.F.; Albaldawi, T.H. Concepts of statistical learning and classification in machine learning: An overview. In Proceedings of the BIO Web of Conferences; EDP Sciences: Les Ulis, France, 2024; Volume 97, p. 00129. [Google Scholar]
- Liu, Y.; Cantero-Chinchilla, S.; Croxford, A.J. A contribution by gradient explainability method for 1D-CNNs on ultrasonic data. NDT E Int. 2026, 159, 103592. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2004, 69, 066138. [Google Scholar] [CrossRef]
- Chang, L.; Liu, P.; Guo, Q.; Wen, F. Explicit Mutual Information Maximization for Self-Supervised Learning. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Borgonovo, E.; Plischke, E.; Rabitti, G. The many Shapley values for explainable artificial intelligence: A sensitivity analysis perspective. Eur. J. Oper. Res. 2024, 318, 911–926. [Google Scholar] [CrossRef]
- Li, M.; Zhu, W.; Wu, O. Toward learnable and interpretable data Shapley valuation for deep learning. Knowl.-Based Syst. 2025, 325, 114002. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, X.; Li, F.; Sun, Z.; Xu, K.; Liu, Z. Cumulative social-ecological risk factors and health-risk behaviors among Chinese adolescents: A latent class analysis. J. Affect. Disord. 2025, 388, 119614. [Google Scholar] [CrossRef]
- Hu, T.; Song, T. Research on XGboost academic forecasting and analysis modelling. In Proceedings of the Second International Conference on Physics, Mathematics and Statistics, Hangzhou, China, 22–24 May 2019; Journal of Physics Conference Series; Huang, X., Ed.; IOP Publishing: Bristol, UK, 2019; Volume 1324. [Google Scholar] [CrossRef]
- Bhaduri, D.; Toth, D.; Holan, S.H. A Review of Tree-Based Methods for Analyzing Survey Data. Wiley Interdiscip. Rev. Comput. Stat. 2025, 17, e70010. [Google Scholar] [CrossRef]
- Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
- Chen, W.; Wu, Z.; Zeng, S.; Guo, H.; Li, J. Diverse behavior clustering of students on campus with macroscopic attention. Sci. Rep. 2025, 15, 29800. [Google Scholar] [CrossRef]










| Training Set Ratio | Mean F1-Score | Std F1-Score |
|---|---|---|
| 0.1 | 0.5039 | 0.1347 |
| 0.5 | 0.6233 | 0.0332 |
| 0.8 | 0.6964 | 0.0605 |
| 1.0 | 0.6885 | 0.0513 |
| Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC |
|---|---|---|---|---|---|
| Logistic Regression | 0.7385 | 0.6857 | 0.7059 | 0.6615 | 0.7734 |
| SVM Linear | 0.6769 | 0.6712 | 0.7206 | 0.6259 | – 1 |
| Random Forest | 0.7615 | 0.6837 | 0.9853 | 0.6517 | 0.8213 |
| XGBoost | 0.7692 | 0.6905 | 0.8529 | 0.6671 | 0.7462 |
| Feature Combination | Feature Count | F1-Score |
|---|---|---|
| Full feature set (proposed) | 2556 | 0.6671 |
| Lexical only (TF–IDF) | 2540 | 0.6657 |
| Linguistic only | 11 | 0.4053 |
| Emotion only | 5 | 0.4069 |
| K | BIC | Silhouette Score | CH Score | Combined Score |
|---|---|---|---|---|
| 2 | 38,292,062.49 | 0.0096 | 2.0920 | 0.6648 |
| 3 | 59,643,196.74 | 0.0003 | 1.8859 | 0.9648 |
| 4 | 81,010,820.51 | 0.0062 | 1.9681 | 0.4964 |
| 5 | 102,391,424.51 | 0.0021 | 1.8973 | 0.3716 |
| 10 | 209,312,789.50 | 0.0191 | 1.7126 | 0.3816 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, P.; Tang, J.; Wang, H.; Zhang, D. Mapping Heterogeneity in Psychological Risk Among University Students Using Explainable Machine Learning. Entropy 2026, 28, 224. https://doi.org/10.3390/e28020224
Liu P, Tang J, Wang H, Zhang D. Mapping Heterogeneity in Psychological Risk Among University Students Using Explainable Machine Learning. Entropy. 2026; 28(2):224. https://doi.org/10.3390/e28020224
Chicago/Turabian StyleLiu, Penglin, Ji Tang, Hongxiao Wang, and Dingsen Zhang. 2026. "Mapping Heterogeneity in Psychological Risk Among University Students Using Explainable Machine Learning" Entropy 28, no. 2: 224. https://doi.org/10.3390/e28020224
APA StyleLiu, P., Tang, J., Wang, H., & Zhang, D. (2026). Mapping Heterogeneity in Psychological Risk Among University Students Using Explainable Machine Learning. Entropy, 28(2), 224. https://doi.org/10.3390/e28020224

