Data Mining to Identify Factors Associated with University Student Retention
Abstract
1. Introduction
2. Materials and Methods
2.1. Methodology
2.1.1. Business Understanding
2.1.2. Data Understanding
2.1.3. Data Preparation
- (a)
- Scree plot of eigenvalues. The horizontal axis represents the factor number extracted from the correlation matrix, while the vertical axis shows the eigenvalue associated with each factor. The dashed horizontal line indicates the Kaiser criterion (eigenvalue = 1) used to determine the number of retained factors. The sharp decline in the first components followed by an inflection point supports the retention of five latent factors.
- (b)
- Heatmap of factor loadings for the retained factors. The horizontal axis represents the extracted latent factors (Factor 1–Factor 5), while the vertical axis lists the observed questionnaire items grouped by theoretical constructs (Motivation, Commitment, Attitude and Commitment, Social and Economic Conditions, and Retention). The color scale represents the magnitude of the factor loadings, where darker tones indicate stronger associations between items and factors. Only items with the highest loadings (≥0.40–0.50) are displayed to highlight those with the greatest explanatory contribution and to improve visual interpretability.
2.1.4. Modeling
2.1.5. Evaluation
3. Results
3.1. Descriptive Analysis
3.2. Decision Tree
3.3. Classification of Factors
3.4. SHAP Analysis
3.5. Sensitivity Analysis
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Carballo-Mendívil, B.; Arellano, A.; Ríos, N.; Lizardi, M.d.P. Predicting Student Dropout from Day One: XGBoost-Based Early Warning System Using Pre-Enrollment Data. Appl. Sci. 2025, 15, 9202. [Google Scholar] [CrossRef]
- Dia, N.J.; Sieras, J.C.; Khalid, S.A.; Macatotong, A.H.T.; Mondejar, J.M.; Genotiva, E.R.; Delena, R.D. EduGuard RetainX: An advanced analytical dashboard for predicting and improving student retention in tertiary education. SoftwareX 2025, 29, 102057. [Google Scholar] [CrossRef]
- Bird, K.A.; Castleman, B.L.; Mabel, Z.; Song, Y. Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education. AERA Open 2021, 7, 23328584211037630. [Google Scholar] [CrossRef]
- Mutka, A.; Mutka, F.Ž.; Žagar, M.; Tolić, D. Enhancing Student Retention in Introductory Programming Courses: Leveraging Advanced Learning Validation Tools and Educational Data Mining. IEEE Access 2025, 13, 153614–153626. [Google Scholar] [CrossRef]
- Araka, E.; Wario, R.; Maina, E. Promoting University Students’ Self-Regulated Learning Skills on E-Learning Platforms Using Educational Data Mining. In Proceedings of the 2025 IST-Africa Conference (IST-Africa), Nairobi, Kenia, 28–30 May 2025; pp. 1–12. [Google Scholar]
- Tang, Z.; von Seekamm, K.; Colina, F.E.; Chen, L. Enhancing Student Retention with Machine Learning: A Data-Driven Approach to Predicting College Student Persistence. J. Coll. Stud. Retent. 2025, 15210251251336372. [Google Scholar] [CrossRef]
- Klašnja-Milićević, A.; Ivanović, M.; Vesin, B.; Satratzemi, M.; Wasson, B. Editorial: Learning Analytics—Trends and Challenges. Front. Artif. Intell. 2022, 5, 856807. [Google Scholar] [CrossRef] [PubMed]
- Pimentel, M.; Villamar, M.; Andrade, D.; Zambrano, B. Estrategias para evitar la deserción universitaria. RECIAMUC 2023, 7, 273–280. [Google Scholar] [CrossRef]
- Garcés, M.; De la Ossa, S.; Arellano, W.; Alvis, J.; Figueroa, L. ¿Volver o no volver a las clases presenciales? Motivaciones y temores que influyen en la deserción universitaria en Colombia en tiempos de postpandemia. Salud Uninorte 2024, 40, 52–68. [Google Scholar] [CrossRef]
- French, A. Toward a New Conceptual Model: Integrating the Social Change Model of Leadership Development and Tinto’s Model of Student Persistence. J. Leadersh. Educ. 2017, 16, 97–117. [Google Scholar] [CrossRef]
- Savage, M.W.; Strom, R.E.; Hubbard, A.S.E.; Aune, K.S. Commitment in College Student Persistence. J. Coll. Stud. Retent. 2019, 21, 242–264. [Google Scholar] [CrossRef]
- Elturki, E.; Liu, Y.; Hjeltness, J.; Hellmann, K. Needs, Expectations, and Experiences of International Students in Pathway Programs in the United States. J. Int. Stud. 2019, 9, 192–210. [Google Scholar] [CrossRef]
- Álvarez-Santana, C.; Caicedo-Montesdeoca, D. La intervención social y la tutoría estudiantil como medida de disminución de los índices de deserción de las universidades de la provincia de Manabí, periodo 2015–2019. Rev. Cient. Multidiscip. Arbitr. Yachasun 2021, 5, 36–50. [Google Scholar] [CrossRef]
- Rubén, E.; García, A. Clima y Compromiso Organizacional. 2007. Available online: https://www.eumed.net/libros-gratis/2007c/340/ (accessed on 3 September 2025).
- Velasquez, M.; Posada, P.M.; Gomez, D.N.C.; Lopez, N.; Vallejo, G.F.; Ramirez, P.A.; Hernandez, E.C.; Vallejo, A. Acciones Para Favorecer La Permanencia. Universidad de Antioquía, 2011. Available online: https://revistas.utp.ac.pa/index.php/clabes/article/view/856 (accessed on 3 September 2025).
- Aisenson, G.; Valenzuela, V.; Celeiro, R.; Bailac, K.; Legaspi, L. El significado del estudio y la motivación escolar de jóvenes que asisten a circuitos educativos diferenciados socieconómicamente. Anu. Investig. 2010, 12, 109–119. [Google Scholar]
- Castrillón-Gómez, O.D.; Sarache, W.; Ruiz-Herrera, S. Predicción de las principales variables que conllevan al abandono estudiantil por medio de técnicas de minería de datos. Form. Univ. 2020, 13, 217–228. [Google Scholar] [CrossRef]
- Castro, L.; Esperanza, E.; Romero, R. Análisis de características que influyen en la deserción estudiantil en el contexto de una universidad latinoamericana. Rev. EIA 2023, 20, 4002. [Google Scholar] [CrossRef]
- Deleña, R.D.; Dia, N.J.; Sacayan, R.R.; Sieras, J.C.; Khalid, S.A.; Macatotong, A.H.T.; Gulam, S.B. Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors. Syst. Soft Comput. 2025, 7, 200352. [Google Scholar] [CrossRef]
- Pérez, M.; Navarrete, D.; Baldeon-Calisto, M.; Guerrero, Y.; Sarmiento, A. Unlocking Student Success: Applying Machine Learning for Predicting Student Dropout in Higher Education. In Proceedings of the 2025 13th International Symposium on Digital Forensics and Security (ISDFS), Boston, MA, USA, 24–25 April 2025; pp. 1–6. [Google Scholar]
- Torres, C.Z.; Ramos, C.A.; Moraga, J.L. Estudio de variables que influyen en la deserción de estudiantes universitarios de primer año, mediante minería de datos. Cienc. Amaz. 2016, 6, 73. [Google Scholar] [CrossRef]
- Eckert, K.B.; Suénaga, R. Análisis de Deserción-Permanencia de Estudiantes Universitarios Utilizando Técnica de Clasificación en Minería de Datos. Form. Univ. 2015, 8, 3–12. [Google Scholar] [CrossRef]
- Miranda, M.; Guzmán, J. Análisis de la Deserción de Estudiantes Universitarios usando Técnicas de Minería de Datos. Form. Univ. 2017, 10, 61–68. [Google Scholar] [CrossRef]
- González, A.; Alonso, M.A.; Gómez, M.d.L.Á.; Aliagas, I. Peer mentoring, university dropout and academic performance before, during, and after the pandemic in Spain. Eval. Program Plan. 2025, 113, 102676. [Google Scholar] [CrossRef]
- Grijalva, P.; Freire, V.; Real, K.; Arellano, A.; Cornejo, G. Aplicación de Técnicas de Minería de Datos para el Análisis de la Eficiencia Académica. Rev. Cient. Hallazgos 2018, 3, 1–16. [Google Scholar] [CrossRef]
- Dórame, D.L. Factores asociados a la permanencia estudiantil de la universidad de Sonora. Rev. Psicol. Univ. Auton. Estado México 2022, 11, 70–96. [Google Scholar] [CrossRef]
- Almalawi, A.; Soh, B.; Li, A.; Samra, H. Predictive Models for Educational Purposes: A Systematic Review. Big Data Cogn. Comput. 2024, 8, 187. [Google Scholar] [CrossRef]
- Lampropoulos, G.; Evangelidis, G. Learning Analytics and Educational Data Mining in Augmented Reality, Virtual Reality, and the Metaverse: A Systematic Literature Review, Content Analysis, and Bibliometric Analysis. Appl. Sci. 2025, 15, 971. [Google Scholar] [CrossRef]
- Khalid, F.; Javed, A.; Ain, Q.-U.; Ilyas, H.; Irtaza, A. DFGNN: An interpretable and generalized graph neural network for deepfakes detection. Expert Syst. Appl. 2023, 222, 119843. [Google Scholar] [CrossRef]
- López-Meneses, E.; Mellado-Moreno, P.C.; Herrerías, C.G.; Pelícano-Piris, N. Educational Data Mining and Predictive Modeling in the Age of Artificial Intelligence: An In-Depth Analysis of Research Dynamics. Computers 2025, 14, 68. [Google Scholar] [CrossRef]
- Reina, Y.; Huatangari, L.Q.; Caro, O.C.; Guevara, J.L.M.; Tuesta, J.N.A.; Bardales, E.S.; Santos, R.C. Data Mining to Identify University Student Dropout Factors. Appl. Sci. 2025, 15, 11911. [Google Scholar] [CrossRef]
- Murillo-Zabala, A.M.; Santos, P.J.-D.L. Permanencia estudiantil: Factores que inciden en el Politécnico Internacional de Bogotá, Colombia. Rev. Electron. Educ. 2021, 25, 1–25. [Google Scholar] [CrossRef]
- Narváez, Y.V.; Medina, M.A.G. Factores asociados a la permanencia de estudiantes universitarios: Caso uamm-uat. Rev. Educ. Super. 2017, 46, 117–138. [Google Scholar] [CrossRef]
- Cusquillo, E.J.L.; Cambell, D.C.V.; Vera, M.A.L.; Morán, N.Y.B.; Santander, K.M.A. Estrategias Activas de Aprendizaje: Incidencia en el Rendimiento Académico de Estudiantes de Básica Superior. Cienc. Lat. Rev. Cient. Multidiscip. 2025, 9, 6469–6480. [Google Scholar] [CrossRef]
- Torres-Garagundo, V.; Quispe-Chero, C. Aprendizaje autorregulado y motivación intrínseca en estudiantes de la UNMSM. PsiqueMag/Rev. Cient. Digit. Psicol. 2021, 11, 18–27. [Google Scholar] [CrossRef]
- Vaarma, M.; Li, H. Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technol. Soc. 2024, 76, 102474. [Google Scholar] [CrossRef]
- Resendiz, J.E.L.; de Oca, E.R.M.; Jiménez, L.P.L. Importancia de la tutoría en la formación académica de estudiantes de agronomía. RIDE Rev. Iberoam. Para Investig. Desarro. Educ. 2025, 15, e883. [Google Scholar] [CrossRef]
- Constate-Amores, A.; Martínez, E.F.; Asencio, E.N.; Fernández-Mellizo, M. Factores asociados al abandono universitario. Educ. XX1 2020, 24, 17–44. [Google Scholar] [CrossRef]
- Bravo, A.; Gonzáles, D.; Maytorena, M. Motivación De Logro En Situaciones De Éxito Y Fracaso Académico De Estudiantes Universitarios. Available online: https://www.comie.org.mx/congreso/memoriaelectronica/v10/pdf/area_tematica_01/ponencias/0762-F.pdf (accessed on 3 September 2025).
- Casanova, J.; Cervero, A.; Núñez, J.; Almeida, L.; Bernardo, A. Factors that determine the persistence and dropout of university students. Psicothema 2018, 4, 408–414. [Google Scholar] [CrossRef]
- Palacios, C.A.; Reyes-Suárez, J.A.; Bearzotti, L.A.; Leiva, V.; Marchant, C. Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile. Entropy 2021, 23, 485. [Google Scholar] [CrossRef]
- Gonzales, T. El alumno ante la escuela y su propio aprendizaje: Algunas líneas de investigación en torno al concepto de implicación. REICE Rev. Iberoam. Sobre Calid. Efic. Cambio Educ. 2010, 8, 10–31. Available online: https://www.redalyc.org/articulo.oa?id=55115064002 (accessed on 3 September 2025).
- Contreras-Bravo, L.-E.; Tarazona-Bermúdez, G.-M.; Rodríguez-Molano, J.-I. Tecnología y analítica del aprendizaje: Una revisión a la literatura. Rev. Cient. 2021, 41, 150–168. [Google Scholar] [CrossRef]
- Viberg, O.; Hatakka, M.; Bälter, O.; Mavroudi, A. The current landscape of learning analytics in higher education. Comput. Hum. Behav. 2018, 89, 98–110. [Google Scholar] [CrossRef]
- Khalil, M.; Prinsloo, P.; Slade, S. The use and application of learning theory in learning analytics: A scoping review. J. Comput. High. Educ. 2023, 35, 573–594. [Google Scholar] [CrossRef]
- Choque, V.M.; Jauregui, V.D.S. Análisis del Diseño curricular como factor de deserción académica utilizando Minería de Datos. Yachay—Rev. Cient. Cult. 2022, 11, 551–555. [Google Scholar] [CrossRef]
- Sifuentes, M.S.G.C.; Pérez, L.G.V.; Cantabrana, M.G.N.; Acosta, I.I.F.O.; Santana, F.A.Á.; Fierro, M.d.L.Á.S. Modelo Predictivo de la Deserción Escolar en Educación Superior: Una Aproximación desde la Minería de Datos Utilizando la Metodología CRISP-DM. Cienc. Lat. Rev. Cient. Multidiscip. 2023, 7, 7797–7812. [Google Scholar] [CrossRef]
- Bakariwie, A.; Asamoah, D.; Duwiejuah, A.B. Prevention of student attrition: A data-backed approach to school counselling using Delphi technique and multiple classification algorithms. Discov. Educ. 2025, 4, 259. [Google Scholar] [CrossRef]
- Deng, P.-S. Using Affinity Analysis-Driven Adaptive Data Mining Life Cycle for the Development of a Student Retention DSS. WSEAS Trans. Adv. Eng. Educ. 2021, 18, 135–147. [Google Scholar] [CrossRef]
- Xu, T.; Hsu, H.-Y.; Wang, Y.; Li, X.-B. Applying Text Mining to Identify Critical Factors Contributing to the Retention Rate of First-Year Students at Historically Black Colleges and Universities. J. Coll. Stud. Retent. 2025. [Google Scholar] [CrossRef]
- Attiya, W.M.; Bin Shams, M. Predicting Student Retention in Higher Education Using Data Mining Techniques: A Literature Review. In Proceedings of the 2023 International Conference on Cyber Management and Engineering (CyMaEn), Bangkok, Thailand, 26–27 January 2023; pp. 171–177. [Google Scholar]
- Cardona, T.; Cudney, E.A.; Hoerl, R.; Snyder, J. Data Mining and Machine Learning Retention Models in Higher Education. J. Coll. Stud. Retent. 2023, 25, 51–75. [Google Scholar] [CrossRef]
- Shafiq, D.A.; Marjani, M.; Habeeb, R.A.A.; Asirvatham, D. Student Retention Using Educational Data Mining and Predictive Analytics: A Systematic Literature Review. IEEE Access 2022, 10, 72480–72503. [Google Scholar] [CrossRef]
- Yu, C.H.; DiGangi, S.; Jannasch-Pennell, A.; Kaprolet, C. A Data Mining Approach for Identifying Predictors of Student Retention from Sophomore to Junior Year. J. Data Sci. 2021, 8, 307–325. [Google Scholar] [CrossRef]
- Kang, K.; Wang, S. Analyze and Predict Student Dropout from Online Programs. In Proceedings of the ICCDA 2018: 2018 The 2nd International Conference on Compute and Data Analysis, DeKalb, IL, USA, 23–25 March 2018; pp. 6–12. [Google Scholar]
- Navarro, M.M.; Utreras, E.G.; Ugarte, C.G.B.; Vidal, C.L. Factores psicológicos asociados a la permanencia de estudiantes beneficiados por el Programa de Acceso-Acompañamiento Efectivo a la Educación Superior. Rev. Electron. Investig. Educ. 2023, 25, 1–13. [Google Scholar] [CrossRef]
- De la Hoz-Granadillo, E.J.; Reyes-Ruiz, L.; Sanchez-Villegas, M. Cluster analysis and artificial neural networks to assess and diagnosis suicide ideation in school adolescents. Rev. Interam. Psicol./Interam. J. Psychol. 2023, 57, e1360. [Google Scholar] [CrossRef]
- Jaramillo, J.D.F.; Olivera, N.R.N. Aplicación de Inteligencia Artificial en la Educación de América Latina: Tendencias, Beneficios y Desafíos. Rev. Veritas Difus. Cient. 2024, 5, 1–21. [Google Scholar] [CrossRef]
- Martínez, J. Auto-motivación y rendimiento académico en el Espacio Europeo de Educación Superior. Cuad. Educ. Desarro. Laguna EUMED 2011, 3, 1–12. Available online: https://dialnet.unirioja.es/servlet/articulo?codigo=6372719 (accessed on 3 September 2025).
- Jadue, G. Hacia una mayor permanencia en el sistema escolar de los niños en riesgo de bajo rendimiento y de deserción. Estud. Pedagog. 1999, 25, 83–90. Available online: https://www.redalyc.org/pdf/1735/173513845005.pdf (accessed on 3 September 2025). [CrossRef]
- Fonseca, G.; García, F. Permanencia y abandono de estudios en estudiantes universitarios: Un análisis desde la teoría organizacional. Rev. Educ. Super. 2016, 45, 25–39. [Google Scholar] [CrossRef]
- Pascarella, E.T.; Terenzini, P.T. Predicting Freshman Persistence and Voluntary Dropout Decisions from a Theoretical Model. J. High. Educ. 1980, 51, 60–75. [Google Scholar] [CrossRef]
- Fishbein, M.; Ajzen, I. Belief, Attitude, Intention, and Behavior: An Introduction to Theory and Research; Addison-Wesley: Reading, MA, USA, 1975; Available online: https://people.umass.edu/aizen/f&a1975.html (accessed on 3 September 2025).
- Bean, J.P. Dropouts and turnover: The synthesis and test of a causal model of student attrition. Res. High. Educ. 1980, 12, 155–187. [Google Scholar] [CrossRef]
- Spady, W.G. Dropouts from higher education: An interdisciplinary review and synthesis. Interchange 1970, 1, 64–85. [Google Scholar] [CrossRef]
- Ahmed, W.; Wani, M.A.; Plawiak, P.; Meshoul, S.; Mahmoud, A.; Hammad, M. Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Sci. Rep. 2025, 15, 26879. [Google Scholar] [CrossRef]
- Nadkarni, P. Core Technologies: Data Mining and ‘Big Data’. In Clinical Research Computing; Elsevier: Amsterdam, The Netherlands, 2016; pp. 187–204. [Google Scholar] [CrossRef]
- Quan, Z.; Pu, L. An improved accurate classification method for online education resources based on support vector machine (SVM): Algorithm and experiment. Educ. Inf. Technol. 2023, 28, 8097–8111. [Google Scholar] [CrossRef]
- Kalra, M.; Kumar, V.; Kaur, M.; Idris, S.A.; Öztürk, Ş.; Alshazly, H. Attribute weighted naïve bayes classifier. Comput. Mater. Contin. 2022, 71, 1945–1957. [Google Scholar] [CrossRef]
- Blockeel, H.; Devos, L.; Frénay, B.; Nanfack, G.; Nijssen, S. Decision trees: From efficient prediction to responsible AI. Front. Artif. Intell. 2023, 6, 1124553. [Google Scholar] [CrossRef] [PubMed]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. Promot. Commun. Stat. Stata 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Bates, S.; Hastie, T.; Tibshirani, R. Cross-Validation: What Does It Estimate and How Well Does It Do It? J. Am. Stat. Assoc. 2024, 119, 1434–1445. [Google Scholar] [CrossRef]
- Allgaier, J.; Pryss, R. Cross-Validation Visualized: A Narrative Guide to Advanced Methods. Mach. Learn. Knowl. Extr. 2024, 6, 1378–1388. [Google Scholar] [CrossRef]
- Zhan, Z.; Shen, T. Development of a prediction model for student teaching satisfaction based on 10 machine learning algorithms. Sci. Rep. 2025, 15, 36547. [Google Scholar] [CrossRef]
- Goldrick-Rab, S. Paying the Price: College Costs, Financial Aid, and the Betrayal of the American Dream; University of Chicago Press: Chicago, IL, USA, 2016. [Google Scholar]
- Tinto, V. Completing College: Rethinking Institutional Action; University of Chicago Press: Chicago, IL, USA, 2012. [Google Scholar]
- Kuh, G.D.; Cruce, T.M.; Shoup, R.; Kinzie, J.; Gonyea, R.M. Unmasking the Effects of Student Engagement on First-Year College Grades and Persistence. J. High. Educ. 2008, 79, 540–563. [Google Scholar] [CrossRef]
- Nabi, G.; Walmsley, A.; Mir, M.; Osman, S. The impact of mentoring in higher education on student career development: A systematic review and research agenda. Stud. High. Educ. 2025, 50, 739–755. [Google Scholar] [CrossRef]
- Merisotis, J.P.; McCarthy, K. Retention and student success at minority-serving institutions. New Dir. Institutional Res. 2005, 2005, 45–58. [Google Scholar] [CrossRef]
- Ortiz, D.G. State Repression and Mobilization in Latin America. In Handbook of Social Movements Across Latin America; Almeida, P., Cordero Ulate, A., Eds.; Springer: Dordrecht, The Netherlands, 2015; pp. 43–59. [Google Scholar] [CrossRef]









| Factors | Description | Authors |
|---|---|---|
| Motivation | It includes internal motivation (personal goals, expectations of success, self-concept) and external motivation (influence of the teacher in the classroom). | Bravo et al. [39] Martínez [59] Aisenson et al. [16] Jadue [60] Fonseca & García [61] |
| Commitment | It is divided into personal commitment to study (self-efficacy, academic performance, perception of difficulty) and commitment to the institution perceived (quality of the career, services, tutoring). | Velasquez et al. [15] Pascarella & Terenzini [62] Fonseca & García [61] |
| Attitude and commitment | Related to academic integration: sense of belonging, relationship with school authorities, relationship with peers. | Fishbein & Ajzen [63] Velasquez et al. [15] Pascarella & Terenzini [62] Fonseca & García [61] |
| Social and economic conditions | It includes social and family interaction (moral support, communication, respectful relationships) and economic conditions (financial resources, scholarships, transportation, etc.). | Bean [64] Gonzales [42] Velázquez & González [32] Fonseca & García [61] |
| Retention | Expected outcome of the process: timely approval of subjects, regular attendance, and uninterrupted continuity of university studies. | Rubén & García [14] Spady [65] Fonseca & García [61] |
| Factors | Categories | Indicators |
|---|---|---|
| Motivation | Internal | Personal Goals |
| Expectations of success | ||
| Self-concept | ||
| External | By the teacher inside the classroom | |
| Commitment | Personal commitment to studying | Self-efficacy |
| Academic performance | ||
| Perception of difficulty | ||
| Commitment to the institution | Career quality | |
| Services | ||
| Attitude and commitment | Academic integration | Sense of belonging |
| Relationship with university authorities | ||
| Peer relationships | ||
| Socioeconomic conditions | Conditions | Social and family interaction |
| Economic Conditions |
| Model | Accuracy | Precision-Macro | Recall-Macro | F1-Macro |
|---|---|---|---|---|
| Random Forest | 0.729 ± 0.058 | 0.678 ± 0.170 | 0.623 ± 0.134 | 0.636 ± 0.136 |
| Extra Trees | 0.720 ± 0.061 | 0.681 ± 0.180 | 0.601 ± 0.115 | 0.625 ± 0.134 |
| XGBoost | 0.714 ± 0.047 | 0.639 ± 0.161 | 0.593 ± 0.121 | 0.606 ± 0.132 |
| KNN | 0.692 ± 0.070 | 0.666 ± 0.174 | 0.586 ± 0.132 | 0.605 ± 0.140 |
| Logistic Regression | 0.679 ± 0.062 | 0.620 ± 0.139 | 0.617 ± 0.147 | 0.605 ± 0.132 |
| SVM | 0.703 ± 0.063 | 0.657 ± 0.172 | 0.586 ± 0.141 | 0.603 ± 0.147 |
| Gradient Boosting | 0.705 ± 0.064 | 0.610 ± 0.158 | 0.595 ± 0.142 | 0.592 ± 0.139 |
| Decision Tree | 0.692 ± 0.048 | 0.603 ± 0.168 | 0.557 ± 0.115 | 0.568 ± 0.130 |
| Naïve Bayes | 0.635 ± 0.063 | 0.518 ± 0.096 | 0.594 ± 0.139 | 0.516 ± 0.096 |
| AdaBoost | 0.699 ± 0.037 | 0.457 ± 0.029 | 0.463 ± 0.023 | 0.454 ± 0.023 |
| Items | Class_0 | Class_1 | Class_2 | Global Shape |
|---|---|---|---|---|
| HC_TransportOK | 0.367 | 0.202 | 0.179 | 0.249 |
| AM_TaskFinisher | 0.258 | 0.150 | 0.103 | 0.170 |
| HC_NoWorkNeeded | 0.234 | 0.217 | 0.052 | 0.168 |
| AB_AllCoursesPass | 0.147 | 0.146 | 0.199 | 0.164 |
| AB_AllApproved Overall | 0.225 | 0.091 | 0.120 | 0.145 |
| AB_GradeHigher14 | 0.167 | 0.111 | 0.144 | 0.140 |
| FE_NoFamilyIssues | 0.110 | 0.247 | 0.041 | 0.133 |
| Semester | 0.172 | 0.190 | 0.030 | 0.131 |
| AM_OnTimeFinish | 0.194 | 0.150 | 0.007 | 0.117 |
| HC_StudySpace | 0.051 | 0.041 | 0.237 | 0.110 |
| HC_FinancialSupport | 0.157 | 0.084 | 0.069 | 0.103 |
| Field of study | 0.119 | 0.043 | 0.121 | 0.095 |
| Age | 0.037 | 0.129 | 0.116 | 0.094 |
| AB_PassExams | 0.021 | 0.032 | 0.190 | 0.081 |
| SI_TeamIntegration | 0.100 | 0.061 | 0.078 | 0.080 |
| TS_HasTutor | 0.036 | 0.098 | 0.096 | 0.077 |
| AB_PriorityOblig | 0.084 | 0.068 | 0.038 | 0.063 |
| AD_IntAccred | 0.081 | 0.082 | 0.009 | 0.057 |
| PW_NoViolence | 0.044 | 0.100 | 0.027 | 0.057 |
| AM_AcadCompetitive | 0.108 | 0.042 | 0.000 | 0.050 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Reina Marín, Y.; Quiñones Huatangari, L.; Alva Tuesta, J.N.; Caro, O.C.; Maicelo Guevara, J.L.; Sánchez Bardales, E.; Chávez Santos, R. Data Mining to Identify Factors Associated with University Student Retention. Informatics 2026, 13, 50. https://doi.org/10.3390/informatics13040050
Reina Marín Y, Quiñones Huatangari L, Alva Tuesta JN, Caro OC, Maicelo Guevara JL, Sánchez Bardales E, Chávez Santos R. Data Mining to Identify Factors Associated with University Student Retention. Informatics. 2026; 13(4):50. https://doi.org/10.3390/informatics13040050
Chicago/Turabian StyleReina Marín, Yuri, Lenin Quiñones Huatangari, Judith Nathaly Alva Tuesta, Omer Cruz Caro, Jorge Luis Maicelo Guevara, Einstein Sánchez Bardales, and River Chávez Santos. 2026. "Data Mining to Identify Factors Associated with University Student Retention" Informatics 13, no. 4: 50. https://doi.org/10.3390/informatics13040050
APA StyleReina Marín, Y., Quiñones Huatangari, L., Alva Tuesta, J. N., Caro, O. C., Maicelo Guevara, J. L., Sánchez Bardales, E., & Chávez Santos, R. (2026). Data Mining to Identify Factors Associated with University Student Retention. Informatics, 13(4), 50. https://doi.org/10.3390/informatics13040050

