Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education
Abstract
:1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Dataset Description, Access, and Extraction
3.2. Preliminary Analysis: Main Statistics
3.3. Feature Selection
3.3.1. Random Forest
3.3.2. Extreme Gradient Boosting
3.3.3. Importance of Variables
3.4. Machine Learning Regression Models
3.4.1. Linear Regression
3.4.2. Support Vector Regression
3.4.3. Random Forest and XGBoost
4. Results
4.1. Feature Selection
4.2. Machine Learning Methods Evaluation
5. Discussion
Research Questions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Variable | Explanation |
---|---|
Student ID | The ID that identifies the student. |
Academic amount | The cost of the student’s enrollment. |
Administrative fee | The administrative and insurance costs. |
Degree | The grade the student is studying; the variable used to determine the area. |
Area | The area to which the student’s degree belongs (Social Sciences and Law, Sciences, Health Sciences, Engineering, Arts and Humanities). |
Family township | A dichotomous variable that identifies whether the student has a family in the region of Madrid or not. |
Admission option | The Spanish public university access system is competitive on the basis of student performance. A student can choose up to 12 options between the degree and university to access university studies. |
Gender | A dichotomous variable identifying the sex of the student. |
Country of birth | A dichotomous variable that identifies whether the student is Spanish or foreign. |
Admission study | A dichotomous variable that identifies whether the student has entered university from high school or from a professional training degree. |
Access speciality | In the last years of school, the student must choose between subjects from different areas that will determine the specialty with which they mainly enter university (Social Sciences and Humanities, Arts, Technical Sciences, Health Sciences). However, this requirement is not compulsory; a science student can enter social science degrees and vice versa. |
Time commitment | A dichotomous variable that identifies whether the student has enrolled in the first year of the full course or not. |
Access grade | The university entrance grade, an average between the marks of the last two years of high school and the entrance exam (over 14). |
Mother’s or guardian’s level of studies | The mother’s or guardian’s level of studies (illiterate, no education, primary education, secondary education, higher education). |
Father’s or guardian’s level of studies | The father’s or guardian’s level of studies (illiterate, no education, primary education, secondary education, higher education). |
Scholarship holder | A dichotomous variable that identifies whether the student receives any scholarships or not. |
Type of school | A variable that identifies whether the school is a comprehensive school, only an upper secondary school, or only a professional degree school. |
School holder | A variable that identifies whether the school is public, private, or private with public subsidy. |
Location of the school | A dichotomous variable that identifies whether the student has attended school in the region of Madrid or not. |
PAU Call | The university entrance examination has two calls, ordinary and extraordinary. |
Admission Reason | A student can be accepted under different quotas (general, disabled, elite athletes, appeal). |
Age | The age of the student in the first year of university. |
First-semester grade | The average first-semester grade at university (over 10). |
First-year grade | The average first-year grade at university (over 10). |
No. of ECTS Passed 1st semester | The number of ECTS passed in the first semester at university. |
No. of ECTS enrolled 1st semester | The number of ECTS enrolled in the first semester at university. |
Ratio of subject passes 1st semester | The ratio between ECTS passed and enrolled in the first semester. |
No. of ECTS Passed 1st year | The number of ECTS passed in the first year at university. |
No. of ECTS enrolled 1st year | The number of ECTS enrolled in the first year at university. |
Ratio of subject passes 1st year | The ratio between ECTS passed and enrolled in the first year. |
References
- Anderson, L.; Krathwohl, D. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives: Complete Edition; Addison Wesley Longman, Inc.: Boston, MA, USA, 2001. [Google Scholar]
- Bravo-Agapito, J.; Romero, S.J.; Pamplona, S. Early prediction of undergraduate student’s academic performance in completely online learning: A five-year study. Comput. Hum. Behav. 2021, 115, 106595. [Google Scholar] [CrossRef]
- Richardson, M.; Abraham, C.; Bond, R. Psychological correlates of university students’ academic performance: A systematic review and meta-analysis. Psychol. Bull. 2012, 138, 353–387. [Google Scholar] [CrossRef] [PubMed]
- Bowen, H.R.; Fincher, C. Goals: The intended outcomes of higher education. In Investment in Learning; Routledge: London, UK, 1996; pp. 31–60. [Google Scholar] [CrossRef]
- Hattie, J. Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement; Taylor & Francisc Group: New York, NY, USA, 2008; pp. 1–378. [Google Scholar]
- World Education Forum. Incheon Declaration: Education 2030: Towards Inclusive and Equitable Quality Education and Lifelong Learning for All. 2015. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000233137 (accessed on 1 July 2024).
- Marzano, R.J. Marzano Levels of School Effectiveness; Reseach Laboratory: Bloomington, IN, USA, 2012. [Google Scholar]
- You, J.W. Identifying significant indicators using LMS data to predict course achievement in online learning. Internet High. Educ. 2016, 29, 23–30. [Google Scholar] [CrossRef]
- Schneider, M.; Preckel, F. Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychol. Bull. 2017, 143, 565–600. [Google Scholar] [CrossRef] [PubMed]
- Chrysikos, A.; Ahmed, E.; Ward, R. Analysis of Tinto’s student integration theory in first-year undergraduate computing students of a UK higher education institution. Int. J. Comp. Educ. Dev. 2017, 19, 97–121. [Google Scholar] [CrossRef]
- McMillan, J.H.; Myran, S.; Workman, D. Elementary teachers’ classroom assessment and grading practices. J. Educ. Res. 2002, 95, 203–213. [Google Scholar] [CrossRef]
- McMillan, J.H.; Schumacher, S. Research in Education: Evidence-Based Inquiry, 7th ed.; Pearson: Hoboken, NJ, USA, 2010. [Google Scholar]
- Cervero, A.; Castro-López, A.; Álvarez-Blanco, L.; Esteban, M.; Bernardo, A. Evaluation of educational quality performance on virtual campuses using fuzzy inference systems. PLoS ONE 2020, 15, e0232802. [Google Scholar] [CrossRef]
- Papadogiannis, I.; Poulopoulos, V.; Platis, N.; Vassilakis, C.; Lepouras, G.; Wallace, M. First grade GPA as a predictor of later academic performance in high school. Knowledge 2023, 3, 513–524. [Google Scholar] [CrossRef]
- Kondo, N.; Okubo, M.; Hatanaka, T. Early detection of at-risk students using machine learning based on LMS Log Data. In Proceedings of the 2017 6th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2017, Hamamatsu, Japan, 9–13 July 2017; pp. 198–201. [Google Scholar] [CrossRef]
- Brooks, C.; Thompson, C.; Teasley, S. Who you are or what you do: Comparing the predictive power of demographics vs. activity patterns in massive open online courses (MOOCs). In Proceedings of the L@S 2015–2nd ACM Conference on Learning at Scale, Vancouver, BC, Canada, 14–18 March 2015; pp. 245–248. [Google Scholar] [CrossRef]
- Romero, C.; López, M.I.; Luna, J.M.; Ventura, S. Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 2013, 68, 458–472. [Google Scholar] [CrossRef]
- Alves, P.; Miranda, L.; Morais, C. The influence of virtual learning environments in Students’ performance. Univers. J. Educ. Res. 2017, 5, 517–527. [Google Scholar] [CrossRef]
- Pascual-Miguel, F.; Chaparro-Peláez, J.; Hernández-García, Á.; Iglesias-Pradas, S. A characterisation of passive and active interactions and their influence on students’ achievement using Moodle LMS logs. Int. J. Technol. Enhanc. Learn. 2011, 3, 403–414. [Google Scholar] [CrossRef]
- Abuzinadah, N.; Umer, M.; Ishaq, A.; Hejaili, A.; Alsubai, S.; Eshmawi, A.; Mohamed, A.; Ashraf, I. Role of convolutional features and machine learning for predicting student academic performance from MOODLE data. PLoS ONE 2023, 18, e0166111. [Google Scholar] [CrossRef] [PubMed]
- Alabduljabbar, A.; Almana, L.; Almansour, A.; Alshunaifi, A.; Alobaid, N.; Alothaim, N.; Shaik, S.A. Assessment of fear of failure among medical students at King Saud University. Front. Psychol. 2022, 13, 794700. [Google Scholar] [CrossRef] [PubMed]
- Aiken, J.M.; de Bin, R.; Hjorth-Jensen, M.; Caballero, M.D. Predicting time to graduation at a large enrollment American university. PLoS ONE 2020, 15, e0242334. [Google Scholar] [CrossRef] [PubMed]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; University of Chicago Press: Chicago, IL, USA, 2024; Volume 1. [Google Scholar]
- Mello-Román, J.D.; Gómez-Chacón, I.M. Creencias y rendimiento académico en matemáticas en el ingreso a carreras de ingeniería. Aula Abierta 2022, 51, 407–415. [Google Scholar] [CrossRef]
- Segura, M.; Mello, J.; Hernández, A. Machine learning prediction of university student dropout: Does preference play a key role? Mathematics 2022, 10, 3359. [Google Scholar] [CrossRef]
- Balfanz, R.; Byrnes, V. Early warning indicators and intervention systems: State of the field. In Handbook of Student Engagement Interventions: Working with Disengaged Students; Academic Press: Cambridge, MA, USA, 2019; pp. 45–55. [Google Scholar] [CrossRef]
- Lodge, J.M.; Corrin, L. What data and analytics can and do say about effective learning. NPJ Sci. Learn. 2017, 2, 5. [Google Scholar] [CrossRef]
- Macfadyen, L.P.; Dawson, S. International Forum of Educational Technology & Society Numbers Are Not Enough. Why e-learning analytics failed to inform an institutional strategic plan. J. Educ. Technol. Soc. 2012, 15, 149–163. [Google Scholar]
- Tinto, V. Leaving College: Rethinking the Causes and Cures of Student Attrition, 2nd ed.; University of Chicago Press: Chicago, IL, USA, 1993. [Google Scholar] [CrossRef]
- Xing, W.; Guo, R.; Petakovic, E.; Goggins, S. Participation-based student final performance prediction model through interpretable genetic programming: Integrating learning analytics, educational data mining and theory. Comput. Hum. Behav. 2015, 47, 168–181. [Google Scholar] [CrossRef]
- Galvez, C. Análisis de co-palabras aplicado a los artículos muy citados en Biblioteconomía y Ciencias de la Información (2007–2017). Transinformação 2018, 30, 277–286. [Google Scholar] [CrossRef]
- Valle, A.; Rodríguez, S.; Núñez, J.C.; Cabanach, R.G.; González-Pienda, J.A.; Rosario, P. Motivación y Aprendizaje Autorregulado. Interam. J. Psychol. 2010, 44, 86–97. [Google Scholar]
- Gil-Vera, V.D.; Quintero-López, C.; Gil-Vera, V.D.; Quintero-López, C. Predicción del rendimiento académico estudiantil con redes neuronales artificiales. Inf. Tecnológica 2021, 32, 221–228. [Google Scholar] [CrossRef]
- Peñaloza, J.L.; Vargas, C.G.; Mello, J. The Hierarchical nesting effect in the study and interpretation of academic performance in the social sciences: A 2-level multinivel application. In Proceedings of the 18th Annual International Technology, Education and Development Conference, Valencia, Spain, 4–6 March 2024; pp. 6520–6526. [Google Scholar] [CrossRef]
- Páez, A.R.; Ramírez, N.D.G. Modelos predictivos del rendimiento académico a partir de características de estudiantes de ingeniería. IE Rev. Investig. Educ. Rediech 2022, 13, e1426. [Google Scholar] [CrossRef]
- Fernández-Lasarte, O.; Ramos-Díaz, E.; Sáez, I.A. Academic performance, perceived social support and emotional intelligence at the university. Eur. J. Investig. Health Psychol. Educ. 2019, 9, 39–49. [Google Scholar] [CrossRef]
- Cassidy, S. Resilience building in students: The role of academic self-efficacy. Front. Psychol. 2015, 6, 1781. [Google Scholar] [CrossRef]
- Long, C.; Ferrier, F. Actuarial models in higher education research: The use of focus groups for developing a predictive model of student success. J. Appl. Res. High. Educ. 2012, 4, 28–44. [Google Scholar]
- Cleary, T.J.; Zimmerman, B.J. Self-regulation empowerment program: A school-based program to enhance self-regulated and strategic learning. Psychol. Sch. 2004, 41, 537–550. [Google Scholar] [CrossRef]
- Ochoa, L.L.; Rosas Paredes, K.; Baluarte Araya, C. Evaluación de técnicas de minería de datos para la predicción del rendimiento académico. In Proceedings of the LACCEI International Multi-Conference for Engineering, Education and Technology, Boca Raton, FL, USA, 19–21 July 2017. [Google Scholar] [CrossRef]
- Almutairi, S.; Shaiba, H.; Bezbradica, M. Predicting students’ academic performance and main behavioral features using data mining techniques. Commun. Comput. Inf. Sci. 2019, 1097, 245–259. [Google Scholar] [CrossRef]
- Viswanathan, S.; Vengatesh, S. Study of students’ performance prediction models using machine learning. Turk. J. Comput. Math. Educ. 2021, 12, 3085–3091. [Google Scholar] [CrossRef]
- Han, Z.; Wu, J.; Huang, C.; Huang, Q.; Zhao, M. A review on sentiment discovery and analysis of educational big-data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1328. [Google Scholar] [CrossRef]
- Nagawa, K.; Kishigami, T.; Yokoyama, F.; Murakami, S.; Yasugi, T.; Takaki, Y.; Inoue, K.; Tsuchihashi, S.; Seki, S.; Okada, Y.; et al. Diagnostic utility of a conventional MRI-based analysis and texture analysis for discriminating between ovarian thecoma-fibroma groups and ovarian granulosa cell tumors. J. Ovarian Res. 2022, 15, 65. [Google Scholar] [CrossRef] [PubMed]
- Whitmire, C.D.; Vance, J.M.; Rasheed, H.K.; Missaoui, A.; Rasheed, K.M.; Maier, F.W. Using Machine Learning and Feature Selection for Alfalfa Yield Prediction. AI 2021, 2, 71–88. [Google Scholar] [CrossRef]
- Luo, H.; Hansen, A.S.L.; Yang, L.; Schneider, K.; Kristensen, M.; Christensen, U.; Christensen, H.B.; Du, B.; Özdemir, E.; Feist, A.M.; et al. Coupling S-adenosylmethionine–dependent methylation to growth: Design and uses. PLoS Biol. 2019, 17, e2007050. [Google Scholar] [CrossRef] [PubMed]
- Masrom, S.; Rahman, R.A.; Mohamad, M.; Sani, A.; Rahman, A.; Baharun, N. Machine learning of tax avoidance detection based on hybrid metaheuristics algorithms. IAES Int. J. Artif. Intell. 2022, 11, 1153–1163. [Google Scholar] [CrossRef]
- Shahiri, A.M.; Husain, W.; Rashid, N.A. A review on predicting student’s performance using data mining techniques. Procedia Comput. Sci. 2015, 72, 414–422. [Google Scholar] [CrossRef]
- Kaliappan, J.; Srinivasan, K.; Mian Qaisar, S.; Sundararajan, K.; Chang, C.Y.; Suganthan, C. Performance evaluation of regression models for the prediction of the COVID-19 reproduction rate. Front. Public Health 2021, 9, 729–795. [Google Scholar] [CrossRef]
- Pujianto, D.; Nopiyanto, Y.E.; Wibowo, C. High school student-athletes: Their motivation, study habits, self-discipline, academic support and academic performance. Phys. Educ. Theory Methodol. 2024, 7989, 22–31. [Google Scholar] [CrossRef]
- Jin, Z.; Shang, J.; Zhu, Q.; Ling, C.; Xie, W.; Qiang, B. RFRSF: Employee turnover prediction based on random forests and survival analysis. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12343, pp. 503–515. [Google Scholar] [CrossRef]
- Scornet, E.; Biau, G.; Vert, J.P. Consistency of random forest. Ann. Stat. 2015, 43, 1716–1741. [Google Scholar] [CrossRef]
- Sokkhey, P.; Okazaki, T. Developing web-based support systems for predicting poor-performing students using educational data mining techniques. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 23–32. [Google Scholar] [CrossRef]
- Deepika, K.; Sathyanarayana, N. Relief-F and budget tree random forest based feature selection for student academic performance prediction. Int. J. Intell. Eng. Syst. 2019, 12, 30–39. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Jeganathan, S.; Lakshminarayanan, A.R.; Ramachandran, N.; Tunze, G.B. Predicting academic performance of immigrant students using XGBoost Regressor. Int. J. Inf. Technol. Web Eng. 2022, 17, 1–19. [Google Scholar] [CrossRef]
- Wang, J.; Xu, J.; Zhao, C.; Peng, Y.; Wang, H. An ensemble feature selection method for high-dimensional data based on sort aggregation. Syst. Sci. Control Eng. 2019, 7, 32–39. [Google Scholar] [CrossRef]
- An, H.; Ren, J.; Wu, S. XGBDeepFM for CTR Predictions in mobile advertising benefits from ad context. Math. Probl. Eng. 2020, 2020, 1747315. [Google Scholar] [CrossRef]
- Woo, H.; Kim, J.M. Impacts of learning orientation on the modeling of programming using feature selection and XGBoost: A gender-focused analysis. Appl. Sci. 2022, 12, 4922. [Google Scholar] [CrossRef]
- Wu, H.; Wu, C.; Lu, Q.; Ding, Z.; Xue, M.; Lin, J. Spatial-temporal characteristics of severe fever with thrombocytopenia syndrome and the relationship with meteorological factors from 2011 to 2018 in Zhejiang Province, China. PLoS Neglected Trop. Dis. 2021, 14, e0008186. [Google Scholar] [CrossRef]
- Li, C.; Zhou, L.; Xu, W. Estimating aboveground biomass using sentinel-2 msi data and ensemble algorithms for grassland in the shengjin lake wetland, China. Remote Sens. 2021, 13, 1595. [Google Scholar] [CrossRef]
- Zhai, B.; Chen, J. Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. Sci. Total Environ. 2018, 635, 644–658. [Google Scholar] [CrossRef]
- El Guabassi, I.; Bousalem, Z.; Marah, R.; Qazdar, A. A recommender system for predicting students’ admission to a graduate program using machine learning algorithms. Int. J. Online Biomed. Eng. 2021, 17, 135–147. [Google Scholar] [CrossRef]
- Alhazmi, E.; Sheneamer, A. Early predicting of students performance in higher education. IEEE Access 2023, 11, 27579–27589. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Adekitan, A.I.; Salau, O. The impact of engineering students’ performance in the first three years on their graduation result using educational data mining. Heliyon 2019, 5, e01250. [Google Scholar] [CrossRef] [PubMed]
- Román, J.D.M.; Estrada, A.H. A study on academic achievement in mathematics. Rev. Electron. Investig. Educ. 2019, 21, 1–10. [Google Scholar] [CrossRef]
- Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1997, 9, 155–161. [Google Scholar]
- Qasrawi, R.; VicunaPolo, S.; Al-Halawa, D.A.; Hallaq, S.; Abdeen, Z. Predicting school children academic performance using machine learning techniques. Adv. Sci. Technol. Eng. Syst. J. 2021, 6, 8–15. [Google Scholar] [CrossRef]
- Rifatv, M.R.I.; Imran, A.; Al Badrudduza, A.S.M. Educational performance analytics of undergraduate business students. Int. J. Mod. Educ. Comput. Sci. 2019, 11, 44–53. [Google Scholar] [CrossRef]
- Makombe, F.; Lall, M. A predictive model for the determination of academic performance in private higher education institutions. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 415–419. [Google Scholar] [CrossRef]
- Xu, H.; Kim, M. Combination prediction method of students’ performance based on ant colony algorithm. PLoS ONE 2024, 19, 1–18. [Google Scholar] [CrossRef]
- Corti, F.; Llanes, J.; Alcaraz, I.D.; Niella, M.F. Initial adaptation among university student: The case of the social sciences. PLoS ONE 2023, 18, e0294440. [Google Scholar] [CrossRef]
- Vandamme, J.; Meskens, N.; Superby, J.F. Predicting academic performance by data mining methods. Educ. Econ. 2007, 15, 405–419. [Google Scholar] [CrossRef]
- Subiros, J.; Rius, A.; Llorente, A.; Lozano, J. Early prediction of university dropout and academic performance using machine learning techniques. IEEE Access 2020, 8, 20900–20910. [Google Scholar]
Variables | Variables’ Values |
---|---|
Area | Social Science and Law (21.93%), Arts and Humanities (39.37%), Sciences (14.05%), Health Sciences (18.43%) and Engineering (6.23%) |
Gender | Men(35.22%), women (64.78%) |
Family township | Madrid (73.52%), outside Madrid (26.48%) |
Country of birth | Spain (92.79%), foreign country (7.21%) |
Admission option | First (65.46%), second (12.90%), third (6.28%), fourth (4.66%), fifth (3.74%), sixth (2.25%), seventh (1.37%), eighth (1.29%), ninth (0.75%), tenth (0.54%), eleventh (0.44%), twelfth (0.34%) |
Access specialty | Social Sciences and Humanities (47.82%), Arts (5.97%), Health Sciences (2.07%), Technical Sciences (44.15%) |
Admission study | Professional training degree (6.64%), high school (93.36%) |
PAU call | Ordinary (94.62%), extraordinary (5.38%) |
Mother’s or guardian’s level of studies | Illiterate (0.29%), no education (1.17%), primary education (6.82%), secondary education (27.44%), higher education (64.29%) |
Father’s or guardian’s level of studies | Illiterate (0.59%), no education (1.37%), primary education (10.71%), secondary education (31.28%), higher education (56.06%) |
Time commitment | Part-time student (1.85%), full-time student (98.15%) |
Type of school | Only a professional degree school (1.24%), a comprehensive school (4.63%), only an upper secondary school (94.13%) |
School holder | Public subsidy (1.24%), public (58.60%), private (36.69%) |
Location of the school | Madrid (73.17%), outside Madrid (26.83%) |
Admission reason | Appeal (0.56%), general (97.91%), disabled (0.79%), elite athletes (0.74%) |
Scholarship holder | No grant holder (70.44%), grant holder (29.56%) |
Variable | Mean | Median | Standard Deviation | Skewness |
---|---|---|---|---|
Age | 18.64 | 18 | 1.95 | 15.11 |
Academic amount | 582.46 | 584 | 541.37 | 1.34 |
Administrative fee | 30.08 | 35 | 9.00 | −1.78 |
Access grade | 10.72 | 11 | 1.93 | −0.55 |
No. of ECTS enrolled 1st year | 60.76 | 60 | 5.49 | −0.21 |
No. of ECTS Passed 1st year | 50.79 | 60 | 16.98 | −1.42 |
Ratio of subject passes 1st year | 0.83 | 1 | 0.26 | −1.75 |
No. of ECTS enrolled 1st semester | 27.81 | 30 | 6.40 | −1.37 |
No. of ECTS Passed 1st semester | 20.74 | 24 | 9.56 | −0.45 |
Ratio of subject passes 1st semester | 0.75 | 0.80 | 0.30 | −1.07 |
First-year grade | 6.56 | 6.79 | 1.63 | −1.32 |
First-semester grade | 6.31 | 6.54 | 1.82 | −0.99 |
Importance Order | Random Forest 1 | XGBoost 1 | ||
---|---|---|---|---|
Variables | %IncMSE | Variables | Gain | |
1 | Access grade | 101.14 | Access grade | 0.4768 |
2 | Academic amount | 51.55 | Academic amount | 0.0683 |
3 | No. of ECTS enrolled 1st semester | 44.49 | No. of ECTS enrolled 1st year | 0.0661 |
4 | No. of ECTS enrolled 1st year | 37.86 | No. of ECTS enrolled 1st semester | 0.0631 |
5 | Gender | 34.19 | Admission option | 0.0560 |
6 | Admission option | 33.44 | Gender | 0.0543 |
7 | Family township | 33.23 | Age | 0.0462 |
8 | Location of the school | 31.19 | Administrative fee | 0.0461 |
9 | Scholarship holder | 28.93 | Father’s or guardian’s level of studies | 0.0273 |
10 | Administrative fee | 25.34 | Family township | 0.0201 |
Importance Order | Random Forest 1 | XGBoost 1 | ||
---|---|---|---|---|
Variables | %IncMSE | Variables | Gain | |
1 | Access grade | 114.36 | Access grade | 0.4856 |
2 | Academic amount | 50.32 | Academic amount | 0.0816 |
3 | No. of ECTS enrolled 1st semester | 48.89 | No. of ECTS enrolled 1st semester. | 0.0747 |
4 | No. of ECTS enrolled 1st year | 39.19 | No. of ECTS enrolled 1st year | 0.0605 |
5 | Scholarship holder | 32.98 | Age | 0.0511 |
6 | Gender | 32.15 | Admission option | 0.0502 |
7 | Admission option | 31.13 | Administrative fee | 0.0410 |
8 | Family township | 28.27 | Gender | 0.0397 |
9 | Location of the school | 27.72 | Father’s or guardian’s level of studies | 0.0220 |
10 | Admission study | 24.75 | Family township | 0.0217 |
Feature Selection | Machine Learning Methods | First-Year Grade 1 | First-Semester Grade 2 | ||
---|---|---|---|---|---|
R2 | RMSE | R2 | RMSE | ||
Random Forest | LR | 0.17 | 1.48 | 0.20 | 1.59 |
SVR | 0.18 | 1.47 | 0.23 | 1.57 | |
RF | 0.22 | 1.42 | 0.23 | 1.56 | |
XGBoost | 0.21 | 1.46 | 0.23 | 1.58 | |
XGBoost | LR | 0.17 | 1.48 | 0.21 | 1.59 |
SVR | 0.18 | 1.47 | 0.21 | 1.59 | |
RF | 0.22 | 1.43 | 0.24 | 1.57 | |
XGBoost | 0.22 | 1.47 | 0.22 | 1.59 |
Feature Selection RF 1 | First-Semester Grade | Feature Selection XGBoost 2 | First-Semester Grade |
---|---|---|---|
Intercept | 6.33 | Intercept | 6.19 |
Access grade | 0.60 | Access grade | 0.61 |
Academic amount | −0.14 | Academic amount | −0.22 |
No. of ECTS enrolled 1st semester | 0.05 | No. of ECTS enrolled 1st semester | 0.06 |
No. of ECTS enrolled 1st year | 0.19 | No. of ECTS enrolled 1st year | 0.20 |
Scholarship holder | 0.02 | Age | 0.12 |
Gender | −0.36 | Admission option | −0.88 |
Admission option | −0.91 | Administrative fee | 0.21 |
Family township | 0.35 | Gender | −0.36 |
Location of the school | 0.07 | Father’s or guardian’s level of studies | 0.01 |
Admission study | −0.14 | Family township | 0.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sánchez-Sánchez, A.M.; Mello-Román, J.D.; Segura, M.; Hernández, A. Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education. Systems 2024, 12, 425. https://doi.org/10.3390/systems12100425
Sánchez-Sánchez AM, Mello-Román JD, Segura M, Hernández A. Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education. Systems. 2024; 12(10):425. https://doi.org/10.3390/systems12100425
Chicago/Turabian StyleSánchez-Sánchez, Ana María, Jorge Daniel Mello-Román, Marina Segura, and Adolfo Hernández. 2024. "Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education" Systems 12, no. 10: 425. https://doi.org/10.3390/systems12100425
APA StyleSánchez-Sánchez, A. M., Mello-Román, J. D., Segura, M., & Hernández, A. (2024). Identifying the Determinants of Academic Success: A Machine Learning Approach in Spanish Higher Education. Systems, 12(10), 425. https://doi.org/10.3390/systems12100425