Next Article in Journal
Comparative Analysis of Wavelet Bases for Solving First-Kind Fredholm Integral Equations
Previous Article in Journal
Cross-View Heterogeneous Graph Contrastive Learning Method for Healthy Food Recommendation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Factors, Prediction, Explainability, and Simulating University Dropout Through Machine Learning: A Systematic Review, 2012–2024

by
Mauricio Quimiz-Moreira
1,*,
Rosa Delgadillo
1,*,
Jorge Parraga-Alava
2,
Nelson Maculan
3 and
David Mauricio
1
1
Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru
2
Facultad de Ciencias Informáticas, Universidad Técnica de Manabí, Portoviejo 130104, Ecuador
3
Systems Engineering-Computer Science and Applied Mathematics, CT & CCMN, Campus—Ilha do Fundão, Federal University of Rio de Janeiro, Rio de Janeiro 21941-617, Brazil
*
Authors to whom correspondence should be addressed.
Computation 2025, 13(8), 198; https://doi.org/10.3390/computation13080198
Submission received: 3 July 2025 / Revised: 4 August 2025 / Accepted: 6 August 2025 / Published: 12 August 2025

Abstract

College dropout represents a significant challenge for universities, and despite advances in machine learning technologies, predicting dropout remains a complex task. This literature review focuses on investigating the factors that influence college dropout, examining the models used to predict it, and highlighting the most significant advances in explainability and simulation over the period 2012 to 2024 using the PRISMA methodology. They identified 520 factors in five categories (demographic, socioeconomic, institutional, personal, and academic), with the most studied factors in each category being, respectively, gender, scholarships, infrastructure, student identification, and grades. They also identified 83 machine learning models, with the most studied being the decision tree, logistic regression, and random forest. In addition, eight explanatory models were identified, with SHAP and LIME being the most widely used. Finally, no simulation models related to university dropout were identified. This study groups factors related to university dropout into key models for prediction and analyzes the methods used to explain the causal factors that influence university student dropout.

1. Introduction

Globally, one in three students drop out of higher education (HE), a phenomenon largely influenced by personal factors and the associated social costs [1]. Additionally, UNESCO data indicate that approximately 30% of students entering higher education (HE) fail to complete their studies [2]. In Europe, the Organization for Economic Co-operation and Development (OECD) reports that university dropout rates (UED) vary between 30% and 45% by 2022 [3]. These figures have generated a growing interest in mitigating university dropout (UD) and fostering student retention, issues that have acquired strategic relevance for higher education, as reducing dropout directly contributes to the development of high-level skills and the strengthening of human capital [4].
In South Africa, the higher education system faces one of the highest student dropout rates globally, resulting in an extremely low graduation rate, estimated at only 15% [5]. This phenomenon not only significantly decreases the student population in HEIs [6] but also adversely impacts society by failing to meet the growing demand for professionals needed for economic and social development [7], constituting a critical challenge for both the education system and society as a whole [8].
Moreover, in Latin America, the impact of student dropout has also resulted in considerable economic and social losses. In 2019, approximately 26% of students dropped out of school, evidencing a structural problem that, according to some experts, stands as a key indicator of deficiencies in the quality of the education system [9]. This phenomenon has significant implications for the students themselves, who face multiple barriers to continuing their academic education. Among these barriers are the lack of financial resources to cover their studies, economic dependence on parents or external subsidies, and the limitations imposed by the labor market’s demands, which do not always allow for combining study with part-time employment [10].
A key approach to mitigating UED is to identify which students are more likely to drop out, analyze the underlying causes, and establish effective strategies to encourage their academic continuity [11]. In this context, several factors associated with UED that increase the likelihood of student dropout have been studied [12]. The implementation of machine learning (ML) models has proven to be an effective tool for predicting UED [13], while explainable artificial intelligence (XAI) models have gained relevance for their ability to identify and explain the causes of dropout. In addition, simulation is used to model early-stage UED risk scenarios, allowing for the anticipation of the behavior of students at high risk of dropping out and to simulate the effects of tutoring, remedial courses, or socioeconomic support for students with a high probability of dropping out [14].
The prediction process in ML encompasses several fundamental stages that ensure the efficiency and accuracy of the developed model [15]. It starts with data collection, where large volumes of relevant and representative information are collected and stored. This is followed by data preprocessing, which includes cleaning, normalization, and transformation to address inconsistencies, outliers, or missing values, ensuring data quality and usability. Subsequently, feature selection is performed to identify and prioritize the most influential variables, thereby optimizing both the performance and simplicity of the model. With the pre-processed data and the selected features, the model is built using machine learning (ML) models, adjusting hyperparameters and training the model with appropriate techniques for the type of problem, whether classification, regression, or clustering [16]. Finally, the model is deployed in a real environment to make predictions and support decision-making. At the same time, its performance is monitored using metrics such as accuracy, sensitivity, and specificity, allowing continuous adjustments to maintain its effectiveness under dynamic conditions [17].
Several studies have been conducted on the prediction of UED using machine learning (ML). The decision tree achieved an accuracy of 99.34% in identifying the most relevant factors for predicting attrition [18]. Similarly, logistic regression was applied at the Universidade de Trás-os-Montes e Alto Douro (UTAD), yielding accuracies of 88% and 90% in two separate studies [15].
In addition, explanation methods such as SHAP (Shapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) enable a better understanding and increased confidence in ML models by describing the factors that influence the prediction results [19]. SHAP calculates the impact of each factor on the prediction and its interaction with others, being used at the Budapest University of Technology and Economics [1]. LIME (Local Interpretable Model-agnostic Explanations) provides independent interpretations of the model’s internal architecture. By applying local perturbations to the input data, it constructs an interpretable model, such as a linear regression, that approximates the behavior of the original model and enables the identification of the most influential variables [20].
Regarding simulation, some studies have employed the Generalized mixed-effects random forest (GMERF) model, as noted by [14,21]. However, this approach is not a simulation model in the strict sense but rather a form of structured or implicit simulation, where GMERF allows the emulation of hierarchical relationships between different levels of the educational system, such as student, academic program, and institution, to anticipate dropout risks based on historical patterns.
Due to the proliferation of publications on UED, several authors have developed systematic reviews. For example, in [22], 67 papers were analyzed between 2006 and 2018, including nine conferences, which identified 112 factors, 10 preprocessing techniques, 10-factor selection techniques, 14 prediction methods, and two tools. Similarly, ref. [23] reviewed 17 articles published between 2009 and 2021, identifying 14 machine learning (ML) models and 29 factors associated with attrition. Additionally, ref. [24] identified 23 UD-related research papers that described 373 factors and 13 ML algorithms. Ref. [25] analyzed more than 50 papers, identifying 12 models, five categories, and more than 30 factors between 2013 and 2017. Ref. [26] collected 33 papers between 2010 and 2022, identifying 50 models and 10 categories of factors. Ref. [27] reviewed 134 research papers, identifying more than five categories, 30 models, and five types of anomalies between 2016 and 2021. Ref. [28] analyzed 67 papers between 2017 and 2021, finding more than 30 prediction models and 40 factors in 5 categories. Ref. [29] analyzed 12 papers on the prediction of UED where six sensitive factors, five preprocessing techniques, and 5 ML models were found. Ref. [30] reviewed 36 studies between 2000 and 2023, identifying 8 ML models, 15 factors, and 3 factor categories. Finally, ref. [24] identified 23 studies from 2000 to 2023, finding 373 factors and 16 models and highlighting the XAI limitation in black box models.
These literature reviews reveal an extensive body of work focusing on the prediction of UED, ranging from the identification of factors to the use of various models. However, these studies generally focus exclusively on predictive ability without incorporating the explainability of the results or exploring UED risk scenarios. This limitation reduces the possibility of gaining a deeper and more detailed understanding of the factors contributing to the prediction of UED, which is essential for developing effective and personalized interventions.
This research proposes to conduct a systematic review of the prediction, explanation, and simulation of UD using ML, covering publications in reviews indexed in Scopus and Web of Science (WoS) from 2012 to 2024. The central objective is to answer the following research question: what are the most relevant ML models for predicting, explaining, and simulating university dropout?
The main contributions of this article are:
  • To provide an inventory of factors, predictive, explanatory, and simulation models for UD.
  • To provide the reader with a wide range of bibliographical references to understand and research UED using ML.
The paper is structured in six sections. Section 2 details the methodology applied in this systematic review. Section 3 presents the analysis that answers the research questions posed. The discussion of these findings is presented in Section 4, and the conclusions are drawn in Section 5.

2. Theoretical Background

According to Tinto [31,32,33], student retention depends to a large extent on the level of academic and social integration that the student achieves within the institution. The lack of links with teachers, peers, and the university environment increases the risk of dropout. Subsequent studies have supported this view, highlighting that a sense of belonging and institutional support are determinants of persistence and academic success [34]. Moreover, one of the current challenges for HEIs is to leverage the large volume of available data to design strategies that improve student retention in universities [35].
This section examines the principles of ML and their application in addressing university dropout. By analyzing these principles in detail, ML becomes a crucial element for mitigating UED and implementing institutional policies.

2.1. University Dropout

UED is the definitive interruption of the educational process in HE without the award of a qualification. This phenomenon is of significant concern to educational institutions and governments because of its negative implications for the personal development of students and the economic growth of societies [36,37].
For [38], UED must be understood beyond prolonged absence or lack of enrollment because it implies a total disconnection of the student from the institutional environment, motivated by multiple factors that can range from economic hardship to a lack of a sense of belonging. Furthermore, UED is a multifactorial phenomenon influenced by academic, personal, social, institutional, and economic variables [39]. Factors such as poor academic performance, dissatisfaction with the study program, financial difficulties, and lack of institutional support are key determinants in the decision to drop out [34].

2.2. Machine Learning

ML is a sub-discipline of AI that enables computer systems to learn from data and generate predictions without needing to be programmed for each situation, allowing for the analysis of large volumes of HE-related information. According to [40], ML algorithms are particularly effective at identifying hidden patterns in academic, demographic, and behavioral data, making them ideal for addressing complex phenomena such as college dropout. Similarly, refs. [41,42] highlight that the accuracy of these models enables reliable predictions of student performance and dropout risk, thereby fostering informed institutional decisions. According to [43], by integrating multiple data sources, such as grades, attendance, participation in virtual platforms, and socioeconomic variables, ML algorithms can reliably anticipate which students are at risk of dropping out.
In addition, ML has been instrumental in building smart and adaptive learning environments that respond to the individual needs of each learner [44]. These systems can dynamically adjust the content, the pace of instruction, and assessment methods according to the learner’s profile and performance, contributing to a more personalized learning experience. This adaptability has proven crucial in fostering student motivation, engagement, and retention. As stated by [45], the application of ML also enables teachers to monitor in real-time the effectiveness of their pedagogical strategies, facilitating continuous feedback that improves the quality of the teaching-learning process and, consequently, reduces the risk of dropout.

2.3. Artificial Intelligence Explained (XAI)

XAI is a subfield of machine learning that focuses on providing understandable interpretations of predictive models, particularly in environments where decisions must be auditable and justifiable. The aim is to make the decisions of complex so-called black-box models understandable, allowing non-expert users to interpret and trust their results [46]. In addition, XAI contributes to algorithmic fairness by facilitating the detection of biases, which reinforces transparency and institutional accountability in the use of automated decision support systems [20].

2.4. Simulation

Simulation is a computational tool that focuses on representing, exploring, and anticipating complex scenarios based on initial conditions or hypothetical intervention strategies. Unlike traditional predictive models, which are limited to estimating the probability of event occurrence, such as university dropout, simulation-based approaches enable virtual experimentation with different system configurations, serving as a laboratory where causes, consequences, and cumulative effects can be explored. In this sense, [47] highlights the integration of artificial intelligence with the simulation of complex systems as a way to optimize overall performance, identify critical variables, and design more effective interventions. Thus, simulation in machine learning not only enhances predictive capacity but also significantly contributes to strategic decision-making, supported by evidence generated in a controlled and replicable manner.

3. Materials and Methods

The present study is based on an adaptation of the procedure presented in [48] and used in [49], and is structured in the following phases:
  • Planning: in this phase, the research questions are established, and the review protocol is defined. this protocol outlines the sources of information used, the criteria for including and excluding studies, the data search strategy, and the period considered for the review.
  • Development: primary studies are selected according to the plan, and their quality is then assessed for data extraction and synthesis.
  • Results: the results and statistical analyses, which provide answers to the research questions, are presented in Section 3.3 and Section 4, respectively.

3.1. Planification

To address the key issues related to understanding the factors, prediction, explainability, and simulation of UED, the following research questions have been formulated:
  • Q1. What factors exist for UED, and which are the most studied?
  • Q2. What machine learning models are used for predicting UED?
  • Q3. What are the advances of XAI in UED?
  • Q4. What simulation models exist for UED?
The search string specified in Table 1 was used and applied to the fields ‘title-abstract-keywords’ in Scopus and ‘topic’ in Web of Science (WoS). Articles published in journals in the period 2012–2024 were considered. The criteria for including and excluding studies are detailed in Table 2.

3.2. Development

Potential studies identified during the search were subjected to a rigorous selection process based on the inclusion and exclusion criteria detailed in Table 2. This process involved a thorough review of the content of each study to assess its relevance concerning factors, prediction, explainability, and simulation of UED using ML. Most of the studies were excluded because they focused on areas such as secondary or primary education. Figure 1 illustrates this selection process and describes the specific activities undertaken to determine whether studies are included or excluded.

3.3. Statistics

For the collection of information, a search for relevant research related to university dropout, dropout, and academic withdrawal was conducted to ensure adequate coverage of ML studies in HE. The articles that prevailed in the initial selection were reviewed in their entirety, ensuring that they provided empirical evidence, significant theoretical contributions, and applications of machine learning in the analysis of UED.

3.3.1. Number of Potential and Selected Items

Table 3 presents the number of potential and selected articles by source. Note that the total number of selected articles corresponds to 15.52% of the total number of potential studies.

3.3.2. Trend of Articles per Year

Figure 2 illustrates the trend in the number of selected papers by year of publication on UED, highlighting a 5 to 1 ratio between the periods 2019–2024 with 104 papers and 2012–2018 with 18 papers. This indicates an increasing trend of studies since 2019.

3.3.3. Number of Authors by Country of Affiliation

Figure 3 shows the geographical distribution of author affiliation of the 100 selected papers on UED. Forty-seven countries have been identified, with Peru standing out as the country that makes the highest number of contributions to the topic, representing 19%. Spain contributes 8% of the total number of countries.

3.3.4. Selected Articles by Quartile

Figure 4 shows the number of studies selected by quartile for this analysis. Notably, 46% of the articles belong to the Q2 quartile, while 34% are classified in the Q1 quartile, indicating that the selected articles are of high quality and scientific relevance.

3.3.5. Selected Articles by Publisher

Figure 5 shows the selected papers by publisher, where MDPI is predominant with 19%. IEEE Xplore reaches 14 million, and journals published by universities account for 10%. In addition, it is worth noting that the ‘Others’ category comprises 22 articles, but these are publishers with only one published article.

4. Results

This section addresses the research questions posed in Section 2.1, based on the selected articles.

4.1. What UED Factors Exist, and Which Are the Most Studied?

For a better understanding of the factors influencing UED, these must be categorized. Therefore, five categories of factors have been identified from the selected studies, which are described in Table 4.
520 UED factors have been identified in the 122 selected articles, which have been classified according to Table 4. It should be noted that the total number of factors indicated corresponds to all the individual mentions extracted from the articles prior to their standardization and grouping process. Subsequent tables show consolidated factors and, in some cases, analyze subsets of articles according to specific criteria of each analysis. Therefore, the differences in factors and article counts between the various tables reflect the synthesis and classification methodology employed, rather than the inclusion of sources other than those in WoS or Scopus.

4.1.1. Demographic Factors

In this category, 75 demographic factors were identified across 83 articles, with gender and age being the most frequently studied, at 50 (60%) and 43 (52%), respectively. Table 5 presents the 10 most relevant demographic factors; the complete list of 75 identified factors is available in Table A1.

4.1.2. Socioeconomic Factors

Within this category, 80 factors were identified in 64 articles, with scholarships and jobs being the most studied, with 16 (22%) and 12 (19%) mentions, respectively. Table 6 presents a summary of the 10 most relevant socioeconomic factors, while the full list of factors identified in this category is available in Table A2.

4.1.3. Institutional Factors

A total of 23 institutional factors were identified in 12 articles (see Table 7), with infrastructure, educational services, adequate equipment, and location being the most studied, each cited in 2 articles (17%). Table 7 re-summarizes the 10 most relevant institutional factors, while the complete list of all identified factors is available in Table A3.

4.1.4. Personal Factors

In this category, 138 personal factors were identified across 50 articles, with the year of entry being the most researched aspect, mentioned in 10 articles (20%). Table 8 presents the 10 most relevant personal factors, and Table A4 provides the complete list of identified factors.

4.1.5. Academic Factors

In total, 206 academic factors were identified across 90 articles, with grades and subjects being the most frequently investigated aspects, mentioned in 17 (19%) and 16 (18%) articles, respectively. Table 9 presents the 10 most relevant academic factors, while Table A5 provides the complete list of identified factors.

4.1.6. Summary of Categories

Figure 6 illustrates the number of UED factors, categorized by type (demographic, socioeconomic, institutional, personal, and academic) and frequency of occurrence in the reviewed studies. Most of the institutional and personal factors have a low frequency (1–5 studies). In comparison, the academic and demographic categories concentrate on the factors with the highest recurrence (range of 6 to 20 or more studies).

4.2. Which ML Models Are Used for Predicting UED?

To address UED, 149 ML-based prediction models (individual and hybrid) were identified in 86 selected articles, where 38% employed decision trees (DT), 26% logistic regression (LR), and 22% random forests (RFs). It is worth noting that a single study can encompass multiple models. In addition, other models analyzed include support vector machines (SVM), which are present in 27 articles (21%), and artificial neural networks (ANNs), which are present in 21 articles (16%). Table 10 presents a synthesis of the most relevant prediction models (used in at least two studies) having accuracy as a metric. In contrast, the complete list of identified models is available in Table A6.
As for the preprocessing steps, the most used techniques were data cleaning and data transformation. It should be noted that 26% of the works did not use preprocessing techniques (see Table 11).

4.3. What Progress Has XAI Made in the UED?

Eight XAI models for the UED (SHAP, LIME, GPI, AM, SAGE, ANFIS, PEI, and PEM-SNN) have been identified in 11 of the 122 selected studies, highlighting SHAP and LIME, which are described in Table 12. It is worth noting that SHAP assigns an importance value to each resource for a specific forecast and highlights the importance of additive measures [149]. LIME provides a local approximation that allows for a precise understanding of the most critical specific factors contributing to the prediction of UED, which serve as inputs not only to validate the model’s behavior but also to design institutional intervention strategies [20].
Table 13 presents the combination of the main factors associated with university dropout and the XAI models used to explain them in the studies reviewed. SHAP is the most widely used model, encompassing a diverse range of factors, including GPA, cumulative credits, age, family income, attendance, scholarships, gender, participation, personality, hours of study, and grades on assignments or exams. LIME is used to explain outcomes related to GPA and test scores. Models such as GPI, AM, SAGE, and PEI have focused on academic variables, including GPA, cumulative credits, and homework grades, while addressing other factors in an ad hoc manner. ANFIS has been applied to explain predictions based on age, income, and hours of study, whereas PEM-SNN stands out for its ability to integrate the explanation of multiple key factors, including GPA, accumulated credits, age, income, scholarships, and gender.

4.4. What Simulation Models Exist for the UED?

To date, no simulation studies have been identified for UED, despite its importance in analyzing scenarios for understanding dropout, such as its causes and behaviors.

5. Discussion

The result of this systematic review is a comprehensive catalog that includes factors, prediction models, explanation methods, and simulation methods focused on UED. This catalog provides a comprehensive overview that contributes to the understanding of higher education attrition through ML and establishes strategies to maximize student retention. The quality of the results is confirmed by the fact that 65% of the selected papers are from journals in the top two quartiles.

5.1. About Factors

Five categories of factors have been identified: demographic, socioeconomic, institutional, personal, and academic. The most studied factors in each category are, respectively, gender, student scholarships, university infrastructure, year of entry, and grades. Gender influences academic behavior, career choice, and experiences of discrimination, which are exacerbated by social and cultural norms, which can demotivate students and increase dropout rates. Student scholarships offer financial assistance to low-income students, enabling them to continue their studies. The quality of the university’s infrastructure significantly contributes to an enriching and fulfilling educational experience, which is essential for fostering a positive academic environment. The year of entry affects UED due to changes in educational policies and resources, which vary over time and can impact student adjustment and success. In addition, in the economic and social context of each year, recessions can increase financial hardship and dropout rates. Finally, grades are key indicators of academic difficulties, allowing for timely interventions that can prevent students from dropping out. These elements are a priority for research because of their ability to be assessed objectively, providing a solid basis for formulating effective retention policies and strategies.
The relevance of these factors is reflected in Figure 6, which shows a high concentration of academic and demographic factors (more than 11 studies). In comparison, institutional and personal factors predominate in the low frequency ranges (1–5 studies). This pattern indicates the prioritization of measurable and easily accessible factors, relegating institutional and personal aspects to the background. Thus, the heat map not only summarizes the relative weight of each group of factors but also highlights existing gaps and directs future research towards less-explored dimensions.
Furthermore, understanding the interrelationships between factors related to UED is crucial for designing more effective intervention strategies. In [154], causal inference techniques were used to analyze the impact of academic load in the first year on the risk of dropping out, showing that lower academic load significantly reduces the probability of dropping out, particularly in students with high academic vulnerability, demonstrating the usefulness of causal models to design more targeted interventions. The Qatar University (QU) study employed structural equation modeling (SEM) to identify the factors influencing the perception of the institutional image. The results showed that student services represent the factor with the most significant positive impact, followed by administrative feedback and academic services [155]. Ref. [156] develops a model based on Mixture Structural Equation Models (MSEM) to classify students who continue or drop out of university studies in Latin America. The model incorporates variables such as student health, interpersonal relationships, and class attendance, showing that adaptation to university has a positive impact on academic satisfaction.
It is noted that some factors are more challenging to assess due to their inherent complexity and variability, for example, factors related to psychological aspects. Therefore, studies prioritize more tangible and easily measurable factors, such as academic performance, financial support, and demographic characteristics [157].

5.2. About the Model

149. ML models were identified in the UED. The variety of ML models at UED indicates a strong interest in optimizing and improving outcomes. This diversity reflects the continuous effort to better understand the causal factors that lead a student to drop out and underscores the adaptability of ML to meet specific needs in the context of higher education.
Among the most widely used models for predicting university dropout are decision trees (DT), logistic regression (LR), and random forests (RFs), with 49, 34, and 28 studies, respectively, as detailed in Table 11. These algorithms were chosen due to their ability to handle incomplete information, high inter-record variability, and complex relationships between the different groups of factors influencing UED [158]. On the other hand, although deep learning models have shown superior performance in other areas, their use in UED prediction is not always satisfactory. This is because educational data are usually smaller in size and present high variability in the quality of the records, conditions that negatively affect the training of deep networks [159]. Moreover, due to the high complexity of their models, which include millions of parameters distributed in multiple nonlinear layers, it isn’t easy to understand how each input variable contributes to the UED [46].
In terms of preprocessing techniques, the most used is data cleaning, as it ensures the accuracy, completeness, and quality of the information by eliminating anomalies and biases in the dataset. In 26% of the studies, no preprocessing techniques were used, which may be because the models can process categorical variables directly and provide improved decisions without numerical coding.
In addition, preliminary UED research, as in [82], indicates an accuracy level of 97.6%, but this indicator can be misleading due to the unbalanced distribution of the data, where the number of students who do not drop out considerably exceeds the number of dropouts. In such situations, accuracy loses relevance as an evaluation measure since a model that classifies all students as non-dropouts could achieve a value of 99.5% if dropouts represent only 0.5%, even without correctly identifying any dropouts. Faced with this problem, we recommend the use of more appropriate metrics, such as the F1-Score, which integrates precision (accuracy) and sensitivity (recall), offering a balance between the effective identification of the minority class (dropouts) and the reduction of false positives, positioning itself as a more reliable metric to evaluate models in contexts with data imbalance.

5.3. About Explication

Only 11 studies were found that applied eight explainability models, where SHAP was the most widely used due to its ability to intuitively and transparently break down the contributions of each feature in the predictions of the ML models. The scarcity of studies on explainability models is due to their limited integration in education, the lack of awareness of the importance of explainability in predictive models within the educational community, and the access to computational resources that are not always available in educational settings.
Furthermore, despite the increase in XAI models, most focus on a small number of academic and demographic factors, leaving institutional, personal, and behavioral variables (see Table 13). This asymmetry highlights an opportunity to expand the use of XAI models in explaining less-addressed factors, thereby enriching the comprehensive understanding of university dropouts.

5.4. About Simulation

In this study, no research on simulating UED was identified, indicating a lack of focus on simulating scenarios to address dropout. This could be because UED is a complex phenomenon, affected by a variety of socioeconomic, academic, personal, and institutional factors, whose complex interaction makes it difficult to model for practical simulations. Moreover, running such simulations requires access to detailed and sensitive data on student behavior, which is often restricted by privacy or unavailable.

5.5. Factors, Prediction, Explanation, and Simulation

The UED approach articulates four fundamental pillars: factors, prediction, explanation, and simulation, which interact in an integrated manner, as illustrated in Figure 7, to mitigate student dropout. The first component, factor analysis, allows the identification of the most determinant variables in the risk of dropout, considering demographic, academic, socioeconomic, personal, and institutional dimensions. Subsequently, based on the data associated with these factors, the ML models generate individualized predictions on the probability of dropout. When the model predicts an attrition scenario, explicability mechanisms enable the outcome to be decomposed, identifying the underlying causes that contribute to the risk. This explanatory capability provides the institutional manager with an in-depth understanding of the critical factors, facilitating the formulation of intervention scenarios through simulation. At this stage, the specialist adjusts the risk factors in the models, exploring alternatives that lead to a retainment scenario. This iterative cycle of factors-prediction-explanation-simulation can be repeated until a retention scenario is reached. Finally, the analysis of the retention scenario also enables the identification of the degree of influence of each factor, providing precise inputs for the design and implementation of personalized strategies aimed at enhancing student retention.

6. Conclusions

A systematic review of the literature on UED was carried out using ML. Of the 786 articles identified, 122 articles were selected through detailed analysis, 81% of which corresponded to high-impact journals (Q1 and Q2 quartiles). A total of 520 factors were identified in 98 studies, 149 prediction models in 86 studies, and eight explanatory algorithms in 11 studies, with no articles on simulation. Unlike other studies, this work considered four important aspects: a greater number of factors, prediction models, explainability models, and simulation.
Regarding the factors, five fundamental categories were identified: demographic (75), socioeconomic (80), institutional (23), personal (137), and academic (205). The quantitative analysis, supported by the frequency heat map, reveals that academic and demographic factors are the most frequently studied and recurring topics in the literature, while institutional and personal factors are less frequently addressed and appear mainly in low-frequency ranges. Within each category, the most investigated factors were gender, student scholarships, university infrastructure, year of entry, and grades, respectively. This pattern confirms that academic and demographic factors continue to be the primary predictors of university dropout, aligning with the literature’s preference for objective and readily accessible variables. Regarding the ML models for predicting UED, DT stands out as the most used, followed by LR, RF, and ANN. There is an increasing trend in the use of hybrid models designed to enhance prediction accuracy. Additionally, data cleaning is one of the most widely used preprocessing techniques. In terms of explainability, eleven papers were identified that apply XAI techniques in education, with SHAP and LIME being the most used models. SHAP has been used to explain factors such as GPA, cumulative credits, age, family income, attendance, scholarships, gender, participation, personality, hours of study, and grades in homework or exams, showing its versatility and scope in educational analysis. LIME, although less frequent, has primarily focused on interpreting results associated with GPA and assessments. On the other hand, more recent approaches, such as GPI, AM, SAGE, and PEI, demonstrate a more focused application of academic variables. In the field of simulation, there were no studies to simulate UED using ML.
Our research demonstrates the relevance of incorporating ML models in the university environment for UD, as they facilitate the analysis of large volumes of academic information, enabling the identification of patterns and provision of relevant results for decision-making in universities. However, it is essential to ensure data privacy when implementing this type of solution. In addition, the incorporation of XAI will not only improve predictions but also provide clarity and ensure transparency regarding the predictions of ML models.
This study has some important limitations. First, the search period covered 2012 to 2024, and the sources were limited to the Scopus and Web of Science (WoS) databases. While this time range covers more than a decade of scientific production, future research could extend the sources and the time period to obtain an even more comprehensive picture. Secondly, the search strategy used terms such as ‘dropout’, “university”, ‘machine learning’, and ‘explainability’, which may have excluded relevant studies with other related conceptual names, limiting the retrieval of relevant papers. Finally, the possible influence of publication bias is acknowledged, as studies with positive results or high-performing models are published more frequently, in contrast to those with null or negative results. Therefore, the patterns identified should be interpreted considering this possible overestimation of performance.
This study opens relevant lines for future research. In terms of factors, it will be necessary to deepen holistic approaches that integrate multimodal data, allowing the incorporation of emerging factors associated with digital behavior, emotions, and social dynamics. Likewise, artificial intelligence offers new opportunities to identify variables inherent to personalized, adaptive contexts mediated by intelligent tutors. In prediction, the development of hybrid models and federated learning will facilitate the construction of more robust, scalable, and privacy-friendly systems. Explainability will need to shift towards causal, counterfactual, and interactive approaches, capable of providing interpretations that are understandable, actionable, and tailored to the user. In the realm of simulation, agent-based systems can be implemented to determine scenarios where the risk factors of dropout allow a student to change their future status from dropout to retention.
Finally, the convergence of these four components—prediction, explainability, and simulation—into unified intelligent systems will allow institutions to anticipate, understand, and mitigate the risk of attrition through personalized and informed decisions. It is worth noting that this type of integration has already shown effective results in sensitive domains such as pediatric congenital cardiac surgery [160], where the combination of prediction, explainability, and simulation significantly improved clinical decision-making.

Author Contributions

Conceptualization, D.M. and M.Q.-M.; methodology, D.M.; formal analysis, D.M. and M.Q.-M.; investigation, M.Q.-M. and J.P.-A.; writing—original draft preparation, M.Q.-M.; writing—review and editing, M.Q.-M.; supervision, D.M., R.D., and N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Universidad Nacional Mayor de San Marcos—RR N° 004305-R-24 and project number C24200721.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
KPCAKernel Principal Component Analysis
PCAPrincipal Component Analysis
LPPLocality Preserving Projection
NPENeighborhood Preserving Embedding
IsoPIsometric Projection
WCT-TWeighted Connected Triple Transformation
WTQ-TWeighted Triple Quality Transformation
DTDecision Tree
LRRegression Logistic
DT-ID3 Iterative Dichotomiser 3
SVMVector Support Machine
RFRandom Forest
HDBSCANHierarchical Density-Based Spatial Clustering of Applications with Noise
DBSCANDensity-Based Spatial Clust1ering of Applications with Noise
GMERFGeneralized Mixed-Effects Random Forest
BGGradient Boosting
GBMGradient Boosting Machine
XGBoostExtreme Gradient Boosting
CNNsConvolutional Neural Networks
DP-CNNConvolutional Neural Network with Dynamic Pooling
NBNaive Bayes
CLSASentiment Analysis Model at Concept Level
KNNK- NeighborsClassifier
ANNsArtificial Neural Networks
AHPAnalytic Hierarchy Process
LSTMLong Short-term Memory
GLMGeneralized Linear Model
SGDStochastic Gradient Descent
MLPMultilayer Perception
AdaBoostAdaptive Boosting
BAGBootstrap Aggregated Decision Trees
SVMSMOTESupport Vector Machines—Synthetic Minority Over-Sampling Technique
SMOTESynthetic Minority Over-Sampling Technique
MMFAModified Mutated Firefly Algorithm
GBNsGaussian Bayesian Networks
FFNNFeed Forward Neural Network
BNsBayesian Networks
RBFRadial Basis Function
DT-CHAID Decision Tree-Chi-Square Automatic Interaction Detector
LLMLogit Leaf Model
LMTLogistic Model Tree
LightGBMLight Gradient Boosting Machine
SEDMStudent Educational Data Mining
PESFAMProbabilistic Ensemble Simplified Fuzzy ARTMAP
FNNFeed Forward Neural Network
BRFBalanced Random Forest
EEEasy Ensemble
RBRUSBoost
CARTClassification and Regression Trees
CTMClassification Tree Model
SMOTE-NCSynthetic Minority Over-Sampling Technique for Nominal and Categorical Data
ETCExtra Trees Classifier
CITConditional Inference Tree
Bagged CARTClassification and Regression Tree Bagging
FTTFeature Tokenizer Transformer
GMMGaussian Mixture Model
GBTGradient Boosted Trees
NNsNeural Networks
LDALinear Discriminant Analysis
PRPolynomial Regression
PEM-SNNPiecewise Exponential Model with Structural Neural Network
ARDAutomatic Relevance Determination
LASSOLeast Absolute Shrinkage and Selection Operator
BRBayesian Ridge
LIRELinear Regression
RRRidge Regression
DRDummy Regressor
IFIsolation Fores
DCData Cleaning (DC);
DTAData Transformation (DTA)
FSFeature Selection
SVStandardization of Variables
VCVariable Coding
SMOTESynthetic Minority Oversampling Technique
EVElimination of Variables
TVTransformation of Variables
DRDimensionality Reduction
CCCategorical Coding
DSData Selection

Appendix A

Table A1. Demographic factors influencing the UED.
Table A1. Demographic factors influencing the UED.
IdFactorReferencesIdFactorReferences
F001Gender[7,14,39,50,51,53,54,58,59,60,61,63,64,65,66,67,68,69,71,72,74,77,78,78,82,83,86,88,89,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110]F039Travel time to university[93]
F002Age[7,15,51,54,59,64,65,66,67,68,69,71,72,74,77,78,80,83,84,86,89,91,94,97,98,99,100,101,105,108,109,111,112,113,114,115,116,117,118,119]F040University of origin[116]
F003Marital status[51,59,61,68,69,71,76,80,86,91,93,97,102,106,108,114,116,118]F041Citizenship status[52]
F004Sex[21,55,78,80,84,86,90,91,111,112,114,115,118,120,121,122,123,124]F042Geographical displacement[130]
F005Place of residence[7,14,39,51,64,65,66,68,74,86,93,106,108,114,121,125]F043First-generation student[95]
F006Place of origin[39,51,53,54,71,78,89,90,91,93]F044City[110]
F007Nationality[7,21,53,55,58,61,64,65,76,93,101,104,115,126]F045Multidimensional Poverty Index of the school[103]
F008Parents’ educational level[59,70,72,93,94,95,97,99,112,117,118]F046Zonal Address[91]
F009Type of school[14,64,65,66,75,78,86,89,91,95,103,109]F047Pre-university preparation[91]
F010Ethnicity[72,74,77,83,84,86,97,98,112,113,126]F048Housing tenure[91]
F011Date of birth[53,58,65,87,120,121,127]F049Type of construction[91]
F012Registration age[14,55,61,73,93,106]F050Water[91]
F013School[39,58,72,78,108,129]F051Drain[91]
F014Foreign[55,108,109,117,122]F052Phone[91]
F015Province of origin[91,96,106,118]F053Colour TV[91]
F016Number of siblings[114,116,129]F054Radio[91]
F017Type of housing[72,101,114]F055Sound equipment[91]
F018Zip code[51,88,113]F056Iron[91]
F019Country of origin[99,101,108]F057Cellular[91]
F020Region[39,117]F058Laptop[91]
F021Type of university[39,70]F059Refrigerator[91]
F022Higher education centre[101,122]F060Personal library[91]
F023Change of residence[114,130]F061Wardrobe[91]
F024Foreigner[78,93]F062Wire[91]
F025Displaced student[55,61]F063Home environments[91]
F026Number of children[80,85]F064Number of floors[91]
F027Computer[77,91]F065Number of bedrooms[91]
F028Generation[119]F066Number of kitchens[91]
F029Migration status[72]F067Number of bathrooms[91]
F030Country of origin risk[73]F068Number of rooms[91]
F031Proximity to the university[61]F069Number of dining rooms[91]
F032Cohabitation status with parents[61]F070Orphanhood[91]
F033Generation[119]F071He lives alone[91]
F034Family size[93]F072Breadwinner[91]
F035Family you live with[116]F073Marital Status of Father[74]
F036Community support[61]F074Marital Status of Mother[74]
F037Live in the country or the city[106]F075Area of residence[77]
F038Family municipality[105]
Table A2. Socioeconomic factors influencing UED.
Table A2. Socioeconomic factors influencing UED.
IdFactorReferencesIdFactorReferences
F076Scholarships[55,61,72,75,83,93,94,99,104,106,112,120,127]F116Total amount of scholarships[88]
F077Works[54,60,69,71,80,91,94,108,114,125,128]F117Suspensions[122]
F078Family income[59,80,86,95,97,98,102,125,129]F118Single with dependents[131]
F079Income level[14,58,80,89,96,108,118]F119Family financial support[72]
F080Admission score[14,39,78,95,98,121,123]F120Health insurance[104]
F081Disability[69,72,77,88,100,105,114]F121Type of insurance[116]
F082Educational level[54,101,105,108,117,122]F122Medical insurance company[54]
F083Socioeconomic status[64,77,83,95,124]F123Financial commitment of the firstborn son to his family[85]
F084Father’s profession[15,76,93,129]F124Student perspective on their integration into the labor market[85]
F085Financial aid[68,80,83,100,112]F125Economic problems[130]
F086Mother’s rating[55,61,102]F126Lack of family support[130]
F087Father’s rating[55,61,102]F127Mother’s highest educational level[126]
F088Father’s employment status[15,55,102]F128Father’s highest educational level[126]
F089Mother’s occupation[15,93,102]F129Employment status[95]
F090Internet access[77,91,97,114,129]F130Housing situation[95]
F091Financial situation[80,121,123]F131Monthly tuition payment[103]
F092Type of scholarship[68,79,99]F132Social stratification of the school[103]
F093Total income[66,77,91]F133Household composition[91]
F094Study financing[80,81]F134Family Burden[91]
F095Working conditions[59,72]F135Children in Higher Education[91]
F096Type of transport[102,114]F136Economic dependence[91]
F097Current occupation[113,118]F137Head of household[91]
F098School tuition cost[78,117]F138Economic income modality[91]
F099Debtor[55,61]F139Access to technology[65]
F100Registrations up to date[55,61]F140Latitude[74]
F101Dependence on parents[71,114]F141Length[74]
F102Source of income[77,96]F142Social class[74]
F103Mother’s profession[76,93]F143Brothers at school[74]
F104Economic indicator[112]F144Type of license plate[75]
F105Professional status[101]F145Type of income[75]
F106Political status[106]F146Unemployment rate[76]
F107Works part-time[133]F147Tuition payment up to date[76]
F108Student loan[132]F148Economic situation[77]
F109Money for food[116]F149Number of people in the household[77]
F110Study books[116]F150Eligibility[112]
F111Scholarship percentage[78]F151Academic integration[83]
F112Parents’ main field[59]F152Subsidized loan[83]
F113Student’s profession[101]F153Unsubsidized loan[83]
F114Percentage of loans[78]F154Work-study[83]
F115Total percentage of aid[78]F155Aid for merit[83]
Table A3. Institutional factors influencing UED.
Table A3. Institutional factors influencing UED.
IdFactorReferencesIdFactorReferences
F156Infrastructure[80,81,84]F168Counselor’s perception of the counselor’s own expectations[126]
F157Educational services[66,83]F169Counselor’s perception of the director’s expectations[126]
F158Suitable equipment[80,81]F170Institutional integrity[84]
F159Place[78,89]F171Social infrastructure[84]
F160Institution size[83,84]F172Social aspects induction program[84]
F161Area[60]F173Institutional control[83]
F162Geographical area[78]F174Percentage of minorities[83]
F163Teacher’s commitment to the student[85]F175Part-time teachers[83]
F164Classification of the career or institution[85]F176Full-time teachers[83]
F165Group class[82]F177Instruction expenditure[83]
F166School climate assessment scale[126]F178Academic support expenses[83]
F167Counselor’s perception of teachers’ expectations[126]
Table A4. Personal factors influencing UED.
Table A4. Personal factors influencing UED.
IdFactorReferencesIdFactorReferences
F179Year of admission[58,64,74,75,87,90,94,95,120,127]F248Mobile phone addiction[121]
F180Motivation[80,81,82,97]F249Gaming Addiction[121]
F181Extracurricular activities[72,84,97]F250Video game addiction[121]
F182Commitment[99,124,130]F251Shopping addiction[121]
F183Class participation[84,131,132]F252Smoking[82]
F184Number of voluntary activities[7,58,88]F253Student’s sense of school belonging[126]
F185Future time perspective[55,130]F254Perception of social support[95]
F186Time to study[60,80]F255Use of learning platform[95]
F187Adaptation and coexistence[55,133]F256Frequency of library use[95]
F188Leader or president[60,93]F257Participation in tutoring[95]
F189Addictions/vices[85,116]F258Participation in mentoring[95]
F190Social media addiction[119,142]F259Access to academic support services[95]
F191Club participation[7,58]F260Short-term objective[110]
F192Stress level[80,97]F261Weekly minutes on the platform[110]
F193Level of motivation[64,116]F262Active days[110]
F194Participation in study groups[84,97]F263Total progress[110]
F195Landline[116]F264Device used[110]
F196Cellular phone[116]F265Login frequency[110]
F197Second language[116]F266Average session duration[110]
F198Masked student[97]F267Number of sessions per week,[110]
F199External social relations[80]F268Number of accesses in the last month[110]
F200Desire for knowledge[127]F269Hours in the last month[110]
F201Frankness[125]F270Hours in the last 3 months[110]
F202Extraversion[125]F271Hours in the last 6 months[110]
F203Neuroticism[125]F272Hours in the last year[110]
F204Conscientiousness[125]F273Interaction with tutors[110]
F205Emotional commitment[125]F274Participation in forums[110]
F206Calculating commitment[125]F275Interaction with multimedia resources[110]
F207Regulatory commitment[125]F276Average time per resource[110]
F208Professional interests[124]F277Completed activities[110]
F209Conference programs[51]F278Number of evaluations submitted[110]
F210Family problems[142]F279Evaluation results[110]
F211Mental health[107]F280Response time in activities[110]
F212Health problem[107]F281Number of clicks per session[110]
F213Study habits[143]F282Participation in chats[110]
F214Depression[143]F283Preferred content type[110]
F215Anxiety[143]F284Navigation route[110]
F216License history[51]F285Number of messages received[110]
F217Communications level[113]F286Number of messages sent[110]
F218First generation to study[119]F287Activity during non-business hours[110]
F219Extracurricular activity scores[7]F288Level of self-efficacy[65]
F220Participation in first-year camp activities[7]F289Learning strategy[65]
F221Frequency of computer use[86]F290Rh factor[74]
F222Learning approach[133]F291Neuroticism[141]
F223I wanted practical work[63]F292Extraversion[141]
F224Disease[63]F293Kindness[141]
F225Pregnancy[63],F294Responsibility[141]
F226Incompatibility between career and childcare[63]F295Openness to experience[141]
F227Self-assessment[61]F296Social integration[83,84]
F228Time spent exercising[118]F297Perception of learning[84]
F229Vocational training[125]F298Experiences of exam disappointment[84]
F230Number of friends[93]F299Support and guidance[84]
F231Kindness[125]F300Quality of teaching[84]
F232Leisure[61]F301Alignment in teaching[84]
F233Study hours[93]F302Clarity in instruction[84]
F234Planned and unplanned pregnancy[85]F303Feedback active learning[84]
F235Bullying[85]F304Higher-order thinking[84]
F236Sexism[85]F305Cooperative learning[84]
F237Student adaptation to university learning[85]F306Introductory courses[84]
F238Poor interpersonal relationships with peers[130]F307Student research programs[84]
F239Lack of study habits and techniques[130]F308Perception of difficulty[84]
F240Demotivation[130]F309Coherence between courses in the curriculum[84]
F241Feeling of not belonging[130]F310Educational aspiration[83]
F242Health problems[130]F311Language[68]
F243Internet addiction[135]F312Video platform[77]
F244Technology addiction[135]F313Physical books[77]
F245Alcohol addiction[121]F314Reading time[77]
F246Addiction to emotional dependence[121]F315Internet browsing time[77]
F247Drug addiction[121]
Table A5. Academic factors influencing UED.
Table A5. Academic factors influencing UED.
IdFactorReferencesIdFactorReferences
F316Ratings[39,54,65,74,76,78,82,87,90,106,111,116,117,120,126,127]F420Temporary withdrawal[132]
F317General GPA[7,59,64,67,69,75,83,84,89,90,97,113,120,123,126,129]F421Order of option to apply[101]
F318Secondary note[39,72,83,86,97,103,109,111,121,129]F422Access order[101]
F319Subjects taken[15,72,73,78,94,98,108,118]F423Weighted historical average[116]
F320Credits taken[71,89,100,101,112,120,121]F424Lower test results[106]
F321Attendance[55,59,61,64,72,73,76,120]F425Years of study at the University[116]
F322Type of admission[39,58,78,90,95,107]F426Belongs to the institute’s school[119]
F323Type of school[66,86,89,99,102,103,108]F427Specialty[132]
F324School[39,55,70,78,89,91,117]F428Student status[60]
F325Academic year[39,78,93,100,134]F429Course evaluation comments[59]
F326Number of failed courses[67,72,78,97,101,111]F430Average evaluations first semester[117]
F327Subjects[87,107,121,122,143]F431First period average[78]
F328Entrance examination[39,53,70,117,127]F432Attendance status[131]
F329Average grades throughout the career[39,78,86,88,120]F433Dropping out during the semester[114]
F330Academic cycle[78,87,91,111,112]F434Admission date[123]
F331Course number[51,71,108,111]F435Rewards and penalties[88]
F332Active semester[7,69,90,118,121]F436Group study[118]
F333Average subjects[39,111,118,129]F437High school completion status[131]
F334Years of graduation[15,58,71,134]F438Title to obtain[123]
F335Admission score[55,72,78,115]F439Enrolled semester number[54]
F336Academic department[54,85,90,106]F440Number of programs enrolled[54]
F337Course[61,76,88,137]F441Graduate[140]
F338Subject[86,104,112,140]F442Type of graduation[64]
F339Registered[78,112,120]F443Good high school graduation[127]
F340Tasks[59,60,137]F444Type of associate degree[114]
F341Type of institution[39,87]F445First-generation student[114]
F342Student status[90,112,121]F446Number of internships[52]
F343Repeater[100,121,137]F447First registration[100]
F344Access note[93,94,130]F448Persistence[100]
F345Number of courses approved[100,101,111]F449Home Language[100]
F346Total credits[59,88,111]F450Previous year’s activity[100]
F347Absence[60,88,120]F451Final decision[100]
F348Academic field[51,79,123]F452Specialty access[105]
F349Evidence[117,134,137]F453Follow the path[105]
F350Faculty[65,69,120]F454Access description[129]
F351Failed subjects[64,80,118]F455Leveling[113]
F352Approved credits[69,88,114]F456Quality of online teaching activities[127]
F353Number of semesters completed[68,97,106]F457Limited knowledge of using specialized software[85]
F354Average of previous semesters[79,110,117]F458Academic problems[130]
F355Career[65,77,98]F459Level of previous studies[82]
F356Repeating course number[51,80]F460Student grade point average in ninth grade[126]
F357Cluster[78,116]F461Hours dedicated to tasks[126]
F358Failed exam[107,109]F462Number of course withdrawals[95]
F359Subjects passed[73,159]F463Number of disciplinary actions[95]
F360Credit ratio per subject[39,87]F464Last school level achieved[110]
F361Ratio of credits to expected credits[39,78]F465Income cohort[128]
F362Study day[65,118]F466Average grades per subject[128]
F363Admission program[107,117]F467GPA per semester[103]
F364Student code[93,114]F468Other programs taken[103]
F365Credits[14,21]F469Reading comprehension score[103]
F366Average national exam score[55,101]F470Score in logical reasoning[103]
F367Attempts to pass the exam[14,21]F471Academic admission program[103]
F368Entrance qualification grade[53,99]F472Performance test[111]
F369Grade points[88,134]F473Academic Department[111]
F370Degree exam[53,59]F474Plan hours[111]
F371Type of study program[53,64]F475Hours recorded in the last semester[111]
F372Full-time status[109,112]F476First year average[111]
F373Exam[62,134]F477Program duration[111]
F374Level enrolled[94,95]F478Title name[89]
F375Career application option range[75,96]F479Additional learning requirements[89]
F376Average secondary grades[89,116]F480Number of honors obtained[89]
F377Exam grades[92,131]F481Admission method[91]
F378Abandoned materials[87]F482Type of activity[146]
F379Period[81]F483Type of action[146]
F380Project rating[140]F484Access frequency by day of the week[146]
F381Average attempts[117]F485Frequency per week and month of the semester[146]
F382Subject code[87]F486Access time[146]
F383Initial test note[144]F487Amount and type of interaction with materials[146]
F384Entrance exam date[54]F488Participation in evaluations[146]
F385Lower consolidated result[106]F489Number of subjects taken[65]
F386Number of national exams taken[56]F490Enrollment method[65]
F387Delay[122]F491Number of times registered[65]
F388Type of entry qualification[54]F492Entry level[74]
F389Degree of study[90]F493Current grade[74]
F390First level degree[105]F494Cumulative average[75]
F391Anonymity of the university[63]F495Level of previous education[68]
F392Place institution[81]F496Syllabus[68]
F393Type of student[114]F497Beginning of the semester[68]
F394Type of study (full-part time)[104]F498Accumulated credits[68]
F395Reason for admission[105]F499Days in exchange programs[68]
F396Admission category[93]F500Moodle Activity Count[68]
F397Computer knowledge[86]F501Activity trend in Moodle[68]
F398Disciplinary infraction[122]F502Course access[92]
F399Admission form[99]F503Test results[92]
F400Risk via admission[73]F504Tasks submitted[92]
F401Binary license plate[112]F505Final course grade[92]
F402Admission option[105]F506Practice grades[134]
F403Average score on entrance exams[101]F507Project ratings[134]
F404Registration value[120]F508Reading comprehension[134]
F405Course of study[54]F509Cumulative GPA[134]
F406Times failed degree[99]F510Credits earned[134]
F407First-choice studies[133]F511Time enrolled in university[134]
F408Tutorials carried out[113]F512Access outside of class[141]
F409Antique[100]F513Curricular units enrolled[76]
F410Previous qualification[62]F514Approved curricular units[76]
F411Average last cycle[116]F515Accredited curricular units[76]
F412Mode[113]F516Training chain[77]
F413Military service[122]F517Current semester average[70]
F414Drop subject[132]F518Average of subjects passed[70]
F415Average score[82]F519Absences[70]
F416Failed subjects in secondary school[79]F520Field of study[83]
F417Repeating a secondary school year[79]
F418Repeating the first academic year[94]
F419Temporary withdrawal[132]
Table A6. Advances in ML for UED prediction.
Table A6. Advances in ML for UED prediction.
StudiesDatasetPreprocessingModelResult (%)
[64]21,654Random subsampling (RUS)DT96.20 1
LR96.60 1
SVM97.70 1
ANN95.50 1
LR + SMOTE83.20 1
DT + SMOTE92.50 1
ANN + SMOTE88.10 1
SMV + SMOTE95.40 1
LR + over-sampling85.50 1
DT + over-sampling79.30 1
SVM + over-sampling86.90 1
ANN + over-slamping85.50 1
LR + under-sampling86.00 1
DT + under-sampling86.70 1
SVM + under-sampling87.90 1
ANN + under-slamping84.70 1
[118]670Data cleaningJRip96.00 1
NNge95.80 1
OneR93.70 1
Prism94.40 1
Ridor93.40 1
ADTree96.6 1
DT-J4894.30 1
RandomTree94.00 1
REPTree92.70 1
SimpleCart96.60 1
ICRM v192.10 1
ICRM v293.70 1
ICRM v393.40 1
[82]670Data cleansing
Discretization of variables
Creation of attributes
ADTree98.20 1
J4896.70 1
RandomTree96.10 1
REPTree96.50 1
SimpleCart96.40 1
Prism99.80 1
Ridor97.90 1
ICRM v192.10 1
ICRM v294.40 1
ICRM v394.00 1
[39]5951Data cleansing
Dimensionality reduction
Data balancing
Data transformation
Random model51.00 1
KNN62.00 1
SVM65.00 1
DT68.00 1
RF69.00 1
GB69.00 1
NB66.00 1
LR62.00 1
ANN66.00 1
[57]10,554UnrealizedDT72.80 1
LR84.50 1
SVM82.80 1
RF82.60 1
ANN77.80 1
GB83.70 1
LLM83.90 1
LMT80.10 1
BAG78.00 1
[97]13,696Data cleansingDT86.60 1
LR88.90 1
SVM89.40 1
KNN87.60 1
RF90.02 1
MLP89.20 1
CNN94.60 1
GBN85.40 1
[147]79,186Data cleaning
Normalization
Time series
Matrix specifications
LR85.70 1
SVM80.10 1
CNN86.40 1
LSTM80.10 1
CNN-LSTM84.80 1
DP-CNN84.20 1
CLSA87.40 1
[59]7536UnrealizedDT84.70 1
LR76.60 1
SVM76.60 1
RF82.90 1
MLP89.60 1
MSNF87.70 1
STUD90.10 1
[51]425UnrealizedDT97.92 1
LR99.47 1
KNN82.10 1
RF99.47 1
NB96.79 1
GB98.68 1
[120]261Feature selection
Data cleaning
DT94.00 1
LR96.00 1
SVM94.00 1
RF94.00 1
NB94.00 1
ANN97.00 1
[122]60,010Data cleansingLightGBM81.00 1
XGBoost83.00 1
LR50.00 1
SVM51.00 1
RF80.00 1
DT65.00 1
[136]3029Undersampling
SMOTE-Tomek
LR89.00 1
SGD86.00 1
DT98.00 1
MLP79.001
RF99.001
SVM72.00 1
[132]1650Data transformationDT80.00 1
LR87.59 1
SVM85.55 1
RF88.33 1
NB77.14 1
MLP83.92 1
[139]32,593Data extraction
Data cleaning
Data scaling
DT78.00 1
LR80.00 1
SVM79.00 1
RF79.00 1
SELOR84.00 1
SIHMM83.00 1
[148]104Data cleansingSVM36.73 1
PESFAM43.24 1
FFNN68.97 1
SEDM85.71 1
LR98.95 1
[72]26Data cleaningDT78.00 1
SVM80.00 1
KNN73.00 1
RF92.00 1
ANN90.00 1
[100]4419Data cleaningDT88.46 1
SVM86.92 1
KNN83.85 1
RF92.31 1
NB79.23 1
[140]261Data cleaningRF91.76 1
GB86.76 1
XGBoost91.76 1
FNN + RF + GB + XGBoost93.59 1
FNN96.76 1
[56]4433Extract
Transform
Upload
SMOTE + RF87.00 1
SVMSMOTE +RF87.00 1
BRF82.80 1
EE83.20 1
RB81.30 1
[93]131Data cleaningLR73.20 1
SVM70.99 1
RF92.30 1
NB75.50 1
MLP92.30 1
DT-J4874.00 1
[105]3425Feature selectionDT82.05 1
LR83.37 1
SVM82.90 1
KNN85.59 1
ANN85.11 1
[90]811KPCA, PCA, LPP, NPE, IsoP, WCT-T, and WTQ-TKNN93.30 1
ANN94.00 1
DT-C4.592.60 1
NB93.80 1
[15]331Data cleaningCatBoost84.00 1
RF81.00 1
XGBoost82.00 1
ANN87.00 1
[78]143,326Data cleaningDT99.53 1
LR99.58 1
ANN97.72 1
XGBoost99.28 1
[54]NaUnrealizedLR93.76 1
Random Forest (Bagging)93.58 1
AdaBoost95.51 1
ANN94.76 1
[7]NaData cleaningDT91.00 1
LR87.00 1
NB55.00 1
MLP90.00 1
[106]77,384Data cleaningDT94.63 1
ANN93.97 1
BN93.92 1
[98]11,496UnrealizedRF90.10 1
ANN89.30 1
Logit91.20 1
[55]12,370UnrealizedLR + SMOKE_SVM72.00 1
RF78.00 1
ANN + SMOKE_SVM74.00 1
[143]670Data cleaningK-means80.01 1
HDBSCAN65.63 1
DBSCAN95.71 1
[101]3373UnrealizedSVM76.39 1
RF80.40 1
ANN77.95 1
[61]128Data transformationDT84.00 1
LR82.00 1
NB84.00 1
ANN82.00 1
[142]220UnrealizedDT97.69 1
[113]1861Data cleaning
Data transformation
DT-J4891.80 1
ANN94.60 1
DT + ANN98.70 1
[86]2422UnrealizedLR76.03 1
AHP64.57 1
[129]5426Data cleaningSVM89.041 1
RF88.312 1
GB87.103 1
[14]46,000UnrealizedGMERF93.58 1
CART87.01 1
GLM91.05 1
[161]530Data cleaningRF56.67 1
XGBoost70.00 1
RF + XGBoost91.52 1
[109]970Data cleaningANN62.00 1
DT-C4.565.00 1
DT-D362.00 1
[81]160Variable generation
Data selection
Data cleaning
ANN100.00 1
DT-C4.587.77 1
DT-ID370.79 1
[87]976Data Selection
Data cleaning
Generation data integration
Formatting
ANN85.00 1
DT-C4.568.00 1
DT-ID375.00 1
[94]1022UnrealizedLR80.00 1
Análisis discriminante91.50 1
[88]67,060SMOTE
RandomOverSampler
SMOTETOMEK
SMOTEENN
LR95.30 1
ANN98.20 1
GB98.00 1
GB + RF + SVM97.80 1
XGBoost + Catboost98.90 1
[137]1862MMFANB92.85 1
DT95.82 1
[60]17,432Data cleaning
Data transformation
SMOTE
KNN98.20 1
CART97.91 1
NB98.24 1
[85]2670Data cleaningMLP98.60 1
RBF98.10 1
[138]201Data cleaningDT-ID392.90 1
DT-J4892.90 1
[135]1178Data cleaning
Data transformation
Attribute selection
LR84.90 1
DT91.70 1
[115]83UnrealizedLR89.00 1
[127]176UnrealizedLR95.80 1
[96]189UnrealizedCluster Analysis83.30 1
[116]6300SMOTEDT95.91 1
[123]1851UnrealizedDT87.90 1
[21]41,098UnrealizedGMERF90.80 1
[79]237UnrealizedGBM92.20 1
[108]12,148Data cleaning
Variable coding
DT71.40 1
[131]24,770UnrealizedXGBoost80.32 1
[144]197UnrealizedDT79.90 1
[145]389Data cleaningDT89.39 1
[80]237UnrealizedDT87.76 1
[102]32,593Lasso and ridgeLR86.90 1
[130]3773NormalizationLR95.00 1
[151]3172Data cleaningRF87.67 1
[121]3162Data cleaning
Data transformation
Variable extraction
DT-CHAID98.71 1
[21]24,736Data division
Variable selection
GMERF90.85 1
[126] SMOTE-NCDT84.29 1
[95]1500Data cleaning
Removing features
DeepS3VM (RNN + S3VM)92.54 1
[110]35,000SMOTEXGBoost82.00 1
LightGBM79.80 1
DT79.80 1
RF81.50 1
ETC79.00 1
LR77.00 1
SVM61.00 1
[128]197Data cleaning
Categorical coding
Normalization
Feature selection
RF100.00 1
[111]5883UnrealizedCART79.70 1
CIT81.90 1
SVM83.00 1
GLM82.30 1
ANN81.40 1
NB70.30 1
BAGGD CART81.40 1
Random Forest83.10 1
ADABOOST80.90 1
XGBoost82.30 1
[89]44,875Variable coding
Standardization of variables
Class imbalance
Data separation
RF85.00 1
FTT87.00 1
[91]329Feature selection
Dimensionality reduction
DT80.20 1
LR73.10 1
SVM71.00 1
NB62.40 1
[146] Data transformation
Data cleaning
Feature selection
Dimensionality reduction
BIRCH56.50 4
DBSCAN32.08 4
GMM43.50 4
RF86.00 1
DT84.00 1
SVM83.00 1
LR)82.00 1
KNN81.00 1
[65]985Data cleaning
Standardization of variables
SMOTE
AdaBoost88.00 1
XGBoost88.86 1
[74]1865Data cleaning
Data transformation
Categorical coding
Feature selection
RF88.00 1
SVM79.00 1
GBT92.00 1
[66]4792Data transformationLR85.00 1
DT87.00 1
[75]17,904Data cleaning
Categorical coding
Feature selection
SMOTE
DT91.70 1
NB83.40 1
KNN96.30 1
[67]1957Data cleaning
Imputation of missing values
Transformation of variables
LR98.20 1
PR98.20 1
NB98.60 1
RF98.50 1
DT98.80 1
SVM98.80 1
KNN98.00 1
[68]8813Reindexing of time series
Data deletion
Variable coding
Standardization of variables
CatBoost85.30 3
NN84.40 3
LR84.20 3
LDA84.10 3
RF83.90 3
LightGBM83.20 3
XGBoost82.30 3
SVM82.30 3
NB78.00 3
KNN77.70 3
[92]321Data cleaning
Feature selection
Standardization of variables
LR83.30 1
PR86.30 1
[134]661Data cleaning
Transformation of variables
Standardization of variables
Swinging
LSTM98.30 1
DNN98.10 1
DT93.40 1
RF92.00 1
LR98.00 1
SVM74.70 1
KNN99.00 1
[69]129,846Data cleaning
Transformation of variables
Variable coding
Semantic clustering
Standardization of variables
PEM-SNN81.10 1
[141]322Elimination of variables
Variable coding
Standardization of variables
ARD1.42 5
BR1.45 5
LIRE1.47 5
RR1.48 5
LASSO1.49 5
DT1.65 5
RF1.60 5
AdaBoost1.62 5
XGBoost1.63 5
CatBoost1.64 5
SVM1.66 5
KNN1.68 5
MLP1.65 5
DR2.13 5
[76]4424Elimination of variables
Variable coding
Standardization of variables
DT81.00 2
RF87.00 2
XGBoost88.00 2
CatBoost88.00 2
LightGBM88.00 2
BG85.00 2
SVM76.00 2
[77]288Feature selection
Converting variables
Elimination of variables
DT90.51 1
K-means44.29 1
IF30.34 1
LIRE35.06 1
[70]6312Elimination of variables
Variable coding
Normalization
ANN81.00 1
1: accuracy; 2: F1 score; 3: AUC; 4: silhouette; 5: RMSE; root mean square error = RMSE; alternating decision tree (ADTree); kernel principal component analysis (KPCA); principal component analysis (PCA); locality preserving projection (LPP); neighborhood preserving embedding (NPE); isometric projection (IsoP); weighted connected triple transformation (WCT-T); weighted triple quality transformation (WTQ-T); decision tree (DT); logistic regression (LR); DT-ID3 (iterative dichotomiser 3); support vector machine (SVM); bosques aleatorios (RF); hierarchical density-based spatial clustering of applications with noise (HDBSCAN); density-based spatial clustering of applications with noise (DBSCAN); generalized mixed-effects random forest (GMERF); gradient boosting (GB); gradient boosting machine (GBM); extreme gradient boosting (XGBoost); sentiment analysis model at concept level (CNN); convolutional neural network with dynamic pooling (DP-CNN); naive bayes (NB); sentiment analysis model at concept level (CLSA); K-neighborsclassifier (KNN); redes neuronales artificial (ANN); analytic hierarchy process (AHP); long short-term memory (LSTM); generalized linear model (GLM); stochastic gradient descent (SGD); multilayer perception (MLP); adaptive boosting (ADABOOST); bootstrap aggregated decision trees (BAG); support vector machines—synthetic minority over-sampling technique (SVMSMOTE); synthetic minority over-sampling technique (SMOTE); modified mutated firefly algorithm (MMFA); gaussian bayesian networks (GBNs); feed forward neural network (FFNN); bayesian networks (BNs); radial basis function (RBF); decision tree-chi-square automatic interaction detector (DT-CHAID); logit leaf model (LLM); logistic model tree (LMT); light gradient boosting machine (LightGBM); student educational data mining (SEDM); probabilistic ensemble simplified fuzzy ARTMAP (PESFAM); feed forward neural network (FNN); balanced random forest (BRF); easy ensemble (EE); RUSBoost (RB); classification and regression trees (CART); classification tree model (CTM); synthetic minority over-sampling technique for nominal and categorical data (SMOTE-NC); extra trees classifier (ETC); conditional inference tree (CIT); classification and regression tree con bagging (bagged CART); feature tokenizer transformer(FTT); gaussian mixture model (GMM); gradient boosted trees (GBT); neural networks (NNs); linear discriminant analysis (LDA); polynomial regression(PR); piecewise exponential model con structural neural network (PEM-SNN); automatic relevance determination (ARD); least absolute shrinkage and selection operator(LASSO); bayesian ridge (BR); linear regression (LIRE); ridge regression (RR); dummy regressor (DR); isolation fores (IF).

References

  1. Baranyi, M.; Nagy, M.; Molontay, R. Interpretable Deep Learning for University Dropout Prediction. In Proceedings of the SIGITE 2020—Proceedings of the 21st Annual Conference on Information Technology Education, Virtual Event, 7–9 October 2020. [Google Scholar] [CrossRef]
  2. Bustamante, D.; Garcia-Bedoya, O. Predictive Academic Performance Model to Support, Prevent and Decrease the University Dropout Rate. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  3. OECD. How many students complete tertiary education? In Education at a Glance 2022: OECD Indicators; OECD Publishing: Paris, France, 2022. [Google Scholar] [CrossRef]
  4. Agrusti, F.; Bonavolontà, G.; Mezzini, M. University dropout prediction through educational data mining techniques: A systematic review. J. E-Learn. Knowl. Soc. 2019, 15, 161–182. [Google Scholar] [CrossRef]
  5. Netanda, R.S.; Mamabolo, J.; Themane, M. Do or die: Student support interventions for the survival of distance education institutions in a competitive higher education system. Stud. High. Educ. 2019, 44, 397–414. [Google Scholar] [CrossRef]
  6. Felderer, B.; Kueck, J.; Spindler, M. Using Double Machine Learning to Understand Nonresponse in the Recruitment of a Mixed-Mode Online Panel. Soc. Sci. Comput. Rev. 2022, 41, 461–481. [Google Scholar] [CrossRef]
  7. Lee, J.H.; Kim, M.; Kim, D.; Gil, J.M. Evaluation of Predictive Models for Early Identification of Dropout Students. J. Inf. Process. Syst. 2021, 17, 630–644. [Google Scholar] [CrossRef]
  8. Pfau, W.; Rimpp, P. AI-Enhanced Business Models for Digital Entrepreneurship; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  9. Vargas, A.V.; Palacio, G.J.L. Abandono estudiantil en una universidad privada: Un fenómeno no ajeno a los posgrados. Valoración cuantitativa a partir del análisis de supervivencia. Colombia, 2012–2016. Rev. Educ. 2020, 44, 177–191. [Google Scholar]
  10. Buduma, N.; Locascio, N. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms; O’Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
  11. Berka, P.; Marek, L. Bachelor’s degree student dropouts: Who tend to stay and who tend to leave? Stud. Educ. Eval. 2021, 70, 100999. [Google Scholar] [CrossRef]
  12. Nájera, A.B.U.; Ortega, L.A.M. Predictive Model for Taking Decision to Prevent University Dropout. Int. J. Interact. Multimed. Artif. Intell. 2022, 7, 205–213. [Google Scholar]
  13. Núñez-Naranjo, A.F.; Ayala-Chauvin, M.; Riba-Sanmartí, G. Prediction of university dropout using machine learning. In Proceedings of the International Conference on Information Technology & Systems, La Libertad, Ecuador, 4–6 February 2021; Springer: Cham, Switzerland, 2021; pp. 396–406. [Google Scholar]
  14. Cannistrà, M.; Masci, C.; Ieva, F.; Agasisti, T.; Paganoni, A.M. Early-predicting dropout of university students: An application of innovative multilevel machine learning and statistical techniques. Stud. High. Educ. 2022, 47, 1935–1956. [Google Scholar] [CrossRef]
  15. Moreira da Silva, D.E.; Solteiro Pires, E.J.; Reis, A.; de Moura Oliveira, P.B.; Barroso, J. Forecasting Students Dropout: A UTAD University Study. Future Internet 2022, 14, 76. [Google Scholar] [CrossRef]
  16. Bertolini, R.; Finch, S.; Nehm, R. Enhancing data pipelines for forecasting student performance: Integrating feature selection with cross-validation. Int. J. Educ. Technol. High. Educ. 2021, 18, 1–23. [Google Scholar] [CrossRef]
  17. Lee, S.; Chung, J.Y. The machine learning-based dropout early warning system for improving the performance of dropout prediction. Appl. Sci. 2019, 9, 3093. [Google Scholar] [CrossRef]
  18. Blundo, C.; Fenza, G.; Fuccio, G.; Loia, V.; Orciuoli, F. A time-driven FCA-based approach for identifying students’ dropout in MOOCs. Int. J. Intell. Syst. 2022, 37, 2683–2705. [Google Scholar] [CrossRef]
  19. Heuillet, A.; Couthouis, F.; Díaz-Rodríguez, N. Explainability in deep reinforcement learning. Knowl.-Based Syst. 2021, 214, 106685. [Google Scholar] [CrossRef]
  20. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
  21. Pellagatti, M.; Masci, C.; Ieva, F.; Paganoni, A.M. Generalized mixed-effects random forest: A flexible approach to predict university student dropout. Stat. Anal. Data Min. 2021, 14, 241–257. [Google Scholar] [CrossRef]
  22. Alban, M.; Mauricio, D. Predicting University Dropout through Data Mining: A Systematic Literature. Indian J. Sci. Technol. 2019, 12, 10. [Google Scholar] [CrossRef]
  23. Albreiki, B.; Zaki, N.; Alashwal, H. A systematic literature review of student’ performance prediction using machine learning techniques. Educ. Sci. 2021, 11, 552. [Google Scholar] [CrossRef]
  24. Andrade-Girón, D.; Sandivar-Rosas, J.; Marín-Rodriguez, W.; Susanibar-Ramirez, E.; Toro-Dextre, E.; Ausejo-Sanchez, J.; Villarreal-Torres, H.; Angeles-Morales, J. Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review. EAI Endorsed Trans. Scalable Inf. Syst. 2023, 10, 1. [Google Scholar] [CrossRef]
  25. Mduma, N.; Kalegele, K.; Machuve, D. A survey of machine learning approaches and techniques for student dropout prediction. Data Sci. J. 2019, 18, 14. [Google Scholar] [CrossRef]
  26. Alalawi, K.; Athauda, R.; Chiong, R. Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review. Eng. Rep. 2023, 5, e12699. [Google Scholar] [CrossRef]
  27. Guo, T.; Bai, X.; Tian, X.; Firmin, S.; Xia, F. Educational anomaly analytics: Features, methods, and challenges. Front. Big Data 2022, 4, 811840. [Google Scholar] [CrossRef]
  28. Alhothali, A.; Albsisi, M.; Assalahi, H.; Aldosemani, T. Predicting student outcomes in online courses using machine learning techniques: A review. Sustainability 2022, 14, 6199. [Google Scholar] [CrossRef]
  29. Idowu, J.A. Debiasing education algorithms. Int. J. Artif. Intell. Educ. 2024, 34, 1510–1540. [Google Scholar] [CrossRef]
  30. Venkatesan, R.G.; Karmegam, D.; Mappillairaju, B. Exploring statistical approaches for predicting student dropout in education: A systematic review and meta-analysis. J. Comput. Soc. Sci. 2024, 7, 171–196. [Google Scholar] [CrossRef]
  31. Tinto, V. Dropout from Higher Education: A Theoretical Synthesis of Recent Research. Rev. Educ. Res. 1975, 45, 89–125. [Google Scholar] [CrossRef]
  32. Tinto, V. Limits of Theory and Practice in Student Attrition. J. High. Educ. 1982, 53, 687–700. [Google Scholar] [CrossRef]
  33. Tinto, V. Leaving College: Rethinking the Causes and Cures of Student Attrition; University of Chicago Press: Chicago, IL, USA, 1994. [Google Scholar] [CrossRef]
  34. Franz, S.; Paetsch, J. Academic and social integration and their relation to dropping out of teacher education: A comparison to other study programs. Front. Educ. 2023, 8, 1179264. [Google Scholar] [CrossRef]
  35. Villegas-Ch, W.; Govea, J.; Revelo-Tapia, S. Improving Student Retention in Institutions of Higher Education through Machine Learning: A Sustainable Approach. Sustainability 2023, 15, 14512. [Google Scholar] [CrossRef]
  36. Quincho Apumayta, R.; Carrillo Cayllahua, J.; Ccencho Pari, A.; Inga Choque, V.; Cárdenas Valverde, J.; Huamán Ataypoma, D. University Dropout: A Systematic Review of the Main Determinant Factors (2020-2024)[Version 2; Peer Review: 2 Approved]. F1000Research 2024, 13, 942. [Google Scholar] [CrossRef]
  37. Lorenzo-Quiles, O.; Galdón-López, S.; Lendínez-Turón, A. Factors contributing to university dropout: A review. Front. Educ. 2023, 8, 1159864. [Google Scholar] [CrossRef]
  38. Xavier, M.; Meneses, J. A Literature Review on the Definitions of Dropout in Online Higher Education. In Proceedings of the European Distance and E-Learning Network (EDEN) Proceedings, Timisoara, Romania, 22–24 June 2020; Available online: https://femrecerca.cat/meneses/publication/literature-review-definitions-dropout-online-higher-education/literature-review-definitions-dropout-online-higher-education.pdf (accessed on 25 February 2025).
  39. Opazo, D.; Moreno, S.; Álvarez-Miranda, E.; Pereira, J. Analysis of First-Year University Student Dropout through Machine Learning Models: A Comparison between Universities. Mathematics 2021, 9, 2599. [Google Scholar] [CrossRef]
  40. Dervenis, C.; Kyriatzis, V.; Stoufis, S.; Fitsilis, P. Predicting Students’ Performance Using Machine Learning Algorithms. In Proceedings of the 6th International Conference on Algorithms, Computing and Systems, ICACS ’22, Larissa, Greece, 16–18 September 2022; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
  41. Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
  42. Wang, J.; Yu, Y. Machine Learning Approach to Student Performance Prediction of Online Learning. PLoS ONE 2025, 20, e0299018. [Google Scholar] [CrossRef] [PubMed]
  43. Dabhade, P.; Agarwal, R.; Alameen, K.P.; Fathima, A.T.; Sridharan, R.; Gopakumar, G. Educational Data Mining for Predicting Students’ Academic Performance Using Machine Learning Algorithms. Mater. Today Proc. 2021, 47, 5260–5267. [Google Scholar] [CrossRef]
  44. Hakim, N.; Jastacia, B.; Mansoori, A.A. Personalizing Learning Paths: A Study of Adaptive Learning Algorithms and Their Effects on Student Outcomes. J. Emerg. Technol. Educ. 2024, 2, 318–330. [Google Scholar] [CrossRef]
  45. Alzubaidi, A.; Alzubaidi, A.; Alzubaidi, A. Assessment and Evaluation of Different Machine Learning Models for Predicting Students’ Academic Performance. J. Comput. Sci. 2023, 19, 415–427. [Google Scholar] [CrossRef]
  46. Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  47. Cotfas, L.-A.; Delcea, C.; Mancini, S.; Ponsiglione, C.; Vitiello, L. An agent-based model for cruise ship evacuation considering the presence of smart technologies on board. Expert Syst. Appl. 2023, 214, 119124. [Google Scholar] [CrossRef]
  48. Kitchenham, B.; Pearl Brereton, O.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [Google Scholar] [CrossRef]
  49. Shiguihara, P.; Lopes, A.d.A.; Mauricio, D. Dynamic Bayesian Network Modeling, Learning, and Inference: A Survey. IEEE Access 2021, 9, 117639–117648. [Google Scholar] [CrossRef]
  50. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
  51. Mutrofin, S.; Ginardi, R.V.H.; Fatichah, C.; Kurniawardhani, A. A critical assessment of balanced class distribution problems: The case of predict student dropout. Test Eng. Manag. 2019, 81, 1764–1770. [Google Scholar]
  52. Phan, M.; De Caigny, A.; Coussement, K. A decision support framework to incorporate textual data for early student dropout prediction in higher education. Decis. Support Syst. 2023, 168, 113940. [Google Scholar] [CrossRef]
  53. Al-Jallad, N.T.; Ning, X.; Khairalla, M.A. An interpretable predictive framework for students’ withdrawal problem using multiple classifiers. Eng. Lett. 2019, 27, 1–8. [Google Scholar]
  54. Berens, J.; Schneider, K.; Görtz, S.; Oster, S.; Burghoff, J. Early Detection of Students at Risk—Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods. J. Educ. Data Min. 2019, 11, 1–41. [Google Scholar]
  55. Velasco, C.L.R.; Villena, E.G.; Ballester, J.B.; Prados, F.Á.D.; Alvarado, E.S.; Álvarez, J.C. Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. Int. J. Emerg. Technol. Learn. 2023, 18, 120–155. [Google Scholar] [CrossRef]
  56. Martins, M.V.; Baptista, L.; Machado, J.; Realinho, V. Multi-Class Phased Prediction of Academic Performance and Dropout in Higher Education. Appl. Sci. 2023, 13, 4702. [Google Scholar] [CrossRef]
  57. Coussement, K.; Phan, M.; De Caigny, A.; Benoit, D.F.; Raes, A. Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model. Decis. Support Syst. 2020, 135, 113325. [Google Scholar] [CrossRef]
  58. Oqaidi, K.; Aouhassi, S.; Mansouri, K. Towards a Students’ Dropout Prediction Model in Higher Education Institutions Using Machine Learning Algorithms. Int. J. Emerg. Technol. Learn. 2022, 17, 103–117. [Google Scholar] [CrossRef]
  59. Won, H.S.; Kim, M.J.; Kim, D.; Kim, H.S.; Kim, K.M. University Student Dropout Prediction Using Pretrained Language Models. Appl. Sci. 2023, 13, 7073. [Google Scholar] [CrossRef]
  60. Hutagaol, N. Suharjito Predictive modelling of student dropout using ensemble classifier method in higher education. Adv. Sci. Technol. Eng. Syst. 2019, 4, 206–211. [Google Scholar] [CrossRef]
  61. Sultana, S.; Khan, S.; Abbas, M.A. Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. Int. J. Electr. Eng. Educ. 2017, 54, 105–118. [Google Scholar] [CrossRef]
  62. Realinho, V.; Machado, J.; Baptista, L.; Martins, M.V. Predicting Student Dropout and Academic Success. Data 2022, 7, 146. [Google Scholar] [CrossRef]
  63. Behr, A.; Giese, M.; Teguim Kamdjou, H.D.; Theune, K. Motives for dropping out from higher education—An analysis of bachelor’s degree students in Germany. Eur. J. Educ. 2021, 56, 325–343. [Google Scholar] [CrossRef]
  64. Thammasiri, D.; Delen, D.; Meesad, P.; Kasap, N. A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Syst. Appl. 2014, 41, 321–330. [Google Scholar] [CrossRef]
  65. Goran, R.; Jovanovic, L.; Bacanin, N.; Stankovic, M.; Simic, V.; Antonijevic, M.; Zivkovic, M. Identifying and understanding student dropouts using metaheuristic optimized classifiers and explainable artificial intelligence techniques. IEEE Access 2024, 12, 122377–122400. [Google Scholar] [CrossRef]
  66. Gutiérrez, B.; Dehnhardt, M.; Cortés, R.; Matheu, A.; Cornejo, C. Modelo logístico de deserción mediante técnicas de regresión y árbol de decisión para la eficiencia en la destinación de recursos: El caso de una universidad privada chilena. Rev. Ibérica Sist. E Tecnol. Informação 2024, E68, 398–412. [Google Scholar]
  67. Hassan, M.A.; Muse, A.H.; Nadarajah, S. Predicting student dropout rates using supervised machine learning: Insights from the 2022 National Education Accessibility Survey in Somaliland. Appl. Sci. 2024, 14, 7593. [Google Scholar] [CrossRef]
  68. Vaarma, M.; Li, H. Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technol. Soc. 2024, 76, 102474. [Google Scholar] [CrossRef]
  69. Cai, C.; Fleischhacker, A. Structural Neural Networks Meet Piecewise Exponential Models for Interpretable College Dropout Prediction. J. Educ. Data Min. 2024, 16, 279–302. [Google Scholar]
  70. Asto-Lazaro, M.S.; Cieza-Mostacero, S.E. Web Application Based on Neural Networks for the Detection of Students at Risk of Academic Desertion. TEM J. 2024, 13, 2581. [Google Scholar] [CrossRef]
  71. Isleib, S.; Woisch, A.; Heublein, U. Causes of higher education dropout: Theoretical basis and empirical factors. Z. Erzieh. 2019, 22, 1047–1076. [Google Scholar] [CrossRef]
  72. Guerra, L.; Rivero, D.; Ortiz, A.; Diaz, E.; Quishpe, S. Prediction model of university dropout through data analytics: Strategy for sustainability. RISTI—Rev. Iber. Sist. E Tecnol. Inf. 2020, 2020, 38–47. [Google Scholar]
  73. Hinojosa, M.; Derpich, I.; Alfaro, M.; Ruete, D.; Caroca, A.; Gatica, G. Student clustering procedure according to dropout risk to improve student management in higher education. Texto Livre 2022, 15, e37275. [Google Scholar] [CrossRef]
  74. Zapata-Medina, D.; Espinosa-Bedoya, A.; Jiménez-Builes, J.A. Improving the Automatic Detection of Dropout Risk in Middle and High School Students: A Comparative Study of Feature Selection Techniques. Mathematics 2024, 12, 1776. [Google Scholar] [CrossRef]
  75. Arthana, I.K.R.; Maysanjaya, I.M.D.; Pradnyana, G.A.; Dantes, G.R. Optimizing Dropout Prediction in University Using Oversampling Techniques for Imbalanced Datasets. Int. J. Inf. Educ. Technol. 2024, 14, 1052–1060. [Google Scholar] [CrossRef]
  76. Villar, A.; de Andrade, C.R.V. Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discov. Artif. Intell. 2024, 4, 2. [Google Scholar] [CrossRef]
  77. Diaz, J.; Moreira, F. Toward Educational Sustainability: An AI System for Identifying and Preventing Student Dropout. IEEE Rev. Iberoam. Tecnol. Aprendiz. 2024, 19, 100–110. [Google Scholar]
  78. Kuz, A.; Morales, R. Education in the Knowledge Society Educational Data Science and Machine Learning: A Case Study on University Student Dropout in Mexico. Educ. Knowl. Soc. 2023, 24, 14. [Google Scholar]
  79. Villarreal-Torres, H.; Ángeles-Morales, J.; Cano-Mejía, J.; Mejía-Murillo, C.; Flores-Reyes, G.; Palomino-Márquez, M.; Marín-Rodriguez, W.; Andrade-Girón, D. Classification model for student dropouts using machine learning: A case study. EAI Endorsed Trans. Scalable Inf. Syst. 2023, 10, 1–12. [Google Scholar] [CrossRef]
  80. Díaz, B.; Marín, W.; Lioo, F.; Baldeos, L.; Villanueva, D.; Ausejo, J. Student desertion, factors associated with decision trees: The case of a graduate school at a public university in Peru. RISTI—Rev. Iber. Sist. E Tecnol. Inf. 2022, 2022, 197–211. [Google Scholar]
  81. Bedregal-Alpaca, N.; Cornejo-Aparicio, V.; Zarate-Valderrama, J.; Yanque-Churo, P. Classification models for determining types of academic risk and predicting dropout in university students. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 7. [Google Scholar] [CrossRef][Green Version]
  82. Marquez-Vera, C.; Morales, C.R.; Soto, S.V. Predicting School Failure and Dropout by Using Data Mining Techniques. IEEE Rev. Iberoam. Tecnol. Aprendiz. 2013, 8, 7–14. [Google Scholar] [CrossRef]
  83. Chen, R. Institutional characteristics and college student dropout risks: A multilevel event history analysis. Res. High. Educ. 2012, 53, 487–505. [Google Scholar] [CrossRef]
  84. Qvortrup, A.; Lykkegaard, E. The malleability of higher education study environment factors and their influence on humanities student dropout—Validating an instrument. Educ. Sci. 2024, 14, 904. [Google Scholar] [CrossRef]
  85. Alban, M.; Mauricio, D. Neural networks to predict dropout at the universities. Int. J. Mach. Learn. Comput. 2019, 9, 149–153. [Google Scholar] [CrossRef]
  86. Silva, H.A.; Quezada, L.E.; Oddershede, A.M.; Palominos, P.I.; O’Brien, C. A Method for Estimating Students’ Desertion in Educational Institutions Using the Analytic Hierarchy Process. J. Coll. Stud. Retent. Res. Theory Pract. 2020, 25, 101–125. [Google Scholar] [CrossRef]
  87. Bedregal-Alpaca, N.; Tupacyupanqui-Jaén, D.; Cornejo-Aparicio, V. Analysis of the academic performance of systems engineering students, desertion possibilities and proposals for retention. Ingeniare 2020, 28, 668–683. [Google Scholar] [CrossRef]
  88. Kim, S.; Choi, E.; Jun, Y.K.; Lee, S. Student Dropout Prediction for University with High Precision and Recall. Appl. Sci. 2023, 13, 6275. [Google Scholar] [CrossRef]
  89. Zanellati, A.; Zingaro, S.P.; Gabbrielli, M. Balancing performance and explainability in academic dropout prediction. IEEE Trans. Learn. Technol. 2024, 17, 2086–2099. [Google Scholar] [CrossRef]
  90. Iam-On, N.; Boongoen, T. Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int. J. Mach. Learn. Cybern. 2017, 8, 497–510. [Google Scholar] [CrossRef]
  91. Quispe, J.O.Q.; Toledo, O.C.; Toledo, M.C.; Llatasi, E.E.C.; Saira, E. Early prediction of university student dropout using machine learning models. Nanotechnol. Percept. 2024, 20, 659–669. [Google Scholar]
  92. Bouihi, B.; Bousselham, A.; Aoula, E.; Ennibras, F.; Deraoui, A. Prediction of Higher Education Student Dropout based on Regularized Regression Models. Eng. Technol. Appl. Sci. Res. 2024, 14, 17811–17815. [Google Scholar] [CrossRef]
  93. Aggarwal, D.; Mittal, S.; Bali, V. Prediction model for classifying students based on performance using machine learning techniques. Int. J. Recent Technol. Eng. 2019, 8, 496–503. [Google Scholar] [CrossRef]
  94. Alvarez, N.L.; Callejas, Z.; Griol, D. Factors that affect student desertion in careers in Computer Engineering profile. Rev. Fuentes 2020, 22, 105–126. [Google Scholar] [CrossRef]
  95. Cam, H.N.T.; Sarlan, A.; Arshad, N.I. A hybrid model integrating recurrent neural networks and the semi-supervised support vector machine for identification of early student dropout risk. PeerJ Comput. Sci. 2024, 10, e2572. [Google Scholar] [CrossRef]
  96. Castelo Branco, U.V.; Jezine, E.; Santos Diniz, A.V.; Silva, G.T. Sistema de Alerta para la Identificación de Posibles Factores de Deserción de Estudiantes de Grado en Período de Pandemia en Paraíba (Brasil). Res. Educ. Learn. Innov. Arch. 2022, 29, 83–101. [Google Scholar] [CrossRef]
  97. Gutierrez-Pachas, D.A.; Garcia-Zanabria, G.; Cuadros-Vargas, E.; Camara-Chavez, G.; Gomez-Nieto, E. Supporting Decision-Making Process on Higher Education Dropout by Analyzing Academic, Socioeconomic, and Equity Factors through Machine Learning and Survival Analysis Methods in the Latin American Context. Educ. Sci. 2023, 13, 154. [Google Scholar] [CrossRef]
  98. Hoffait, A.S.; Schyns, M. Early detection of university students with potential difficulties. Decis. Support Syst. 2017, 101, 1–11. [Google Scholar] [CrossRef]
  99. Lacave, C.; Molina, A.I.; Cruz-Lemus, J.A. Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behav. Inf. Technol. 2018, 37, 993–1007. [Google Scholar] [CrossRef]
  100. Lottering, R.; Hans, R.; Lall, M. A Machine Learning Approach to Identifying Students at Risk of Dropout: A Case Study. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 417–422. [Google Scholar] [CrossRef]
  101. Martins, M.; Migueis, V.; Fonseca, D. Gouveia Paulo Prediction of academic dropout in a higher education institution using data mining. RISTI—Rev. Iber. Sist. E Tecnol. Inf. 2020, 2020, 188–203. [Google Scholar]
  102. Radovanović, S.; Delibašić, B.; Suknović, M. Predicting dropout in online learning environments. Comput. Sci. Inf. Syst. 2021, 18, 957–978. [Google Scholar] [CrossRef]
  103. Rivera-Baena, O.D.; Patiño-Rodríguez, C.E.; Úsuga-Manco, O.C.; Hernández-Barajas, F. ADHE: A tool to characterize higher education dropout phenomenon. Rev. Fac. Ing. Univ. Antioq. 2024, 64–75. [Google Scholar] [CrossRef]
  104. Schneider, K.; Berens, J.; Burghoff, J. Early detection of student dropout: What is relevant information? Z. Erzieh. 2019, 22, 1121–1146. [Google Scholar] [CrossRef]
  105. Segura, M.; Mello, J.; Hernández, A. Machine Learning Prediction of University Student Dropout: Does Preference Play a Key Role? Mathematics 2022, 10, 3359. [Google Scholar] [CrossRef]
  106. Tan, M.; Shao, P. Prediction of student dropout in E-learning program through the use of machine learning method. Int. J. Emerg. Technol. Learn. 2015, 10, 11. [Google Scholar] [CrossRef]
  107. Wainipitapong, S.; Chiddaycha, M. Assessment of dropout rates in the preclinical years and contributing factors: A study on one Thai medical school. BMC Med. Educ. 2022, 22, 461. [Google Scholar] [CrossRef]
  108. Yasmin. Application of the classification tree model in predicting learner dropout behaviour in open and distance learning. Distance Educ. 2013, 34, 218–231. [Google Scholar] [CrossRef]
  109. Zárate-Valderrama, J.; Bedregal-Alpaca, N.; Cornejo-Aparicio, V. Classification models to recognize patterns of desertion in university students. Ingeniare 2021, 29, 168–177. [Google Scholar] [CrossRef]
  110. Zerkouk, M.; Mihoubi, M.; Chikhaoui, B.; Wang, S. A machine learning based model for student’s dropout prediction in online training. Educ. Inf. Technol. 2024, 29, 15793–15812. [Google Scholar] [CrossRef]
  111. Alfahid, A. Algorithmic Prediction of Students On-Time Graduation from the University. TEM J. 2024, 13, 692–698. [Google Scholar] [CrossRef]
  112. Mealli, F.; Rampichini, C. Evaluating the effects of university grants by using regression discontinuity designs. J. R. Stat. Soc. Ser. A Stat. Soc. 2012, 175, 775–798. [Google Scholar] [CrossRef]
  113. Daza, A. A stacking based hybrid technique to predict student dropout at universities. J. Theor. Appl. Inf. Technol. 2022, 100, 1–12. [Google Scholar]
  114. Lackner, E. Community College Student Persistence During the COVID-19 Crisis of Spring 2020. Community Coll. Rev. 2023, 51, 193–215. [Google Scholar] [CrossRef] [PubMed]
  115. Willging, P.A.; Johnson, S.D. Factors that influence students’ decision to dropout of online courses. Online Learn. J. 2019, 13, 115–127. [Google Scholar] [CrossRef]
  116. Vega, H.; Sanez, E.; De La Cruz, P.; Moquillaza, S.; Pretell, J. Intelligent System to Predict University Students Dropout. Int. J. Online Biomed. Eng. 2022, 18, 27–43. [Google Scholar] [CrossRef]
  117. Fontana, L.; Masci, C.; Ieva, F.; Paganoni, A.M. Performing learning analytics via generalised mixed-effects trees. Data 2021, 6, 74. [Google Scholar] [CrossRef]
  118. Márquez-Vera, C.; Cano, A.; Romero, C.; Ventura, S. Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 2013, 38, 315–330. [Google Scholar] [CrossRef]
  119. Alvarado-Uribe, J.; Mejía-Almada, P.; Masetto Herrera, A.L.; Molontay, R.; Hilliger, I.; Hegde, V.; Montemayor Gallegos, J.E.; Ramírez Díaz, R.A.; Ceballos, H.G. Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education. Data 2022, 7, 119. [Google Scholar] [CrossRef]
  120. Dasi, H.; Kanakala, S. Student Dropout Prediction Using Machine Learning Techniques. Int. J. Intell. Syst. Appl. Eng. 2022, 10, 408–414. [Google Scholar]
  121. Albán, M.; Mauricio, D.; Albán, M. Decision trees for the early identification of university students at risk of desertion. Int. J. Eng. Technol 2018, 7, 51. [Google Scholar] [CrossRef]
  122. Song, Z.; Sung, S.H.; Park, D.M.; Park, B.K. All-Year Dropout Prediction Modeling and Analysis for University Students. Appl. Sci. 2023, 13, 1143. [Google Scholar] [CrossRef]
  123. Fauszt, T.; Erdélyi, K.; Dobák, D.; Bognár, L.; Kovács, E. Design of a Machine Learning Model to Predict Student Attrition. Int. J. Emerg. Technol. Learn. 2023, 18, 184–195. [Google Scholar] [CrossRef]
  124. Meyer, J.; Leuze, K.; Strauss, S. Individual Achievement, Person-Major Fit, or Social Expectations: Why Do Students Switch Majors in German Higher Education? Res. High. Educ. 2022, 63, 222–247. [Google Scholar] [CrossRef]
  125. Wild, S.; Schulze Heuling, L. Student dropout and retention: An event history analysis among students in cooperative higher education. Int. J. Educ. Res. 2020, 104, 101687. [Google Scholar] [CrossRef]
  126. Wongvorachan, T.; Bulut, O.; Liu, J.X.; Mazzullo, E. A Comparison of Bias Mitigation Techniques for Educational Classification Tasks Using Supervised Machine Learning. Information 2024, 15, 326. [Google Scholar] [CrossRef]
  127. Sacală, M.D.; Pătărlăgeanu, S.R.; Popescu, M.F.; Constantin, M. Econometric research of the mix of factors influencing first-year students’ dropout decision at the faculty of agri-food and environmental economics. Econ. Comput. Econ. Cybern. Stud. Res. 2021, 55, 203–220. [Google Scholar] [CrossRef]
  128. Kok, C.L.; Ho, C.K.; Chen, L.; Koh, Y.Y.; Tian, B. A Novel Predictive Modeling for Student Attrition Utilizing Machine Learning and Sustainable Big Data Analytics. Appl. Sci. 2024, 14, 9633. [Google Scholar] [CrossRef]
  129. Fernandez-Garcia, A.J.; Preciado, J.C.; Melchor, F.; Rodriguez-Echeverria, R.; Conejero, J.M.; Sanchez-Figueroa, F. A real-life machine learning experience for predicting university dropout at different stages using academic data. IEEE Access 2021, 9, 133076–133090. [Google Scholar] [CrossRef]
  130. Alban, M.; Mauricio, D. Factors that influence undergraduate university desertion according to students perspective. Int. J. Eng. Technol. 2019, 10, 1585–1602. [Google Scholar] [CrossRef]
  131. Huo, H.; Cui, J.; Hein, S.; Padgett, Z.; Ossolinski, M.; Raim, R.; Zhang, J. Predicting Dropout for Nontraditional Undergraduate Students: A Machine Learning Approach. J. Coll. Stud. Retent. Res. Theory Pract. 2023, 24, 1054–1077. [Google Scholar] [CrossRef]
  132. Nuanmeesri, S.; Poomhiran, L.; Chopvitayakun, S.; Kadmateekarun, P. Improving Dropout Forecasting during the COVID-19 Pandemic through Feature Selection and Multilayer Perceptron Neural Network. Int. J. Inf. Educ. Technol. 2022, 12, 851–857. [Google Scholar] [CrossRef]
  133. Zamora Menéndez, Á.; Gil Flores, J.; de Besa Gutiérrez, M.R. Learning approaches, time perspective and persistence in university students. Educ. XX1 2020, 23, 17–39. [Google Scholar] [CrossRef]
  134. Vives, L.; Cabezas, I.; Vives, J.C.; Reyes, N.G.; Aquino, J.; Cóndor, J.B.; Altamirano, S.F.S. Prediction of students’ academic performance in the programming fundamentals course using long short-term memory neural networks. IEEE Access 2024, 12, 5882–5898. [Google Scholar] [CrossRef]
  135. Alban, M.; Mauricio, D. Prediction of university dropout through technological factors: A case study in Ecuador. Rev. Espac. 2018, 39, 8. [Google Scholar]
  136. Cedeño-Valarezo, L.; Morales-Carrillo, J.; Quijije-Vera, C.P.; Palau-Delgado, S.A.; López-Mora, C.I. Machine learning to predict school dropout in the context of COVID-19. RISTI—Rev. Iber. Sist. E Tecnol. Inf. 2023, 2023, 370–377. [Google Scholar]
  137. Gamao, A.O.; Gerardo, B.D. Prediction-based model for student dropouts using modified mutated firefly algorithm. Int. J. Adv. Trends Comput. Sci. Eng. 2019, 8, 3461–3469. [Google Scholar] [CrossRef]
  138. Heredia, D.; Amaya, Y.; Barrientos, E. Student Dropout Predictive Model Using Data Mining Techniques. IEEE Lat. Am. Trans. 2015, 13, 3127–3134. [Google Scholar] [CrossRef]
  139. Mubarak, A.A.; Cao, H.; Zhang, W. Prediction of students’ early dropout based on their interaction logs in online learning environment. Interact. Learn. Environ. 2022, 30, 1414–1433. [Google Scholar] [CrossRef]
  140. Niyogisubizo, J.; Liao, L.; Nziyumva, E.; Murwanashyaka, E.; Nshimyumukiza, P.C. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Comput. Educ. Artif. Intell. 2022, 3, 100066. [Google Scholar] [CrossRef]
  141. Rico-Juan, J.R.; Cachero, C.; Macià, H. Study regarding the influence of a student’s personality and an LMS usage profile on learning performance using machine learning techniques. Appl. Intell. 2024, 54, 6175–6197. [Google Scholar] [CrossRef]
  142. Selvan, M.P.; Navadurga, N.; Prasanna, N.L. An efficient model for predicting student dropout using data mining and machine learning techniques. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 750–752. [Google Scholar] [CrossRef]
  143. Valles-Coral, M.A.; Salazar-Ramírez, L.; Injante, R.; Hernandez-Torres, E.A.; Juárez-Díaz, J.; Navarro-Cabrera, J.R.; Pinedo, L.; Vidaurre-Rojas, P. Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels. Data 2022, 7, 165. [Google Scholar] [CrossRef]
  144. Figueroa-Canas, J.; Sancho-Vinuesa, T. Early prediction of dropout and final exam performance in an online statistics course. Rev. Iberoam. Tecnol. Aprendiz. 2020, 15, 86–94. [Google Scholar] [CrossRef]
  145. Nuankaew, P. Dropout situation of business computer students, University of Phayao. Int. J. Emerg. Technol. Learn. 2019, 14, 115–131. [Google Scholar] [CrossRef]
  146. Pecuchova, J.; Drlik, M. Enhancing the Early Student Dropout Prediction Model Through Clustering Analysis of Students’ Digital Traces. IEEE Access 2024, 12, 159336–159367. [Google Scholar] [CrossRef]
  147. Fu, Q.; Gao, Z.; Zhou, J.; Zheng, Y. CLSA: A novel deep learning model for MOOC dropout prediction. Comput. Electr. Eng. 2021, 94, 107315. [Google Scholar] [CrossRef]
  148. Burgos, C.; Campanario, M.L.; Peña, D.d.l.; Lara, J.A.; Lizcano, D.; Martínez, M.A. Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Comput. Electr. Eng. 2018, 66, 541–556. [Google Scholar] [CrossRef]
  149. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (accessed on 25 February 2025).
  150. Melo, E.; Silva, I.; Costa, D.G.; Viegas, C.M.D.; Barros, T.M. On the use of explainable artificial intelligence to evaluate school dropout. Educ. Sci. 2022, 12, 845. [Google Scholar] [CrossRef]
  151. Dass, S.; Gary, K.; Cunningham, J. Predicting student dropout in self-paced mooc course using random forest model. Information 2021, 12, 476. [Google Scholar] [CrossRef]
  152. Karlos, S.; Kostopoulos, G.; Kotsiantis, S. Predicting and interpreting students’ grades in distance higher education through a semi-regression method. Appl. Sci. 2020, 10, 8413. [Google Scholar] [CrossRef]
  153. Torres, J.A.O.; Santiago, A.M.; Izaguirre, J.M.V.; Garduza, S.H.; García, M.A.; Alejandro, G.F. Multilayer fuzzy inference system for predicting the risk of dropping out of school at the high school level. IEEE Access 2024, 2, 137523–137532. [Google Scholar] [CrossRef]
  154. Karimi-Haghighi, M.; Castillo, C.; Hernández-Leo, D. A Causal Inference Study on the Effects of First Year Workload on the Dropout Rate of Undergraduates. In Artificial Intelligence in Education; Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 15–27. [Google Scholar]
  155. Alhaza, K.; Abdel-Salam, A.-S.G.; Mollazehi, M.D.; Ismail, R.M.; Bensaid, A.; Johnson, C.; Al-Tameemi, R.A.N.; A Hasan, M.; Romanowski, M.H. Factors affecting university image among undergraduate students: The case study of Qatar University. Cogent Educ. 2021, 8, 1977106. [Google Scholar] [CrossRef]
  156. Viloria, A.; Lezama, O.B.P. Mixture structural equation models for classifying university student dropout in Latin America. Procedia Comput. Sci. 2019, 160, 629–634. [Google Scholar] [CrossRef]
  157. Ishii, T.; Tachikawa, H.; Shiratori, Y.; Hori, T.; Aiba, M.; Kuga, K.; Arai, T. What kinds of factors affect the academic outcomes of university students with mental disorders? A retrospective study based on medical records. Asian J. Psychiatry 2018, 32, 67–72. [Google Scholar] [CrossRef]
  158. Pecuchova, J.; Drlik, M. Predicting students at risk of early dropping out from course using ensemble classification methods. Procedia Comput. Sci. 2023, 225, 3223–3232. [Google Scholar] [CrossRef]
  159. Brigato, L.; Iocchi, L. A Close Look at Deep Learning with Small Data. arXiv 2020, arXiv:2003.12843. [Google Scholar]
  160. Mauricio, D.; Cárdenas-Grandez, J.; Uribe Godoy, G.V.; Rodríguez Mallma, M.J.; Maculan, N.; Mascaro, P. Maximizing Survival in Pediatric Congenital Cardiac Surgery Using Machine Learning, Explainability, and Simulation Techniques. J. Clin. Med. 2024, 13, 6872. [Google Scholar] [CrossRef]
  161. Ananthi Claral Mary, T.; Arul Leena Rose, P.J. Ensemble Machine Learning Model for University Students’ Risk Prediction and Assessment of Cognitive Learning Outcomes. Int. J. Inf. Educ. Technol. 2023, 13, 948–958. [Google Scholar] [CrossRef]
Figure 1. Systematic review process according to PRISMA [50].
Figure 1. Systematic review process according to PRISMA [50].
Computation 13 00198 g001
Figure 2. Number of articles selected per year.
Figure 2. Number of articles selected per year.
Computation 13 00198 g002
Figure 3. Number of authors by country of affiliation.
Figure 3. Number of authors by country of affiliation.
Computation 13 00198 g003
Figure 4. Number of articles per quartile.
Figure 4. Number of articles per quartile.
Computation 13 00198 g004
Figure 5. Number of articles per publisher.
Figure 5. Number of articles per publisher.
Computation 13 00198 g005
Figure 6. Frequency of UED factors by category.
Figure 6. Frequency of UED factors by category.
Computation 13 00198 g006
Figure 7. Process to minimize UED.
Figure 7. Process to minimize UED.
Computation 13 00198 g007
Table 1. Database search string.
Table 1. Database search string.
DatabaseSearch String
Scopustitle-abstract-keywords ((“student desertion” OR “student abandonment” OR “student retreat” OR “student withdrawal” OR “desertion university” OR “dropout university” OR “desertion dropout” OR desertion OR “college dropout” OR “academic desertion” OR “academic dropout” OR “student dropout” OR “university withdrawal” OR “college withdrawal”) AND (explication OR factor OR prediction OR simulation OR methods OR framework OR forecast OR predict OR process OR explanation OR interpretation OR patterns OR analysis OR identify OR estimate OR know OR architecture OR establish OR proposal OR discover OR explainability OR predicting OR performance OR models) AND (“machine learning” OR “deep Learning” OR “decision tree” OR “Bayesian” OR “neural network” OR arn OR regression OR clustering OR “association rules” OR “automatic learning”))
Web of Science (WoS)(“student desertion” OR “student abandonment” OR “student retreat” OR “student withdrawal” OR “desertion university” OR “dropout university” OR “desertion dropout” OR desertion OR “college dropout” OR “academic desertion” OR “academic dropout” OR “student dropout” OR “university withdrawal” OR “college withdrawal”) AND (explication OR factor OR prediction OR simulation OR methods OR framework OR forecast OR predict OR process OR explanation OR interpretation OR patterns OR analysis OR identify OR estimate OR know OR architecture OR establish OR proposal OR discover OR explainability OR predicting OR performance OR models) AND (“machine learning” OR “deep Learning” OR “decision tree” OR “Bayesian” OR “neural network” OR arn OR regression OR Clustering OR “association rules” OR “automatic learning”) (topic)
Table 2. Inclusion and exclusion criteria.
Table 2. Inclusion and exclusion criteria.
Inclusion CriteriaExclusion Criteria
Articles addressing at least one of the key dimensions of this review: factors, prediction, explanation, or simulation of UED using ML.
Journal articles.
Area related to “Engineering” or “Computer Science”.
Articles published in journals indexed in Scopus and Web of Science
Period: 2012–2024.
Pre-publications
Articles in the field of secondary or primary education.
Articles that are not within the context of ML.
Articles that identify factors associated with UED, without empirical or statistical validation.
Table 3. Potentially Eligible and Selected Articles.
Table 3. Potentially Eligible and Selected Articles.
SourcePotentially Eligible StudiesSelected Studies
Scopus620102
Web of Science (WoS)16620
Total786122 *
* 61 studies removed from WoS for being duplicates in Scopus.
Table 4. Category of factors influencing UED.
Table 4. Category of factors influencing UED.
CategoryDescriptionReferences
DemographicsCharacteristics or attributes that describe the structure and composition of a population.[51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71]
SocioeconomicIt is the analysis of the economic and social situation of each student.[51,52,55,56,57,61,62,64,66,71,72,73,74,75,76,77,78]
InstitutionalElements related to the physical structure and functioning of the institution.[59,78,79,80,81,82,83,84,85]
PersonalElements immersed within the student’s family circle.[54,60,69,72,74,79,80,86,87,88,89]
AcademicElements related to their academic performance.[51,53,54,55,56,57,58,59,60,62,64,65,66,67,68,69,72,73,74,75,76,77,80,86,88,90,91,92]
Table 5. Most relevant demographic factors of the UED.
Table 5. Most relevant demographic factors of the UED.
IdFactorReferences
F001Gender[7,14,39,50,51,53,54,58,59,60,61,63,64,65,66,67,68,69,71,72,74,77,78,82,83,86,88,89,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110]
F002Age[7,15,51,54,59,64,65,66,67,68,69,71,72,74,77,78,80,83,84,86,89,91,94,97,98,99,100,101,105,108,109,111,112,113,114,115,116,117,118,119]
F003Marital status[51,59,61,68,69,71,76,80,86,91,93,97,102,106,108,114,116,118]
F004Sex[21,55,78,80,84,86,90,91,111,112,114,115,118,120,121,122,123,124]
F005Place of residence[7,14,39,51,64,65,66,68,74,86,93,106,108,114,121,125]
F006Place of origin[39,51,53,54,71,78,89,90,91,93]
F007Nationality[7,21,53,55,58,61,64,65,76,93,101,104,115,126]
F008Parents’ educational level[59,70,72,93,94,95,97,99,112,117,118]
F009Type of school[14,64,65,66,75,78,86,89,91,95,103,109]
F010Ethnicity[72,74,77,83,84,86,97,98,112,113,126]
Table 6. Most relevant socioeconomic factors of the UED.
Table 6. Most relevant socioeconomic factors of the UED.
IdFactorReferences
F076Scholarships[55,61,72,75,83,93,94,99,104,106,112,120,127]
F077Works[54,60,69,71,80,91,94,108,114,125,128]
F078Family income[59,80,86,95,97,98,102,125,129]
F079Income level[14,58,80,89,96,108,118]
F080Admission score[14,39,78,95,98,121,123]
F081Disability[69,72,77,88,100,105,114]
F082Educational level[54,101,105,108,117,122]
F083Socioeconomic status[64,77,83,95,124]
F084Internet access[15,76,93,129]
F085Financial aid[68,80,83,100,112]
Table 7. Most relevant institutional factors of the UED.
Table 7. Most relevant institutional factors of the UED.
IdFactorReferences
F156Infrastructure[80,81,84]
F157Educational services[66,83]
F158Suitable equipment[80,81]
F159Place[78,89]
F160Institution size[83,84]
F161Area[60]
F162Geographical area[78]
F163Teacher’s commitment to the student[85]
F164Classification of the career or institution[85]
F165Group class[82]
F166School climate assessment scale[126]
F167Counselor’s perception of teachers’ expectations[126]
Table 8. Most relevant personal factors of the UED.
Table 8. Most relevant personal factors of the UED.
IdFactorReferences
F179Year of admission[58,64,74,75,87,90,94,95,120,127]
F180Motivation[80,81,82,97]
F181Extracurricular activities[72,84,97]
F182Commitment[99,124,130]
F183Class participation[84,131,132]
F184Number of voluntary activities[7,58,88]
F185Future time perspective[55,130]
F186Time to study[60,80]
F187Adaptation and coexistence[55,133]
F188Leader or president[60,93]
Table 9. Most relevant academic factors of the UED.
Table 9. Most relevant academic factors of the UED.
IdFactorReferences
F317Ratings[39,54,65,74,76,78,82,87,90,106,111,116,117,120,126,127]
F318General GPA[7,59,64,67,69,75,83,84,89,90,97,113,120,123,126,129]
F319Secondary note[39,72,83,86,97,103,109,111,121,129]
F320Subjects taken[15,72,73,78,94,98,108,118]
F321Credits taken[71,89,100,101,112,120,121]
F322Attendance[55,59,61,64,72,73,76,120]
F323Type of admission[39,58,78,90,95,107]
F324Type of school[66,86,89,99,102,103,108]
F325School[39,55,70,78,89,91,117]
F326Academic year[39,78,93,100,134]
Table 10. Most relevant developments in ML for the prediction of UED.
Table 10. Most relevant developments in ML for the prediction of UED.
ModelMinMaxReferences
DT6299.53[7,39,56,58,60,65,66,71,72,75,76,77,78,79,81,82,87,90,91,99,100,101,102,106,108,110,111,114,116,118,119,120,121,124,129,131,132,135,136,137,138,139,140,141,142,143]
LR5099.58[39,53,56,58,60,65,66,67,71,79,86,88,91,92,96,99,102,105,106,108,113,118,120,125,129,131,135,141,142,143,144,145]
RF56.67100[15,39,54,56,58,66,72,74,89,93,100,102,104,106,108,118,120,126,127,129,131,134,135,141,143,146,147]
SVM36.7398.8[39,56,58,66,67,71,72,74,76,91,93,99,100,102,106,108,110,118,120,127,129,131,135,141,143,144,145]
ANN62100[15,39,53,56,60,69,71,72,78,79,87,88,90,93,99,101,104,110,111,118]
NB5598.6[7,39,59,60,66,67,75,90,91,100,102,110,118,129,136]
KNN6299[39,59,66,67,72,75,90,99,100,106,131,132,135]
XGBoost7099.28[15,64,67,71,76,79,108,110,120,132,134,146]
MLP7998.60[7,58,85,102,106,129,132,141]
GB6998.68[39,56,88,127,134]
AdaBoost80.995.51[53,64,110,132]
GMERF90.893.58[14,21]
CART79.797.91[14,59,110]
Ridor93.497.90[82,116]
ICRM v293.794.40[82,116]
ICRM v192.192.10[82,116]
GLM82.391.05[14,110]
ADTree96.698.20[82,116]
CNN86.494.60[106,144]
RandomTree9496.10[82,116]
LightGBM79.881.00[108,120]
Prism94.499.80[82,116]
REPTree92.796.50[82,116]
PR86.398.20[66,92]
SimpleCart96.496.60[82,116]
K-means44.2980.01[77,133]
Table 11. Preprocessing techniques.
Table 11. Preprocessing techniques.
PreprocessingReferences
DC[7,15,39,59,64,66,67,68,72,74,75,78,79,82,85,87,92,97,100,101,102,106,108,111,116,118,119,120,126,127,131,133,134,135,138,139,144,145,146,147,148]
DTA[39,59,60,65,74,111,119,129,135,142]
FS[74,75,77,91,92,99,118,126,135]
SV[64,67,68,76,89,92,131,132]
VC[67,69,76,89,108,132]
SMOTE[59,64,75,88,108,114]
Normalization[69,126,128,144]
EV[69,76,77,132]
TV[66,68,131]
DR[39,91,135]
CC[74,75,126]
DS[78,87]
Unrealized[14,21,53,54,56,58,71,80,81,86,93,96,98,104,110,113,121,125,137,140]
Data cleaning (DC); data transformation (DTA); feature selection (FS); standardization of variables (SV); variable coding (VC); synthetic minority oversampling technique (SMOTE); elimination of variables (EV); transformation of variables (TV); dimensionality reduction (DR); categorical coding (CC); data selection (DS).
Table 12. Explanation progress for the UED.
Table 12. Explanation progress for the UED.
StudiesCaseModel
[150]They developed a black box model to predict and explain student dropout at the Federal Institute of Rio Grande do Norte (IFRN) in Brazil, with an explanatory level of 78% for SHAP and 57% for LIME. Characteristics such as academic performance, parents’ educational level, and family income are determinants of school dropout.SHAP
LIME
[151]Prediction of UED in a college algebra course at Arizona State University was conducted using RF in conjunction with SHAP and identified that the number of topics mastered, variability in performance, and activity tendencies had the most significant weight in predicting attrition.SHAP
[152]They implement a semi-supervised regression algorithm that utilizes multi-view learning to enhance the prediction of student grades, specifically for undergraduate students’ final grades, in conjunction with SHAP. The analysis identifies that participation in optional contact sessions, grades on written assignments, and student interactions are influential on UED.SHAP
[110]SHAP was utilized in an online education environment at a vocational training institute in Algeria to assess the contribution of active minutes per week, days online, and demographic data to the prediction, aiming to build a more accurate and understandable model.SHAP
[111]The study applied SHAP and LIME to the RF model at Majmaah University. SHAP identified that the number of hours logged in the last semester, first-year GPA, and length of study are the factors that have the most significant influence on predicting graduation. LIME analyzed an individual case where the main variables influencing not graduating on time were a low first-year GPA, a long duration of studies, and a low number of hours registered in the last semester. This local interpretation enabled a clear understanding of the reasons behind the prediction, providing valuable information for potential personalized interventions.SHAP
LIME
[89]Explainability techniques were applied at the University of Trento (Italy) to analyze the UED prediction models. Grouped Permutation Importance (GPI), used in RF through the analysis of the impact of groups of variables, measures the loss in model performance by randomly permuting the values of each group and determines that among the most relevant variables are cumulative credits and weighted grade point average. Attention Map (AM) applied to FTT visualized the areas of the input vector to which the model paid the most attention, again highlighting academic factors as the most influential. SHAP provided local explanations by calculating the individual contribution of each variable in predicting UED, showing that students with lower academic loads and low achievement were at greater risk. These complementary techniques strengthened the transparency of the models and confirmed the relevance of academic performance in the early identification of university dropouts.GPI
AM
SHAP
[65]The study applied SHAP and Shapley additive global importance (SAGE) in Serbia by researchers from Singidunum University and the Information Technology School on XGBoost and AdaBoost. The most influential factors in predicting UED were identified at both the local and global levels, highlighting failed curricular units, tuition payment status, and age at entry.SHAP
SAGE
[153]A neuro-fuzzy model (ANFIS) was applied in Mexican institutions, combining neural networks with fuzzy logic to predict UED risk. Using linguistic rules and visual response surfaces, we identified how age, income, and internet access influenced risk. In specific cases, high vulnerability profiles were identified, facilitating its use as a diagnostic tool in marginalized areas.ANFIS
[68]Permutation Importance (PEI) was applied at the Finnish University of Applied Sciences to explain the predictions of UED in three models: CatBoost, NN, and LR. PEI measured the drop in model performance (F1-score) by randomly altering the values of each feature, concluding that the three most important variables were cumulative credits, number of failed subjects, and Moodle activity count.PEI
[69]In the study conducted at the University of Delaware, an explainable model called PEM-SN was developed to predict UED. The process involved structuring the neural network so that academic, economic, and sociodemographic factors were grouped into independent hidden layers, generating three representative neurons that combine linearly to estimate dropout risk. The results indicated that academic integration was the most influential factor, clearly differentiating between dropouts and retained students, while economic and social factors had a minor impact. This structured design facilitated an understanding of the model, allowing for the identification not only of who is at risk but also why.PEM-SNN
[141]The study, applied at the Polytechnic University of Bucharest (Romania), used SHAP over Automatic Relevance Determination (ARD) to explain the prediction of academic performance. Explainability identified from the fourth week of the semester that the most influential variables were previous academic performance, personality traits such as openness and conscientiousness, and the use of the LMS outside class hours. This facilitated the early detection of students at risk of SUD.SHAP
Table 13. Most commonly used factors in XAI models.
Table 13. Most commonly used factors in XAI models.
FactorSHAPLIMEGPIAMSAGEPEIANFISPEM-SNN
AverageXXXXX
Accumulated creditsXXXXXX
AgeXXXXX
Employment statusXXXX
Attendance/presenceX
ScholarshipsXX
GenderXX
Moodle activityXX
PersonalityX
Study hoursXX
Assignment/exam markXXXX
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Quimiz-Moreira, M.; Delgadillo, R.; Parraga-Alava, J.; Maculan, N.; Mauricio, D. Factors, Prediction, Explainability, and Simulating University Dropout Through Machine Learning: A Systematic Review, 2012–2024. Computation 2025, 13, 198. https://doi.org/10.3390/computation13080198

AMA Style

Quimiz-Moreira M, Delgadillo R, Parraga-Alava J, Maculan N, Mauricio D. Factors, Prediction, Explainability, and Simulating University Dropout Through Machine Learning: A Systematic Review, 2012–2024. Computation. 2025; 13(8):198. https://doi.org/10.3390/computation13080198

Chicago/Turabian Style

Quimiz-Moreira, Mauricio, Rosa Delgadillo, Jorge Parraga-Alava, Nelson Maculan, and David Mauricio. 2025. "Factors, Prediction, Explainability, and Simulating University Dropout Through Machine Learning: A Systematic Review, 2012–2024" Computation 13, no. 8: 198. https://doi.org/10.3390/computation13080198

APA Style

Quimiz-Moreira, M., Delgadillo, R., Parraga-Alava, J., Maculan, N., & Mauricio, D. (2025). Factors, Prediction, Explainability, and Simulating University Dropout Through Machine Learning: A Systematic Review, 2012–2024. Computation, 13(8), 198. https://doi.org/10.3390/computation13080198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop