A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction

Elmasry, Menna M. S.; Gafar, Mona G.; Elsabagh, M. A.

doi:10.3390/inventions11020020

Open AccessArticle

A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction

by

Menna M. S. Elmasry

¹,

Mona G. Gafar

^2,*

and

M. A. Elsabagh

³

¹

Department of English Language and Literature, College of Science and Humanities in Al-Kharj, Prince Sattam Bin Abdulaziz University, Kharj 16278, Saudi Arabia

²

Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Kharj 16278, Saudi Arabia

³

Department of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh 33516, Egypt

^*

Author to whom correspondence should be addressed.

Inventions 2026, 11(2), 20; https://doi.org/10.3390/inventions11020020

Submission received: 10 January 2026 / Revised: 12 February 2026 / Accepted: 19 February 2026 / Published: 24 February 2026

(This article belongs to the Section Inventions and Innovation in Design, Modeling and Computing Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Because of the substantial class disparity and the intricate interactions between academic, behavioral, and socioeconomic characteristics, anticipating student academic performance and dropout rates continues to be a major issue for institutions of higher learning. To improve the dependability and credibility of multiclass student outcome prediction, this study suggests a strong, multi-objective, and uncertainty-aware predictive framework that combines the Random Forest (RF) classifier with Holistic Swarm Optimization (HSO). The suggested method creates a multi-objective optimization problem that simultaneously maximizes macro F1-score, controls model complexity, and lessens inter-class performance disparity. Thereby, the model promotes fairness across student outcome categories, in contrast to traditional optimization strategies that only concentrate on predictive accuracy. Furthermore, by utilizing ensemble-based probability dispersion, the framework integrates uncertainty-aware prediction, making it possible to identify high-risk students with different degrees of confidence to assist practical academic interventions. According to the results of experiments, the suggested HSO-RF framework greatly reduces the performance gap between outcome classes while achieving the best overall predictive performance, reaching an accuracy of 77.74%, a macro F1-score of 0.69, and a weighted F1-score of 0.76. The analysis shows that academic, socioeconomic, and administrative characteristics serve as significant markers of student motivation, stability, and vulnerability in addition to computational benefits. The suggested architecture advances appropriate and trustworthy educational data mining and offers a dependable decision-support tool for early warning systems.

Keywords:

Holistic Swarm Optimization (HSO); Random Forest (RF); dropout prediction; academic performance prediction; multiclass classification robustness

1. Introduction

Predicting student performance and dropout is a multi-layered, complex challenge that faces educational institutions and academic systems. It is not a simple computational process but a deep one that involves considering different integrated academic, educational, socio-economic, demographic, and linguistic factors that reflect how students behave in their educational environment. Early predictions based on predictive analytics and statistics enable academic guidance departments in universities to help students in their educational pathways through scheduling balanced timetables, selecting suitable courses, reducing confusion, and offering academic support. Furthermore, reducing student dropout is useful not only for students but also for universities and academic institutions, as it helps them in saving their resources. Students’ dropout means extended program duration that leads to a loss of financial costs, administrative burdens and teaching and advising hours. Through analyzing large numbers of different datasets, modern algorithms enable educational systems to predict dropout before it takes place, thus investigating the problem, diagnosing risks and providing support solutions to solve this problem. In addition, these huge volumes of data encompass various behavioral, academic, social and economic features that cannot be analyzed manually, so Machine Learning (ML) [1] models can detect early warnings and formulate planned solutions or changes in administrative procedures, academic and psychological guidance manuals or even in course plans. The current study proposes a hybrid model that connects the Holistic Swarm Optimization (HSO) [2] algorithm with the Random Forest (RF) [3] classifier. The hybrid model suggests a multi-objective optimization framework that simultaneously maximizes the macro F1-score [4], regulates model complexity [5], and lessens inter-class performance [6] disparity in order to overcome the drawbacks of conventional optimization strategies that prioritize predictive accuracy. The suggested method guarantees more balanced and equitable predictive performance by specifically taking fairness across various student outcome categories into account. Additionally, the framework uses ensemble-based probability dispersion to incorporate uncertainty-aware prediction [7,8], making it possible to identify high-risk students with different degrees of confidence. This capacity facilitates more knowledgeable and successful academic interventions, especially in early warning systems for dropout risk and student performance. The integration of the advanced, more accurate algorithm HSO, along with the reliable RF, enhances the prediction process, tunes hyperparameters, and thus increases accuracy.

Students’ dropouts from university programs may be caused by several reasons. These reasons could be dramatically lowered by monitoring and student guidance. Universities place special importance on the students’ guidance process as it is supposed to detect students at risk of dropout or continue their programs poorly and become uncompetitive with their colleagues in the workspace. The main problem is the lack of data reported on those students. Often, university guidance offices find out about these cases very late, causing students’ failure and wasting university resources. Also, datasets available for students’ dropouts suffer from an imbalance sourced from the low number of dropouts compared to successful students. Students’ dropout and academic success, a standard data set from the UCI repository, is utilized in the literature. This dataset suffers from imbalance, which causes the prediction model to be biased in favor of successful students and fails to detect students at risk (minority instances) of dropout [9]. This literature section is dedicated to studying dropout topics, trying to reach the latest techniques and research gaps.

Prediction models based on machine learning algorithms interest researchers as they can be trained efficiently by existing data and produce models that can automatically classify new instances. For example, Lykourentzou et al. [10] benefits Neural Networks (NNs) capabilities in hybrid with Support Vector Machine (SVM) and ensemble-based fuzzy ARTMAP to detect students’ dropout from e-learning courses. In comparison, their model outperformed other traditional classification algorithms. Also, Yukselturk et al. [11] used strong predictive machine learning algorithms like Decision Trees (DT), Random Forests (RF), and SVM to classify dropouts in online program data. They proved these algorithms’ capabilities to predict student dropout from a set of factors represented in the dataset features. In traditional university programs, RF was investigated by Dekker et al. [12] and Zhao et al. [13] in detecting students at risk of dropping out. These models achieved accuracy between 75% and 80% on institutionally prepared datasets. Niyogisubizo et al. [14] presented a stacking ensemble combining RF, XGBoost, Gradient Boosting (GB), and NN to predict student dropout using data from 2016 to 2020 at Constantine the Philosopher University in Nitra. Their model achieved higher accuracy and AUC. This helped investigators identify students at risk of university dropout. Martins et al. [15] evaluated ML algorithms and boosting algorithms. Their experimental results showed that boosting algorithms achieved better performing classification in a dataset from a higher education institution. On releasing the UCI dropout dataset, Villar et al. [16] evaluated RF, SVM, NN, and ensemble models. Their results indicated that tree-based methods performed better on imbalanced data. They reported that socioeconomic and parental features dramatically affect student success or failure during university progress. Kok et al. [17] and Rebelo et al. [18] analyzed Learning Management System (LMS) Moodle activity logs by gradient boosting, attention-based Recurrent Neural Network (RNN), and time-dependent features. They reported that behavioral features like login frequency, submissions and forum activity are strongly related to students’ success during programs and hence provide early signs for risks of dropouts. Also, Tamada et al. [19] presented a systematic review on ML algorithms to support virtual learning students’ advisors and provide early signs of students at risk, which improves the student retention process. Gardner et al. [20] were interested in analyzing a cross-institutional prediction model. Their experimental results proved that transfer models can be generalized as they simulate the performance of locally trained models. Vaarma et al. [21] demonstrated that prediction models trained by LMSs can be generalized better than those dependent on demographic and pre-admission features. This concluded that behavioral features are more general than social and economic ones.

In the hybrid models context, Xiong et al. [22] developed an educational prediction model using CNN and RNN. In comparison, their hybrid model competed with traditional ML methods with higher precision in dropout prediction but at the cost of interpretability. Although RF is a strong and robust machine learning algorithm because of its interpretability and overfitting resistance, metaheuristic optimization algorithms could be integrated with its capabilities to fine-tune hyper parameters. In the RF optimization field, metaheuristic algorithms like the Gray Wolf Optimizer (GWO) [23], Particle Swarm Optimization (PSO) [24] and Artificial Fish Swarm Algorithm (AFSA) [25] were introduced in several fields. For example, Radhi et al. [26] optimized RF to improve its performance in medical diagnostic systems. During the COVID-19 pandemic, the predictive model generalizes the idea of tuning RF hyper parameters by metaheuristic optimization in classifying imbalanced data. In a similar context, Khalidou et al. [27] introduced optimizing RF by PSO on a heart-disease prediction model. This encouraged researchers to combine swarm-intelligence with RF algorithms for tuning hyper parameters and utilizing these capabilities in early predicting student dropouts. Khaseeb et al. [28] improved the prediction model performance by combining GWO in feature selection on high-dimensional datasets. In the security field, the MOO-PSO algorithm [29] is used to optimize RF and XGBoost classifiers. They tend to improve accuracy, convergence speed, and prediction model complexity. In the bioinformatics context, AFSA [30] was introduced to optimize RF parameters based on a reduced set of relevant features. The comparisons proved that the proposed hybrid provides better performance than traditional models. Also, PSO and Genetic Algorithms (GA) [31] were introduced by Shafiey et al. for hyperparameter tuning. They concluded that optimization improved performance over state-of-the-art prediction methods.

Since traditional machine learning models usually provide point estimates without expressing the confidence of their predictions, uncertainty-aware predictions provide a suitable substitute, especially in educational data mining and learning analytics. In high-stakes situations like early identification of at-risk students, where poor choices can have a detrimental impact on academic progress, such deterministic outputs may be deceptive. Recent research has used uncertainty estimation methods to measure prediction confidence to overcome this constraint.

In order to enable more dependable decision-making in ambiguous classification scenarios, Kornaev et al. [7] proposed a multivariate, multi-view classification framework that explicitly models prediction uncertainty by combining information from multiple data perspectives. By examining consistency across perspectives instead of depending solely on single-model confidence scores, their method enhances awareness of uncertainty. For COVID-19 X-ray image classification, Gour and Jain [8] presented an uncertainty-aware convolutional neural network that incorporates uncertainty estimation to differentiate between high-confidence and low-confidence predictions. By lowering the possibility of overconfident misclassifications, their approach improved diagnostic reliability and demonstrated the significance of uncertainty-aware models in crucial decision-support systems.

HSO is a recently introduced metaphor algorithm for optimization requirements by Akbari et al. [2]. Unlike traditional swarm methods like PSO, AFSA, and GWO, which simulate metaphor-driven behaviors such as birds, HSO guides search movement by the entire population’s fitness distribution. Hence, HSO obtains a global view of the search space. Thereby, this research introduces HSO as an optimization algorithm to improve RF capabilities by tuning its hyper parameters. The prediction model is used to predict student dropouts from university programs based on a multi-objective and uncertainty-aware hybrid optimization framework. As HSO is a novel metaphor algorithm not used in an educational prediction model, this research tends to benefit its exploration and exploitation capabilities to tune RF hyperparameters to predict students at risk of dropout, saving their time and university resources as well. The next section presents methods and materials of the proposed HSO-RF hybrid model.

The following is a summary of this study’s primary contributions:

By combining HSO with the RF classifier, a strong multi-objective optimization framework is suggested. To tackle the problems of unbalanced multiclass educational datasets, predictive performance, model complexity, and inter-class fairness are jointly optimized.
By utilizing ensemble-based probability dispersion, an uncertainty-aware prediction mechanism is included in the HSO-RF framework. This allows for the identification of high-risk students with varied degrees of confidence and supports more dependable and practical academic intervention tactics.
The suggested framework consistently outperforms classical machine learning and ensemble baselines, achieving improved macro F1-score, competitive overall accuracy, and more balanced performance across minority and majority outcome classes, according to a thorough evaluation on a real-world educational dataset.
The efficacy of HSO in navigating intricate, plateau-rich hyperparameter search spaces is confirmed by convergence analysis and constant feature importance behavior, which support the robustness and stability of the optimized model.

Most methods optimize predictive accuracy as a single aim and produce deterministic forecasts without taking model confidence into consideration, notwithstanding the efficacy of current machine learning models in student dropout prediction. These restrictions make it more difficult to make practical decisions and lower the dependability of highly unbalanced educational datasets. This paper suggests a multi-objective, uncertainty-aware hybrid optimization approach that combines Random Forest classification and Holistic Swarm Optimization to fill these shortcomings. The suggested method quantifies prediction uncertainty to support reliable academic interventions while concurrently balancing predictive performance, model complexity, and inter-class fairness. The following sections are structured as follows: Section 2 presents materials and methods. Section 3 encompasses experiments and results. Section 4 elaborates on the discussion, and Section 5 concludes the main findings of the paper.

2. Materials and Methods

Since it allows for immediate assistance and better educational decision-making, predicting student dropout and academic performance is a critical obstacle to higher education analytics. To improve prediction accuracy on the Predict Students’ Dropout and Academic Success dataset [9], this research offers a sophisticated multi-objective and uncertainty-aware hybrid optimization framework that combines the Random Forest (RF) classifier [3,32] with Holistic Swarm Optimization (HSO) [2]. The suggested framework develops a multi-objective optimization strategy in which classification performance, model complexity, and inter-class fairness are jointly optimized, improving reliability under extreme class imbalance, in contrast to traditional methods that optimize predictive accuracy in isolation. Additionally, by utilizing ensemble-based probability dispersion, an uncertainty-aware prediction mechanism is included, allowing for the identification of high-risk individuals with different degrees of confidence and facilitating more practical academic interventions. In terms of macro F1-score, overall accuracy, and balanced class-wise performance, experimental results show that the HSO-optimized RF consistently performs better than conventional baseline configurations. As shown in Figure 1, this study enhances the field of educational data mining (EDM) by offering a solid, comprehensible, and trustworthy data-driven framework for early identification of at-risk students and informed academic decision-making. The HSO is proposed in the following subsection.

2.1. Holistic Swarm Optimization

The foundation of contemporary metaheuristic research is biologically inspired optimization techniques. Metaheuristic algorithms originate from Holland’s Genetic Algorithm [33], which introduced evolutionary concepts like selection and mutation to solve challenging optimization problems. Since then, a number of swarm-based and biologically inspired algorithms have surfaced, such as Ant Colony Optimization [34] and Particle Swarm Optimization [24], and they have proven to perform well in a variety of application domains. Recent research demonstrates their effective application in difficult fields like high-dimensional feature selection, biomedical data analysis [35], and other engineering problems [36,37,38], frequently in conjunction with machine learning models. These studies highlight how flexible and resilient swarm intelligence is when dealing with large-scale, nonlinear problems. Building upon this framework, the suggested Holistic Swarm Optimization improves search efficiency by utilizing global population dynamics, which diverges from single-metaphor modeling. Notably, this work extends biologically inspired techniques in student performance prediction tasks with intelligent learning systems by introducing a novel application of swarm-based optimization in educational data analysis.

In this study, “holistic” means the optimization strategy in which the swarm evolves mostly through learning at the population level instead of learning based on individual memory. In traditional Particle Swarm Optimization (PSO), each particle’s update is determined by its individual best position and the best position in the whole system. In contrast, HSO focuses on holistic guidance by putting more emphasis on the best global solution and stochastic exploration components. This makes it easier for groups to work together to find solutions and makes them less sensitive to local personal attractors. This all-encompassing mechanism enables stable convergence and efficient exploration of the hyperparameter search space within the proposed HSO-RF framework. HSO is a new metaphor-less optimization technique that uses population-wide data to improve the search process. HSO takes a holistic strategy, making sure that every decision is influenced by the population’s general distribution and fitness landscape, in contrast to traditional techniques that depend on partial or local information. Through an adaptive framework that combines adaptive mutation, simulated annealing-based selection, and root-mean-squared (RMS) fitness-based displacement coefficients, the technique dynamically strikes a balance between exploration and exploitation. HSO can effectively handle challenging, multimodal optimization issues while avoiding local optima because of this structure [2]. The HSO algorithm provides the following significant innovations and contributions:

Extensive Population Utilization: To guide each search agent’s update mechanism, the HSO algorithm uses data from the whole population. This holistic approach guarantees that the algorithm’s choices are based on a full comprehension of the population’s general distribution and fitness landscape.
Effective Displacement Coefficients: Using the RMS fitness values, the algorithm presents a new technique for computing displacement coefficients. By ensuring that agents migrate away from poorer solutions and toward better ones, this technique improves the convergence qualities of the algorithm.
Adaptive Mutation: To improve exploration abilities, HSO uses adaptive mutation. This property helps the technique explore new regions of the search space and steer clear of local optima by introducing random perturbations with a given probability.
Adaptive Simulated Annealing-dependent Selection: To determine whether to allow new places depending on their fitness, the technique uses simulated annealing-based selection. By admitting inferior solutions with a decreasing likelihood over time, this method aids in striking a balance between exploration and exploitation. The likelihood of selection declines with the number of iterations and is inversely correlated with the degree of the adverse change in the most recent iteration.
Robustness Across Domains: The HSO algorithm’s outstanding performance on a range of benchmark problems attests to its effectiveness and dependability. Because of its strong structure, it can be used to solve challenging optimization issues in a variety of domains.

2.1.1. Population Initialization

Initializing a population of search agents, represented as

x_{i} for i = 1, 2, \dots, n,

where

n

is the number of agents, and is the first step in the algorithm. Within the specified search space, each agent’s location is created at random. The optimization issue is m-dimensional, and each search agent

x_{i}

indicates a possible solution.

2.1.2. Assessment of Fitness

Each search agent’s fitness is assessed using the objective function

f (x_{i})

.

2.1.3. Calculating Coefficients

The following equation is used to determine the RMS of all fitness values

{R M S}_{f} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} f {(x_{i})}^{2}}

(1)

And agent

i^{'} s

fitness value is represented by

f (x_{i})

. The displacement vector parameters are then calculated using the difference between each fitness value and the RMS. For agent i, the difference is provided as follows:

d_{i} = {R M S}_{f} - f (x_{i})

(2)

Normalized to add up to 1 in terms of absolutes, the coefficients

c_{i}

are given positive values for differences below the RMS and negative values for those above:

c_{i} = \frac{s i g n (d_{i}) \cdot | d_{i} |}{\sum_{j = 1}^{n} | d_{j} |}

(3)

And each agent’s vector of movement is indicated by the sign expression

s i g n (d_{i})

.

2.1.4. Position Change

Frequently, the search agents’ locations are changed. The weighted sum of the disparities between each agent’s current location and all other agents’

x_{i}

locations affect each agent’s new location,

x_{i}^{'}

. he rule for changing is as follows:

x_{i}^{'} = x_{i} + α \sum_{j = 1}^{n} r_{i j} \cdot c_{j} \cdot (x_{j} - x_{i})

(4)

Here,

α

is a constant parameter and

r_{i j}

is a random variable that guarantees diversity and thorough search space exploration.

2.1.5. Selection Based on Adaptive Simulated Annealing (SA)

Every iteration of the simulated annealing-based selection step incorporates a dynamic temperature change. The following equation is how the temperature

T

is changed:

T = T_{initial} \times {C R}^{iter}

(5)

where

iter

is the present iteration,

C R

(Cooling Rate) is a factor that regulates the rate of temperature decline, and

T

initial is the beginning temperature. The algorithm acts greedily and swaps the previously selected solution with the latest one if the latest fitness value

f_{new} (x_{i})

is superior to the old fitness value

f (x_{i})

. However, the subsequent selection equations are used if the new fitness value

f_{new} (x_{i})

is worse than the prior fitness value

f (x_{i})

.

Δ f = f_{new} (x_{i}) - f (x_{i})

(6)

The likelihood that the newly created position will be admitted is determined as follows:

P = e x p (- \frac{Δ f}{T})

(7)

Despite the decline in fitness, the new location is acceptable if

P

is higher than a random number selected from a uniform distribution

U n i f_{rand}

:

if e x p (- \frac{Δ f}{T}) > U n i f_{rand}

(8)

In conclusion, the SA-based selection operator states that the likelihood of choosing a particular solution that ends up in a less favorable position than it was in the previous iteration is inversely correlated to the degree of the unfavorable change in its fitness value in the most recent iteration, and this likelihood decreases as the iteration goes on. Over time, this configuration strikes a balance between exploration and exploitation.

2.1.6. Adaptive Mutation

A mutation operator can be described in an adaptive way that its mutation rate and mutation step size are dynamically modified according to the present cycle to further improve exploration capabilities. The following equation is used to determine the mutation step size

δ_{mut}^{iter}

and the mutation rate

λ_{mut}^{iter}

:

λ_{mut}^{iter} = λ_{init} - iter \times (\frac{λ_{init} - λ_{final}}{{iter}_{m a x}})

(9)

δ_{mut}^{iter} = δ_{init} - iter \times (\frac{δ_{init} - δ_{final}}{{iter}_{m a x}})

(10)

where

iter

is the present iteration,

λ_{mut}^{iter}

is the mutation rate at iteration iter,

λ_{init}

is the initial mutation rate,

λ_{final}

is the final mutation rate, and

{iter}_{m a x}

is the maximum number of iterations. where

δ_{init}

is the initial mutation step size,

δ_{final}

is the final mutation step size, and

δ_{mut}^{iter}

is the mutation step size at iteration

iter

. The likelihood that a mutation will happen during iteration

iter

is determined by the mutation rate

λ_{mut}^{iter}

. From the beginning value

λ_{init}

to its ultimate value

λ_{final}

,

λ_{mut}^{iter}

falls linearly as the number of iterations grows. Similarly, the size of the perturbation given to the search agent’s location is controlled by the mutation step size

δ_{mut}^{iter}

. Additionally, as the algorithm advances, it falls linearly from its starting value

δ_{init}

to its final value

δ_{final}

. With this arrangement, HSO can focus on utilizing present solutions as it gets closer to the maximum number of cycles, while exploring the search space more thoroughly in the early cycles. A random perturbation

Δ x_{i}

is introduced to the agent’s location, according to the following equation, if a mutation takes place (with a likelihood equal to

λ_{mut}^{iter}

):

Δ x_{i} = δ_{mut}^{iter} \cdot N (0,1)

(11)

where

Δ x_{i}

is the mutation vector that changes the agent

x_{i}

’s present location and

N (0,1)

is a conventional normal distribution. The new location of agent

i

is computed as follows:

x_{i}^{'} = x_{i} + Δ x_{i}

(12)

where

x_{i}^{'}

represents the agent’s new location following mutation. Table 1 presents the setup of the HSO parameters, and Algorithm 1 and Figure 2 present the HSO steps.

Algorithm 1: The HSO algorithm.

1-: Parameters initialization such as ( ${iter}_{m a x}$ , n, m) for maximum iteration, # of agents, and # of variables, subsequently.
2-: Create the starting locations $x_{i}$ at random for each agent
3-: For $iter = 1 to {iter}_{\max}$ do
4-: Assess each agent’s fitness $f (x_{i})$
5-: Utilize Equation (1) to calculate RMS
6-: To approve or decline new places based on fitness, apply Equations (5)–(8) in SA–based selection.
7-: Modify the most effective solution discovered so far.
8-: To track performance, note the optimal cost.
9-: Utilizing Equations (2) and (3) to determine the displacement coefficients $c_{i}$ .
10-: Use Equation (4) to modify the agents’ locations.
11-: Use Equations (9)–(12) to calculate adaptive mutation to $x_{i}$ .
12-: End for
13-: End
14-: Provide the optimal solution with its fitness

2.2. Random Forest (RF)

One supervised ML technique that is well-known for its adaptability in handling both regression and classification issues is the RF algorithm [32]. Its fundamental idea as an ensemble learning method is to build many Decision Trees (DTs) during training and output the mean prediction (for regression) or the mode of the classes (for classification) of the individual trees. By correcting for the tendency of individual DTs to overfit their training set, this “wisdom of the crowd” method produces a model that performs better when applied to new data. The technique creates a forest of unrelated trees by combining two potent concepts: feature randomization and bagging (Bootstrap Aggregating). The strength of the RF algorithm originates from its strategies for generating and mixing trees. The technique can be divided into several important stages:

Bootstrap Sampling (Bagging): A randomly chosen portion of the initial training data is selected with replacement for every tree in the forest. Each tree has been trained on a slightly distinct dataset because of this method, which is called bootstrap sampling.
Attribute Randomness: The technique does not consider each attribute when breaking a node throughout the DT building process. Rather, a subset of attributes is chosen at random as potential candidates for the optimal split. The trees are decorrelated thanks to this characteristic bagging.
Tree building: Using only the random feature subset at each node and the bootstrap sample and each tree is developed to its additional stopping criteria or with its maximum depth.
A majority vote across all trees determines the final forecast for classification. The average of each tree’s predictions is the final regression prediction.

2.3. Configuring HSO with RF (HSO-RF)

2.3.1. Metaheuristic Motivation

The RF classifier is known for its outstanding results and comparatively low sensitivity to slight changes in hyperparameters. However, accurately adjusting crucial parameters such as the tree depth (max_depth), the number of trees (n_estimators), and the minimum number of samples needed for a split (min_samples_split) is crucial for its efficacy in complicated, unbalanced multi-class issues, such as forecasting student dropout and success. For huge search spaces, randomized search and standard grid search are operationally prohibitive. We used the HSO technique to effectively navigate this space and get the best optimum hyperparameter configuration. HSO is an effective method of mathematics that combines SA for local search, a strong adaptive mutation strategy to avoid early convergence, and concepts from Swarm Intelligence. Both efficient worldwide exploration and precise local cost landscape exploitation are guaranteed by this hybrid technique.

2.3.2. HSO Configuration

To maximize the predictive accuracy of the model, the HSO technique was set up to minimize a fitness function that represented the classification error. The objective function has been described as the Negative macro F1-score attained by the RF model due to the underlying imbalance in the student outcome dataset, where “Graduate” is the overwhelming majority and “Dropout” is a crucial minority. This method guarantees that the optimizer gives equal weight to predictive capability for each of the three classes (Graduate, Enrolled, and Dropout).

Fitness (\vec{x}) = m i n {- Macro F 1 - Score ({R F}_{\vec{x}})}

(13)

where the vector of hyperparameters being adjusted is represented by

\vec{x}

.

2.3.3. Robustness-Aware Multi-Objective Optimization

The suggested approach presents RF optimization as a multi-objective problem, in contrast to traditional hyperparameter tuning techniques that maximize a single performance indicator. Simultaneously, the optimization method minimizes inter-class performance difference, controls model complexity, and maximizes predictive resilience. For unbalanced multiclass educational datasets, where gains in overall accuracy could mask subpar performance on minority classes, this formulation is especially appropriate. Equation (14) shows Multi-Objective Fitness function.

\underset{θ}{m i n} L (θ) = α (1 - F 1_{macro} (θ)) + β \cdot Ω (θ) + γ \cdot Δ_{fair} (θ)

(14)

where

θ = {n_{estimators}, \underset{depth}{m a x}, \underset{samples_split}{m i n}}

(15)

Ω (θ) = n_{estimators} \times \max_{depth} (model complexity)

(16)

Δ_{fair} = | F 1_{Graduate} - F 1_{Dropout} |

(17)

α + β + γ = 1

(18)

The weighting coefficients in Equation (18) add up to 1, which is α + β + γ. In all our experiments, we set (α, β, γ) = (0.6, 0.2, 0.2). This choice puts the macro F1-score (predictive effectiveness) as the main goal, while keeping the complexity penalty and the inter-class disparity penalty as important factors. The chosen weights created a stable balance between how well the classes did at classifying and how fair they were across all classes.

2.3.4. Hyperparameter RF

The hyperparameter vector

\vec{x} = [x_{1}, x_{2}, x_{3}]

is represented by the HSO technique, which functions in a three-dimensional continuous space. The RF, as specified in Table 2, maps the continuous values discovered by HSO to the proper discrete or limited ranges. Table 3 and Figure 3 present the HSO-RF proposed model.

This study focuses on optimizing n_estimators, max_depth, and min_samples_split, even though RFC has more hyperparameters (like max_features, min_samples_leaf, bootstrap, and class_weight). This is because these three parameters are the most important factors that affect the model’s capacity and generalization in RF and directly control the complexity term in the multi-objective function. Limiting the search space cuts down on the amount of work that needs to be done and lowers the chance of overfitting to the internal validation subset. Min_samples_split is shown as a fraction of the size of the training dataset during optimization so that HSO can keep searching. The Random Forest implementation needs this value to be an integer count, so it is multiplied by the size of the training sample before the final model training.

2.3.5. Uncertainty-Aware Assessment of Student Risk

The suggested approach includes an uncertainty-aware mechanism to measure confidence in individual student outcome forecasts in addition to optimum prediction performance. Because Random Forests are an ensemble, it is possible to distinguish between high-confidence and high-uncertainty risk instances by using the class probability distributions across trees to quantify prediction uncertainty. Equation (19) presents the uncertainty equation (entropy-based prediction uncertainty).

U (x) = - \sum_{c = 1}^{C} P (c ∣ x) l o g P (c ∣ x)

(19)

where the RF-predicted probability for class c is

P (c ∣ x)

and higher entropy leads to higher uncertainty.

3. Results

The suggested multi-objective fitness function (Equation (14)) is used in the optimization process to assess potential solutions, and entropy-based ensemble dispersion (Equation (19)) is used to quantify prediction uncertainty.

3.1. Students’ Dropout Dataset

A publicly accessible dataset that has been carefully selected to forecast student academic performance serves as the empirical basis for this study [9,15]. This multidimensional dataset, which is intended to model the complicated aspects impacting academic success and failure, consists of 36 different attributes and 4424 individual student records. Four main categories can be used to roughly classify the attributes: (1) Administrative and Admission (such as course code, application mode, scholarship eligibility, tuition status); (2) Socioeconomic and Demographic (such as gender, age, nationality, parents’ qualifications,); (3) Macroeconomic Indicators (such as GDP at the time of enrollment, inflation rate, unemployment rate); and (4) Academic Performance (such as first and second semester curricular units credited, admission grade, mean grade, and approved), as shown in Table 4. Comprehensive preprocessing, such as Label Encoding for the parameter of interest and Standard Scaling for numerical attributes, is needed to guarantee that all attributes participate identically in model training because the raw data comprises a mixture of continuous numerical attributes (such as grades) and numerous integer-encoded categorical attributes (such as course codes, marital status). “Graduate,” “Enrolled” (now active), and “Dropout,” are the three potential outputs of the multi-class desired attribute, which represents the final stage of educational state. A substantial class imbalance is revealed by a crucial examination of the distribution, which has a big impact on the selection of assessment indicators. Figure 4 displays the observed class frequencies: graduate (49.7%), enrolled (18.2%), and dropout (32.1%). This distinction shows that “Graduate” is the majority class and “Enrolled” is the most severely underrepresented minority class. The hyperparameter optimization process must therefore use a balanced metric, such as the macro F1-score, which ensures that the model is penalized for inadequate accuracy on the minority and most significant classes (Enrolled and Dropout), rather than just optimizing for overall accuracy skewed by the majority class. The experiments utilized a genuine educational dataset from the UCI Machine Learning Repository, comprising student academic and demographic characteristics along with a three-class target variable (e.g., Dropout, Enrolled, and Graduate). After preprocessing, the dataset has 4424 samples and 36 features.

The dataset was divided into two parts, with 80% of the data used for training and 20% for testing. As shown in Figure 5, this division ensures that the final model evaluation is conducted on unobserved data to assess the model’s capacity for generalization. This two-stage split means that about 64% of the full dataset is used for training (inner), 16% for validation, and 20% for testing. This kept the original class distribution even when there was an imbalance. For reproducibility, a random seed that did not change was used. For HSO-based hyperparameter optimization, the training set was split into two parts: an internal stratified validation subset (80/20 within the training set) and the multi-objective fitness was calculated using this validation subset. The final optimized Random Forest model was then trained on the full training set and tested once on the test set, which had never been seen before optimization. In the single-run evaluation (seed = 42), the suggested HSO-RF got an accuracy of 0.7774, a macro F1-score of 0.69, and a weighted F1-score of 0.76. This shows that it is a strong classifier and works better when the classes are balanced.

3.2. Data Preparation

The preprocessing stage of the HSO-RF workflow starts as follows:

After initial preparation, the training dataset has 3539 instances and 36 attributes from the input.
In accordance with the Predict Students’ Dropout and Academic Success dataset, the desired attribute has three different classes, indicating that the classification problem is multiclass.
The data is effectively prepared for model training and optimization by applying common preparation techniques, such as attribute standardization, categorical attribute encoding, and column cleanup.

By ensuring that the model receives normalized and encoded inputs, this preprocessing step lowers bias resulting from scale disparities and facilitates effective convergence during optimization.

3.3. Behavior of HSO

The convergence behavior of the suggested HSO method used for the RF classifier’s multi-objective optimization is shown in Figure 6. The vertical axis shows the aggregated multi-objective cost, which collectively captures predictive performance (macro F1-score), model complexity, and inter-class performance discrepancy. The horizontal axis shows the number of iterations. Early iterations of the convergence curve show a sharp decline in the objective value, suggesting that HSO effectively searches the search space and promptly finds promising areas. The rapid decline reflects the algorithm’s good global exploration capabilities, which allow it to escape suboptimal configurations and greatly improve the quality of the solution within the first few iterations. The curve then shows a stage of gradual refinement where the swarm finds a balance between exploitation and exploration, leading to minimal gains. The objective value stabilizes after ten to twelve iterations, signifying that the algorithm has arrived at a stable and nearly perfect solution. The robustness of the optimization process and stable convergence behavior are indicated by the lack of oscillations or sudden spikes in the cost. Notably, as adequate solutions are found without a significant number of repetitions, the early convergence suggests computing efficiency. This is particularly important for educational data mining applications, where frequent model retraining may be necessary for continuous monitoring and early warning systems. Overall, the convergence pattern demonstrates that the multi-objective formulation is successfully optimized by the proposed HSO framework, achieving a reasonable trade-off between model complexity, performance, and fairness. These findings validate HSO’s applicability as a trustworthy optimization technique for predicting student outcomes while taking fairness and uncertainty into account.

HSO finds a better solution with an elevated macro F1-score of 0.6844 at Iteration 139. The recently discovered hyperparameters are Max depth is 18, 189 is the number of estimators, and 0.0111 is the fractional representation of training set size, corresponding to 39 samples in the training subset. This enhancement shows that the HSO displacement technique and adaptive mutation strategy can effectively move out of the local optimum after extended exploration, allowing the technique to find a more ideal area of the search space. The optimizer continuously maintains this enhanced solution from Iteration 139 to Iteration 200, demonstrating that HSO converges and stabilizes around a globally competitive optimum.

3.4. HSO-RF Performance

The RF algorithm is trained again, employing the optimal parameters identified by HSO following optimization. The test data’s final evaluation shows the optimal macro F1-score: 0.6844 and accuracy: 0.7774. These findings draw attention to several significant findings:

Macro-F1 Enhancement: Compared to the baseline solutions found in previous iterations, the final macro F1-score demonstrates a discernible improvement.
Performance on Unbalanced Data: There is a known class imbalance in the dataset, particularly between the “Graduate,” “Enrolled,” and “Dropout” categories. Because it balances performance across all classes, the macro F1-score is a suitable evaluation statistic.
Reasonable Accuracy: The HSO-optimized RF model successfully captures the underlying patterns connected to student dropout and academic advancement, as evidenced by an overall accuracy of 77.74%.

We used fixed hyperparameter settings in our experimental scripts to make sure that all baseline models (Logistic Regression, MLP, k-NN, Decision Tree, Naïve Bayes, Bagging, and baseline Random Forest) were compared fairly. There was no grid search or random search done for these baseline models. In the same way, XGBoost, LightGBM, and CatBoost were tested with preset hyperparameter settings that are often used in structured tabular learning tasks. The suggested HSO-RF framework, on the other hand, was the only model that went through systematic hyperparameter optimization. The HSO algorithm worked on a stratified validation subset that was only made up of training data. The optimization reduced the multi-objective cost function by balancing the macro F1-score, model complexity, and fairness between classes. After optimization, the final Random Forest model was retrained on the whole training dataset and then tested on the test set that was not used for training. We did not use an exhaustive hyperparameter search on baseline models so that we could have a stable and reproducible reference point and separate the effect of the new multi-objective HSO-based optimization framework. All models were trained and tested using a stratified 80/20 train test split with fixed random seeds to make sure the results were fair and could be repeated.

To make benchmarking even better, we compared the proposed HSO-RF model to well-known strong gradient-boosted decision tree (GBDT) frameworks like XGBoost, LightGBM, and CatBoost, which are widely regarded as the best ways to classify tabular data. We trained and tested all the baseline models (LR, ANN, KNN, Bagging, DT, Naive Bayes, and RF) using the same preprocessing pipeline and the same stratified 80/20 train-test split settings to make sure the comparisons were fair. The suggested HSO-RF model automatically optimizes the hyperparameters for RF parameters, while we did not do a grid search or random search to test all the baseline models. Instead, we used fixed hyperparameter configurations as they were set up in our experimental scripts. Eight distinct ML models, including the proposed (HSO-RF) model and common benchmarks, are evaluated in Table 5 and Figure 7 using four performance metrics: the overall accuracy and the F1 Score for each of the three target classes (Enrolled, Graduate, and Dropout). We added a strong gradient boosting baseline that works well with tabular datasets, like XGBoost, LightGBM, and CatBoost. The model was adjusted by using a random search on the training set over typical parameters like the number of estimators, the learning rate, the maximum depth, the subsampling ratio, and the feature subsampling. This addition makes the benchmark stronger and proves that the proposed HSO-RF framework is competitive, as shown in Table 6. The findings show that HSO-RF sets the standard for overall accuracy and delivers the best overall performance. The HSO-RF model has the highest weighted average F1-score (76.00%) and the best overall accuracy (77.74%). The base RF model is greatly enhanced by the HSO, which raises its weighted average F1-score from 75.0% to 76.0% and increases its overall accuracy from 77.00% to 77.74%. The confusion matrix is presented in Figure 8.

The single-run experiment produced 0.7774 accuracy, 0.69 macro F1, and 0.76 weighted F1. To test robustness even more, we used multiple stratified runs. We replicated the entire experimental pipeline (stratified split → optimization on internal validation → final evaluation on held-out test data) across 10 stratified random seeds. The suggested method got an average Accuracy of 0.7651 ± 0.0105 (95% CI: [0.7586, 0.7716]), a macro F1-score of 0.6711 ± 0.0156 (95% CI: [0.6615, 0.6808]), and a Weighted F1-score of 0.7431 ± 0.0116 (95% CI: [0.7359, 0.7503]). This shows that the method worked well over multiple runs, as shown in Table 7. A normal approximation (z = 1.96) was used to calculate the 95% confidence intervals that were reported. Because the repeated-run analysis is descriptive and the performance metrics are roughly evenly spread out across seeds, the normal-based interval was used as a stable way to measure variability. Table 8 shows the results of repeated runs.

In this study, we characterize the complexity regularizer is calciulated, as shown in Equation (16), as a minimal proxy for model capacity and interpretability cost, given that larger forests and deeper trees generally augment the quantity of decision rules. We recognize that effective complexity is influenced by early stopping and splitting constraints (e.g., min_samples_split) and the characteristics of the dataset. So, Ω(θ) is used as a rough regularization term instead of a precise measure of computation. To prevent the weighted sum objective from taking over, the complexity term is normalized before being added together, as shown in Table 9. The HSO found the best RF setup, which had n_estimators = 189, max_depth = 18, and min_samples_split = 39. This setup had a complexity proxy value of Ω = 3402. This shows that the suggested method makes predictions more accurate while keeping the model’s complexity under control.

The distribution of uncertainty in predictions derived from the suggested multi-objective and uncertainty-aware HSO-optimized RF framework is shown in Figure 9. Entropy-based ensemble dispersion is used to quantify uncertainty. The distribution of uncertainty values in the image is wide and biased to the right, suggesting significant variation in prediction confidence among students. Higher entropy levels indicate confusing situations where the ensemble shows disagreement among class probabilities, whereas lower entropy values are associated with very confident predictions, mostly linked to well-separated examples in the feature space. The apparent concentration of cases at higher uncertainty levels emphasizes how difficult it is to predict multiclass student outcomes, especially when there is a significant class imbalance and overlapping class features. Crucially, this distribution demonstrates that the suggested framework offers confidence-aware risk estimates rather than just point predictions, making it possible to distinguish between accurate and inaccurate forecasts. Such behavior is crucial for early warning systems because high-confidence forecasts can be given priority for rapid academic help, while high-uncertainty cases need more contextual analysis or delayed intervention. Overall, the uncertainty distribution shows how well the suggested entropy-based method captures epistemic ambiguity, supporting the framework’s contribution to reliable and useful educational data mining.

A comparative boxplot study of prediction uncertainty for properly and poorly categorized cases generated by the suggested multi-objective and uncertainty-aware HSO-optimized Random Forest framework is shown in Figure 10. Equation (19) defines entropy over class probability distributions, which is used to quantify uncertainty. The two groups are clearly separated in the figure, with inaccurate predictions showing significantly greater median and upper-quartile uncertainty values than right predictions. This pattern implies that misclassified student outcomes are strongly associated with greater predictive ambiguity, reflecting disagreement among ensemble members. Conversely, cases that are successfully classified exhibit lower and more concentrated entropy values, indicating a higher level of model confidence and more stable decision boundaries. The intrinsic difficulty of some borderline cases in imbalanced multiclass educational data is further highlighted by the existence of a small number of low-uncertainty outliers among wrong predictions. Overall, this analysis shows that the suggested uncertainty-aware prediction mechanism can distinguish between accurate and inaccurate predictions, empirically validating its efficacy. Because it allows academic decision-makers to prioritize high-confidence risk alarms while subjecting high-uncertainty cases to additional monitoring or human evaluation, such differentiation is crucial for reliable early warning systems.

In addition to demonstrating that misclassified samples generally exhibit elevated entropy, we assess the quality of entropy-based uncertainty through calibration metrics and decision–utility curves. The model gets a multiclass Brier score of 0.3307 and an expected calibration error (ECE, max-confidence, 10 bins) of 0.0627 for the single-run experiment (seed = 42), as shown in Table 10. Figure 11 shows the reliability diagram, which shows that predicted confidence is a good indicator of empirical correctness. We evaluate practical utility by discarding predictions characterized by high entropy and subsequently plotting the resultant risk–coverage curve, as shown in Figure 12, which illustrates that lower-entropy subsets enhance accuracy as coverage diminishes. The mean entropy for correct predictions is 0.6308, while the mean entropy for incorrect predictions is 0.9252. The results show that the multi-objective HSO-RF framework reduces inter-class performance difference, especially between the “Graduate” and “Dropout” classes, and increases macro F1-score by about 1–2% over the baseline RF. The usefulness of the uncertainty-aware method in identifying ambiguous and high-risk cases is further confirmed by uncertainty analysis, which shows that misclassified instances are primarily related to high entropy values.

4. Discussion

The study’s findings show that incorporating uncertainty-aware prediction and multi-objective optimization into an RF framework significantly improves the reliability and robustness of multiclass student outcome prediction. The suggested HSO-RF framework clearly balances predictive performance, model complexity, and inter-class fairness, in contrast to traditional methods that maximize predictive accuracy alone. This produces more dependable results when there is a significant class imbalance. Despite being numerically modest, the observed increases in macro and weighted F1-scores are especially significant in educational datasets where minority groups, like dropout instances, are frequently underrepresented and more challenging to predict.

Achieving balanced performance across student outcome categories is largely dependent on the multi-objective design. The suggested framework reduces the propensity of ensemble models to overfit majority classes by penalizing excessive model complexity and class-wise performance difference during optimization. Improved class-wise F1-scores and smaller performance differences between graduate and dropout categories are clear indications of this effect. For early warning systems, where ignoring minority high-risk pupils can compromise the usefulness of predictive analytics, such balanced behavior is crucial.

In addition to improving performance, uncertainty-aware prediction is a major improvement over the deterministic student risk model. Because some student profiles are inherently ambiguous, the entropy-based uncertainty distributions show significant variation in prediction confidence. Higher predictive entropy is closely correlated with misclassification risk, as evidenced by the distinct divergence between uncertainty levels for successfully and wrongly categorized examples. This result provides empirical support for the suggested uncertainty modeling approach’s efficacy and emphasizes its significance for practical implementation, as decision-makers must choose between trustworthy alarms and unclear situations that need more research.

Uncertainty-aware outputs allow for more sophisticated academic interventions from an applied standpoint. While high-uncertainty scenarios might benefit from postponed choices, more data collection, or human review, high-confidence predictions can initiate urgent support actions. This capability, which emphasizes openness, dependability, and risk-aware decision-making in delicate areas like education, is consistent with growing ideals of trustworthy and responsible artificial intelligence.

The robustness of the suggested framework is further supported by the stability of the adjusted hyperparameters and the steady convergence behavior of the HSO algorithm. These findings imply that the holistic swarm-based search technique avoids premature convergence while successfully navigating intricate, non-convex hyperparameter domains. Additionally, the model’s explanatory value is strengthened by the semantic interpretation of influential features, which shows that academic, administrative, and socioeconomic factors collectively contain latent aspects of student motivation, resilience, and institutional restrictions.

Despite these advantages, there are certain drawbacks that should be discussed. Because just one publicly accessible dataset was used for the trials, generalizability across universities with various student demographics or academic frameworks may be limited. Furthermore, entropy-based uncertainty does not completely differentiate between aleatoric and epistemic uncertainty, even though it offers useful confidence estimates. Future research prospects include cross-institutional validation, longitudinal analysis, and the incorporation of more sophisticated uncertainty decomposition approaches to address these limitations.

Overall, by fusing robust optimization, confidence-aware prediction, and useful interpretability, the suggested multi-objective and uncertainty-aware HSO-RF framework pushes the boundaries of educational data mining. The findings show that increasing model credibility is just as important as increasing accuracy, especially when predictive algorithms are meant to guide student support plans and high-stakes academic decisions.

5. Conclusions

To improve multiclass student academic performance and dropout prediction, this study suggested a reliable, multi-objective, and uncertainty-aware predictive framework that combines the Random Forest (RF) classifier with Holistic Swarm Optimization (HSO). The suggested methodology simultaneously balances macro F1-score, model complexity, and inter-class performance fairness, in contrast to traditional methods that maximize predictive accuracy separately. This makes it especially useful for severely imbalanced educational datasets. Extensive trials on the publicly available Predict Students’ Dropout and Academic Success dataset show that the proposed HSO-RF framework outperforms standard machine learning models and improves upon the baseline RF configuration by roughly 1–2% in macro and weighted F1-score. It achieves a weighted F1-score of 76.0% and an overall accuracy of 77.74%. More importantly, the improved model reduces bias in favor of the majority “Graduate” class by doing better on minority classes, particularly when it comes to identifying dropout-prone students. In the context of imbalanced multiclass prediction, where increases in macro F1-score directly imply improved dependability for underrepresented and high-risk student groups, these improvements—while quantitatively modest—are noteworthy. Other than predictive performance, the suggested approach can measure confidence in individual student risk assessments thanks to the incorporation of uncertainty-aware prediction. By differentiating between high-confidence and high-uncertainty forecasts, this capacity gives academic advisers practical insights that promote better-informed and more focused intervention tactics. The robustness and interpretability of the suggested method are further supported by the observed stability of the adjusted hyperparameters and feature importance rankings. From a wider angle, the findings demonstrate how intricate relationships between administrative, socioeconomic, macroeconomic, and academic issues influence educational attainment. Together, these factors serve as significant markers of student motivation, resilience, and institutional restrictions in addition to being computational predictors. The suggested paradigm advances the creation of reliable and ethical educational analytics by combining robust optimization, uncertainty estimates, and semantic interpretation.

Future research will concentrate on expanding the framework to longitudinal and cross-institutional datasets, adding further explainability and fairness criteria, and investigating multimodal educational data sources like behavioral traces and learning management system logs. All things considered, the suggested multi-objective and uncertainty-aware HSO-RF architecture provides a dependable and useful decision-support tool for early warning systems, with great potential to enhance data-driven academic planning and student retention tactics.

Author Contributions

Conceptualization, M.A.E. and M.G.G.; methodology, M.A.E.; software, M.A.E.; validation, M.A.E., M.G.G. and M.M.S.E.; formal analysis, M.G.G.; investigation, M.G.G.; resources, M.A.E.; data curation, M.A.E.; writing—original draft preparation, M.M.S.E.; writing—review and editing, M.M.S.E.; visualization, M.A.E.; supervision, M.G.G.; project administration, M.G.G.; funding acquisition, M.M.S.E. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/01/38523).

Data Availability Statement

The data presented in this study are openly available in UCI machine learning at https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success, (accessed on 1 November 2025) reference number [9].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alzubi, J.; Nayyar, A.; Kumar, A. Machine learning from theory to algorithms: An overview. J. Phys. Conf. Ser. 2018, 1142, 12012. [Google Scholar] [CrossRef]
Akbari, E.; Rahimnejad, A.; Gadsden, S.A. Holistic swarm optimization: A novel metaphor-less algorithm guided by whole population information for addressing exploration-exploitation dilemma. Comput. Methods Appl. Mech. Eng. 2025, 445, 118208. [Google Scholar] [CrossRef]
Nandhini, T.; Kumar, P.S.P.; MoheshKumar, G.; Dhivyadharshini, S.; Dhanush, M.M.; Pradeepkumar, K. Optimizing Pandemic Mitigation and Advisory Solutions using Random Forest Classifier. In Proceedings of the 2025 7th International Conference on Inventive Material Science and Applications (ICIMA), Namakkal, India, 28–30 May 2025; pp. 1156–1162. [Google Scholar]
Lee, M.C.H.; Braet, J.; Springael, J. Performance metrics for multilabel emotion classification: Comparing micro, macro, and weighted f1-scores. Appl. Sci. 2024, 14, 9863. [Google Scholar] [CrossRef]
Wang, B.; Li, C.; Pavlu, V.; Aslam, J. Regularizing model complexity and label structure for multi-label text classification. arXiv 2017, arXiv:1705.00740. [Google Scholar] [CrossRef]
Liu, Z.; Wei, P.; Wei, Z.; Yu, B.; Jiang, J.; Cao, W.; Bian, J.; Chang, Y. Handling inter-class and intra-class imbalance in class-imbalanced learning. arXiv 2021, arXiv:2111.12791. [Google Scholar]
Kornaev, A.; Kornaeva, E.; Ivanov, O.; Pershin, I.; Alukaev, D. Awareness of uncertainty in classification using a multivariate model and multi-views. arXiv 2024, arXiv:2404.10314. [Google Scholar] [CrossRef]
Gour, M.; Jain, S. Uncertainty-aware convolutional neural network for COVID-19 X-ray images classification. Comput. Biol. Med. 2022, 140, 105047. [Google Scholar] [CrossRef]
Realinho, V.; Martins, M.V.; Machado, J.; Baptista, L. Predict Students’ Dropout and Academic Success. UCI Mach. Learn. Repos. 2021, 10, C5MC89. [Google Scholar]
Lykourentzou, I.; Giannoukos, I.; Nikolopoulos, V.; Mpardis, G.; Loumos, V. Dropout prediction in e-learning courses through the combination of machine learning techniques. Comput. Educ. 2009, 53, 950–965. [Google Scholar] [CrossRef]
Yukselturk, E.; Ozekes, S.; Türel, Y.K. Predicting dropout student: An application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 2014, 17, 118–133. [Google Scholar] [CrossRef]
Dekker, G.W.; Pechenizkiy, M.; Vleeshouwers, J.M. Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Educational Data Mining, EDM, Cordoba, Spain, 1–3 July 2009; pp. 41–50. [Google Scholar]
Zhao, Z.; Ren, P. Random Forest-Based Early Warning System for Student Dropout Using Behavioral Data. Bull. Educ. Psychol. 2025, 5, 1–22. [Google Scholar]
Niyogisubizo, J.; Liao, L.; Nziyumva, E.; Murwanashyaka, E.; Nshimyumukiza, P.C. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Comput. Educ. Artif. Intell. 2022, 3, 100066. [Google Scholar] [CrossRef]
Martins, M.V.; Tolledo, D.; Machado, J.; Baptista, L.M.T.; Realinho, V. Early prediction of student’s performance in higher education: A case study. In Proceedings of the World Conference on Information Systems and Technologies, Azores, Portugal, 30 March–2 April 2021; pp. 166–175. [Google Scholar]
Villar, A.; de Andrade, C.R.V. Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discov. Artif. Intell. 2024, 4, 2. [Google Scholar] [CrossRef]
Kok, C.L.; Ho, C.K.; Chen, L.; Koh, Y.Y.; Tian, B. A novel predictive modeling for student attrition utilizing machine learning and sustainable big data analytics. Appl. Sci. 2024, 14, 9633. [Google Scholar] [CrossRef]
Marcolino, M.R.; Porto, T.R.; Primo, T.T.; Targino, R.; Ramos, V.; Queiroga, E.M.; Munoz, R.; Cechinel, C. Student dropout prediction through machine learning optimization: Insights from moodle log data. Sci. Rep. 2025, 15, 9840. [Google Scholar] [CrossRef]
Tamada, M.M.; Netto, J.F.d.M.; de Lima, D.P.R. Predicting and reducing dropout in virtual learning using machine learning techniques: A systematic review. In Proceedings of the 2019 IEEE Frontiers in Education Conference (FIE), Cincinnati, OH, USA, 16–19 October 2019; pp. 1–9. [Google Scholar]
Gardner, J.; Yu, R.; Nguyen, Q.; Brooks, C.; Kizilcec, R. Cross-institutional transfer learning for educational models: Implications for model performance, fairness, and equity. In Proceedings of the Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023; pp. 1664–1684. [Google Scholar]
Vaarma, M.; Li, H. Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technol. Soc. 2024, 76, 102474. [Google Scholar] [CrossRef]
Xiong, S.Y.; Gasim, E.F.M.; Xinying, C.; Wah, K.K.; Ha, L.M. A proposed hybrid cnn-rnn architecture for student performance prediction. Int. J. Intell. Syst. Appl. Eng. 2022, 10, 347–355. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
Pourpanah, F.; Wang, R.; Lim, C.P.; Wang, X.-Z.; Yazdani, D. A review of artificial fish swarm algorithms: Recent advances and applications. Artif. Intell. Rev. 2023, 56, 1867–1903. [Google Scholar] [CrossRef]
Al-Hasnawi, M.; Radhi, A. Early stage prediction of COVID-19 Using machine learning model. Wasit J. Comput. Math. Sci. 2023, 2, 46–61. [Google Scholar] [CrossRef]
Barry, K.A.; Manzali, Y.; Flouchi, R.; Elfar, M. Heart disease approach using modified random forest and particle swarm optimization. Heart Dis. 2025, 14, 1242–1251. [Google Scholar] [CrossRef]
Khaseeb, J.Y.; Keshk, A.; Youssef, A. Improved Binary Grey Wolf Optimization Approaches for Feature Selection Optimization. Appl. Sci. 2025, 15, 489. [Google Scholar] [CrossRef]
Alsalamah, H.A.; Ismail, W.N. A Swarm-Based Multi-Objective Framework for Lightweight and Real-Time IoT Intrusion Detection. Mathematics 2025, 13, 2522. [Google Scholar] [CrossRef]
Ma, L.; Fan, S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinform. 2017, 18, 169. [Google Scholar] [CrossRef]
El-Shafiey, M.G.; Hagag, A.; El-Dahshan, E.-S.A.; Ismail, M.A. A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimedia Tools Appl. 2022, 81, 18155–18179. [Google Scholar] [CrossRef]
Mallala, B.; Ahmed, A.I.U.; Pamidi, S.V.; Faruque, M.O.; Reddy, R. Forecasting global sustainable energy from renewable sources using random forest algorithm. Results Eng. 2025, 25, 103789. [Google Scholar] [CrossRef]
Forrest, S. Genetic algorithms. ACM Comput. Surv. 1996, 28, 77–80. [Google Scholar] [CrossRef]
Blum, C. Ant colony optimization: Introduction and recent trends. Phys. Life Rev. 2005, 2, 353–373. [Google Scholar] [CrossRef]
WKsiążek, W. Explainable thyroid cancer diagnosis through two-level machine learning optimization with an improved naked mole-rat algorithm. Cancers 2024, 16, 4128. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Wang, X.; Cao, L. FOX Optimization Algorithm Based on Adaptive Spiral Flight and Multi-Strategy Fusion. Biomimetics 2024, 9, 524. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Wang, H.; Hu, D.; Xian, Q.; Wu, B. Q-learning improved golden jackal optimization algorithm and its application to reliability optimization of hydraulic system. Sci. Rep. 2024, 14, 24587. [Google Scholar] [CrossRef] [PubMed]
Özbay, F.A. A modified seahorse optimization algorithm based on chaotic maps for solving global optimization and engineering problems. Eng. Sci. Technol. Int. J. 2023, 41, 101408. [Google Scholar] [CrossRef]

Figure 1. Educational data mining framework.

Figure 2. HSO flowchart.

Figure 3. Multi–Objective and Uncertainty–Aware HSO–Optimized RF Flowchart.

Figure 4. Target class distribution.

Figure 5. Splitting data for training and testing.

Figure 6. HSO convergence curve (multi-objective optimization).

Figure 7. Comparative analysis of ML model with HSO-RF.

Figure 8. The confusion matrix of using the optimized RF after HSO.

Figure 9. Uncertainty distribution of prediction.

Figure 10. Comparison of entropy-based prediction uncertainty between correct and incorrect classifications produced by the proposed uncertainty-aware HSO-optimized RF model.

Figure 11. Reliability graph of HSO-RF.

Figure 12. Risk–coverage curve.

Table 1. HSO parameters.

Type	Name of Parameter	Description	Value
Problem	varMax, varMin	Bounds of the search space.	100, −100
	nVar	# variables used in decision.	30
Swarm	alpha	position update scaling factor.	3
	maxIt	Maximum number of iterations.	200
	nPop	Population size	30
SA	coolingRate	The rate at which the temperature decreases.	0.995
	initialTemp	Starting temperature.	10,000
Adaptive Mutation	finalMutationRate	Ending probability of mutation.	0.1
	initialMutationRate	Starting probability of mutation.	0.5
	finalMutationStep	Ending size of the mutation vector.	0.1
	initialMutationStep	Starting size of the mutation vector.	0.3

Table 2. The values discovered by HSO.

Hyperparameter	Range (min, max)	Variable
Min Samples Split	(0.01, 0.5)	$x_{3}$
Maximum Tree Depth	(5, 30)	$x_{2}$
Number of Estimators	(50, 300)	$x_{1}$

Table 3. Multi-objective and uncertainty-aware HSO-optimized RF algorithm.

Step	Description
Input	- HSO parameters and Predict Students’ Dropout and Academic Success dataset $D = {(x_{i}, y_{i})}_{i = 1}^{N};$ - ${i t e r}_{m a x}$ - Number of particles $P$ - Hyperparameter bounds $Θ$ for RF - Weight coefficients $α$ , $β$ , $γ$
Output	- Optimized RF model $M^{*}$ - Scores for prediction uncertainty $U (x)$
1- Data Preprocessing	Dataset is loaded. Column names are cleaned and categorical features are encoded Features are standardized Divide the data into training $D_{t r a i n}$ and testing $D_{t e s t}$ Create a swarm of particles $D_{t r a i n}$ , each of which stands for a potential candidate Hyperparameter vector for RF $θ_{p} \in Θ$
2- Multi-Objective Fitness Evaluation	6. Train RFC on $D_{t r a i n}$ for each particle $θ_{p}$ 7. Make predictions about the validation subset’s class labels. 8. Compute $F 1_{macro} (θ)$ , $Ω (θ)$ , $Δ_{fair} (θ)$ 9. Using Equation (14) for evaluating the multi-objective fitness function 10. If a lower fitness value is found, update the global best solution.
3- HSO (For $i t e r = 1 t o {i t e r}_{m a x}$ do)	11. Particle velocities should be updated in accordance with comprehensive swarm dynamics. 12. Within predetermined hyperparameter bounds, modify particle locations. 13. Use Steps 6–10 to reassess each particle’s fitness. 14. For the convergence analysis, note the optimal fitness value.
4- Construction of the Final Model	15. Choose the best hyperparameter vector that HSO has produced. 16. Utilizing the hyperparameter vector on the entire training dataset, train the final RFC model.
5- Uncertainty-Aware Prediction	17. To forecast class probability distributions $P (c ∣ x)$ , use $M^{*}$ for every instance of the test. 18. Calculate the prediction uncertainty based on entropy using Equation (19)
6- Analysis of Performance and Reliability	19. Use accuracy, weighted F1-score, and macro F1-score to assess predictive performance. 20. Examine uncertainty distributions and contrast the degree of uncertainty between accurate and inaccurate forecasts.

Table 4. Predict students’ dropout and academic success distribution.

Category	Attribute Count	Example Attributes
Administrative and Admission	12	Debtor, Course Code, Application Mode, Scholarship Holder, Tuition Status
Socioeconomic and Demographic	10	Age, Parents’ Qualifications, Gender, Marital Status, Nationality
Macroeconomic Indicators	3	GDP, Inflation Rate, Unemployment Rate
Academic Performance	11	Units Credited/Enrolled/Approved (1st/2nd sem), Admission Grade, Mean Grade
Total Attributes	36

Table 5. Comparative analysis between HSO-RF and other ML models.

	Logistic Regression	ANN	K-NN	Bagging	Decision Tree	Naïve Bayes	Random Forest	HSO-RF
F1-score Enrolled	0.377	0.404	0.325	0.418	0.416	0.418	0.440	0.410
F1-score Graduate	0.846	0.812	0.735	0.854	0.840	0.800	0.850	0.860
F1-score Dropout	0.781	0.754	0.653	0.778	0.752	0.697	0.780	0.800
F1-score (weighted Avg.)%	73.70	71.60	63.20	74.80	73.30	69.60	75.00	76.00
Accuracy(%)	75.25	71.75	63.39	76.38	74.23	69.71	77.00	77.74

Table 6. Comparative analysis in terms of accuracy and macro F1-score.

Model	Accuracy	Macro F1
Logistic regression	75.25	68.26
ANN	71.75	63.85
K-NN	63.39	57.51
Bagging	76.38	66.78
Decision Tree	74.23	63.73
Naïve Bayes	69.71	55.76
Random Forest	77.00	68.00
XGBoost	72.31	64. 69
LightGBM	70.39	62.78
CatBoost	70.73	62.38
HSO-RF	77.74	69.00

Table 7. Stability of performance across multiple stratified runs (10 seeds).

Metric	Mean ± Std	95% CI
Accuracy	0.7651 ± 0.0105	[0.7586, 0.7716]
Macro F1-score	0.6711 ± 0.0156	[0.6615, 0.6808]
Weighted F1-score	0.7431 ± 0.0116	[0.7359, 0.7503]

Table 8. The results of repeated runs.

Seed	Accuracy	Macro_f1	Weighted_f1
1	0.7605	0.6597	0.7349
2	0.7808	0.6923	0.7590
3	0.7571	0.6507	0.7310
4	0.7718	0.6905	0.7553
5	0.7706	0.6785	0.7519
6	0.7582	0.6592	0.7357
7	0.7763	0.6873	0.7539
8	0.7458	0.6602	0.7284
9	0.7695	0.6764	0.7484
10	0.7605	0.6566	0.7327

Table 9. Results of the optimized HSO-RF are related to complexity.

Parameter	Value
n_estimators	189
max_depth	18
min_samples_split	39
Ω = n_estimators × max_depth	3402

Table 10. Uncertainty quality metrics.

Metric	Value
Accuracy	0.7774
Macro F1	0.6900
Weighted F1	0.7600
Mean Entropy (Correct)	0.6308
Mean Entropy (Incorrect)	0.9252
Multiclass Brier Score	0.3307
ECE (max-confidence)	0.0627

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Elmasry, M.M.S.; Gafar, M.G.; Elsabagh, M.A. A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction. Inventions 2026, 11, 20. https://doi.org/10.3390/inventions11020020

AMA Style

Elmasry MMS, Gafar MG, Elsabagh MA. A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction. Inventions. 2026; 11(2):20. https://doi.org/10.3390/inventions11020020

Chicago/Turabian Style

Elmasry, Menna M. S., Mona G. Gafar, and M. A. Elsabagh. 2026. "A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction" Inventions 11, no. 2: 20. https://doi.org/10.3390/inventions11020020

APA Style

Elmasry, M. M. S., Gafar, M. G., & Elsabagh, M. A. (2026). A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction. Inventions, 11(2), 20. https://doi.org/10.3390/inventions11020020

Article Menu

A Multi-Objective and Uncertainty-Aware Holistic Swarm Optimized Random Forest for Robust Student Performance and Dropout Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Holistic Swarm Optimization

2.1.1. Population Initialization

2.1.2. Assessment of Fitness

2.1.3. Calculating Coefficients

2.1.4. Position Change

2.1.5. Selection Based on Adaptive Simulated Annealing (SA)

2.1.6. Adaptive Mutation

2.2. Random Forest (RF)

2.3. Configuring HSO with RF (HSO-RF)

2.3.1. Metaheuristic Motivation

2.3.2. HSO Configuration

2.3.3. Robustness-Aware Multi-Objective Optimization

2.3.4. Hyperparameter RF

2.3.5. Uncertainty-Aware Assessment of Student Risk

3. Results

3.1. Students’ Dropout Dataset

3.2. Data Preparation

3.3. Behavior of HSO

3.4. HSO-RF Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI