A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning

Alghamdi, Saad; Soh, Ben; Li, Alice

doi:10.3390/electronics14183703

Open AccessArticle

A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning

by

Saad Alghamdi

^1,2,*

,

Ben Soh

²

and

Alice Li

³

¹

Department of Computer Science, Al-Qunfudhah College of Computing, Umm Al-Qura University, Makkah 24382, Saudi Arabia

²

Department of Computer Science and Information Technology, School of Computing, Engineering and Mathematical Sciences, La Trobe University, Bundoora, VIC 3086, Australia

³

La Trobe Business School, La Trobe University, Bundoora, VIC 3086, Australia

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(18), 3703; https://doi.org/10.3390/electronics14183703

Submission received: 13 July 2025 / Revised: 7 September 2025 / Accepted: 16 September 2025 / Published: 18 September 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

High dropout rates on in-session learning platforms pose a significant challenge to student retention and the overall success of educational programmes. This study proposes a novel framework that integrates multi-level stacked ensemble learning with optimised feature selection using a hybrid approach combining Genetic Algorithm (GA) with Correlation-Based Feature Selection (CFS). The model employs a Multi-Layer Perceptron (MLP) as a meta-learner, aggregating predictions from multiple ensemble-based base classifiers to enhance predictive accuracy. To improve generalisation and reduce noise, the proposed approach applies GA-CFS-driven feature optimisation in conjunction with data balancing techniques. Experimental results demonstrate that the proposed model outperforms benchmark approaches, achieving improvements of up to 22% in prediction accuracy and 12% in F1-score over standard stacked ensemble methods. These results highlight the effectiveness of combining meta-heuristic optimisation with ensemble learning to advance dropout prediction in online learning environments.

Keywords:

in-session platform; stacked ensemble learning; dropout prediction; machine Learning; feature selection optimisation; genetic algorithm (GA); multi-layer perceptron (MLP)

1. Introduction

Modern institutions of learning are expected to create innovative courses that go beyond the conventional limitations of time and place. To meet these demands, they need to use e-learning and in-session education platforms to develop interesting learning methodologies and environments [1]. A major challenge in online education is optimising the learning environment to reduce student failures and dropouts. In-session learning has made predictive models, such as stacked generalisation, highly efficient in predicting student dropout. In stacked generalisation, several base learners are combined, and their outputs are fed into a meta-learner that predicts the final classification result, such as whether a student would drop out [2].

Recent studies show that stacked models are able to reduce prediction errors and provide high accuracy. However, the quality of the data used in these models can affect their performance. The dropout prediction models are trained using data that is out-of-date, incorrect, or subjective and does not accurately represent the real-world actions and behaviours of students on e-learning platforms [3]. Using relevant and objective data that is directly taken from in-session platforms becomes essential for improving the prediction of student failures and dropout [4]. To ensure accurate predictions, these data must be carefully selected and optimised, as well as balanced.

Feature selection approach is a major step to improve the performance of machine learning (ML) models especially for student dropout prediction [4,5]. Feature selection is able to reduce noise, minimises overfitting, and enhances model interpretability by identifying and selecting the most relevant features from the raw data. Moreover, it ensures that the model focuses on the most prominent features, enabling them to solve the imbalance problem in situations where datasets are unbalanced and dropout rates are excessively high or low [5]. The optimisation of feature selection for stacked ensemble learning models is able to achieve greater prediction accuracy and robustness.

This study uses an in-session dataset gathered from the online learning platform (orthopächer.net) to present a novel optimal stacked ensemble learning model for predicting student dropout to overcome the mentioned issues. The dataset was collected between March and April 2020 by the in-session platform to capture behavioural patterns during a critical phase of the COVID-19 pandemic, when learners were adapting to new learning methods. While the pandemic’s conditions may introduce some unpredictability into the data, they also highlight the practical challenges faced by in-session online platforms. To improve retention strategies in online education, our study analysed this dataset to identify the factors contributing to dropout risks in such dynamic and turbulent contexts. The standard RF, GBC, XGBoost, and AdaBoost were the ML algorithms used by sklearn library fine-tuned using GridSearchCV to achieve optimal performance on the in-session prediction task. To determine which basic learners were the most successful, their performances were compared through extensive experimentation.

1.1. The Contribution

The main objective of this study is to use the Genetic Algorithm (GA) and Correlation Feature Selection (CFS) approaches as a feature selection optimisation method. GA optimises a subset of features by selecting the most correlated features and removing duplicate features. This method improves the predictive ability of base learners and addresses the problem of data imbalance by ensuring that the selected features adequately represent the underlying patterns of the dataset. The outputs of the base learners are fed into a second-layer classifier known as an MLP, which acts as an overlearner and generates the final prediction result. This hierarchical structure provides the integration of multiple predictors, improving classification performance. Accordingly, the contribution of this study is highlighted as follows.

▪: Accuracy improvement: feature selection optimisation ensures that only the most optimal features are used, leading to more accurate dropout predictions.
▪: Data Imbalance Reduction: GA-CFS feature selection helps to address class imbalance and ensuring that the model performs well even when dropout rates are unevenly distributed.
▪: Improved interpretability: Feature optimisation simplifies the model by reducing the number of input variables, making it more predictive when applied to different learning scenarios.

This study aims to develop a stacked ensemble learning model based on optimised feature selection to improve the performance of student dropout prediction. The proposed approach provides an improved dropout learning method to increase student success rates and reduce repetition rates, enabling early detection of high-risk students and supporting timely interventions.

1.2. Paper Organisation

The remainder of the article is organised as follows: Section 2 discusses the most related studies for stacked ensemble dropout prediction models. The problem identification and the proposed methodology are discussed in Section 3 and Section 4, respectively. The experimental results and discussion are reviewed in Section 5. Finally, the conclusion is presented in Section 6. A list of abbreviations is shown in Table 1.

2. Literature Review

One of the more significant issues that educational institutions are currently dealing with is early student dropout. To identify students who are at risk of dropping out, a variety of ML algorithms have been employed. However, while students who present no real risk utilise more resources, unidentified students are removed from retention procedures due to the classification errors caused by these models [5]. The stacking ensembles allow for a better integrated dropout model which provides models that are more accurate. Additionally, to enhance the student performance in behaviour analysis, feature selection must be performed well to extract only crucial features. The most relevant features for predicting students’ performance and activity levels can be obtained by combining the ensemble method with the feature selection approach [6]. This method also helps to improve the prediction performance of the classification model and reduce the complexity of the feature-learning model. Table 2 lists the most recent research on student dropout prediction that considers the combination of feature selection optimisation techniques with ensemble stacked learning models.

For online course dropout prediction, a study in [7] developed a blending ensemble learning model with two layers, a Meta model with XGBoost and various base models. As part of the pre-processing step, the course with the most participants was selected, feature selection was performed, and the data were divided into three categories: ensemble, blender, and test. An accurate value of 90% was derived from the evaluation results of the proposed model.

In [8], authors present a novel feature selection model known as dynamic feature ensemble evolution for enhanced feature selection. It combines heat maps with conventional techniques such as information gain, correlation matrix analysis, and Chi-square to select the most relevant features for predicting student performance. The core novelty of the proposed approach is its dynamic and adaptive threshold mechanism, which overcomes the drawbacks of static approaches and reduces problems such as over-fitting and under-fitting by modifying thresholds in response to changing data patterns. The results show how the approach can adjust to changing data patterns, allowing for reliable and accurate predictions of student performance.

An ensemble approach was developed in [9] to extract the most relevant data for student performance prediction. The most relevant feature in this study was chosen using the Improved PCA approach. To increase the prediction accuracy, the most relevant feature sets are chosen using an ensemble method that uses the features produced by the proposed method. To categorise students’ performance and engagement in online classrooms, there are logistic regression, neural networks, and decision trees. They are used for a prediction ensemble model. The proposed method could predict student performance and activity with up to 89% accuracy for online courses.

In [10], a study evaluates the effectiveness of models developed through blending and stacking methods. The proposed methods are utilised in the various base models. The same results are obtained when evaluating the model using a confusion matrix and k-fold cross-validation procedures. Based on the online learning data student dropout prediction case study, the results show that the stacking model gives moderate accuracy by 0.83%.

A novel stacking ensemble based on combining ML techniques with a complex non-linear transformation feature selection method is proposed in [11] to predict university class dropout rates. The proposed method has outperformed the base models in terms of accuracy. The results show that the proposed approach is able to identify students who are at risk of dropping out of school based on a variety of influential factors. In [12], authors evaluate several feature selection techniques using three prediction classes based on the ensembling methods of ML algorithms and filter methods to choose the most relevant features. Four Stanford University MOOC courses have verified this strategy. A comparison between the proposed approach and other algorithms was conducted on a number of performance metrics, and results show that the proposed approach gives high accuracy when using a simple feature selection method.

A dropout prediction method for an in-session platform was proposed in [13]. By using over 164,000 session logs from 52,000 users, the authors developed time-progressive ML models, such as MLP. By identifying over-motivational and subject-specific needs, the proposed model, which had an accuracy of up to 87%, was able to personalise real-time approaches based on dropout probabilities.

The ensemble learning approach is able to predict the student dropout rate accurately for in-session datasets, according to results from related studies. Despite these improvements, class imbalance remains an issue in dropout prediction. Inaccurate generalisation for the minority class might arise from biased models that favour the majority class due to imbalanced datasets [14]. This imbalance occurs when the number of non-dropout data exceeds the number of dropout data. In the initial subset of data, after applying the RUS algorithm to dynamically balance the dataset, the number of classes can be equal.

The proposed study uses an undersampling method to balance the classes for in-session data. During the study, a stacked ensemble ML model is used to predict in-session dropouts which advance the methods utilised in the previous work, as in [13]. An MLP model was able to predict dropout events with up to 87% accuracy in the original study using data from a sequential learning process. The accuracy of the proposed method in predicting student dropout was compared to the methodology in [14] to evaluate its performance.

The study proposed a student dropout prediction model for in-session dataset by integrates a multi-level stacked ensemble learning with feature selection optimisation. These two approaches are used to improve the dropout prediction accuracy which are highlighted as follows:

▪: Feature Selection Optimisation: The GA with CFS is able to find the most related and non-redundant subset of features from the dataset. By reducing noise and redundancy, this phase ensures that the predictive models are trained on useful, high-quality data.
▪: Multi-Level Stacked Ensemble Learning: the use of base learners such as AdaBoost, RF, XGBoost, and GBC techniques able to build an efficient stacked ensemble model. The results output from the base learners is passed to MLP acting as the meta-learner. Predictive accuracy and generalisation are improved by this hierarchical method, especially in datasets that are balanced.

3. Problem Identification

The proposed model enables predictions in real time as students move through their learning activities because it is made to deal with sequential subsets of in-session data. The approach attempts to provide an effective dropout prediction framework by handling issues including high dimensionality, class imbalance, and dynamic data patterns [14]. To directly identify the issue, let X ϵ

D^{n p}

, be the dataset with n samples and p features. The dropout output, represented by y ϵ

{0,1}^{n}

, shows whether a student drops out when y = 1 or not. The problem is to determine the ideal subset of features S^* that optimises predictive performance while minimising redundancy by identifying the subset of selected features from the original feature set by S = {

p_{1}

,

p_{2}

, ……

p_{m}

} [15]. Using the optimised feature subset

S^{*}

, a multi-level stacked ensemble model is trained to predict student dropout rates based on the CFS score, which is calculated as follows:

S^{*} = \arg {m a x}_{S} CFS (S)

(1)

▪: Objective 1. Avoid class Imbalance and Complexity

The datasets used for dropout prediction are unbalanced, with the majority class of y = 0 as non-dropouts far exceeding the minority class of y = 1 as dropouts [15]. This imbalance can cause ML models to perform poorly on the minority class, which is frequently a focus of intervention initiatives, by prioritising the majority class.

To solve this, the dataset can be balanced using the Random Under-Sampling (RUS) technique described as follows:

(X_{r}, y_{r}) = RUS (X,y)

(2)

In Equation (2),

X_{r}

,

y_{r}

represent the resampled feature matrix and target vector, respectively. As students progress through assignments, tests, or other activities, in-session learning platforms produce sequential data [16]. To address this, related to the suggested dataset, Let= {

m_{1}

,

m_{2}

, ……

m_{T}

} represent the matrix sequence that corresponds to the current sentence or task number in the session. Every matrix, represents a specific instant in the session, and as time (t) increases, the data changes dynamically. The problem is to update the training dataset iteratively to model its dynamic character [17]. The cumulative balanced dataset

{(D}_{t})

up to time (t) can be calculated as follows:

D_{t} = D_{t - 1} \cup {(X_{t}, y_{t})}

(3)

where

X_{t} and y_{t}

represent a balanced feature–target pair for the current matrix

m_{t}

, obtained through the RUS approach.

▪: Objective 2. Optimising Feature Selection

The most related features are essential for increasing model accuracy and interpretability to predict student dropout. Complex interactions between features may be missed by traditional feature selection approaches, requiring advanced optimisation approaches [17].

The CFS score, which is calculated as follows, is typically used to measure the quality of a feature subset

S_{s}

:

CFS (S) = \frac{k {\bar{r}}_{c f}}{\sqrt{k + (k - 1) {\bar{r}}_{f f}}}

(4)

In Equation (4), the term, k =

\sum_{i = 1}^{d} C_{i}

represents the number of selected features.

{\bar{r}}_{c f}

is the average correlation between selected features and the target y, and

{\bar{r}}_{f f}

is the average correlation among selected features, which can be given by the following:

{\bar{r}}_{c f} = \frac{1}{k} \sum_{i ϵ S_{s}} r (x_{i,} y)

(5)

{\bar{r}}_{f f} = \frac{2}{k (k - 1)} \sum_{i ϵ S_{s}} \sum_{j ϵ S_{s}} r (x_{i,} x_{j,})

(6)

To assess feature subsets according to relevance and redundancy, CFS can be incorporated into GA as the fitness function during its procedure [17]. Good feature subsets have characteristics that are poorly correlated with one another (

{\bar{r}}_{f f}

) and associated with the target variable (

{\bar{r}}_{c f}

), which is the justification for adopting CFS. It ensures that, while reducing information redundancy, the chosen features all work together to support the predictive model.

The proposed approach can be used with any in-session datasets because it is able to adjust to different datasets and target variables, such as dropout or course completion [18]. Additionally, by decreasing the dimensionality of the dataset, the framework will enhance predictive models’ generalisation capabilities and expedite the training process, particularly for stacked ensemble approaches.

4. The Methodology

Efficient predictions are made possible by the proposed approach, which is able to function on successive subsets of in-session data as students move through their learning tasks. The model intends to give useful insights by addressing issues including high dimensionality, class imbalance, and irrelevant features [19]. Figure 1 shows the architecture of the proposed approach. This developed model is trained by 70% of the datasets, with 30% for validation and testing.

The proposed hierarchical framework is based on three hypotheses (HPs) that guide the optimisation of the MOOC dropout prediction model as follows.

▪: HP 1: Optimised feature selection increases the dropout prediction accuracy.

The use of CFS technique used in the GA process helps to find the most related features while removing redundancy and noise. This leads to improved model accuracy. The evaluation will compare the performance of proposed model with baseline models trained with optimised features.

▪: HP 2: ML Models as base learners

ML models such as AdaBoost, RF, XGBoost, and GBC are better at detecting temporal dependencies and non-linear interactions in student interaction data. The evaluation involves training the ML by the optimised subset of optimum features to vote for the meta learner model, enabling the successful build of the dropout prediction framework.

▪: HP 3: Stacked Ensemble models will increase the prediction accuracy compared to individual models.

The integration of GA with CFS and RUS approaches significantly improves feature selection and dynamically reduces class imbalance, thereby enhancing both the predictive performance and interpretability of the proposed stacked ensemble model. The GA-CFS model improves the selection of an optimal subset of features that are highly predictive of dropout risk while minimising redundancy among them. This ensures that the model focuses on the most relevant features by reducing noise and improving interpretability. By dynamically applying the GA-CFS model to each matrix value as a sentence, the model adapts to evolving learner behaviour and captures critical behavioural patterns associated with dropout [20].

Moreover, the use of GA-CFS with RUS helps address class imbalance in the in-session dataset, where the number of dropouts (minority class) is significantly lower than the number of non-dropouts (majority class) [21]. By applying RUS as new sentences are processed, the model maintains balanced training sets throughout the course. This reduces bias toward the majority class and gives reliable predictions even in scenarios with highly imbalanced distributions. [22]. The combination of metaheuristics such as GA with the CFS approach, along with balanced data through RUS, ensures that the proposed model predicts dropout based on optimised and balanced features, thereby enhancing its ability to generalise across diverse learner behaviours. The integration of GA-CFS and RUS within the stacked ensemble model further enables the handling of new sentences while preserving both accuracy and effectiveness of predictions. This optimised approach aligns with the practical need for early detection of dropout cases, where timely identification of at-risk learners is essential [23].

Ensemble approaches integrated the predictions from different ML models to maximise their strengths and minimise variance, in addition to improving accuracy and robustness. After producing an ensemble by base learner models, an MLP is used as a meta model to find the final dropout prediction.

4.1. Data Collection

The data collected from the online learning platform (https://zenodo.org/records/7755363 (accessed on 4 September 2024)) is used to evaluate the proposed approach. It contains 164,580 sessions, with approximately 3 million response phrases from about 52,032 students between March and April 2020. The data were collected during the initial stages of the COVID-19 pandemic. While the study does not explicitly analyse the direct impact of the pandemic, several factors suggest that it presented both challenges and opportunities for online learning platforms. In particular, the pandemic led to a significant increase in the use of the in-session platform for distance learning.

In dataset, the exercise set consists of ten sentences. The in-session dataset contains a large number of factors related to student behaviour and performance [13]. A predicted matrix value, represented by an instance in each row, indicates a sentence in the course progression. A binary target variable as dropout specifies whether a student left class or remained enrolled.

4.2. Feature Engineering and Preprocessing

The first step exposes the selected features that direct the data export before going into the methods used in this phase. Feature engineering was used as a key step to select initial features and predictors. To predict student dropouts, a subset of sessions is evaluated as a dataset [24]. This dataset has 32 features, which are divided into 12 numerical features and 18 binary features. Moreover, there is a distinctive feature known as matrix represents incremental sentence numbers. These features provide an in-depth understanding of student behaviour, academic achievement, and environmental elements that could affect dropout rates. The features that were maintained are shown in Table 3. Two types of features were considered as follows.

4.2.1. Numerical Features

The measured student performance and activity are captured by the 12 numerical features as shown in Table 2. The learner’s response quality and accuracy are reflected in features such as first solution, success, and difficulty, which provide information about their academic competency. Variables such as mistakes, multiple false steps, and steps are used to measure errors and interactions, and find patterns of difficulty or disengagement. The contextual data such as school hours, class level, and years registered provides understanding long-term engagement trends as a required knowledge of the students’ academic background and environment. The pending tasks count feature shows how much student assignments need to be completed. The numerical properties are useful for modelling because they enable correlation with dropout behaviour. Larger error or difficulty scores, for instance, may indicate a larger chance of dropping out, but success and dropout likelihood may be inversely correlated.

4.2.2. Binary Features

The 18 binary features represent the states that affect how students behave. Such features are background of student activity during sessions which can be captured with the use of behavioural indicators such as homework, previous break, and distraction. Writing abilities can be indicated from linguistic error-related features such as type capitalisation, type grammar, and type hyphenation, which may be associated with academic readiness. Post-tests, pre-tests, and interim tests show the involvement in assessments, which can be used as benchmarks for evaluating the engagements. The considered demographic parameters such as gender, male, female, and positional features such as test-position check, and training, enable the model to take individual variances into account. Numerical features express categorical information, which is where binary features come in useful. The existence or non-existence of a previous break, for instance, may reveal whether a student is returning from an extended absence, which may affect their risk of dropping out.

The matrix feature that divides the data into sequential phases is able to highlight the current sentence number in the dataset. By this defined feature, the model sequentially processes data, approximating real-time learning contexts [21]. The model is able to detect patterns and make dynamic predictions about dropout possibilities through monitoring behaviour and performance across various matrix values, The sorted set of unique matrix values provides a clear phrase progression, which enables visualisation of metric trends over time. The following pre-processing steps are used to prepare the in-session dataset for dropout prediction:

▪: Identify features strongly correlate with dropout behaviour.
▪: Address class imbalance by using the RUS method.
▪: Perform feature selection to retain only the most informative features.

Figure 2 shows the count plots for the number of the binary features of the dataset that represent dropout variables, 0 for no Dropout, and 1 for dropout. The preprocessing decisions, including addressing class imbalance, feature selection, and identifying potential predictors of dropout behaviour, are informed by these count plots, which are essential for understanding the distribution of each feature across dropout and non-dropout scenarios [25].

The performance of the proposed stacked ensemble dropout prediction model is affected by the basic learners and the input features those concerning the dropout labels, which is most clearly illustrated in Figure 2. The important aspects of preprocessing decisions such as feature selection, encoding schemes, and balancing procedures highlight class imbalances and provide feature-specific patterns associated with dropout behaviour [26]. The ensemble base model proposed consists of AdaBoost, RF, XGBoost, and GBC, with an MLP meta-learner trained on a prepared dataset when these distributions are correctly classified, which improves its ability to generalise, reduce bias, and produce better prediction results.

4.3. Feature Selection Optimisation

A GA is an adaptive algorithm inspired by natural selection for optimising complex problems. It operates through iterative processes including selection, exchange, mutation, and evaluation. GA with CFS are the two main techniques used in the proposed approach. While CFS evaluates the relationship between features and the target variable associated with dropout prediction, a GA searches for the best combination of features to maximise prediction accuracy. In a GA, CFS aims to select features that have a high correlation with the target variable but low redundancy [27]. A higher CFS value indicates that the selected variables are effective in predicting dropout behaviour because they provide significant predictive power without reducing redundancy. In a genetic algorithm, the predictive performance of each subset of features is used to determine their fitness, and the subsets with the highest performance are retained for further exploration. The mathematical expression for the fitness function of a genetic algorithm is given by the following.

fitness (S) = \frac{1}{1 + e (s)}

(7)

where e(s) represents prediction error related to classification error or mean squared error of the model when trained on the subset. By combining CFS and GA, the framework effectively narrows down the feature space, allowing the model to focus only on the most critical predictors [27].

Figure 3 shows the feature selection optimisation process based on the hybrid GA-CFS framework. It initially finds the feature subsets by CFS during the process of GA. The proposed approach evaluates subsets by the GA fitness function for subgroups that are driven by the CFS technique to avoid overfitting [28]. Finally, the processes of crossover, mutation, and selection are used to iteratively improve subsets until they converge or reach a predetermined iteration limit.

A GA efficiently explores the feature space by generating candidate subsets as chromosomes, evaluating their fitness, and producing improved solutions through crossover and mutation operations. In this study, the GA’s fitness function is defined using CFS, which evaluates feature subsets based on their association with the target variable while minimising redundancy among the selected features. [28]. By combining GA with CFS, the algorithm identifies an optimal feature subset that maximises predictive performance while reducing noise. This ensures that the final feature set is both informative and non-redundant. This integration is implemented in the feature selection module, specifically through a GA function. [29]. This function takes the independent variables of the dataset and the target variable as inputs, along with the total number of features. It then applies the GA-CFS process to determine the best subset of features and their corresponding related score.

The GA in the proposed study iteratively generates a population of binary-encoded individuals, each of which represents a candidate solution as a subset of features, to optimise the feature subset. To extract the best features for the model, Algorithm 1 shows the hybrid CFS-GA approach. The GA process is mathematically defined as follows. A binary vector

p_{m}

= {

b_{1}

,

b_{2}

, ……

b_{p}

} represents each

p_{m}

in the population, where

b_{i}

∈ {0,1} denotes the selection or non-selection of the i-th feature. It is possible to determine the fitness of an individual

p_{m}

by Equation (1). To choose individuals for reproduction, tournament selection is used [29]. A person’s fitness determines how likely it is that they are to be chosen.

Algorithm 1. Feature Selection Optimisation by Hybrid GA-CFS Approach

Input: In-session Dataset

(D^{n p}

), target variable y, GA parameters: population size

({p o p}_{s i z e})

, individual solution (

p_{m}

), max generations

(G_{m a x})

, crossover probability (

C_{p}

), mutation probability (

M_{p}

), crossover threshold (

C_{t h}

)
Output: Optimised feature subset (

S^{*}

)

begin
set: ${p o p}_{s i z e}$ , $G_{m a x}$ , Cp, and Mp
set: ( $D^{n p}$ )
function CFS(S, $D^{n p}$ )
Calculates: correlation between selected features and target by Equation (5)
Calculates: redundancy among selected features by Equation (6)
calculate: CFS-score ( $S^{*}$ ) by Equation (1)
return CFS-score
end function
function GA-CFS( $D^{n p}$ )
for $D_{i}$ = 1 to $D_{N}$ do
set: $D_{t}$ = balance (RUS)
set: GA population ( ${p o p}_{s i z e}$ , $D_{t}$ · $p_{m}$ )
for G = 1 to $G_{m a x}$ do
set: fitness-scores (S)
Repeat step (7)
selected-population = selection-pop( $p_{m}$ , $S^{*}$ )
set: offspring;
while size(offspring) < ${p o p}_{s i z e}$ do
Select: GA parents
if $C_{t h}$ < $C_{p}$ then
generate children by crossover
else
Add parent to directly to offspring
end if
Add children to offspring
end while
set: offspring with probability ( $M_{p}$ ),
select: mutated child, add to mutated-offspring
end for
set: $p_{m}$ = mutated-offspring
end for
find the best features by Equation (9)
Return the best features
end function
optimised features = GA-CFS( $D^{n p}$ )
build model ( ${y^}_{f i n a l}$ , $D_{t}$ )
end begin

The hybrid GA-CFS method integrates the ability of statistical correlation-based feature subset evaluation by CFS with the global search capabilities of GA to ensure relevance and reduce redundancy. Given a dataset D_t, the goal is to choose the optimal feature subset (S) that maximises model performance [30]. This can be explained as follows:

{F (S)}_{m a x} = R_{v} (S) - λ R_{d} (S)

(8)

where

R_{v} (S)

is a relevance measure of the correlation of features in (S) with the target (y).

R_{d} (S)

represents redundancy, measuring the inter-correlation among features in (S). λ is a regularisation parameter controlling the trade-off.

The GA framework starts by representing the chromosome C = {

C_{1}

,

C_{2}

,…

C_{d}}

,

C_{i}

ϵ (0,1) in the GA population represents a feature subset stored as a binary vector of length d.

C_{i}

= 1 indicates that feature (i) is picked; otherwise, it is excluded [31]. The initial populations

p_{m}

randomly produced chromosomes. This guarantees that the search space is covered in a diversified manner. The fitness of chromosome C, representing a feature subset

S_{s}

, is evaluated by Equation (4). By using techniques such as roulette wheel selection, chromosomes are chosen according to their fitness scores. Then, in a crossover phase, two-parent chromosomes,

C_{1}

and

C_{2}

, are joined to create children based on crossover threshold [0,1]. When the crossover probability is equal or larger than the threshold, the crossover process creates the children.

To introduce variety, the mutation process randomly flips a gene in a chromosome with probability (

p_{m u t a t i o n,}

), and the offspring is given by O = [

C_{1}^{1}, C_{2}^{2},

…,

C_{d / 2}^{1}

,

C_{\frac{d}{2} + 1}^{2}

, …

C_{d}^{2}

] [32]. The GA achieved the predetermined number of generations. After each GA generation, the best features

{p_{m}}^{*}

are calculated by the following:

{p_{m}}^{*} = \arg {m a x}_{{p_{m}}^{*} ϵ S} fitness (p_{m})

(9)

According to the discussed model, the GA approach uses fitness evaluation during selection to solve optimisation issues in an approach similar to biological evolution [32]. GA with CFS enables efficient selection of the optimal features from the in-session datasets to be used in the prediction model.

4.4. Multi-Level Stacked Ensemble Model

The multi-level stacked ensemble model integrates meta-learner with base learners’ algorithms. The meta-learner uses the base learners’ predictions as extra features after they trained on the input data. By gaining the advantages of many ML models, this hierarchical approach improves generalisation while solving issues such as class imbalance [33]. Each base learner produces probability estimates for the target variable y. The meta-learner receives these probabilities and combines them with the original feature set, as input. Through optimisation for accuracy and robustness, the meta-learner learns to integrate these probabilities into a final prediction. Iterative updates to the training and testing sets are required due to the sequential nature of the data, ensuring flexibility in response to shifting patterns [34].

In the proposed model, each base learner model is trained using the optimised feature subset

S^{*}

and the predicted probabilities for each class represented by

f_{b a s e, i}

(

X_{S^{*}}

) →

\hat{y_{i}}

, i

\in

{

f_{A d a}

,

f_{R F}

,

f_{X G B}

,

f_{G B C}

} are associated with the AdaBoost, RF, XGBoost, and GBC base learners [34]. Each base learner produces a probability estimate that can be calculated as follows.

{\hat{p}}_{b a s e, i} = P (y = 1 | (X_{S^{*}}; β_{i}))

(10)

In Equation (10),

β_{i}

represents the hyperparameters of the i-th base learner. The meta-learner uses the MLP model, represented by

f_{m e t a}

(

{\hat{p}}_{A d a}, {\hat{p}}_{R F}, {\hat{p}}_{X G B}, {\hat{p}}_{G B C}

) →

{y^}_{f i n a l}

, to combine the predictions of the base learners [35]. The following Equation (11) describes how the meta-learner

f_{m e t a}

reduces the binary cross-entropy loss function.

L (f_{m e t a}) = - \frac{1}{N} \sum_{i = 1}^{N} {[y}_{i} \log (f_{m e t a} (x_{i})) + (1 - y_{i}) \log (1 - f_{m e t a} (x_{i}))]

(11)

where N is the number of training samples,

x_{i}

is the enhanced feature used as an input to the MLP. The final prediction

{y^}_{f i n a l}

is obtained by applying a threshold to the output of the meta-learner given by the following.

{y^}_{f i n a l} = \{\begin{matrix} 1, i f f_{m e t a} (x_{i}) > 0.5 \\ 0, o t h e r w i s e \end{matrix}

(12)

Grid Search is used to systematically explore a predetermined hyperparameter space to find the best configuration for each base learner in order to tune the hyperparameters (β_i) for enhancing the performance of the proposed model. This guarantees that every model performs to the best of its abilities and provides the meta-learner with predictions of the highest quality [35]. A grid of hyperparameter values is generated for every learner model, and cross-validation is used for both training and evaluation [36]. The hyperparameter combination with the highest validation score is chosen. To ensure that the ensemble benefits from fine-tuned models, this procedure is conducted prior to stacking the base learners. This enables the maximisation of the validation score V_S over the hyperparameter space, as shown in the following equation:

β^{*} = \arg {m a x}_{β ϵ h} V_{S} (f (X, β))

(13)

where

h

represents the hyperparameter space.

β^{*}

is the optimal set of hyperparameters [36]. The

V_{S}

represents any performance metric evaluated using cross-validation. For each fold j in k-fold cross-validation, the optimal set of hyperparameters can be calculated as follows.

β^{*} = \arg {m a x}_{β ϵ h} \frac{1}{k} \sum_{i = 1}^{k} V_{S} (f (X_{p, i}, β), y_{p, i})

(14)

where

X_{p, i}

and

y_{p, i}

represent the validation features and validation labels, respectively, for the j-th fold in a cross-validation process.

The proposed multi-level stacked ensemble model with GA-CFS feature selection approach able to reduce the individual weaknesses of different classifiers while utilising their benefits to provide dependable and practical results [37]. This methodology enables the finding of improved dropout prediction accuracy and improves overall prediction performance.

5. Experimental Results and Discussion

The architecture of the dropout prediction modelling framework focuses on performance optimisation through careful model configuration and hyperparameter tuning. The base learning techniques and the meta-learner model were selected and configured with the parameters shown in Table 4.

The GA parameters used include a population size ranging from 50 to 100, selected to ensure sufficient diversity in the search space while maintaining computational feasibility. The crossover and mutation probabilities were set to 0.8 and 0.05, respectively, to balance exploration and exploitation during the evolutionary process. The number of generations was fixed at 50, providing sufficient iterations for convergence while keeping computational costs acceptable.

To evaluate the proposed approach, the in-session dataset was passed through pre-processing stages to structure and format the data. The implementation was conducted in Python version 3.8 using the scikit-learn library. The experiments were executed on a Google VM running Windows Server 2019, equipped with a 64-bit CPU (6 vCPUs) and 48 GB of RAM.

The main objective of this study was to evaluate how well various base learning models with meta-learner model configurations perform using an efficient GA-CFS feature selection method in a stacking ensemble architecture [38]. To achieve this, grid search is used to improve the hyperparameters of base learners such as AdaBoost, RandomForest, XGBoost, and Gradient Boosting and tested them methodically. A subset of features selected served as the training set for the base learners. The input of the meta-learner model was created by combining its probabilistic predictions with the original features. The input of basic learning models was used to train and assess the meta-learner model, which was implemented as an MLP classifier [39]. To find the optimal setup for the meta-learner and ensure reliable performance over all matrix values, hyperparameter optimisation was performed. As a result, the proposed framework was evaluated by following three scenarios:

▪: Evaluates the meta-learner models to identify the most effective configuration for predicting student dropout behaviour. The results show which meta-learner model is outperformed.
▪: Evaluates the stacked ensemble model before and after the proposed feature selection optimisation to evaluate its performance to improve the accuracy.
▪: The final evaluation highlights the performance of the proposed framework against the Benchmarking Models used in the baseline study.

5.1. Meta Learner Model Performance

A comparative analysis of many ML models as meta-learner models, including logistic regression (LR), decision trees (DT), naive bayes (NB), and MLP, across several performance criteria is presented in this section. To identify which learner algorithm would operate best as the Meta-Learner in the proposed stacked ensemble dropout prediction framework, these models were evaluated.

Figure 4 shows the performance of the proposed meta-learner models in terms of accuracy. It indicates that the MLP model provides high accuracy approximated to 80%, this is due to its ability to capture complex, non-linear relationships in the data. This is particularly important given the high-dimensional and incremental nature of the dataset. For other models, DT provides moderate accuracy up to 73%, this is because it may not be able to capture the difficulties of learner behaviour across sessions. Both LR and NB models show lower accuracy, likely due to their simplicity and inability to handle high-dimensional datasets effectively. For example, DTs tend to overfit the training data, while NB’s assumption of feature independence often leads to suboptimal performance in real-world scenarios.

The high accuracy of MLP is that it generalises well across the dataset, making it suitable for predicting dropouts in MOOCs. Its ability to leverage probabilities from base models and original features enhances its predictive power. In Figure 5, the precision performance of the evaluated models is highlighted. The MLP model ensures that students who are identified as at-risk are likely to drop out by reducing false positives in dropout prediction. The DT and LR provide moderate results but still provide some false positives, due to their limited ability to represent non-linear relations. Because of its poor precision, the NB commonly incorrectly labels non-dropout instances as dropouts.

The F1-score performance is shown in Figure 6 highlights that the MLP model continuously obtains the highest F1-score, indicating its ability to successfully trade off between precision and recall. F1-socre indicates that MLP is able to detect the majority of real dropouts while reducing false positives. The weak F1-scores of the other models, DT, LR, and NB, show that they are unable to find an acceptable balance between recall and precision. The high F1-socre of MLP ensures its ability to manage unbalanced datasets, which makes it appropriate for dropout prediction tasks.

Figure 7 shows that the MLP achieves the highest ROC-AUC score, indicating its strong ability to distinguish between dropouts and non-dropouts, even in the presence of class imbalance. In contrast, the other models perform poorly due to their limited capacity to capture complex, non-linear relationships and their weaker decision-making capabilities. These models struggle to model the intricate interactions between features and the target variable, especially under imbalanced data conditions. The high ROC-AUC score of the MLP demonstrates its effectiveness in accurately identifying at-risk students.

Overall, the MLP model consistently performs better than other algorithms on every metric. It is most suited for the meta-learner in the stacked ensemble structure due to its ability to represent complex, non-linear relationships and generalise effectively. The reason for this optimal performance is that MLP uses non-linear activation functions such as ReLU and multiple hidden layers to identify complex patterns in the input. Its ability to adapt during training is ensured by its early detection of overfitting. Moreover, the dynamic structure of the MLP model enables it to adapt to changing learner behaviour, as the data is processed incrementally based on matrix values, enhancing its suitability for real-time dropout prediction.

5.2. Optimised Stacked Ensemble Model Performance

The performance of the stacked ensemble model was evaluated by tracking two scenarios, before and after the feature selection optimisation. The purpose of this analysis is to show how much the overall performance and prediction accuracy of the model are improved by the optimum balanced feature set.

Figure 8 compares the accuracy of the optimised stacked ensemble model with feature selection optimisation with the baseline stacked model across different matrix values. The optimised model shows an accuracy of approximately 86% at a matrix value equal to 10, while the baseline models show an accuracy of about 75%. The optimised model ensures its high performance while the matrix value rises and achieves accuracies between 86% and 92%, while the baseline models reach an average of approximately 76 to 78%. As more data becomes available, feature selection enhances the ability of the model to generalise, as evidenced by the stable and increasing accuracy trend of the optimised model. The baseline models show limited improvement, indicating that they are facing problems to efficiently using the additional data. The optimised model provides an average accuracy improvement up to 92% when compared to the baseline model. The high performance indicates the ability of feature selection to improve the prediction of the stacked ensemble model. This is because feature selection focuses on the most relevant information and reduces noise, which is essential to increasing accuracy. This ensures that the improved model outperforms the baseline models in detecting small variations in learner behaviour.

Figure 9 shows that the optimised model continuously provides higher performance than the baseline model in terms of F1-score. The high performance of optimised model shows its ability to successfully balance recall and precision. However, the baseline model performance remains low, showing their limits in dynamically improving imbalanced datasets. The optimised model gives higher performance than the baseline model by an average F1-score up to 11.4% which indicates how well the model finds true positives while reducing false positives.

Figure 10 and Figure 11 show the optimised stacked ensemble model’s recall and ROC-AUC performance. When additional sentences are processed, the recall of the optimised model continuously increases, showing its ability to detect more true instances of success as dropouts. However, the baseline models show slower improvements, which may indicate that some at-risk students are not taken into account. Additionally, the ROC-AUC trend of the optimised model is steady and increasing, showing its effective biased ability to separate dropouts from non-dropouts. However, gains in the baseline models are slower, indicating that they are less able to rate good instances as dropouts higher than negative ones.

The performance of optimised model exceeds the baseline model, gives an average of 86% recall and ROC-AUC, which is equivalent to a relative improvement of almost 12.5%. The highest performance of proposed model shows how it works to identify dropouts early. High recall is important for dropout prediction because it ensures that the model will identify the majority of students who are at risk. The highest recall of the optimised model shows that it can identify minor behavioural patterns that indicate dropout risks. Even with unbalanced datasets, a high ROC-AUC score shows that the optimised model can regularly rate dropout instances higher than non-dropout cases.

Table 5 shows the performance comparison of the proposed stacked model without feature optimisation and feature selection by using the GA-CFS approach. It is noteworthy that the accuracy of the optimised model is higher than the baseline model, which indicates that by focusing on the best predictive features and minimising noise, feature selection improves the ability of the model to generalise. The increase in accuracy indicates that the GA-CFS method is able to remove irrelevant features while identifying the subset of features that most strongly influence predictive performance. In datasets such as those used for in-session dropout prediction, feature optimisation ensures that the model uses only the most relevant features, enabling better generalisation and more accurate predictions across various sentences. The two approaches maintain a comparable trade-off between decreasing false positives and negatives and recognising true positives for the F1-Score. Feature selection indirectly supports robust performance by improving other measures such as accuracy and recall, even when it does not directly raise the F1-score.

For accuracy, since feature selection helps reduce false positives, the optimised model shows a slight improvement in precision when compared to the baseline. This ensures that the positive predictions of the model as dropouts are highly reliable and should perform better as the number of sentences increases. The optimised model outperforms the baseline model by 2% in terms of both recall and AUC-ROC. This observation indicates that feature selection improves the ability of the model to detect dropouts and find true positive cases. The increase in recall indicates that the GA-CFS approach is able to identify minor behavioural patterns that the baseline model could miss. Based on the ROC-AUC value, feature selection improves the discriminatory ability of the model, enabling it to more effectively differentiate between dropouts and non-dropouts. Additionally, it shows that even in datasets that are unbalanced, the model is better able to prioritise positive instances higher than negative ones.

Overall, all evaluation metrics demonstrate improved performance with the proposed stacked ensemble model as feature selection improves. The slow learning strategy implemented in this study is reflected in the evaluation, which limits each learner to 60 sentences. These results are based on a relatively small dataset size, showing that the model performs better than baseline models. However, they may not fully impact the improved feature selection and ensemble learning framework. However, the performance of the proposed model could significantly improve with an increased number of sentences. It is expected that increasing sentences will give the model richer temporal dynamics and a wider range of behavioural patterns, helping it better capture complex interactions between features and dropout probabilities. The gradual accumulation of learner behaviour data would enhance the predictive capabilities of the MLP meta-learner as the dataset expands, resulting in significantly greater performance gains compared to baseline models.

5.3. Comparison with Benchmarking Models

The performance of the proposed optimised stacked ensemble model was compared with the benchmarking models as in [13] to further validate its efficiency. The comparison focuses on important evaluation metrics such as accuracy and F1-score. These benchmark models demonstrate the use of various machine learning techniques evaluated on the in-session platform, which serves as the target dataset for this study. This will show the relative advantages of the proposed approach in terms of robustness and prediction accuracy.

Figure 12 shows that the proposed optimised stacked ensemble model continuously provides maximum accuracy across all matrix values. The improved model starts with an accuracy up to 0.86 with a few sentences. With more sentences are processed, a strong generalisation ability is shown by the accuracy reaches 0.93. The DT and MLP models show moderate accuracy, increasing slightly to 0.80 from an initial value of 0.75. KNN and LR provide lower accuracy when compared to the optimised model give accuracy levels 0.70 to 0.75. With every sentence value, the optimised model performs better than benchmark models, which indicates that the use of feature selection, ensemble learning, and incremental learning is able to increase its predictive value. When dataset gradually increases, the optimised model continues to give excellent accuracy, highlighting its flexibility in response to diverse learner behaviour. The lower accuracy of traditional models such as DT, KNN, and LR is due to their inability to capture complex patterns in the data. Although the MLP outperforms these simpler models, its reliance on single-model predictions, without leveraging the advantages of ensemble learning, limits its ability to match the performance of the optimised ensemble.

Figure 13 shows that the optimised mode provides the highest F1 score for all sentence values compared to the benchmark models. The F1-score of the optimised model either remains stable or improves as additional sentences are analysed, demonstrating its ability to dynamically adapt to changing learner behaviour. The effectiveness of the stacked ensemble approach is reflected in the consistent superiority of the optimised model, particularly when combined with feature selection and incremental learning. As a result, it represents a reliable choice for real-world applications requiring complex and adaptive prediction. In the proposed framework, the optimised model enhances predictive performance by aggregating outputs from several base models, including AdaBoost, Random Forest, XGBoost, Gradient Boosting, and an MLP meta-learner. Through ensemble learning, the model effectively captures complex data correlations, surpassing the capabilities of individual models. The use of GA-CFS for feature selection ensures the model focuses on the most informative features, thereby reducing noise and improving generalisation.

6. Conclusions

This study proposed an optimised feature selection approach to enhance the ability of a stacked ensemble model to predict student dropout on a session platform. The proposed model combines several baseline models, including AdaBoost, Random Forest, XGBoost, and Gradient Boosting, with a multi-level learner to improve its prediction accuracy and robustness. Evaluation results show that, across several important benchmarks, the improved model performs significantly better than traditional comparison models such as decision trees, k-nearest neighbours, logistic regression, and stand-alone MLP models. By incorporating genetic algorithm-based correlation feature selection (GA-CFS), noise is reduced, and generalisation is enhanced by using only the most informative features. Furthermore, as more sentences are analysed, the model can dynamically adapt to adjust learner’s behaviour. Since immediate actions can increase learner retention, this flexibility is particularly important for real-time dropout prediction. In terms of accuracy and F1 score, the proposed approach achieves an average performance of approximately 91% and 88%, respectively. The results highlight the importance of ensemble learning to identify complex relationships for dropout prediction in high-dimensional datasets, making the proposed approach a suitable choice for session dropout analyses.

Author Contributions

Conceptualisation, S.A. and B.S.; methodology, S.A.; software, S.A.; validation, S.A., B.S. and A.L.; formal analysis, S.A.; investigation, B.S.; resources, A.L.; data curation, S.A. and A.L.; writing—original draft preparation, S.A.; writing—review and editing, A.L.; visualisation, B.S. and A.L.; supervision, B.S. and A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by Umm Al-Qura University, Saudi Arabia, under grant number: 25UQU4350390GSSR04S.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia, for funding this research work through grant number: 25UQU4350390GSSR04S.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor correction to the Acknowledgments. This change does not affect the scientific content of the article.

References

Smirani, L.K.; Yamani, H.A.; Menzli, L.J.; Boulahia, J.A.; Huang, C. Using Ensemble Learning Algorithms to Predict Student Failure and Enabling Customised Educational Paths. Sci. Program. 2022, 2022, 3805235. [Google Scholar] [CrossRef]
Zhou, Y.; Xu, Z. Multi-Model Stacking Ensemble Learning for Dropout Prediction in MOOCs. J. Phys. Conf. Ser. 2020, 1607, 012004. [Google Scholar] [CrossRef]
Cho, C.H.; Yu, Y.W.; Kim, H.G. A Study on Dropout Prediction for University Students Using Machine Learning. Appl. Sci. 2023, 13, 12004. [Google Scholar] [CrossRef]
Nithya, S.; Umarani, S. MOOC Dropout Prediction using FIAR-ANN Model based on Learner Behavioural Features. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2022, 13, 607–617. [Google Scholar] [CrossRef]
Marcolino, M.R.; Porto, T.R.; Primo, T.T.; Targino, R.; Ramos, V.; Queiroga, E.M.; Munoz, R.; Cechinel, C. Student dropout prediction through machine learning optimization: Insights from moodle log data. Sci. Rep. 2025, 15, 9840. [Google Scholar] [CrossRef]
Talamás-Carvajal, J.A.; Ceballos, H.G. A stacking ensemble machine learning method for early identification of students at risk of dropout. Educ. Inf. Technol. 2023, 28, 12169–12189. [Google Scholar] [CrossRef]
Putra, M.R.P.; Utami, E. A Blending Ensemble Approach to Predicting Student Dropout in Massive Open Online Courses (MOOCs). JUITA J. Inform. 2025, 13, 11–18. [Google Scholar] [CrossRef]
Malik, S.; Patro, S.G.K.; Mahanty, C.; Hegde, R.; Naveed, Q.N.; Lasisi, A.; Buradi, A.; Emma, A.F.; Kraiem, N. Advancing educational data mining for enhanced student performance prediction: A fusion of feature selection algorithms and classification techniques with dynamic feature ensemble evolution. Sci. Rep. 2025, 15, 8738. [Google Scholar] [CrossRef]
Rabelo, A.M.; Zárate, L.E. A model for predicting dropout of higher education students. Data Sci. Manag. 2025, 8, 72–85. [Google Scholar] [CrossRef]
Putra, M.R.P.; Utami, E. Comparative Analysis of Hybrid Model Performance Using Stacking and Blending Techniques for Student Drop Out Prediction In MOOC. J. RESTI (Rekayasa Sist. Dan Teknol. Inf.) 2024, 8, 346–354. [Google Scholar] [CrossRef]
Niyogisubizo, J.; Liao, L.; Nziyumva, E.; Murwanashyaka, E.; Nshimyumukiza, P.C. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Comput. Educ. Artif. Intell. 2022, 3, 100066. [Google Scholar] [CrossRef]
Kumar, G.; Singh, A.; Sharma, A. Ensemble Deep Learning Network Model for Dropout Prediction in MOOCs. Int. J. Electr. Comput. Eng. Syst. 2023, 14, 187–196. [Google Scholar] [CrossRef]
Rzepka, N.; Simbeck, K.; Müller, H.-G.; Pinkwart, N. Keep It Up: In-Session Dropout Prediction to Support Blended Classroom Scenarios. In Proceedings of the 14th International Conference on Computer Supported Education, Online, 22–24 April 2022; Volume 2, pp. 131–138. [Google Scholar] [CrossRef]
Alghamdi, S.; Soh, B.; Li, A. ISELDP: An Enhanced Dropout Prediction Model Using a Stacked Ensemble Approach for In-Session Learning Platforms. Electronics 2025, 14, 2568. [Google Scholar] [CrossRef]
Faucon, L.; Olsen, J.K.; Haklev, S.; Dillenbourg, P. Real-Time Prediction of Students’ Activity Progress and Completion Rates. J. Learn. Anal. 2020, 7, 18–44. [Google Scholar] [CrossRef]
Mbunge, E.; Batani, J.; Mafumbate, R.; Gurajena, C.; Fashoto, S.; Rugube, T.; Akinnuwesi, B.; Metfula, A. Predicting Student Dropout in Massive Open Online Courses Using Deep Learning Models—A Systematic Review. In Cybernetics Perspectives in Systems; Silhavy, R., Ed.; CSOC 2022. Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2022; Volume 503. [Google Scholar] [CrossRef]
Psathas, G.; Chatzidaki, T.K.; Demetriadis, S.N. Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning. Computers 2023, 12, 194. [Google Scholar] [CrossRef]
Bujang, S.D.A.; Selamat, A.; Krejcar, O.; Mohamed, F.; Cheng, L.K.; Chiu, P.C.; Fujita, H. Imbalanced Classification Methods for Student Grade Prediction: A Systematic Literature Review. IEEE Access 2023, 11, 1970–1989. [Google Scholar] [CrossRef]
Hassan, M.A.; Muse, A.H.; Nadarajah, S. Predicting Student Dropout Rates Using Supervised Machine Learning: Insights from the 2022 National Education Accessibility Survey in Somaliland. Appl. Sci. 2024, 14, 7593. [Google Scholar] [CrossRef]
Bohrer, J.d.S.; Dorn, M. Enhancing classification with hybrid feature selection: A multi-objective genetic algorithm for high-dimensional data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
Feng, G. Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing. PLoS ONE 2024, 19, e0303088. [Google Scholar] [CrossRef]
Al-Alawi, L.; Al Shaqsi, J.; Tarhini, A.; Al-Busaidi, A.S. Using machine learning to predict factors affecting academic performance: The case of college students on academic probation. Educ. Inf. Technol. 2023, 28, 12407–12432. [Google Scholar] [CrossRef]
Jin, C. MOOC student dropout prediction model based on learning behaviour features and parameter optimization. Interact. Learn. Environ. 2020, 31, 714–732. [Google Scholar] [CrossRef]
Hussain, M.M.; Akbar, S.; Hassan, S.A.; Aziz, M.W.; Urooj, F. Prediction of Student’s Academic Performance through Data Mining Approach. J. Inform. Web Eng. 2024, 3, 241–251. [Google Scholar] [CrossRef]
Namoun, A.; Alshanqiti, A. Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review. Appl. Sci. 2021, 11, 237. [Google Scholar] [CrossRef]
Roy, K.; Farid, D.M. An Adaptive Feature Selection Algorithm for Student Performance Prediction. IEEE Access 2024, 12, 75577–75598. [Google Scholar] [CrossRef]
Mumuni, A.; Mumuni, F. Automated data processing and feature engineering for deep learning and big data applications: A survey. J. Inf. Intell. 2024, 3, 113–153. [Google Scholar] [CrossRef]
Setiadi, H.; Larasati, I.P.; Suryani, E.; Wardani, D.W.; Wardani, H.D.C.; Wijayanto, A. Comparing Correlation-Based Feature Selection and Symmetrical Uncertainty for Student Dropout Prediction. J. RESTI (Rekayasa Sist. Dan Teknol. Inf.) 2024, 8, 542–554. [Google Scholar] [CrossRef]
Hao, J.; Gan, J.; Zhu, L. MOOC performance prediction and personal performance improvement via Bayesian network. Educ. Inf. Technol. 2022, 27, 7303–7326. [Google Scholar] [CrossRef]
Li, J.L.; Xie, S.T.; Wang, J.N.; Lin, Y.Q.; Chen, Q. Prediction and Learning Analysis Using Ensemble Classifier Based on GA in SPOC Experiments. In Data Mining and Big Data; Tan, Y., Shi, Y., Tang, Q., Eds.; DMBD 2018. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 10943. [Google Scholar] [CrossRef]
Albreiki, B.; Zaki, N.; Alashwal, H. A Systematic Literature Review of Student’ Performance Prediction Using Machine Learning Techniques. Educ. Sci. 2021, 11, 552. [Google Scholar] [CrossRef]
Sun, L.; Qin, H.; Przystupa, K.; Cui, Y.; Kochan, O.; Skowron, M.; Su, J. A Hybrid Feature Selection Framework Using Improved Sine Cosine Algorithm with Metaheuristic Techniques. Energies 2022, 15, 3485. [Google Scholar] [CrossRef]
Dey, R.; Mathur, R. Ensemble Learning Method Using Stacking with Base Learner, A Comparison. In Proceedings of International Conference on Data Analytics and Insights; Chaki, N., Roy, N.D., Debnath, P., Saeed, K., Eds.; Lecture Notes in Networks and Systems; Springer: Singapore, 2023; Volume 727. [Google Scholar] [CrossRef]
Tong, T.; Li, Z.; Akbar, S. Predicting learning achievement using ensemble learning with result explanation. PLoS ONE 2025, 20, e0312124. [Google Scholar] [CrossRef]
Sultan, S.Q.; Javaid, N.; Alrajeh, N.; Aslam, M. Machine Learning-Based Stacking Ensemble Model for Prediction of Heart Disease with Explainable AI and K-Fold Cross-Validation: A Symmetric Approach. Symmetry 2025, 17, 185. [Google Scholar] [CrossRef]
Chi, Z.; Zhang, S.; Shi, L. Analysis and Prediction of MOOC Learners’ Dropout Behavior. Appl. Sci. 2023, 13, 1068. [Google Scholar] [CrossRef]
Ismail, W.N.; Alsalamah, H.A.; Mohamed, E. GA-Stacking: A New Stacking-Based Ensemble Learning Method to Forecast the COVID-19 Outbreak. Computers. Mater. Contin. 2023, 74, 3945–3976. [Google Scholar] [CrossRef]
Nafea, A.A.; Mishlish, M.; Shaban, A.M.S.; Al-Ani, M.M.; Alheeti, K.M.A.; Mohammed, H.J. Enhancing Student’s Performance Classification Using Ensemble Modeling. Iraqi J. Comput. Sci. Math. 2023, 4, 204–214. [Google Scholar] [CrossRef]
Cam, H.N.T.; Sarlan, A.; Arshad, N.I. A hybrid model integrating recurrent neural networks and the semi-supervised support vector machine for identification of early student dropout risk. PeerJ Comput. Sci. 2024, 10, e2572. [Google Scholar] [CrossRef]

Figure 1. Proposed Methodology Process Diagram.

Figure 2. Count plots for the number of the binary features of the dataset.

Figure 3. Hybrid GA-CFS framework.

Figure 4. Accuracy of Meta Learner Models.

Figure 5. Precision of Meta Learner Models.

Figure 6. FI- Score of Meta learner Models.

Figure 7. ROC-AUC of Meta Learner Models.

Figure 8. Accuracy of Optimised vs. Baseline Models.

Figure 9. F1-Score of Optimised vs. Baseline Models.

Figure 10. Recall of Optimised vs. Baseline Models.

Figure 11. ROC-AUC of Optimised vs. Baseline Models.

Figure 12. Accuracy comparison of benchmarking models vs. proposed approach.

Figure 13. F1-Score comparison of benchmarking models vs. proposed approach.

Table 1. List of Abbreviations.

Abbreviations	Description
AdaBoost	Adaptive Boosting Learning
CFS	Correlation-Based Feature Selection
CNN	Convolutional Neural Network
CNT	Complex Non-Linear Transformation
CMA	Correlation Matrix Analysis
DT	Decision Tree
FCBF	Fast Correlation-Based Filter
GBC	Gradient Boosting Classifier
GA	Genetic Algorithm
KNN	K-Nearest Neighbours
LightGBM	Light Gradient Boosting Decision Tree Implementation
LSTM	Long Short-Term Memory Network
MLP	Multilayer Perceptron
MMSE	Minimum Mean Square Error
NB	Naïve Bayes
PCA	Principal Component Analysis
PCC	Pearson’s Correlation Coefficient
RF	Random Forest
SVM	Support Vector Machine
SGD	Stochastic Gradient Descent
XGBoost	Extreme Gradient Boosting

Table 2. Comprehensive comparison for the related studies with our proposed approach.

Citations	Contribution	Model Algorithms	Feature Selection Method	Highlights
[7]	Developed a blending ensemble learning for online dropout prediction	DT, NB, and XGBoost	feature importance based on RF	High accuracy approx. to 90%
[8]	Evaluates Enhanced Feature Selection approaches for predicting student performance	DT, RF, SVM, NN, NB	CMA, information gain, and Chi-square	Gives high accuracy of 94% with correlation matrix analysis
[9]	Improves the prediction performance for student activity during online classes	LR, DT, NN	Fully, Stepwise, and Lasso	Moderate accuracy approx. to 89%
[10]	Evaluates different ML approaches using stacking and blending models to predict the dropout rate of students	KNN, DT, and NB	CNT-based NB function	Moderate accuracy approx. to 83%
[11]	Evaluates different feature selection approaches with ML-based ensembling methods	SVM, KNN, DT, NB, LR, and stacked Voting	Chi-square, FCBF, Relief method, and PCC	Higher accuracy by 93% with relief method and stacked voting
[12]	Novel stacking ensemble based on a hybrid ML model to predict students’ dropout in university classes	NN, RF, GBC, XGBoost	CNT	Higher precision and recall
[13]	MLP-based model using stepwise session data for adapted dropout prediction on in-session platforms.	DT, KNN, LR, MLP	temporal/session-based modelling	accuracy with 87%

Table 3. Preserved features and their categories.

Category	Features	Description
Numerical features	First solution	The time or correctness of the first solution
	distracted	If the user was distracted during the session
	Success	The success rate of answers
	difficulty	The difficulty level of tasks or questions
	School hours	The number of school hours associated with user activity
	Multiple false	The number of incorrect attempts made by the user
	Matrix	Current sentence number or matrix value in the dataset
	mistakes	The total number of mistakes during the session
	Class level	The user’s academic level or grade
	Years registered	The years have been registered in the system by users.
	Pending tasks count	Count of pending or incomplete tasks
	Steps	The number of steps or interactions during a task
Binary features	School hours	Binary indicator for whether school hours participated in the session
	Previous break	If there were a break before the current session
	User attribute	A specific attribute of the user
	Type capitalisation	Errors related to capitalisation in user input
	Type grammar	Grammar-related errors in user input
	Type hyphenation	Hyphenation-related errors in user input
	Type comma formation	Comma-related errors in user input
	Type the sound letters	Errors related to sounds or letters in user input
	homework	If the session was related to homework
	Voluntary work	Voluntary work performed by the user
	Post-test	Participation in a post-test
	Pre-test	Participation in a pre-test
	Interim test	Participation in an interim test
	Gender male	Binary indicator for male gender
	Gender female/male	Binary indicator for female gender
	Test-position check	If the user is in a test position
	Test-position training	If the user is in a training position
	Test-position version	The version of the test position (categorical/binary).

Table 4. The ML Model Parameters.

Models	Parameters	Values
RF	Max Number of estimators	150
	Max learning rate	0.2
	Max depth	10
AdaBoost	Number of estimators	150
AdaBoost	Max learning rate	1
GBC	Number of estimators	150
	Max learning rate	0.2
	Max depth	7
XGBoost	Number of estimators	150
	Max learning rate	0.3
	Max depth	7
MLP	Hidden layer sizes	35
MLP	Max iterations	100

Table 5. Average Metric Scores of Stacked Models: Optimised vs. Non-Optimised Feature Selection.

Scenarios	Accuracy	F1-Score	Precision	Recall	ROC-AUC
Baseline Stacked Model	0.88	0.84	0.99	0.78	0.88
Optimised Model	0.91	0.88	1.00	0.85	0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alghamdi, S.; Soh, B.; Li, A. A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning. Electronics 2025, 14, 3703. https://doi.org/10.3390/electronics14183703

AMA Style

Alghamdi S, Soh B, Li A. A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning. Electronics. 2025; 14(18):3703. https://doi.org/10.3390/electronics14183703

Chicago/Turabian Style

Alghamdi, Saad, Ben Soh, and Alice Li. 2025. "A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning" Electronics 14, no. 18: 3703. https://doi.org/10.3390/electronics14183703

APA Style

Alghamdi, S., Soh, B., & Li, A. (2025). A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning. Electronics, 14(18), 3703. https://doi.org/10.3390/electronics14183703

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Optimised Feature Selection Method for In-Session Dropout Prediction Using Hybrid Meta-Heuristics and Multi-Level Stacked Ensemble Learning

Abstract

1. Introduction

1.1. The Contribution

1.2. Paper Organisation

2. Literature Review

3. Problem Identification

4. The Methodology

4.1. Data Collection

4.2. Feature Engineering and Preprocessing

4.2.1. Numerical Features

4.2.2. Binary Features

4.3. Feature Selection Optimisation

4.4. Multi-Level Stacked Ensemble Model

5. Experimental Results and Discussion

5.1. Meta Learner Model Performance

5.2. Optimised Stacked Ensemble Model Performance

5.3. Comparison with Benchmarking Models

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI