GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments

Husayn, Mohammed; Adegboye, Oluwatayomi Rereloluwa; Alzubi, Ahmad

doi:10.3390/app152212163

Open AccessArticle

GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments

by

Mohammed Husayn

^*,

Oluwatayomi Rereloluwa Adegboye

and

Ahmad Alzubi

Business Administration Department, Institute of Graduate Research and Studies, University of Mediterranean Karpasia, Mersin-10, Northern Cyprus, Lefkosa 99010, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(22), 12163; https://doi.org/10.3390/app152212163

Submission received: 15 October 2025 / Revised: 5 November 2025 / Accepted: 7 November 2025 / Published: 16 November 2025

(This article belongs to the Special Issue Artificial Intelligence Technologies for Education: Advancements, Challenges, and Impacts, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate and interpretable prediction of student academic performance is a cornerstone of data-driven educational support systems, enabling timely interventions, personalized learning pathways, and equitable resource allocation. While ensemble machine learning models such as Random Forest, Extra Trees, and CatBoost have shown promise in educational data mining, their predictive power and generalizability are often limited by suboptimal weighting schemes and sensitivity to hyperparameter configurations. To address this, we propose a Grey Wolf Optimizer (GWO)-guided ensemble framework that dynamically optimizes each base regressor’s contribution to minimize prediction error while preserving model transparency. Evaluated on a real-world student performance dataset, the proposed approach achieves a coefficient of determination (R²) of 0.93, significantly outperforming individual and conventional ensemble baselines. Furthermore, we integrate SHAP (SHapley Additive exPlanations) to provide educator-friendly interpretability, revealing that daily study hours, study effectiveness, lifestyle score, and screen time are the most influential predictors of exam outcomes. By bridging an optimized machine learning model with educational analytics, this work delivers a robust, transparent, and high-performing AI solution tailored for intelligent tutoring systems, early-warning platforms, and adaptive learning environments. The methodology exemplifies how nature-inspired optimization can enhance not only accuracy but also actionable insight for stakeholders in smart education ecosystems.

Keywords:

artificial intelligence; educational data mining; student performance prediction; ensemble learning; Grey Wolf Optimizer (GWO); interpretable machine learning; learning analytics

1. Introduction

The integration of artificial intelligence in educational ecosystems has revolutionized traditional pedagogical approaches and institutional management strategies. Educational predictive analytics, an interdisciplinary domain leveraging machine learning (ML) and data mining techniques, has emerged as a transformative paradigm for enhancing academic outcomes through data-driven decision support. The global significance of this research area stems from pressing societal challenges in education systems worldwide, including student attrition rates, suboptimal resource allocation, and achievement gaps across diverse demographic populations. With higher education institutions facing unprecedented pressure to demonstrate efficacy and accountability, predictive modeling offers a scientifically rigorous methodology for identifying at-risk students, personalizing learning interventions, and optimizing institutional performance [1,2]. The conceptual evolution of educational data mining (EDM) has paralleled advancements in computational intelligence, transitioning from traditional statistical analysis to sophisticated ML algorithms capable of modeling complex, nonlinear relationships in educational data. Early approaches primarily relied on regression techniques and descriptive analytics to identify correlational patterns between student characteristics and academic outcomes. However, the emergence of robust ML frameworks has enabled the development of predictive models that assimilate heterogeneous data sources, including demographic attributes, behavioral patterns, institutional factors, and socio-economic indicators, to generate accurate forecasts of student performance [1,3]. This predictive capability carries profound implications for educational equity and institutional effectiveness, enabling early intervention strategies that can potentially mitigate dropout rates and enhance learning outcomes across diverse student populations. Contemporary educational infrastructures generate vast datasets through Learning Management Systems (LMSs), student information systems, and digital learning platforms, creating unprecedented opportunities for analytical modeling. Research by Waheed et al. demonstrated that student clickstream activities within LMS environments contain predictive signals correlated with academic performance, though the relationship between engagement metrics and outcomes requires careful model specification [4]. The scientific import of educational predictive analytics extends beyond mere technical achievement, embodying a fundamental shift toward evidence-based educational management that aligns institutional practices with empirical insights rather than relying solely on tradition or intuition.

The methodological progression of predictive modeling in education reflects broader trends in computational science, marked by a gradual transition from traditional statistical models to ensemble ML approaches and, most recently, to optimized hybrid architectures. Seminal work in this domain primarily employed logistic regression, discriminant analysis, and decision trees to identify factors correlated with academic success. While these methods established important foundational relationships, particularly between prior academic performance and future outcomes, they often struggled with the complex, high-dimensional datasets characteristic of educational environments [2]. The advent of ML introduced more sophisticated algorithms to educational prediction tasks, with studies systematically comparing the efficacy of various approaches [2,5,6]. A comprehensive systematic review evaluated ML algorithms, including Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Decision Trees, against traditional statistical models, concluding that ML approaches consistently outperformed their conventional counterparts due to their superior capacity to handle large, nonlinear datasets and to continuously enhance predictive accuracy. In particular, ensemble methods such as Random Forests and Gradient Boosting Machines demonstrated remarkable performance in predicting student outcomes by mitigating the limitations of individual base learners [7]. These ensemble techniques operate on the principle that a collectivity of learners yields greater overall accuracy than an individual learner, effectively addressing the bias-variance tradeoff that plagues single-model approaches [8].

Recent research has focused on enhancing predictive performance through hybrid architectures that integrate multiple modeling paradigms [9,10]. These studies represent the current state of the art in educational prediction, yet they reveal persistent challenges in model optimization and generalization that necessitate further methodological innovation. Ensemble learning has emerged as a particularly promising approach within educational predictive analytics, with demonstrated efficacy across diverse prediction tasks including performance forecasting, dropout identification, and at-risk student classification. The theoretical underpinnings of ensemble methodology rest on the statistical principle that combining multiple diverse models can produce more robust and accurate predictions than a single constituent model [8]. The literature broadly categorizes ensemble methods into parallel approaches (bagging), sequential approaches (boosting), and heterogeneous integration (stacking), each with distinct mechanistic characteristics and performance attributes [7]. Bagging (Bootstrap Aggregating) operates by training multiple base learners on random subsets of the original training data, then aggregating their predictions through averaging or majority voting. Bagging primarily reduces model variance and mitigates overfitting, particularly beneficial with high-dimensional educational data containing complex interaction effects [7,8]. In contrast, boosting algorithms such as AdaBoost, Gradient Boosting, and XGBoost employ a sequential strategy where each subsequent model focuses on correcting errors made by previous models. This approach systematically reduces bias while maintaining low variance, often achieving superior predictive accuracy compared to bagging methods. The pursuit of optimal ensemble performance has motivated the integration of metaheuristic optimization algorithms with machine learning methodologies, creating hybrid architectures that systematically navigate the complex parameter spaces inherent in ensemble configuration. Metaheuristic algorithms, inspired by natural phenomena and biological systems, offer powerful global search capabilities that effectively address the combinatorial challenges of feature selection, hyperparameter tuning, and model weighting in ensemble construction [11,12]. This integration represents a significant advancement beyond manual parameter configuration or grid search approaches, which often prove computationally prohibitive and prone to suboptimal local convergence.

Nature-inspired optimization algorithms have demonstrated remarkable efficacy in refining predictive models across diverse domains. The Grey Wolf Optimizer (GWO), in particular, has emerged as a prominent metaheuristic based on the social hierarchy and hunting behavior of grey wolves, characterized by balanced exploration–exploitation dynamics and relatively few control parameters [13,14]. The mechanistic basis for metaheuristic–ensemble integration lies in the multidimensional optimization landscape defined by ensemble parameters. Each base model contributes multiple hyperparameters that collectively define a complex, non-differentiable search space often characterized by numerous local optima. Metaheuristic algorithms excel in such environments due to their population-based stochastic search strategies that simultaneously evaluate multiple candidate solutions, effectively exploring the parameter space while resisting premature convergence. This capability is particularly valuable for weighted ensemble approaches where determining optimal model contributions presents a non-trivial optimization challenge directly impacting predictive accuracy [13,14].

Despite considerable methodological advances in educational predictive analytics, several significant limitations persist in current approaches, constraining their practical efficacy and generalizability. Through critical synthesis of the literature, we identify four principal research gaps that merit addressing in contemporary research. First, existing ensemble approaches often lack systematic optimization frameworks for determining optimal model weights and configurations. While studies have implemented weighted voting ensembles, their weighting schemes typically derive from static performance metrics rather than dynamic optimization processes. This represents a substantial limitation because suboptimal model weighting can diminish ensemble performance below theoretical capability. The potential of metaheuristic algorithms to address this challenge remains underexplored in educational contexts. Second, there exists a notable interpretability–transparency deficit in complex ensemble architectures. As observed by Guevara-Reyes et al. [10], existing approaches lack interpretability and do not provide actionable insights for decision-making, creating a significant barrier to adoption in educational settings where pedagogical decisions require justification. While techniques such as SHAP and LIME (Local Interpretable Model-agnostic Explanations) have been applied to individual educational prediction models [9], their integration with metaheuristic-optimized ensembles remains limited. Third, there is insufficient exploration of computational efficiency in complex ensemble optimization [13,14], particularly concerning real-world educational applications with resource constraints. Metaheuristic optimization introduces nontrivial computational overhead, yet few studies in educational predictive analytics have systematically addressed trade-offs between predictive accuracy and computational feasibility.

In response to the identified research gaps, this study proposes a novel optimized ensemble framework that integrates the GWO with heterogeneous machine learning models for educational predictive analytics. The primary research objective is to design, implement, and empirically validate a GWO-optimized ensemble architecture that significantly enhances predictive accuracy for student academic performance while maintaining practical interpretability and computational feasibility. This central objective operationalizes through several specific research aims, outlined as follows: To develop a weighted heterogeneous ensemble framework that strategically combines the complementary strengths of CatBoost, Extra Trees, and Random Forest algorithms for student performance prediction, explicitly addressing the limitations of single-model approaches documented in prior research. To implement the GWO for automated determination of optimal model weights and hyperparameters within the ensemble architecture, overcoming the suboptimal configuration limitations observed in current educational prediction models. To quantitatively evaluate the predictive performance of the proposed GWO-optimized ensemble against constituent base models and conventional ensemble approaches, using multiple metrics and establishing statistical significance through appropriate hypothesis-testing procedures. To enhance model interpretability through integrated SHAP analysis, identifying feature importance rankings and partial dependency relationships that provide actionable insights for educational intervention strategies, thereby addressing the black box criticism frequently leveled against complex ensemble models. This research makes significant contributions to the field of educational predictive analytics. Methodologically, it introduces a novel integration of metaheuristic optimization with heterogeneous ensemble learning specifically tailored to the characteristics of educational data. Empirically, it provides rigorous validation across multiple performance metrics and comparative benchmarks against established ML models. Practically, it delivers an interpretable predictive framework that supports data-driven educational decision-making with a transparent rationale for predictions. Theoretically, it advances understanding of optimal ensemble construction for educational prediction tasks and demonstrates the efficacy of nature-inspired optimization in this context.

The manuscript is structured as follows: The Literature Review Section in Section 2 provides a comprehensive analysis of foundational and contemporary research on educational data mining, ensemble learning methods, and metaheuristic optimization. The Methodology Section in Section 3 details the proposed GWO-optimized ensemble architecture, including data acquisition and preprocessing procedures, GWO implementation details, and evaluation metrics. The Experimental Results and Discussion Section in Section 4 presents empirical findings from comparative performance evaluations. Finally, the Conclusion Section in Section 5 synthesizes key findings, reiterates primary contributions, and considers broader impacts for educational theory and practice.

2. Literature Review

Recent advances in educational predictive analytics have increasingly leveraged machine learning ensembles to forecast student performance, yet the optimal strategy for integrating heterogeneous models remains an open challenge. While individual tree-based algorithms such as Random Forests, Extra Trees, and CatBoost have demonstrated strong predictive capabilities, their standalone use often suffers from suboptimal generalization and sensitivity to hyperparameter configuration. In response, this research explores metaheuristic optimization as a means to enhance ensemble design, though systematic investigations of dynamic weight allocation via algorithms such as the GWO in educational contexts remain scarce. The following review critically examines key contributions over the past decade that intersect ensemble learning, metaheuristic optimization, and student outcome prediction, highlighting both progress and persistent gaps that motivate the present work.

Alsumaidaie et al. [15] evaluated the efficacy of several supervised learning algorithms for forecasting student outcomes, with a specific focus on comparing tree-based models. Their experimental design used Random Forests, Extra Trees, and K-Nearest Neighbors on an educational dataset, rigorously assessing their predictive performance. The Extra Trees classifier emerged as the top performer, thereby underscoring the potential of ensemble tree methods in this domain. Fan et al. [16] introduced a novel multi-layer architecture built upon the CatBoost algorithm to predict student grades in periodic examinations. Their approach incorporated a feature-importance mechanism to iteratively refine the model’s focus on the most predictive attributes at successive layers. This tailored CatBoost framework demonstrated superior performance over standard implementations. Chen et al. [17] developed a strategy for student performance prediction that centered on optimizing tree-based ensemble models. Their methodology sought to fine-tune the hyperparameters of these models to maximize predictive accuracy. The optimized framework yielded an average accuracy of 93.11%. Wang [18] introduced a hybrid model combining DistilBERT with LSTM and optimized it using the Spotted Hyena Optimizer (SHO) for predicting student achievement. Their approach focused on dynamic parameter tuning to efficiently handle large-scale educational datasets. The model demonstrated exceptional performance. Ahmed et al. [9] conducted a comprehensive comparison of ten regression models, for academic performance prediction. Their study utilized two datasets with distinct features to evaluate model generalizability. The ensemble approach achieved remarkable performance, demonstrating robustness through weighted averaging of top-performing base models. Khan et al. [19] developed a novel hybrid architecture integrating Convolutional Neural Networks (CNNs) with Random Forests and used XGBoost as a meta-learner for predicting student achievement. Their approach leveraged institutional records from 24,005 students to capture complex feature interactions. The model outperformed the models it was compared to. Guevara-Reyes et al. [10] deployed optimized machine learning models, specifically XGBoost and Random Forest, to analyze geographic, institutional, and socioeconomic factors affecting student performance. By incorporating SHAP-based interpretability techniques, the research provided transparent insights into feature importance. XGBoost achieved superior performance. Wang and Yu [20] developed a logistic regression model enhanced with Taylor expansion for predicting student performance in online learning environments. Their methodology constructed eleven learning behavioral indicators from online learning processes and selected the most correlated features for model training. The approach demonstrated significant dependencies between learning initiative and duration, offering insights into behavioral factors affecting academic outcomes. Gul et al. [21] studied eight machine learning models including Decision Trees, Random Forest, and CatBoost to predict student performance based on demographic and academic features. Their evaluation used multiple error metrics to assess model robustness across different student populations. CatBoost achieved the highest accuracy at 87.46%, outperforming other algorithms and demonstrating the effectiveness of gradient boosting methods for educational classification tasks. Ma [22] integrated a Random Forest Classifier with meta-heuristic optimization algorithms, specifically Electric Charged Particles Optimization (ECPO) and Artificial Rabbits Optimization (ARO), to predict student performance, demonstrating superior predictive capacity.

The reviewed studies collectively reveal that while metaheuristic optimization has proven effective across various engineering and prediction domains, three persistent gaps remain. First, most ensemble-learning frameworks rely on static or heuristic weighting, leaving dynamic and data-driven optimization strategies underexplored. Second, few metaheuristic studies integrate interpretability mechanisms such as SHAP or LIME, limiting their practical value for decision-makers in educational settings. Third, computational trade-offs between accuracy, interpretability, and scalability are rarely quantified, leading to uncertainty regarding real-world feasibility.

3. Methodology

The research subject of this study is the development of an optimized, interpretable ensemble learning framework for predicting student academic performance. The main research problem is the lack of dynamic weighting strategies for heterogeneous ensemble models in educational data mining applications, leading to reduced generalization and transparency. Accordingly, the research objective is to design, implement, and empirically validate a GWO-guided ensemble architecture that enhances prediction accuracy, interpretability, and computational efficiency.

The specific research problems addressed are as follows:

(1): Static weighting and lack of adaptive optimization in traditional ensemble methods;
(2): Limited interpretability of complex, metaheuristic-driven models; and
(3): Unquantified trade-offs between predictive accuracy and computational cost in educational analytics. The study adopts a quantitative experimental approach, employing metaheuristic optimization, ensemble regression, SHAP-based interpretability, and statistical validation techniques. All experiments were implemented in Python 3.10 using scikit-learn 1.7.1, CatBoost 1.2.8, NumPy, 2.2.0 and the SHAP library 0.49.1, executed on an Intel Core i7 (2.80 GHz, 16 GB RAM) Windows 11 Pro 64-bit environment workstation.

3.1. The Grey Wolf Optimizer (GWO)

The GWO was introduced by Mirjalili as a novel swarm intelligence evolutionary algorithm inspired by the social hierarchy and hunting behavior of grey wolves [23]. The hierarchy among the wolves is structured such that the top three wolves, ranked by fitness, are designated as alpha (

α

), beta (

β

), and delta (

δ

). These elite wolves are tasked with making strategic decisions, while the other wolves, known as omega (

ω

), follow them. A notable aspect of grey wolf behavior is their group hunting strategy, which comprises three main phases: tracking, encircling, and attacking prey. The optimization process in GWO begins with a population of candidate solutions represented by wolves whose initial positions are randomly distributed within the search space [24,25]. The search is predominantly guided by the elite wolves, with the omega wolves updating their positions based on the movements of these leaders. The mathematical modeling of the encircling phase is given by Equations (1) and (2).

D_{t} = |C_{t} \cdot X_{p, t} - X_{t}|

(1)

X_{t + 1} = X_{p, t} - A_{t} \cdot D_{t}

(2)

where

D_{t}

represents the distance between the grey wolf and the prey at iteration

t

,

X_{p, t}

and

X_{t}

denote the positions of the prey and the wolf, respectively. The vectors

A

and

C

are defined as

A = 2 a r_{1} - a

and

C = 2 r_{2}

, they are coefficient vectors, where

a

linearly decreases from 2 to 0 over the course of iterations, and

r_{1},

and

r_{2}

are random vectors with components in the interval. In this framework, the position vector of each grey wolf corresponds to a feature vector representing a candidate solution, while the objective function value associated with each candidate corresponds to the wolf’s fitness. Within each iteration, the top three candidate solutions are identified as the alpha, beta, and delta wolves, respectively, while the remaining candidates are classified as omega wolves. The position of each wolf is updated by considering the influence of these three leaders, as described by Equations (3)–(8)

D_{α, t} = |C_{1} \cdot X_{α, t} - X_{t}|

(3)

D_{β, t} = |C_{2} \cdot X_{β, t} - X_{t}|

(4)

D_{δ, t} = |C_{3} \cdot X_{δ, t} - X_{t}|

(5)

X_{1} = X_{α, t} - A_{1} \cdot D_{α, t}

(6)

X_{2} = X_{β, t} - A_{2} \cdot D_{β, t}

(7)

X_{3} = X_{δ, t} - A_{3} \cdot D_{δ, t}

(8)

Here,

X_{α, t}, X_{β, t}, X_{δ, t}

denote the positions of the

α, β

, and

δ

wolves at iteration

t

, and

A_{1}, A_{2}, A_{3}

and

C_{1}, C_{2}, C_{3}

are coefficient vectors. The wolf’s position for the next iteration is then updated as the average of these three positions, as expressed in Equation (9).

X_{t + 1} = \frac{X_{1} + X_{2} + X_{3}}{3}

(9)

This update method allows the wolves to iteratively converge towards optimal solutions by simulating the coordinated hunting behavior of grey wolves in nature, where the leading wolves guide the pack’s movement towards prey, effectively balancing exploration and exploitation within the search space.

3.2. CatBoost Algorithm

The CatBoost model, introduced by Prokhorenkova et al. [26,27], is a gradient boosting decision tree (GBDT) framework distinguished by its use of symmetric decision trees and its inherent capability to handle categorical features effectively. This model adopts an ordered boosting strategy with relatively few parameters, which helps in minimizing gradient bias and prediction drift, thereby reducing overfitting and enhancing both accuracy and generalizability. One of the core techniques in CatBoost is the use of greedy target-based statistics to efficiently encode categorical variables. The method approach sorts the data to optimize gradient shifts while using symmetric numbers to prevent overfitting. In processing a dataset permutation

θ = {[σ_{1}, \dots, σ_{n}]}^{T}

, the value of the encoded feature

x_{σ_{p}, k}

for the

p

-th instance and the

k

-th categorical feature is computed as denoted in Equation (10).

x_{σ_{p}, k} = \frac{\sum_{j = 1}^{p - 1} [x_{σ_{j}, k} = x_{σ_{p}, k}] \cdot Y_{σ_{j}} + a \cdot P}{\sum_{j = 1}^{p - 1} I [x_{σ_{j}, k} = x_{σ_{p}, k}] + a}

(10)

Here,

Y_{σ_{j}}

represents target variables,

a

is the prior weight (greater than zero), and

P

is a prior term, often the global target mean in regression tasks. This formulation helps reduce overfitting by smoothing categorical feature estimations through prior knowledge and averaging, particularly when data contains fewer features or is noisy. The boosting principle in CatBoost constructs a sequence of models

F_{1} (x), F_{2} (x), \dots, F_{T} (x)

, where each subsequent model aims to correct errors from the previous ensemble. The iterative update in round

t

as expressed in Equation (11)

F_{t + 1} (x) = F_{t} (x) + η \cdot h_{t} (x)

(11)

In this equation,

F_{t} (x)

denotes the model output after

t

rounds,

η

is the learning rate (typically between 0 and 1), and

h_{t} (x)

is the weak learner at iteration

t

, commonly a decision tree. This iterative scheme finely controls the step size of updates to stabilize learning and improve accuracy. CatBoost further employs an ordered boosting approach to avoid prediction bias caused by data leakage inherent in traditional GBDT. Instead of training on the entire dataset simultaneously, it trains

n

models

M_{i}

on the first

i

samples for

i = 1, \dots, n

, where

n

is the number of samples. This ordering ensures that during training, samples are predicted without using their own gradient information, achieving an unbiased estimation. The model is optimized by minimizing a customized loss function defined over all

N

training samples as defined in Equation (12)

L = \sum_{i = 1}^{N} l (y_{i}, F (x_{i}))

(12)

where

l

is the loss function,

y_{i}

is the true label for sample

i

, and

F (x_{i})

is its predicted value. For multiclass classification tasks, CatBoost minimizes the multiclass cross-entropy loss given by Equation (13)

l (y_{i}, F (x_{i})) = - \sum_{k = 1}^{K} 1 [y_{i} = k] l o g (\frac{e^{F_{k} (x_{i})}}{\sum_{j = 1}^{K} e^{F_{j} (x_{i})}})

(13)

where

K

is the number of classes,

F_{k} (x_{i})

is the raw output for class

k

, and

1 [y_{i} = k]

is an indicator function equal to 1 if the true label equals

k

and 0 otherwise. The softmax function normalizes logits into probabilities for classification. By integrating these strategies, CatBoost effectively addresses challenges in handling categorical data, mitigating overfitting, and ensuring unbiased gradient estimation, resulting in a robust and accurate gradient boosting model.

3.3. Extra Trees

The ExtraTrees algorithm, an ensemble learning technique based on decision trees, is typically denoted as

E T (T, K, X)

[28], where

T

signifies the total number of trees,

K

represents a randomly selected subset of features, and

X

is the dataset under consideration. This algorithm constructs numerous highly randomized decision trees and aggregates their outputs to improve the overall generalization and robustness of the model.

The construction process for ExtraTrees involves several key steps. Initially, given a dataset

X

containing

n

samples and

m

features, a random subset of features

F_{selected}

is chosen from the full feature set of size

m

for consideration in node splitting. Importantly, each tree is developed using the entire dataset instead of a bootstrap sample, distinguishing ExtraTrees from some other tree-based ensemble methods. Following this, the splitting strategy at each node is implemented according to standard decision tree principles, yet with enhanced randomness. Specifically, at each node, the algorithm randomly selects the subset

F_{selected}

from among the features to determine the best split, recursively dividing the node into two child nodes until a predefined stopping criterion, such as a maximum depth, is met. These steps are repeated

T

times to generate the full ensemble of trees that constitute the ExtraTrees model. Prediction for a new sample

x

is obtained by aggregating the outputs of all individual trees within the ensemble. For classification problems, a majority voting scheme is applied, wherein each tree casts a vote for a class, and the class receiving the most votes is selected as the final prediction. For regression tasks, the prediction

{\hat{y}}_{i}

for the

i

-th sample is computed as the average of predictions from all

T

trees, formally expressed as defined in Equation (14)

{\hat{y}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} f_{t} (x_{i})

(14)

where

f_{t} (x_{i})

denotes the prediction made by the

t

-th tree for the sample

x_{i}

. ExtraTrees excels in processing high-dimensional data, leveraging randomized feature subsets to enhance model performance.

3.4. Random Forest

Random Forest is a widely employed ensemble machine learning technique that constructs a multitude of decision trees during training and outputs the most frequent class (for classification) or the average prediction (for regression) from these trees. Random Forest enhances predictive performance by combining the outputs of multiple randomized decision trees, thereby reducing overfitting typical of single trees and improving generalization [29,30]. Each tree in the forest is built using a bootstrap sample of the training data, and at each node, a random subset of features is selected to determine the best split. This additional randomness helps increase diversity among trees, which is essential for the ensemble’s strength and robustness. Formally, given a dataset

X

with

n

observations and

m

features, a Random Forest trains

T

trees

{\{f_{t}\}}_{t = 1}^{T}

, where each tree

f_{t}

is grown on a bootstrap sample with split decisions restricted to randomly chosen subsets of features. The prediction for a sample

x_{i}

is aggregated and detailed in Equation (15).

{\hat{y}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} f_{t} (x_{i})

(15)

3.5. Proposed GWO Optimized Ensemble Model

This study proposes an ensemble predictive model that integrates three distinct machine learning algorithms CatBoost, Extra Trees, and Random Forest with optimization of their respective weights achieved through the GWO. The core objective is to harness the complementary strengths of these base models while optimizing their combined predictive accuracy by identifying the optimal weighting scheme. The GWO algorithm, inspired by the social hierarchy and hunting strategies of grey wolves, was employed to determine the optimal weighting vector

w = [w_{1}, w_{2}, w_{3}]

applied to the predictions of the base models in the ensemble. The optimization problem seeks to minimize the Mean Squared Error (MSE) on the validation dataset by adjusting these weights within the bounded search space

w_{i} \in

for

i = 1, 2, 3

, subject to the constraint

\sum_{i = 1}^{3} w_{i} = 1

. The search boundaries of the agent are [0, 1]. The GWO initializes a population of search agents representing candidate weight vectors, uniformly distributed within the search bounds. The algorithm maintains three leading candidate solutions, labeled Alpha, Beta, and Delta, representing the best, second-best, and third-best solutions, respectively. At each iteration

t

, the positions of all agents are updated according to weighted combinations of distances from these three best wolves. The position update rule is as defined in Equation (9). The objective function

f (w)

minimized by GWO calculates the weighted ensemble prediction as defined in Equation (16).

\hat{y} = \sum_{i = 1}^{3} w_{i} {\hat{y}}_{i}

(16)

where

{\hat{y}}_{i}

is the prediction vector from the

i^{th}

base model on the validation set.

w_{i}

represent the ensemble weight. The MSE between

\hat{y}

and the true validation targets

y

serves as the fitness measure as defined in Equation (17)

M S E = \frac{1}{N} \sum_{j = 1}^{N} {(y_{j} - {\hat{y}}_{j})}^{2}

(17)

where

N

is the number of validation samples.

y_{j}

represent the actual target of validation sample. All weight values are constrained to be nonnegative and normalized at each evaluation to ensure a valid convex combination,

w_{i} \geq 0

and

\sum_{i} w_{i} = 1

. The optimized ensemble prediction on any dataset

D

(validation or test) is computed as expressed in Equation (18)

{\hat{y}}_{ens} = w_{1}^{*} \cdot f_{CatBoost} (D) + w_{2}^{*} \cdot f_{ExtraTrees} (D) + w_{3}^{*} \cdot f_{RandomForest} (D)

(18)

where

w_{i}^{*}

are the GWO-optimized weights, and

f_{(\cdot)}

denotes the trained base regressor. This weighted aggregation leverages the strengths of each constituent model while minimizing collective error through data-driven weight calibration. The flowchart of the proposed GWO optimized ensemble model is given in Figure 1. The methodological novelty of this study extends beyond the mere application of the GWO. Unlike conventional weighted-average or stacking ensembles that employ fixed or heuristically tuned coefficients, the proposed framework formulates ensemble weighting as a continuous nonlinear optimization problem solved via GWO’s population-based search dynamics. This approach enables adaptive calibration of model contributions based on each base learner’s validation performance, thereby achieving an optimal trade-off between bias and variance in the aggregated predictor. Through its exploitation–exploration mechanism, GWO autonomously identifies the most effective weighting configuration without manual intervention or meta-learner dependency. This dynamic optimization yields measurable gains in predictive accuracy and generalization over static or linearly combined ensembles, substantiating the algorithmic novelty and empirical value of the proposed method. The GWO was selected as the optimization mechanism owing to its efficient exploration exploitation balance and minimal parameter dependency. GWO maintains diversity through its hierarchical hunting mechanism resulting in a smoother convergence trajectory and faster attainment of near-optimal ensemble weights.

3.6. Dataset

The dataset, which comprises behavioral, demographic, and lifestyle features of students, was sourced from Kaggle [31]. The dataset contains 1000 data points. The dataset is first subjected to rigorous preprocessing. Missing values in the target variable (exam score) and associated features are removed, and engineered features such as total screen time, lifestyle score, and study effectiveness are introduced to capture latent behavioral patterns. Total screen time is the total time spent on social media platforms and Netflix hours; the lifestyle score is the sum of sleep hours, exercise frequency, and mental health rating. Study effectiveness is defined as the product of daily study hours and the proportion of classes attended. Categorical variables (gender, part-time job, extracurricular participation, diet quality, parental education level, and internet quality) are encoded via label encoding and one-hot encoding, respectively. All features are standardized using a standardization method to ensure unit variance and zero mean, mitigating scale-induced bias in distance-sensitive components of the ensemble. Finally the input features in the machine learning model are study hours per day, study effectiveness, lifestyle score, mental health rating, social media hours, total screen time, Netflix hours, part time job, sleep hours, exercise frequency, diet quality good, age, extracurricular participation, gender, attendance percentage, internet quality poor, internet quality good, parental education level master, parental education level high school, diet quality poor. The first five rows of the dataset are given in Table 1.

3.7. Model Evaluation Metrics

To rigorously assess the predictive performance of the individual base learners and the proposed GWO-optimized ensemble model, we employ a comprehensive suite of regression evaluation metrics. These metrics collectively capture different aspects of prediction error magnitude, directionality, relative scale, and explanatory power thereby providing a multifaceted view of model fidelity. The Root Mean Squared Error (RMSE) penalizes larger errors more heavily due to the squaring operation, making it sensitive to outliers and particularly informative in contexts where substantial prediction deviations are especially undesirable, as expressed in Equation (19)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(19)

The Mean Absolute Error (MAE) provides a robust measure of average prediction magnitude without squaring residuals. MAE is less influenced by extreme errors and offers an intuitive interpretation in the original units of the target variable, facilitating practical understanding of typical prediction as given in Equation (20)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(20)

Mean Absolute Percentage Error (MAPE) expresses average error as a percentage of true values. MAPE is particularly useful for comparing model performance across datasets with differing magnitudes as defined in Equation (21)

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(21)

The Mean Error (ME), or average bias, is defined in Equation (22). It indicates the directional tendency of predictions: a positive ME signifies systematic overestimation, while a negative ME reflects consistent underestimation.

M E = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})

(22)

The coefficient of determination (

R^{2}

) quantifies the proportion of variance in the observed exam scores explained by the model as given in Equation (23)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \overline{y})}^{2}}

(23)

An

R^{2}

value of 1 indicates perfect prediction, 0 implies performance equivalent to a naïve mean predictor, and negative values suggest worse-than-baseline performance.

R^{2}

is included to assess explanatory power and model fit in a standardized, scale-invariant manner. The combination of these five metrics enables a nuanced, statistically grounded comparison of predictive models, aligning with best practices in machine learning evaluation for regression tasks. In Equations (19)–(23)

\overline{y}

is the mean of the observed values,

{\hat{y}}_{i}

represented the predicted output and

y_{i}

denote the true target value.

4. Experiment and Discussion Section

In this experimental study, ML were trained with the following hyperparameter settings to establish a consistent, unbiased baseline for comparison. Specifically, the default configurations consisted of 200 estimators and a maximum depth of 15, for Random Forest and Extra Trees, 200 estimators, a learning rate of 0.05, and maximum depth of 8 for Gradient Boosting, SVR configured with an RBF kernel,

C = 10

, and

γ = 0.1

; AdaBoost employing 200 estimators, KNN with 10 neighbors; Decision Tree with a maximum depth of 10, and CatBoost iterations is set to 500. The parameters were chosen based on the established literature. The dataset was partitioned such that 80% was allocated to training, and the remaining

20 %

was split further into validation and test sets at a

60 : 40

ratio, thereby ensuring rigorous model evaluation. In the GWO optimization process, the search space for the ensemble weights was defined within the interval, with the number of search agents (population size) set to 30 and a maximum of 1000 iterations for convergence. This parameterization was intended to effectively balance exploration and exploitation during the optimization, thereby determining an optimal combination of model weights that minimized the validation MSE.

The empirical evaluation presented in Table 2 and Table 3 provides a comprehensive benchmark of the predictive performance of individual regression models relative to the proposed GWO-based ensemble framework across both the validation and test phases. The results substantiate several key insights regarding model behavior, ensemble efficacy, and the role of metaheuristic optimization in enhancing predictive accuracy and generalization. First, among the individual learners, the tree-based ensemble methods specifically CatBoost, Extra Trees, and Random Forest consistently outperform other algorithms across all evaluation metrics. This observation aligns with the established literature highlighting the robustness of ensemble tree methods in handling nonlinear relationships, feature interactions, and high-dimensional data without extensive preprocessing. CatBoost emerges as the strongest individual model, achieving a validation MSE of 34.34 and an

R^{2}

of 0.8713, closely trailed by Extra Trees

R^{2}

of 0.8709 and Random Forest

R^{2}

of 0.8698. On the test set, CatBoost further demonstrates superior generalization with an MSE of 23.56 and

R^{2}

of 0.9269, reinforcing its capacity to minimize both bias and variance in unseen data.

In contrast, non-ensemble and distance-based models, namely, SVR, KNN, and the single DT exhibit markedly inferior performance. SVR and KNN yield substantially higher MSE values (62.77 and 87.13 on validation, respectively) and lower

R^{2}

scores (0.7647 and 0.6735), indicative of limited adaptability to the underlying data structure and sensitivity to feature scaling or neighborhood selection. The Decision Tree’s poor performance further underscores the limitations of single-tree models, including high variance and susceptibility to overfitting, despite their interpretability. Notably, the GWO-optimized ensemble model surpasses all individual regressors on both validation and test sets, achieving the lowest MSE of 32.64 for validation and 22.56 for test, RMSE of 5.71 for validation and 4.75 for test, and MAE of 4.59 for validation and 3.79 for test, alongside the highest

R^{2}

scores of 0.8777 for validation and 0.9300 for test. The consistent improvement across metrics, particularly the 4.3% reduction in test MSE relative to CatBoost, demonstrates that the ensemble does not merely average predictions but strategically combines base learners using GWO-derived optimal weights that minimize global error while maximizing explained variance.

The significant gains achieved by the GWO ensemble are particularly important in the context of student exam score tasks, where diminishing performance is common among high-performing models. The success of the GWO ensemble can be attributed to its ability to harness the complementary strengths of heterogeneous base models. For instance, while CatBoost excels in gradient-boosted sequential learning, Random Forest and Extra Trees offer decorrelated bagging-based variance reduction. By assigning data-driven weights via a metaheuristic search that navigates the complex, non-convex landscape of ensemble weight space, GWO effectively balances model-specific biases and variances. This optimization process yields a synergistic predictor that mitigates individual model weaknesses, such as SVR’s rigidity or KNN’s sensitivity to local noise-while amplifying collective strengths. Furthermore, the ensemble’s superior test performance relative to validation suggests enhanced generalization rather than overfitting, a critical attribute in real-world applications where model robustness to distributional shifts is paramount. This contrasts with some boosting methods (AdaBoost and Gradient Boosting), which show larger performance gaps between validation and test phases, potentially indicating over-optimization during training. Conclusively. The GWO ensemble not only achieves state-of-the-art performance on the evaluated dataset but also exemplifies a principled approach to model fusion, one that transcends simple averaging or stacking by embedding optimization directly into the ensemble design.

The comparison of predicted versus actual value plots across all models for test data in Figure 2 demonstrates the superior predictive alignment of the GWO ensemble. Specifically, in the GWO ensemble plot, most predictions closely follow the ideal prediction line, indicating minimal deviation between observed and predicted outcomes. This concentration of data points along the diagonal, compared with the visible scatter and outliers in individual models such as KNN and SVR, indicates both precision and consistency in the ensemble’s performance. Models like CatBoost, Extra Trees, and Random Forests individually show robust correspondence to the ideal line, yet still exhibit more instances of under and over-prediction relative to the GWO ensemble. The GWO ensemble plot, by contrast, reveals a tighter clustering and diminished spread around the diagonal across almost the entire response range. Furthermore, performance indicators such as lower point dispersion and fewer noticeable outliers highlight the ensemble’s robustness and reduced susceptibility to systematic bias or large prediction errors. In contrast, models like KNN, SVR, and Decision Tree are characterized by larger variance and several points falling significantly away from the ideal prediction line, signaling reduced accuracy and less reliable generalization. Even among competent regressors such as Gradient Boosting or AdaBoost, a moderate spread is observed, which is effectively minimized in the GWO ensemble. Collectively, these plots corroborate numerical metrics, demonstrating that the GWO-optimized ensemble leverages the complementary strengths and mitigates the individual weaknesses of its component models, resulting in enhanced overall predictive accuracy and model stability, as evidenced by reduced error variance and closer alignment with the true target values.

The results depicted in the ensemble model weights bar chart in Figure 3. demonstrate the outcome of the GWO-based optimization for the regression task. The CatBoost model had the highest relative importance in the ensemble, as reflected by a weight of approximately 0.4781. Extra Trees followed with a substantial weight of 0.4086, while Random Forest contributed less, with a weight of 0.1133. This distribution of ensemble weights indicates that the CatBoost and Extra Trees models provided more informative and reliable predictions for the underlying data and were therefore prioritized in the ensemble aggregation. The allocation of weights determined by the GWO is guided by the minimization of prediction error on the validation set. As a result, models that demonstrated comparatively higher accuracy and robustness on this dataset received greater weights. The relatively lower allocation for Random Forest suggests that, within the context of the ensemble, its predictions were less aligned with the optimal estimation of the target variable, relative to its counterparts. Such an outcome highlights the adaptive capability of metaheuristic ensemble techniques, ensuring that the contribution of each base model is rigorously tailored to maximize overall predictive performance. By assigning dynamic, data-driven weights rather than relying on uniform or expert-defined combinations, the ensemble achieves a nuanced balance, incorporating models that best characterize the underlying patterns of the data, while diminishing the influence of less effective models. This methodology not only enhances model accuracy but also improves generalizability, underscoring the value of optimizer-based ensemble design in complex prediction tasks.

The bar chart in Figure 4 presents a comparative analysis of the test

R^{2}

scores for the evaluated regression models, including Random Forest, Gradient Boosting, SVR, Extra Trees, AdaBoost, KNN, Decision Tree, CatBoost, and the GWO-optimized ensemble. It is evident from the plot that the GWO ensemble model achieves the highest coefficient of determination, with an

R^{2}

value of 0.9300. CatBoost (0.9269), Random Forest (0.9225), and Extra Trees (0.9199) also demonstrate strong test performance, whereas models such as SVR (0.8059), KNN (0.7280), and Decision Tree (0.7656) display notably lower predictive capacity. The significant improvement in

R^{2}

observed for the ensemble model over the best individual algorithms illustrates the value of optimized model combination, achieved through metaheuristic weighting, in enhancing overall predictive accuracy and robustness. This clearly reflects the superiority of the ensemble strategy in extracting richer information and reducing model-specific bias, ultimately leading to better generalization on unseen data. Figure 5 presents the training and prediction runtime (in seconds) for eight individual regressors and the proposed GWO-optimized ensemble. Lightweight models such as Decision Tree (0.009 s) and SVR (0.099 s) exhibit the lowest computational overhead, while ensemble methods, including Random Forest (1.59 s), Gradient Boosting (1.31 s), and CatBoost (2.50 s) require slightly more time due to their iterative tree-building processes. The GWO ensemble, which optimizes combination weights over 1000 iterations with a population size of 20, incurs the highest runtime (14.59 s). This reflects the inherent cost of meta-heuristic optimization but is justified when improvements in predictive performance are critical. The results highlight a clear trade-off between computational efficiency and model sophistication, informing practical deployment choices under resource constraints.

The SHAP summary plot in Figure 6 for the GWO-optimized ensemble model provides a comprehensive explanation of the relative importance and directionality of each input feature in shaping the model’s predictions. The plot ranks features by their mean absolute SHAP value, with features at the top exerting a stronger impact on the output. In this case, study hours per day, and study effectiveness emerged as the most influential predictors, with higher values of these features consistently contributing to an increase in predicted exam scores. Lifestyle score, total screen time, and mental health rating also show considerable importance, highlighting the multifaceted nature of academic performance, which encompasses both study behaviors and well-being metrics. The color gradient, with blue representing low feature values and pink representing high values, clarifies the magnitude and direction of the effect. For example, a higher study hours per day is associated with positive SHAP values, indicating that increasing study hours typically raises predicted exam scores. Similarly, higher lifestyle score and mental health rating are associated with favorable outcomes, indicating that holistic well-being correlates with academic success. Features such as social media hours, Netflix hours, and total screen time demonstrate a more nuanced effect, with high values yielding both positive and negative SHAP contributions, reflecting the complex relationship between screen time habits and performance. Several categorical and demographic features, such as gender, extracurricular participation, and various levels of parental education, appear lower in the ranking, indicating relatively minor direct influence under the ensemble framework. Furthermore, the compact spread of SHAP values for less impactful features, such as part time job or diet quality poor, suggests limited variability in how these aspects affect the model’s output. Meanwhile, features associated with academic habits, attendance, and select well-being attributes display a wider spread, underscoring their greater explanatory relevance. The SHAP analysis affirms that the ensemble model’s predictions are predominantly shaped by factors explicitly linked to study behavior, personal effectiveness, and lifestyle, while secondary features contribute modestly. This provides transparency into the inner workings of the optimized ensemble and robust interpretability for stakeholders seeking to understand drivers of academic performance in complex predictive modeling contexts.

Beyond confirming expected influences such as study hours and effectiveness, the SHAP analysis revealed several non-obvious relationships. Notably, total screen time showed a bidirectional effect: moderate engagement correlated with improved performance, whereas excessive usage produced sharply negative SHAP values, indicating cognitive overload and exceeding distraction thresholds. Similarly, the interaction between mental health rating and lifestyle score demonstrated a nonlinear relationship: students maintaining a moderate lifestyle balance achieved higher predicted scores than those at either extreme of the wellness scale, suggesting that overly structured routines may reduce cognitive flexibility. These nuanced dependencies provide educators with actionable insights. For instance, academic advisors can use SHAP-derived thresholds to identify students at risk of digital overuse, while institutional wellness programs can align counseling resources toward maintaining behavioral balance rather than simply increasing study time. Such interpretability bridges predictive analytics with evidence-based educational interventions, reinforcing the model’s utility beyond predictive accuracy.

5. Conclusions

This study systematically investigated the efficacy of a GWO-based ensemble model for predicting academic performance, leveraging a combination of advanced machine learning regressors, namely CatBoost, Extra Trees, and Random Forest. Empirical results demonstrated that the proposed metaheuristically weighted ensemble consistently outperformed all individual base learners on both validation and test data. The ensemble’s superior predictive ability was evident across various metrics. The findings substantiate the hypothesis that an optimizer-driven ensemble approach can effectively capitalize on the complementary strengths and distinct error profiles of its components, thereby delivering enhanced accuracy, generalizability, and robustness. The optimized weighting ensured that greater influence was accorded to models yielding higher predictive reliability while still retaining diversity in the ensemble to minimize the risk of overfitting. Notwithstanding these promising results, several limitations should be acknowledged. First, the optimization was conducted on a single dataset, which may leave the model susceptible to data-specific bias and limit generalizability under different data distributions. Second, the current framework primarily relies on the aggregation of base model predictions; potentially richer ensemble strategies were not explored. Third, computational cost increases due to population-based optimization, especially with large datasets or more complex models, which could limit scalability in practical applications.

Future research should address these limitations by incorporating multiple optimization methods to further enhance robustness, evaluating hybrid metaheuristic algorithms to improve optimization performance, and investigating nonlinear stacking or blending architectures. Additionally, extending to broader, more heterogeneous educational datasets, as well as exploring model explainability tailored to domain experts, would strengthen both the scientific foundation and the real-world impact of ensemble-based predictive analytics in educational settings. Taken together, the results position GWO-optimized ensembles as a compelling methodological advancement for regression tasks involving complex, multifactorial input datasets. By integrating sophisticated optimization with state-of-the-art machine learning, this study contributes both methodological innovation and empirical evidence to the expanding literature at the intersection of artificial intelligence and educational data sciences.

Author Contributions

M.H.: Conceptualization, Supervision, Resources, Editing, O.R.A.: Methodology, Formal Analysis, Original Draft, A.A.: Supervision, Resources, Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data obtained through the experiments are available upon request from corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
Almalawi, A.; Soh, B.; Li, A.; Samra, H. Predictive Models for Educational Purposes: A Systematic Review. Big Data Cogn. Comput. 2024, 8, 187. [Google Scholar] [CrossRef]
Ukoba, K.; Onisuru, O.R.; Jen, T.-C.; Madyira, D.M.; Olatunji, K.O. Predictive modeling of climate change impacts using Artificial Intelligence: A review for equitable governance and sustainable outcome. Environ. Sci. Pollut. Res. 2025, 32, 10705–10724. [Google Scholar] [CrossRef]
Waheed, H.; Hassan, S.-U.; Aljohani, N.R.; Hardman, J.; Alelyani, S.; Nawaz, R. Predicting academic performance of students from VLE big data using deep learning models. Comput. Hum. Behav. 2020, 104, 106189. [Google Scholar] [CrossRef]
Musaddiq, M.H.; Sarfraz, M.S.; Shafi, N.; Maqsood, R.; Azam, A.; Ahmad, M. Predicting the Impact of Academic Key Factors and Spatial Behaviors on Students’ Performance. Appl. Sci. 2022, 12, 10112. [Google Scholar] [CrossRef]
Cagliero, L.; Canale, L.; Farinetti, L.; Baralis, E.; Venuto, E. Predicting Student Academic Performance by Means of Associative Classification. Appl. Sci. 2021, 11, 1420. [Google Scholar] [CrossRef]
GeeksforGeeks. Ensemble Learning—Multiple ML Models for Better Prediction. Available online: https://www.geeksforgeeks.org/machine-learning/a-comprehensive-guide-to-ensemble-learning/ (accessed on 9 October 2025).
What Is Ensemble Learning? | IBM. Available online: https://www.ibm.com/think/topics/ensemble-learning (accessed on 9 October 2025).
Ahmed, W.; Wani, M.A.; Plawiak, P.; Meshoul, S.; Mahmoud, A.; Hammad, M. Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions. Sci. Rep. 2025, 15, 26879. [Google Scholar] [CrossRef] [PubMed]
Guevara-Reyes, R.; Ortiz-Garcés, I.; Andrade, R.; Cox-Riquetti, F.; Villegas-Ch, W. Machine learning models for academic performance prediction: Interpretability and application in educational decision-making. Front. Educ. 2025, 10, 1632315. [Google Scholar] [CrossRef]
Pira, E.; Rouhi, A. A new metaheuristic optimization algorithm based on the participation of smart students to increase the class performance. Soft Comput. 2025, 29, 2829–2844. [Google Scholar] [CrossRef]
Shaikh, M.S.; Raj, S.; Zheng, G.; Xie, S.; Wang, C.; Dong, X.; Lin, Y.; Wang, C.; Junejo, N.U.R. Applications, classifications, and challenges: A comprehensive evaluation of recently developed metaheuristics for search and analysis. Artif. Intell. Rev. 2025, 58, 390. [Google Scholar] [CrossRef]
Saheed, Y.K.; Misra, S. A voting gray wolf optimizer-based ensemble learning models for intrusion detection in the Internet of Things. Int. J. Inf. Secur. 2024, 23, 1557–1581. [Google Scholar] [CrossRef]
Saleh, A.I.; Rabie, A.H.; ElSayyad, S.E.; Takieldeen, A.E.; Khalifa, F. An optimized ensemble grey wolf-based pipeline for monkeypox diagnosis. Sci. Rep. 2025, 15, 3819. [Google Scholar] [CrossRef]
Alsumaidaie, M.S.I.; Nafea, A.A.; Mukhlif, A.A.; Jalal, R.D.; AL-Ani, M.M. Intelligent System for Student Performance Prediction Using Machine Learning. Baghdad Sci. J. 2024, 21, 3877–3891. [Google Scholar] [CrossRef]
Fan, Z.; Gou, J.; Weng, S. A Feature Importance-Based Multi-Layer CatBoost for Student Performance Prediction. IEEE Trans. Knowl. Data Eng. 2024, 36, 5495–5507. [Google Scholar] [CrossRef]
Chen, M.; Liu, Z. Predicting performance of students by optimizing tree components of random forest using genetic algorithm. Heliyon 2024, 10, e32570. [Google Scholar] [CrossRef]
Wang, K. Optimized ensemble deep learning for predictive analysis of student achievement. PLoS ONE 2024, 19, e0309141. [Google Scholar] [CrossRef]
khan, S.; Mazhar, T.; Shahzad, T.; Amir khan, M.; Waheed, W.; Waheed, A.; Hamam, H. Predictive analytics in education- enhancing student achievement through machine learning. Soc. Sci. Humanit. Open 2025, 12, 101824. [Google Scholar] [CrossRef]
Wang, J.; Yu, Y. Machine learning approach to student performance prediction of online learning. PLoS ONE 2025, 20, e0299018. [Google Scholar] [CrossRef]
Gul, M.N.; Abbasi, W.; Babar, M.Z.; Aljohani, A.; Arif, M. Data driven decisions in education using a comprehensive machine learning framework for student performance prediction. Discov. Comput. 2025, 28, 153. [Google Scholar] [CrossRef]
Ma, C. Improving the Prediction of Student Performance by Integrating a Random Forest Classifier with Meta-Heuristic Optimization Algorithms. Int. J. Adv. Comput. Sci. Appl. IJACSA 2024, 15. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Shang, Y.; Zheng, M.; Li, J.; Zheng, X. An effective feature selection approach based on hybrid Grey Wolf Optimizer and Genetic Algorithm for hyperspectral image. Sci. Rep. 2025, 15, 1968. [Google Scholar] [CrossRef] [PubMed]
Shaikh, M.S.; Hua, C.; Jatoi, M.A.; Ansari, M.M.; Qader, A.A. Application of grey wolf optimisation algorithm in parameter calculation of overhead transmission line system. IET Sci. Meas. Technol. 2021, 15, 218–231. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 6639–6649. [Google Scholar]
Zhang, M.; Zhang, J.; Fan, J.; Li, Z.; Chen, J.; Zou, Y.; Jiang, D.; Nelias, D. Model Interpretability and Intensity Prediction of Rockbursts Using a Method Innovation Based on the QGHSCSO-CatBoost Algorithm. Rock. Mech. Rock. Eng. 2025. [Google Scholar] [CrossRef]
Zhang, L.; Chen, Y.; Gao, X.; Zheng, X.; Zhang, C. An algorithm for indoor positioning based on LightGBM + ExtraTrees chaotic weighted ensemble: Evaluation and comparison. J. Supercomput. 2025, 81, 812. [Google Scholar] [CrossRef]
Hu, J.; Szymczak, S. A review on longitudinal data analysis with random forest. Brief. Bioinform. 2023, 24, bbad002. [Google Scholar] [CrossRef] [PubMed]
Pantic, I.V.; Paunovic Pantic, J.; Valjarevic, S.; Corridon, P.R.; Topalovic, N. Artificial intelligence—Based approaches based on random forest algorithm for signal analysis: Potential applications in detection of chemico–biological interactions. Chem. Biol. Interact. 2025, 418, 111624. [Google Scholar] [CrossRef]
Student Habits vs Academic Performance. Available online: https://www.kaggle.com/datasets/jayaantanaath/student-habits-vs-academic-performance (accessed on 8 October 2025).

Figure 1. Flowchart of the proposed GWO-optimized ensemble model.

Figure 2. Scatter Plots of Actual versus Predicted Values for Base Models and the GWO-Optimized Ensemble on the Test Set.

Figure 3. Optimized Weights Assigned to Base Models in the GWO-Ensemble.

Figure 4. R² Comparison of Test Set: Base Models and GWO Ensemble.

Figure 5. Comparison of computation time.

Figure 6. SHAP Feature Importance Score of GWO-Ensembled Model.

Table 1. Data Characteristics.

Age	Gender	Study_Hours_Per_Day	Social_Media_Hours	Netflix_Hours	Attendance_Percentage	Sleep_Hours	Exercise_Frequency	Mental_Health_Rating	Extracurricular_Participation	Total_Screen_Time	Lifestyle_Score	Study_Effectiveness	Diet_Quality_Good	Diet_Quality_Poor	Parental_Education_Level_High School	Parental_Education_Level_Master	Internet_Quality_Good	Internet_Quality_Poor	Exam_Score
23	0	0	1.2	1.1	85	8	6	8	1	2.3	7.333333	0	0	0	0	1	0	0	56.2
20	0	6.9	2.8	2.3	97.3	4.6	6	8	0	5.1	6.2	6.7137	1	0	1	0	0	0	100
21	1	1.4	3.1	1.3	94.8	8	1	1	0	4.4	3.333333	1.3272	0	1	1	0	0	1	34.3
23	0	1	3.9	1	71	9.2	4	1	1	4.9	4.733333	0.71	0	1	0	1	1	0	26.8
19	0	5	4.4	0.5	90.9	4.9	3	1	0	4.9	2.966667	4.545	0	0	0	1	1	0	66.4

Table 2. Predictive Performance of Individual and Ensemble Models on the Validation Set.

	MSE	RMSE	MAE	MAPE	ME	R²
RandomForest	34.7371	5.8938	4.701	0.0787	1.1149	0.8698
GradientBoosting	43.3421	6.5835	5.1513	0.0847	1.1907	0.8376
SVR	62.7714	7.9228	6.4703	0.1082	0.1082	0.7647
ExtraTrees	34.4461	5.8691	4.707	0.0788	1.1554	0.8709
AdaBoost	43.8549	6.6223	5.5204	0.091	0.7743	0.8356
KNN	87.1265	9.3342	7.2996	0.1284	0.9213	0.6735
Decision Tree	76.6195	8.7533	6.7187	0.1092	1.8638	0.7128
CatBoost	34.3366	5.8597	4.6512	0.0758	1.0047	0.8713
GWO Ensemble Model	32.6398	5.7131	4.5888	0.076	1.0788	0.8777

Bold means significant number.

Table 3. Predictive Performance of Individual and Ensemble Models on the Test Set.

	MSE	RMSE	MAE	MAPE	ME	R²
RandomForest	24.9721	4.9972	3.9865	0.0637	0.1161	0.9225
GradientBoosting	30.2498	5.5	4.3602	0.0674	0.3711	0.9061
SVR	62.5385	7.9081	6.1625	0.1047	0.0314	0.8059
ExtraTrees	25.8052	5.0799	4.0658	0.0648	0.137	0.9199
AdaBoost	36.0166	6.0014	4.7093	0.0766	−0.3049	0.8882
KNN	87.6337	9.3613	7.3054	0.1269	0.2984	0.728
Decision Tree	75.5361	8.6912	6.8291	0.1055	1.0717	0.7656
CatBoost	23.5593	4.8538	3.8477	0.0563	−0.337	0.9269
GWO Ensemble Model	22.5553	4.7492	3.7869	0.0593	−0.092	0.93

Bold mean significant number.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Husayn, M.; Adegboye, O.R.; Alzubi, A. GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments. Appl. Sci. 2025, 15, 12163. https://doi.org/10.3390/app152212163

AMA Style

Husayn M, Adegboye OR, Alzubi A. GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments. Applied Sciences. 2025; 15(22):12163. https://doi.org/10.3390/app152212163

Chicago/Turabian Style

Husayn, Mohammed, Oluwatayomi Rereloluwa Adegboye, and Ahmad Alzubi. 2025. "GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments" Applied Sciences 15, no. 22: 12163. https://doi.org/10.3390/app152212163

APA Style

Husayn, M., Adegboye, O. R., & Alzubi, A. (2025). GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments. Applied Sciences, 15(22), 12163. https://doi.org/10.3390/app152212163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GWO-Optimized Ensemble Learning for Interpretable and Accurate Prediction of Student Academic Performance in Smart Learning Environments

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. The Grey Wolf Optimizer (GWO)

3.2. CatBoost Algorithm

3.3. Extra Trees

3.4. Random Forest

3.5. Proposed GWO Optimized Ensemble Model

3.6. Dataset

3.7. Model Evaluation Metrics

4. Experiment and Discussion Section

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI