A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic

Altaf, Saud; Asad, Rimsha; Ahmad, Shafiq; Ahmed, Iftikhar; Abdollahian, Mali; Zaindin, Mazen

doi:10.3390/su151511731

Open AccessArticle

A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic

by

Saud Altaf

^1,*

,

Rimsha Asad

¹,

Shafiq Ahmad

²

,

Iftikhar Ahmed

³

,

Mali Abdollahian

⁴

and

Mazen Zaindin

⁵

¹

University Institute of Information Technology, Pir Mehr Ali Shah Arid Agriculture University, Rawalpindi 46300, Pakistan

²

Industrial Engineering Department, College of Engineering, King Saud University, P.O. Box 800, Riyadh 11421, Saudi Arabia

³

Environmental and Public Health Department, College of Health Sciences, Abu Dhabi University, Abu Dhabi P.O. Box 59911, United Arab Emirates

⁴

School of Science, College of Sciences, Technology, Engineering, Mathematics, RMIT University, P.O. Box 2476, Melbourne, VIC 3001, Australia

⁵

Department of Statistics and Operations Research, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(15), 11731; https://doi.org/10.3390/su151511731

Submission received: 6 July 2023 / Revised: 26 July 2023 / Accepted: 27 July 2023 / Published: 29 July 2023

(This article belongs to the Special Issue Impact of COVID-19 on Education)

Download

Browse Figures

Versions Notes

Abstract

:

COVID-19’s rapid spread has disrupted educational initiatives. Schools worldwide have been implementing more possibilities for distance learning because of the worldwide epidemic of the COVID-19 virus, and Pakistan is no exception. However, this has resulted in several problems for students, including reduced access to technology, apathy, and unstable internet connections. It has become more challenging due to the rapid change to evaluate students’ academic development in a remote setting. A hybrid deep learning approach has been presented to evaluate the effectiveness of online education in Pakistan’s fight against the COVID-19 epidemic. Through the use of multiple data sources, including the demographics of students, online activity, learning patterns, and assessment results, this study seeks to realize the goal of precision education. The proposed research makes use of a dataset of Pakistani learners that was compiled during the COVID-19 pandemic. To properly assess the complex and heterogeneous data associated with online learning, the proposed framework employs several deep learning techniques, including 1D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. With the 98.8% accuracy rate for the trained model, it was clear that the deep learning framework could beat the performance of any other models currently in use. It has improved student performance assessment, which can inform tailored learning interventions and improve Pakistan’s online education. Finally, we compare the findings of this study to those of other, more established studies on evaluating student progress toward educational precision.

Keywords:

deep learning framework; educational data mining; online learning; learning analytics; COVID-19; classification framework

1. Introduction

Educational opportunity is a human right [1] and directly correlates to a country’s progress. New colleges and universities have been established by both the public and commercial sectors in recent years to improve students’ access to higher education and the quality of their education as a whole [2,3]. It has become increasingly challenging to achieve educational precision in Pakistan due to several new issues in the system. The education sector, aware of its difficulties in meeting its pupils’ needs, has actively sought ways to improve its practices [4].

To give each student an education that is specifically designed for them, educators are increasingly turning to data analytics and technological advancements. The result is supposed to be improved academic achievement and a higher level of student engagement due to catering to each student’s specific needs and learning preferences. Collecting and analyzing data on student achievement, behaviors, and preferences to craft individualized learning experiences is what we mean by “precision education” [5]. Precision education uses modern technological advancements to provide students with immediate access to tutoring and guidance, allowing them to maintain their current pace and complete their coursework.

Precision education uses digital learning platforms to collect and analyze student data to give each student an individualized and precise assessment of their progress [6]. Multiple types of student connections, including students’ appearance, learning patterns, and behavior, can be gathered with the help of these platforms. Due to factors such as students’ lack of motivation and incapacity to strike a work-study balance, precision education has taken on increased importance at the university level.

The success of the Pakistani educational system depends on the regular assessment of its effectiveness [4]. To boost institutional performance and guarantee on-time graduation for all students, it is crucial to pinpoint the barriers that slow them down. The key to success here is a helpful application of abundant student data. When working with many learners’ data, it can be difficult for teachers to mine data to identify problem areas. In contrast, data mining methods streamline and enliven the process while negating the need for instructors to participate. Data Mining (DM) techniques can probe data and unearth hidden patterns [7]. Educational data mining (EDM) refers to approaches and algorithms used to glean useful information from educational databases. Some procedures for extracting valuable knowledge from data are shown in Figure 1. Learning analytics (LA) and educational data mining (EDM) are two areas Siemens and Long [8] have identified for making use of educational data collected via digital podiums [9]. Researchers in EDM and LA have focused on studying various student demographics, learning preferences, and online instructional strategies.

Educational data mining (EDM) is a relatively new field that employs a wide variety of data mining techniques and strategies to identify specific subsets of data obtained from learning environments [10,11] and then uses this information to comprehend better how students perform in different classroom settings [12]. With this all-encompassing knowledge, educators can better manage classroom instruction, evaluate student performance, and foster more effective learning strategies. In 2008, Montreal, Canada, played host to the first-ever global scientific session on EDM, which paved the way for forming the Intercontinental EDM Society in 2011 and the launch of the Journal of EDM in 2009 [13].

Currently, EDM is being incorporated with course management systems like Moodle and Blackboard, allowing for the systematic collection and analysis of vast educational data [14]. Using educational data mining makes it easier to keep track of learner records and datasets, and it can also be used to mine relevant information from online education systems for insights that may be used in hypothesis testing and the identification of students’ unique learning styles. To better utilize technical relations gleaned from course-learning databases, EDM-based commendation systems are working toward this goal. Using a competency-based approach, these systems are meant to assist students throughout a course or program. Therefore, this method is essential in efficiently utilizing data to enhance pedagogical outcomes [15].

Online education [16] has grown in popularity among schools since its inception in the 1990s [17] with the advent of the World Wide Web (WWW) and the Internet. Over time, teaching and learning strategies have evolved [18] in response to the ever-evolving nature of information technology [19]. The findings regarding the efficacy of distance education are either similar to or far superior to those of conventional methods of instruction, according to a study of studies comparing traditional and online environments published by the United States Department of Education [20].

Students can now receive a great education regardless of their location, thanks to the proliferation of online learning, which has also reduced the number of obstacles that stand in the way of effective contact between instructors and students and freed them from the restrictions of time [21]. As a result of the COVID-19 pandemic, online information platforms have seen a dramatic increase in use and popularity [22]. Although some research indicates that students are receptive to online learning [23], other research conducted during the epidemic [24] has discovered that students had negative opinions toward online learning [25]. A survey of Pakistani students found that 77% of those polled held unfavorable opinions, while 84% experienced a narrower gap in communication between themselves and their instructors thanks to online instruction in the face of the pandemic [26]. Issues like students’ disinterest in coursework can be traced back to these discrepancies. As a result, students and teachers alike are encountering new difficulties due to the rapid expansion of online education [27].

Recent research has revealed the following shortcomings through a review of the relevant literature [18,19,20,21,22,23,24,25,26,27]:

Previous studies have made models specifically designed for learner prediction relevant to only a single course, which made the model too specific.
Recent studies that include the models for learner prediction under individual course is not an efficient strategy because it requires allocating resources to each model individually, which is an overburden. Therefore, a more generic model is needed.
Recent studies have encountered the major limitation of lacking the number of responses used as a dataset for training the model, raising the issue of scalability in these developed models.
The efficiency of current approaches is hindered by several challenges, such as data imbalance, misclassification, and insufficient feature set of factors considered while assessing student performance.

The following are the contributions made by the current study to address the limitations mentioned above:

The current study has proposed a framework that helps to predict the performance of learners for multiple courses. This helps prevent the creation of separate models that predict performance under a single course. In short, this study has proposed a framework that is generic enough to make sound and valid predictions considering multiple factors in view.
To develop a reliable and effective learner outcome prediction model, a deep learning hybrid model has been presented. Combining deep learning classifiers (such as 1D-CNN and LSTM) produces a robust model that can more accurately predict learner performance outcomes based on student performance in online learning during the COVID-19 term.
After the collection of online responses through a survey from all higher education students, their performance has been analyzed, and it has enabled to make the use of this available information to develop an adaptable model that considers a sufficient number of data points that could have an impact on student performance in any way.
This research utilized the SMOTE method for data resampling and the Median Filtering approach for data imputation. Layers of CNN have automatically performed feature extraction and attribute selection to determine which features most significantly affect the result of predictions made about learners.
The hybrid deep learning model utilized in this research has improved accuracy in visualizing the presence of experts in the field of advanced study and has helped to attain precision education by assisting weak students.

The structure of the following research is as follows. The literature review is presented in Section 2, and it consists primarily of a summary of previous research on Precision Education and LA as they pertain to distance education. In the third section, we introduce the methodology that will be used to evaluate the effectiveness of the proposed framework and the learners’ performance. The fourth section discusses the entire research project and its results, whereas the fifth section provides a conclusion and recommendations for future research.

2. Literature Review

Globally, the COVID-19 pandemic has clogged up the educational system. One recent investigation [28] found that the lack of immediate feedback responses to students during the COVID-19 outbreak had a negative impact on their academic performance. To address this problem, the research suggests employing a model educated using an enhanced, fully connected network (FCN). The methods used in this study are typical of DM research. After the initial phase, data was gathered from the most widely used database (“OULAD”). Online students’ demographic information, study habits, and other details are all captured in this dataset. There were 32,593 records stored there, covering eight months and 22 different classes. Following collection, the information was entered into a database for students. This was followed by data cleansing, which included the removal of duplicates and missing values. The model was trained using data converted from descriptive terms to numerical values, normalized, and separated into the testing and training sets. Algorithms for FCN and optimization helped achieve this goal. This analysis considered 21 data attributes. The accuracy of the FCN model used in the PYTHON environment has increased while the model successfully provides students with feedback. Greater accuracy than that of a conventional ANN model (84% to be exact) was attained using the proposed method.

Recently, another work was proposed for the student performance assessment during online learning using a deep learning model [29]. The chief goal of this work stayed towards reducing the time required for envisaging the makespan of a certain system state. This was achieved using training and executing an artificial neural network (ANN) that follows the framework of the successful AlphaZero method. The study addresses the challenges related to online scheduling involving interconnected automation systems. The pretrained data was used to train the model. The pretraining scenarios were twice the number used for self-play in the proposed approach. The study utilized a greedy algorithm to minimize the period between a job’s operations and reduce the overall makespan. This approach served as a good foundation for pretraining the AlphaZero ANN. The hyperparameters, including the quantity of slabs, bias, number of neurons, and dropout percentage, were optimized using grid-search-based optimization. Finally, the best version of the ANN was trained. According to the outcome, the network trained in the experiment was observed to have high accuracy of over 95% in predicting favorable actions. Additionally, it can guess the makespan along with loss value which was less than 3%.

For assessing student academic performance in the blended learning environment, another work was proposed using the data fusion technique [30]. The data used in the study was obtained from 57 first-year Electrical Engineering students at the University of Cordoba (UCO-Spain) who took the Introduction to Computer Science course in the foremost semester of the 2017–2018 study year. Data includes different details of students: their final exam results, practical sessions, theory classes, and online Moodle sessions. The study conducted four experiments to assess the academic performance of a university course using discretized data and preprocessed numerically. Four different data fusion methods and numerous classification procedures were utilized for this purpose. This study identified the best method for information fusion in envisaging academic performance in a university course. The results showed that the technique of collections and selecting the optimal features with slightly ordered facts provided superlative estimates. The study also found that the degree of concentration in Moodle examination results, classroom conferences, and Moodle discussion involvement were the most consistent predictors of students’ academic achievement in the program.

The issue of class imbalance was considered in a recent study [31] so that the model could perform well in both classes with good and poor results. To counter this problem, the study has proposed a hyper model involving two classifiers, CNN, and LSTM of deep learning, for timely student interventions. This work aimed to develop a scheme that may implicitly excerpt meaningful features from the unstructured information about Massive Open Online Courses (MOOCs) and use them to envisage if a learner will fail or complete the course. Two datasets of students studying at the University of Stanford were taken to perform this study. Two different MOOC platforms were considered for this task. The first dataset comprised five courses, and the second consisted of 39 courses. The dataset contained a huge number of responses. The data from both datasets were collected for a Record Period of 30 days for each course respectively. During preprocessing, inappropriate facts, such as blank columns and actions before a learner’s official registration, were filtered out. The data were anonymized to protect student’s personal information, and students’ identities were determined using a unique ID. Later, the filtered data was passed on to the next phase for extraction of optimal characteristics utilising the feature engineering technique. After training the model through CNN and LSTM, the custom loss function was applied to optimize the model’s performance. The Adam boost classifier was applied as an optimizer. The consequence of the study demonstrates that the proposed framework helped predict students’ performance, which educators and stakeholders can use to create an LA framework and make decisions that support and guide students. Also, the proposed improved performance compared to other deep learning models.

The prediction of student performance and their dropout rate was assessed in a recent study [32] based on students’ learning behaviour during online sessions. This work aimed to provide learners and instructors with valuable insights into learner behavior, allowing for a better understanding of the learning process. To conduct this research work, the author utilized a platform designed for the training of teachers in Africa, known as “UNESCO-IICBA (International Institute for Capacity Building in Africa)”. Some of the features were extracted to make predictions out of the student data which was available to them. As students’ learning behaviour constitutes the time series big data, the author has applied the RNN classifier of deep learning because it can better handle the time series data. To train the model and test its efficiency, three architectures of RNN have applied training of the model: Simple RNN, GRU, and LSTM. After the extraction of behavioral features and pre-processing of data model was trained. The model concluded that simple RNN performed better than its other two architectures. Also, model efficiency can be improved by considering more behavioral features of students learning during online sessions.

Recent work was proposed to assess student academic performance during interactive online sessions [33]. To experiment DEEDS dataset was employed, which is commonly known for extracting real-time data from students. Six lab sessions were considered for this study. In the data pre-processing phase, discrepancies, missing data, and irrelevant data values were handled. Later feature engineering was deployed. Thirty features were selected in the feature extraction phase, which was then reduced into three comprehensive features: Activities, statistics based on timing, and secondary activities of students. Using entropy-based procedures, data features were given ranks for the determination of correlation between the data variables, upon which these features were later selected for the model’s training. To assess the performance accurately, three experiments were performed. In the first experiment, the model was assessed considering all the data features. The second was conducted using a reduced set of features which was selected through the technique of entropy. The third experiment was performed to compare the proposed model with a previously developed model to evaluate the efficiency of the model. 5 classifiers were employed for model training, which includes RF, SVM, RF, LR, and NB. The outcome exposed that Random Forest has outperformed all other algorithms, with a high accuracy value of 97.4%.

A different research investigation was made to evaluate the psychological well-being of students throughout the COVID-19 lockdown and to observe the effects of this global pandemic on their mental health [34]. This work emphasized the significance and employment of operational tools and digital knowledge in COVID-19. It specifically discovered the impact of physical isolation, confinement, and remoteness on college learners’ emotional and psychological well-being. To classify the challenges confronted by scholars in online schooling through the COVID-19 crisis, the author conducted a SWOT analysis. This research study has employed an online inquiry form to assemble statistics from scholars of various Arab republics, given numerous factors such as study outlines, sleeping habits, psychological state, and demographic information. The dataset consisted of 1766 responses. Before model training, the collected data underwent a pre-processing phase. Numerous machine learning algorithms were employed to build a predictive system for student assessment, considering the impact of online learning methods both before and afterwards the COVID-19 plague. The dataset was fragmented into a training set (70%) and a validation set (30%). The model’s success was estimated using chi-square and ANOVA tests. This research concluded that there is a constructive correlation between online erudition and learner presentation through the epidemic.

Research work was steered to explore the influence of online teaching on student gratification during COVID-19 [35]. The main aim of the research was to predict learners’ academic performance and discover ways to improve the efficacy of online knowledge-gaining platforms. This work utilised an actual dataset to gather information on learner gratification plans and online education practices in COVID-19. Statistics were gained over an operational inquiry form administered to students from seven educational institutions in Egypt during the 2021–2022 academic year. It encompassed 18,691 responses and comprised assessments of students’ practices with online learning. The dataset underwent preprocessing to get rid of flawed data. To select the most appropriate features, 11 different meta-heuristic processes were employed. The dataset from Kafrelsheikh University and Mansoura University in Mansoura, Egypt, was then used to train two machine learning classifiers: Support Vector Machine (SVM) and k-NN. The research was directed using Python, and performance metrics were applied to evaluate the model’s effectiveness. The results showed a precision rate of 100%, indicating a robust model.

The research study described in [36] focused on identifying student learning behaviors in traditional in-class courses through the COVID-19 plague and their impression of student presentation. The research specifically targeted a small population of students enrolled in mechanical engineering undergraduate programs. Data for the study was collected through a survey administered via a mobile app, resulting in 133 responses representing four different sections. The dataset was alienated into a training set (70%) and a testing set (30%). The dataset included student information related to class attendance, participation in class activities, and other relevant factors, except for homework, which was not considered in this study. The collected data underwent preprocessing, including converting student grades to letter grades and applying the SMOTE method to poise the dataset. This work utilized various algorithms of machine learning, like decision trees, support vector machine, logistic regression, random forest, ensemble learning, and k-nearest neighbors, to train the model. An insignificant set of facts and figures was used for training with the 10-fold cross-validation procedure to avoid overfitting. The grid search procedure was employed to enhance the performance of each machine learning algorithm. The results demonstrated that ensemble learning outperformed other classifiers, achieving an impressive accuracy of 84%.

Another study [37] predicted the creation of an automated method to measure online learners’ performance based on COVID-19 data. This study used data from 2006–2017, which included information from over a thousand current and previous college students enrolled in 15 distinct courses. Following error removal in preprocessing, the IITR-APE dataset was used as the basis for precise performance forecasting. The Variational Auto-encoder Method (VAM) was then used for the data to extract the most essential features. Student performance was predicted using these derived features. Classifiers including “extra tree”, “linear regression”, “random forest”, “XGBoost”, “K-nearest neighbors”, and “multi-layer perceptron” were all used. Mean squared error, mean absolute error, root-mean-squared error, and the R2-score were some metrics used to assess the model’s efficacy. The outcomes highlighted the efficacy of deep learning models in making precise forecasts. The extra tree classifier had the best R2-score (0.720), MAE (5.943), MSE (77.709), and RMSE (8.581) of all the classifiers tried. The potential contributions of several previous studies are highlighted in Table 1.

3. Proposed Framework for Performance Assessment

This section outlines how online courses will be graded and how students’ performance will be tracked during the COVID-19 wave in Pakistan. The framework uses a machine learning-based hybrid ensemble learning approach. As a result of the pandemic, there has been a significant increase in the use of online learning platforms, resulting in several variables that can affect a student’s ability to master the material. A questionnaire was used to collect student responses to evaluate these possible influences. The feedback is compiled and examined to determine the factors most important in determining a student’s final grade.

Figure 2 shows the procedures that must be taken during the online learning phase to achieve precision education. In the first stage, pre-processing, the obtained dataset was cleaned up by removing anomalies and bad information. To resample the data, the SMOTE technique was used. After the required resampling, a pre-processing technique known as median filtering was used for the dataset. Through a layer of 1-convent CNN, essential characteristics have been automatically identified and prioritized. After settling on these details, a deep learning–inspired hybrid learning model was used to fine-tune the model. SGD optimizer and a specialized loss function were used to improve the model’s effectiveness. Finally, formal validation measures were employed to estimate the model’s efficacy and precision.

3.1. Materials & Methods

The following section presents a full discussion of each phase, stressing their contributions and function in obtaining precision education as indicated in the above-given Figure 2. The technique was carried out in an organized approach by following these steps in order.

3.1.1. Data Acquisition

In the first phase, data are acquired by surveying Pakistani college students. A Google Forms-created online questionnaire was distributed to several educational institutions to collect the required data. The survey was designed to gather information about factors considered significant in determining students’ final grades for the COVID-19 term. In total, there were 30 questions. Responses from college and university students were collected in large numbers (11,000, to be exact).

3.1.2. Resampling Data

Once the data has been collected, the next step is to evaluate whether or not a resample is required. The Synthetic Minority Over-Sampling Technique (SMOTE) was implemented during the resampling procedure to guarantee its precision. SMOTE classifier solves the problem of an unbalanced dataset by creating synthetic examples for the underrepresented group. In contrast to traditional resampling approaches, SMOTE generates additional samples by combining nearby instances of the reduced class. It can help portray minority social groups more accurately and make them more relatable to mainstream social groups. As a result, classifiers can improve their prediction abilities using a refined training set. The mathematical formula for SMOTE is:

y_{i}^{'} = y_{i} + α (y_{i} - y_{j})

(1)

in the above Equation (1),

y_{i}^{'}

denotes the new result that comes through the combination of

y_{i} and y_{j}

, whereas

α

represents the any number between 0 and 1. It shows how the new values are imputed to create new samples.

3.1.3. Data Pre-Processing Stage

The next critical stage is the primary preparation of data, during which outliers, duplicates, and other irregularities are removed. To accomplish this preliminary processing, the “median filtering” method was used.

When used for educational data analysis, the median filtering method can be a helpful strategy. There is typically noise [38] or outliers in educational data points like student scores, grades, or performance measures, which might prevent precise analysis. Median filtering can eliminate these outliers and shed light on underlying patterns and tendencies. The median value is determined within each window, defined by moving the window across the dataset. After determining the median, the value is discarded, and the original value in the middle of the window is substituted. Several emerging methods for data preprocessing are shown in the accompanying Figure 3. The method used for pre-processing out of these methods is “median filtering”:

The following Equation (2) demonstrates how using median filtering can help educators acquire more trustworthy insights from their data and use those insights to improve teaching and learning.

u(m) = median (v (m − j), v (m − j + 1), …, (m), …, v (m + j − 1), v (m + j))

(2)

where u(m) in the above-given Equation (2) corresponds to the output that comes as the result of filtering, j resembles the window size. The window includes the information points from v (m − j) to v (m + j). It shows how the missing value is imputed from the defined window.

3.1.4. Feature Selection and Extraction

To select the best input data and extract the most valuable features from it, convolutional neural networks (CNNs) are used. With CNN, automatic feature selection and extraction are possible, and the network can also help simplify data [39]. A CNN’s convolutional layers are trained to selectively emphasize important features while downplaying or disregarding less important or noisy ones. Convolutional neural networks (CNNs) are optimized for retrieving features from grid-like data structures like snapshots. CNNs can capture local patterns and characteristics because of the filters applied by the convolutional layers. Once trained, the system can use these filters to uncover facts about the world without being explicitly told. One of CNNs’ greatest assets is their capacity for adaptive learning, feature selection, and feature extraction.

3.1.5. Machine Learning Classifiers

There are numerous general classification techniques which can be used for making predictions. These techniques include decision trees, random forests, support vector machines, k-nearest neighbor, and logistic regression [40]. Depending on the nature and complexity of the data, some intelligent classification techniques of deep learning are also utilized, which include Feedforward neural network, recurrent neural network, long-short term memory, convolutional neural network, and deep enforcement learning. The following segment defined two intelligent DL algorithms deployed for the proposed work. Depending on the nature of the data, using the combo of CNN and LSTM was to handle students’ time series data and deal with complex data patterns improving generalization. The key optimal features of the hybrid model considered during data collection include capable of processing multimodal data, dependency on sequential and temporal data and reducing overfitting.

Convolutional Neural Networks (CNN)

A 1D Convolutional Neural Network (CNN) is a powerful deep learning model widely used for feature extraction in numerous applications, which includes time series analysis and sequential data processing. It can successfully capture meaningful patterns and features. CNN architecture upholds four layers: Convolution, ReLU, Pooling and Fully Connected [41], as made known in below given Figure 4:

Extracting useful characteristics from incoming data is the job of CNN layers, which do so by performing convolutions. Convolutional filters multiply the input values by the filter weights as they slide across the data. Feature maps and significant spatial patterns are created by CNN this way. The spatial dimensions are decreased, but the necessary information is preserved by down-sampling the feature maps in pooling layers. When flattening a layer, the attribute mappings are transformed into a vector in only one dimension. All neurons in one layer are linked to those in the next, allowing the network to perform categorization.

Long-Short Term Memory (LSTM)

To address the vanishing gradient issue, researchers developed a special type of Recurrent Neural Network (RNN) called Long Short-Term Memory (LSTM) [42]. Therefore, LSTM replaces simple RNN because of its superior performance on data with long-term dependencies. To remember such intricate details over such extended periods, it employs a highly sophisticated network of memory cells. An LSTM consists of input, forget, and output gates and memory cells [43]. After passing through these filters, it remembers only the most relevant details and forgets the rest. Using a sigmoid function, the forget gate is the crucial operator for determining which pieces of data can be safely discarded in the final stage. Backpropagation trains LSTM, allowing it to learn from past experiences and adjust its current state accordingly. When applied to time series data, LSTM becomes a highly effective tool. The LSTM model’s workings are outlined in the following Equations (3)–(7):

inp(u) = σ(wg_inp × [hdn(u − 1), y(u)] + bias_inp)

(3)

forget(u) = σ(wg_forget × [hdn(u − 1), y(u)] + bias_forget)

(4)

out(u) = σ(wg_out × [hidden(u − 1), y(u)] + bias_out)

(5)

cs(u) = forget(u) × cs(u − 1) + inp(u) × tanh (wg_cs × [hdn(u − 1), y(u)] + bias_cs)

(6)

hdn(u) = out(u) × tanh(cs(u))

(7)

where in the above equations, u denotes the current time, y(u) represents the input at that time, inp(u), out(u) and forget(u) are the input, output and forget gate, respectively. cs(u) denotes the cell state, hdn(u − 1) signifies the hidden state, the sigmoid activation function is denoted by σ, wg_inp, wg_forget, wg_out, wg_cs are the weight of matrices and bias_inp, bias_forget, bias_out, bias_cs are values of bias vectors. These equations show how the values related to bias and weight are adjusted and then passed on to the neural layers for learning the pattern to give a certain expected outcome.

3.1.6. Performance Authentication of Model

Confusion matrices are used to evaluate the model’s efficiency within the scope of this research. It’s a 2 × 2 matrix with one side showing predicted values based on a good model’s classification and the other showing the actual values from the dataset. Here, we break down the meaning of several performance metrics, such as the F1 score, recall, precision, and accuracy.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(8)

Equation (8) above shows that the accuracy indicates the model’s general extrapolative ability, representing the quantity of properly classified data points from total instances. Precision works by quantifying the model’s ability to appropriately categorize positive cases without misleadingly labeling negative cases as positive ones, as it is represented by the following formula in Equation (9):

Precision = \frac{TP}{TP + FP}

(9)

Recall, similarly recognized as sensitivity or true positive rate, is a performance measure tactic deployed to assess the efficacy of a classification model. It computes the proportion of true positive instances properly identified by the model out of all actual positive cases in the dataset. It is assumed by following Equation (10):

Recall = \frac{TP}{TP + FN}

(10)

A combination of precision and recall is presented through a single value in F1-score. It provides a stable measure of a classification model’s efficiency. It considers both the ability of the model to appropriately identify positive cases (precision) and its capability to capture all actual positive cases (recall). Mathematically, it is given by following Equation (11):

F - measure = \frac{2 \times (Precision \times Recall)}{(Precision + Recall)}

(11)

Above Equations (8)–(11), TN indicates the true negative case, TP shows the value of the true positive instance, FN characterizes the false negative case, and FP labels the case of false positive.

4. Experiment and Results

Dataset Description

This study’s data collection process has been centred on assembling the necessary data for analysis. The dataset was carefully collected to ensure it contained all the information and variables needed for the study. The dataset collection period was four months, between January and April, 2023. After data collection, the accuracy of the data has been meticulously checked, as it will serve as the foundation for our analysis and conclusions. To check the validity of the dataset, a peer review was conducted. It was cross-checked by the industrial expert in this domain to enhance the readability of the findings. To assess students’ progress throughout the COVID-19 cycle, an adaptable Google Forms-based survey was developed. The survey covered all the bases in terms of questions needed to measure the impact of the lockdown on students. It was disseminated to different colleges and universities all over Pakistan to solicit input from their student bodies.

A total of 30 questions were included in the questionnaire. The type of data was numerical, and the data of the responses were in the form of “.csv” format. However, after much deliberation, only 28 questions were chosen for further analysis, and the rest was ignored. These questions were considered important factors in determining how accurately grades were calculated. The decision class chosen for prediction was the ‘final result’, either pass or fail. Below, Table 2 lists some of the questions from the survey and how many responses were received against each of the responses.

Each question from the questionnaire is listed above; SA in the table represents “strongly agree”, A represents “agree”, N represents “normal”, D represents “disagree”, and SD represents “strongly disagree”. While making predictions of student performance, class imbalance is the key problem. As the SMOTE technique, besides its limitations, works well in handling the class imbalance problem, so after collecting the data, the SMOTE method was used to conduct a resample to correct the skewed results. Instances of minorities benefiting from SMOTE have grown. Before applying, SMOTE data collected was 11,000 in number. By resampling the data, it has helped with generalization and with lowering bias. The resampled data has increased the responses to 14,000. Out of these total responses, 85.79% of responses were of Safe label, and 14.21% were of At-Risk label.

A total of 14,000 initial responses were gathered after the resampling phase. The dataset was reduced to 12,000 responses during the pre-processing stage. These responses were collected from students actively enrolled during the COVID-19 pandemic. These students studied computer science, software engineering, information technology, business administration, and management sciences. About 68% of the students in this data set were undergrads, 20% had master’s degrees, and 12% had doctoral degrees.

Figure 5 shows how the valuable characteristics learned from students are distributed statistically. The number of students is shown along the vertical axis, and their responses to each question are shown along the horizontal. It just depicts the feedback given by students regarding the factors shown in Figure 5. Pre-processing of collected data via filtering was used to perform data imputations following data resampling. It was used to deal with missing values and outliers and impute and discard noisy data that wasn’t necessary. Following data preparation, a 1D-convent CNN was used, with choosing features, and extraction was performed implicitly by the network.

Figure 6 above shows how different data attributes affect a student’s final grade. Figure 6 displays the outcome for some of the factors that were predicted by the model that these factors have or do not have any effect on the student’s performance. Learners’ physical and mental well-being, as well as their prospects for further study and employment, are all affected by the variables depicted in the figure. Both the safe and risky influences of these input characteristics have been identified. If the x-axis label is a zero, the students are secure and have done well academically.

In contrast, students deemed at risk of dropping out and who would benefit from immediate intervention are indicated by the brown color and labelled “1”. Student feedback on their online learning experiences during COVID-19 is plotted along the y-axis. More people who responded “yes” or “no” to that criterion or who gave an average response are represented by the thicker section. Figure 7 displays a taxonomy of the entire body of work.

The valuable input features selected and extracted by CNN are depicted in Figure 7 above and include the following: quizzes, homework, screen time, number of clicks, social media, group meetings, class participation, age, gender, admission score, attendance, midterms, presentations, part-time work, diet, internet access, income, family, sleep, mental health, stress, books read, notes taken, feedback received, mentor, and free time. These traits were already imparted to the model during its training phase.

Four input features are graphically represented in Figure 8. These features are platforms, lectures, suggestions, and studying. The y-axis represents the total number of students, while the x-axis displays the various data attributes. Coloration based on whether or not students were happy with the lecture delivery platforms is shown in dark blue. If the teacher’s lectures were held on time, they would be represented by a green bar. A student’s ability to comment on class discussions is represented by a light blue. Finally, the yellow block represents responses from students about whether or not they were given adequate reading and learning materials for their courses.

The dataset was then made available in.csv format following the collection of data attributes. The information was initially transformed into a numerical format. The Google Collab platform was implemented in the model train. The ‘tensorflow’ and ‘keras’ neural network libraries, as well as the ‘NumPy’ numerical computing library and the ‘pandas’ data structure library, must be imported before the task can be executed. To ensure that results could be replicated, we later used the sklearn library to divide the data into a testing set and a training set, with a test size of 0.3 and a random state of 42. The model setup is shown in Figure 9 below.

The model’s ‘Sequential’ class structure allows for linear layer stacking. There were three distinct layers of the model’s structure. Potential features were extracted from the input layer using two 1D convolutional layers. The ReLU activation function was used in the first ‘Conv1D’ layer, containing 64 filters with a kernel size of 3. The ReLU was used again in the second ‘Conv1D’ layer, comprised of 64 filters with a kernel size of 3. The next layer is the long short-term memory (LSTM) layer, utilized with 64 units to capture orderly dependent variables in classifying the gathered attributes. We used dropout regularization to reduce overfitting with an input and recurrent connection dropout rate of 0.2. To produce the final binary classification output, a dense layer was added at the end, comprising a single unit and the sigmoid activation function. The model’s performance was fine-tuned using the SGD classifier and the ‘binary_crossentropy’ method, which worked well to address binary classification issues. The combined use of 1D-CNN and LSTM has been shown to accurately predict students’ outcomes in the pursuit of precision education to 98.8%. Correctly predicted occurrences for safe and risky categories are displayed in Figure 10 below.

Figure 10 emphasizes the sum of correctly classified instances for both classes. The predicted percentages of safe students and those at risk for academic failure in the near future were 86.79 and 13.21, respectively.

Figure 11 below shows the accuracy achieved during training and validation. X-axis indicates the number of epochs, and Y-axis shows the accuracy. The total number of epochs was 50. The blue line shows the training accuracy, which reached up to 98%. The orange line demonstrates the validation accuracy, which reached up to 98.8%.

The same goes for Figure 12 below, which portrays the value of loss achieved during training and validation. X-axis shows the number of epochs, and y-axis gives the loss value. The total number of epochs was 50. The blue line shows the training accuracy, which has given a loss value of 0.7%. The orange line demonstrates the validation loss, which has given a loss value of 1%.

Recall, precision, and F1-measure scores were generated after the model had made its predictions and were used to assess the model’s validity. The model’s safe classification performance was 97.1 percent recall, 99.5 percent precision, and 98.1 percent F1 measure. The model’s performance in identifying high-risk individuals was 97.5 percent recall, 97.5 percent precision, and 97.5 percent F1 measure as shown in following Table 3.

Figure 13 is a 2 × 2 confusion matrix demonstrating the predicted conclusion using a deep learning model, which may affect students’ grades. A comparison of the model’s predicted value and its observed value has been plotted in this matrix.

The proposed framework has been seen to achieve state-of-the-art performance by combining two deep learning models. In Table 4, we compare the findings of this study to those of other, more established studies on evaluating student progress toward educational precision. Evaluation methods, feature selection classifiers, the number of valuable features chosen, and accuracy are all compared across these works. Also, Table 5 below compares the proposed work with deep learning models.

5. Conclusions and Future Recommendations

It is essential to implement more precise teaching methods to increase the number of students who graduate from advanced educational institutions and provide an education of the highest possible quality. This necessitates preventative measures for keeping tabs on and bettering student performance. The information gleaned from data mining techniques is invaluable because it enables the discovery of valuable outlines that can be implemented in the classroom. Additionally, deep learning and machine learning regressors have both contributed significantly to the advancement of precision education. In this study, we look at how deep learning architecture can improve online education, specifically during the COVID-19 outbreak in Pakistan. While there are established models for estimating student performance, some have low generalizability and use a small number of hand-picked attributes. Overfitting is possible as a result of these restrictions due to insufficient data. A web-based survey was developed for this purpose and distributed to all Pakistani academic institutions. Bachelor’s, Master’s, and Doctoral students all filled Performa positions. A total of 14,000 records were gathered for the dataset. Eleven thousand responses were gathered after initial processing was complete so the study could proceed. Twenty-eight useful features were gleaned and passed along to the model-training stage. The model was trained using a deep learning framework with a 1D-CNN and LSTM. Predictions made using the trained model have shown superior performance to all other models, with an increased accuracy of 98.8% and a loss of only 1.2%. According to the results of this study, deep learning frameworks help produce reliable forecasts. Therefore, this research accurately predicted students’ performance, allowing for timely interventions and precision education. Finally, we compare the findings of this study to those of other studies on evaluating student progress toward educational precision in terms of features considered and with the existing deep learning models. The limitation of the study is that the usage of the large dataset with 28 features for prediction may make it less practical for researchers with limited computing capabilities.

To predict student performance in both traditional and online educational systems beyond the COVID-19 period, the following work needs to be extended in the future:

Putting deep learning architecture into action for the period after COVID-19.
Researchers could explore feature reduction techniques or alternative models that balance predictive performance and computational efficiency to mitigate computational challenges, such as simpler RNN variants or attention-based models.
Increasing the size of the dataset and looking into data augmentation techniques can help prevent overfitting and lead to a more generalized model.
The proposed framework could be an example for other developing nations facing similar difficulties due to the COVID-19 pandemic.
Deploying pre-tuned models through transfer learning to improve a model’s performance even further.
Exploring alternative deep learning architectures like recurrent neural nets (RNNs), GRU, or transformers to enhance efficiency and gather multifaceted student performance data.
A combination of synchronous and asynchronous learning opportunities across various subject areas should be a focus of future development.

Author Contributions

Conceptualization, R.A., S.A. (Shafiq Ahmad) and S.A. (Saud Altaf); methodology, R.A., S.A. (Saud Altaf) and M.A.; software, R.A.; validation, R.A., S.A. (Shafiq Ahmad) and I.A.; formal analysis, R.A., S.A. (Shafiq Ahmad) and S.A. (Saud Altaf); investigation, R.A.; resources, S.A. (Saud Altaf); data curation, S.A. (Saud Altaf) and S.A. (Shafiq Ahmad); writing—original draft preparation, R.A., I.A. and M.A.; writing—review and editing, S.A. (Saud Altaf) and S.A. (Shafiq Ahmad); visualization, R.A., S.A. (Saud Altaf); supervision, S.A. (Saud Altaf) and S.A. (Shafiq Ahmad); project administration, M.Z.; funding acquisition, S.A. (Shafiq Ahmad) and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research has received funding from King Saud University through Researchers Supporting Project number (RSP2023R387), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Pir Mehr Ali Shah Arid Agriculture University, Rawalpindi, Pakistan Research Ethics Committee (PMAS-AAUR/R.Eth/63; 13 January 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated during and/or analyzed during the current research are available from the corresponding author upon reasonable request.

Acknowledgments

The authors extend their appreciation to King Saud University for funding this work through Researchers Supporting Project number (RSP2023R387), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gomede, E.; Gaffo, F.H.; Briganó, G.U.; De Barros, R.M.; Mendes, L.D.S. Application of computational intelligence to improve education in smart cities. Sensors 2018, 267, 267. [Google Scholar] [CrossRef] [Green Version]
Al Absi, S.M.; Jabbar, A.H.; Mezan, S.O.; Al-Rawi, B.A.; Al_Attabi, S.T. An experimental test of the performance enhancement of a Savonius turbine by modifying the inner surface of a blade. Mater. Today Proc. 2021, 42, 2233–2240. [Google Scholar] [CrossRef]
Loganathan, M.K.; Tan, C.M.; Mishra, B.; Msagati, T.A.; Snyman, L.W. Review and selection of advanced battery technologies for post-2020 era electric vehicles. In Proceedings of the 2019 IEEE Transportation Electrification Conference, Bengaluru, India, 17–19 December 2019; pp. 1–5. [Google Scholar]
Yang, S.J.H. Precision education: New challenges for AI in education [conference keynote]. In Proceedings of the 27th International Conference on Computers in Education (ICCE), Kenting, Taiwan, 2–6 December 2019; Asia-Pacific Society for Computers in Education APSCE: Kenting, Taiwan, 2019. [Google Scholar]
Cook, C.R.; Kilgus, S.P.; Burns, M.K. Advancing the science and practice of precision education to enhance student outcomes. J. Sch. Psychol. 2018, 66, 4–10. [Google Scholar] [CrossRef] [PubMed]
Maldonado-Mahauad, J.; Pérez-Sanagustín, M.; Kizilcec, R.F.; Morales, N.; Munoz-Gama, J. Mining theory-based patterns from Big data: Identifying self-regulated learning strategies in Massive Open Online Courses. Comput. Hum. Behav. 2018, 80, 179–196. [Google Scholar] [CrossRef]
Baker, E. (Ed.) International Encyclopedia of Education, 3rd ed.; Elsevier: Oxford, UK, 2010. [Google Scholar]
Siemens, G.; Long, P. Penetrating the fog: Analytics in learning and education. Educ. Rev. 2011, 46, 30. [Google Scholar]
Alsuwaiket, M.; Blasi, A.H.; Al-Msie’deen, R.F. Formulating module assessment for Improved academic performance predictability in higher education. Eng. Technol. Appl. Sci. Res. 2019, 9, 4287–4291. [Google Scholar] [CrossRef]
Garg, R.K.; Bhola, J.; Soni, S.K. Healthcare monitoring of mountaineers by low power wireless sensor networks. Inform. Med. Unlocked 2021, 27, 100775. [Google Scholar] [CrossRef]
Yan, Z.; Yu, Y.; Shabaz, M. Optimization research on deep learning and temporal segmentation algorithm of video shot in basketball games. Comput. Intell. Neurosci. 2021, 2021, 4674140. [Google Scholar] [CrossRef] [PubMed]
Alshareef, F.; Alhakami, H.; Alsubait, T.; Baz, A. Educational Data Mining Applications and Techniques. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 729–734. [Google Scholar] [CrossRef]
Du, X.; Yang, J.; Hung, J.L.; Shelton, B. Educational data mining: A systematic review of research and emerging trends. Inf. Discov. Deliv. 2020, 48, 236–255. [Google Scholar] [CrossRef]
Mahajan, G.; Saini, B. Educational Data Mining: A state-of-the-art survey on tools and techniques used in EDM. Int. J. Comput. Appl. Inf. Technol. 2020, 12, 310–316. [Google Scholar]
Salloum, S.A.; Alshurideh, M.; Elnagar, A.; Shaalan, K. Mining in educational data: Review and future directions. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2020; pp. 92–102. [Google Scholar]
Paulsen, M.F.; Nipper, S.; Holmberg, C. Online Education: Learning Management Systems: Global E-Learning in a Scandinavian Perspective; NKI Gorlaget: Oslo, Norway, 2003. [Google Scholar]
Palvia, S.; Aeron, P.; Gupta, P.; Mahapatra, D.; Parida, R.; Rosner, R.; Sindhi, S. Online education: Worldwide status, challenges, trends, and implications. J. Glob. Inf. Technol. Manag. 2018, 21, 233–241. [Google Scholar]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Bates, R.; Khasawneh, S. Self-efficacy and college students’ perceptions and use of online learning systems. Comput. Hum. Behav. 2007, 23, 175–191. [Google Scholar]
Means, B.; Toyama, Y.; Murphy, R.; Bakia, M.; Jones, K. Evaluation of Evidence-Based Practices in Online Learning: A Meta-Analysis and Review of Online Learning Studies. In Learning Unbound: Select Research and Analyses of Distance Education and Online Learning; Department of Education, US: Washington, DC, USA, 2012. Available online: https://www2.ed.gov/rschstat/eval/tech/evidence-based-practices/finalreport.pdf (accessed on 5 July 2023).
Dias, S.B.; Hadjileontiadou, S.J.; Diniz, J.; Hadjileontiadis, L.J. DeepLMS: A deep learning predictive model for supporting online learning in the COVID-19 era. Sci. Rep. 2020, 10, 19888. [Google Scholar] [CrossRef] [PubMed]
Dascalu, M.D.; Ruseti, S.; Dascalu, M.; McNamara, D.S.; Carabas, M.; Rebedea, T.; Trausan-Matu, S. Before and during COVID-19: A Cohesion Network Analysis of students’ online participation in moodle courses. Comput. Hum. Behav. 2021, 121, 106780. [Google Scholar]
Bello, G.; Pennisi, M.A.; Maviglia, R.; Maggiore, S.M.; Bocci, M.G.; Montini, L.; Antonelli, M. Online vs live methods for teaching difficult airway management to anesthesiology residents. Intensive Care Med. 2005, 31, 547–552. [Google Scholar] [CrossRef]
Al-Azzam, N.; Elsalem, L.; Gombedza, F. A cross-sectional study to determine factors affecting dental and medical students’ preference for virtual learning during the COVID-19 outbreak. Heliyon 2020, 6, e05704. [Google Scholar]
Chen, E.; Kaczmarek, K.; Ohyama, H. Student perceptions of distance learning strategies during COVID-19. J. Dent. Educ. 2020, 85, 1190. [Google Scholar] [CrossRef]
Abbasi, S.; Ayoob, T.; Malik, A.; Memon, S.I. Perceptions of students regarding E-learning during COVID-19 at a private medical college. Pak. J. Med. Sci. 2020, 36, S57. [Google Scholar] [CrossRef]
Means, B.; Bakia, M.; Murphy, R. Learning Online: What Research Tells Us about Whether, When and How; Routledge: Oxfordshire, UK, 2014. [Google Scholar]
Hooda, M.; Rana, C.; Dahiya, O.; Shet, J.P.; Singh, B.K. Integrating la and EDM for Improving Students Success in Higher Education Using FCN Algorithm. Math. Probl. Eng. 2022, 2022, 7690103. [Google Scholar] [CrossRef]
Göppert, A.; Mohring, L.; Schmitt, R.H. Predicting performance indicators with ANNs for AI-based online scheduling in dynamically interconnected assembly systems. Prod. Eng. 2021, 15, 619–633. [Google Scholar]
Chango, W.; Cerezo, R.; Romero, C. Multi-source and multimodal data fusion for predicting academic performance in blended learning university courses. Comput. Electr. Eng. 2021, 89, 106908. [Google Scholar] [CrossRef]
Mubarak, A.A.; Cao, H.; Hezam, I. Deep analytic model for student dropout prediction in massive open online courses. Comput. Electr. Eng. 2021, 93, 107271. [Google Scholar] [CrossRef]
Fotso, J.E.M.; Batchakui, B.; Nkambou, R.; Okereke, G. Algorithms for the development of deep learning models for classification and prediction of behaviour in MOOCS. In Proceedings of the2020 IEEE Learning with MOOCS (LWMOOCS), Antigua Guatemala, Guatemala, 29 September–2 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 180–184. [Google Scholar]
Brahim, G.B. Predicting student performance from online engagement activities using novel statistical features. Arab. J. Sci. Eng. 2022, 47, 10225–10243. [Google Scholar] [CrossRef]
Atlam, E.S.; Ewis, A.; Abd El-Raouf, M.M.; Ghoneim, O.; Gad, I. A new approach in identifying the psychological impact of COVID-19 on university student’s academic performance. Alex. Eng. J. 2022, 61, 5223–5233. [Google Scholar] [CrossRef]
Abdelkader, H.E.; Gad, A.G.; Abohany, A.A.; Sorour, S.E. An efficient data mining technique for assessing satisfaction level with online learning for higher education students during the COVID-19. IEEE Access 2022, 10, 6286–6303. [Google Scholar] [CrossRef]
Stadlman, M.; Salili, S.M.; Borgaonkar, A.D.; Miri, A.K. Artificial Intelligence Based Model for Prediction of Students’ Performance: A Case Study of Synchronous Online Courses During the COVID-19 Pandemic. J. STEM Educ. Innov. Res. 2022, 23, 39–46. [Google Scholar]
Bansal, V.; Buckchash, H.; Raman, B. Computational intelligence enabled student performance estimation in the age of COVID-19. SN Comput. Sci. 2022, 3, 41. [Google Scholar] [CrossRef]
Justusson, B.I. Median filtering: Statistical properties. In Two-Dimensional Digital Signal Prcessing II: Transforms and Median Filters; Springer: Berlin/Heidelberg, Germany, 2006; pp. 161–196. [Google Scholar]
Shi, L.; Jianping, C.; Jie, X. Prospecting information extraction by text mining based on convolutional neural networks—A case study of the Lala copper deposit, China. IEEE Access 2018, 6, 52286–52297. [Google Scholar]
Qiu, F.; Zhang, G.; Sheng, X.; Jiang, L.; Zhu, L.; Xiang, Q.; Jiang, B.; Chen, P.K. Predicting students’ performance in e-learning using learning process and behaviour data. Sci. Rep. 2022, 12, 453. [Google Scholar] [PubMed]
Asad, R.; Rehman, S.U.; Imran, A.; Li, J.; Almuhaimeed, A.; Alzahrani, A. Computer-Aided Early Melanoma Brain-Tumor Detection Using Deep-Learning Approach. Biomedicines 2023, 11, 184. [Google Scholar] [PubMed]
Punlumjeak, W.; Rachburee, N. A comparative study of feature selection techniques for classify student performance. In Proceedings of the 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand, 29–30 October 2015; pp. 425–429. [Google Scholar]
Manna, T.; Anitha, A. Precipitation prediction by integrating Rough Set on Fuzzy Approximation Space with Deep Learning techniques. Appl. Soft Comput. 2023, 139, 110253. [Google Scholar]
Ajibade SS, M.; Ahmad, N.B.; Shamsuddin, S.M. An heuristic feature selection algorithm to evaluate academic performance of students. In Proceedings of the 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia, 2–3 August 2019; pp. 110–114. [Google Scholar]
Jalota, C.; Agrawal, R. Feature selection algorithms and student academic performance: A study. In Proceedings of the International Conference on Innovative Computing and Communications: Proceedings of ICICC, 2021, Delhi, India, 20–21 February 2021; Springer: Delhi, India; Volume 1, pp. 317–328. [Google Scholar]
Asad, R.; Altaf, S.; Ahmad, S.; Shah Noor Mohamed, A.; Huda, S.; Iqbal, S. Achieving Personalized Precision Education Using the Catboost Model during the COVID-19 Lockdown Period in Pakistan. Sustainability 2023, 15, 2714. [Google Scholar]
Asad, R.; Altaf, S.; Ahmad, S.; Mahmoud, H.; Huda, S.; Iqbal, S. Machine Learning-Based Hybrid Ensemble Model Achieving Precision Education for Online Education Amid the Lockdown Period of COVID-19 Pandemic in Pakistan. Sustainability 2023, 15, 5431. [Google Scholar]
Nayani, S.; P, S.R. Combination of Deep Learning Models for Student’s Performance Prediction with a Development of Entropy Weighted Rough Set Feature Mining. Cybern. Syst. 2023, 1–43. [Google Scholar] [CrossRef]
Zuhri, B.; Harani, N.H.; Prianto, C. Probability Prediction for Graduate Admission Using CNN-LSTM Hybrid Algorithm. Indones. J. Comput. Sci. 2023, 12, 1105–1119. [Google Scholar] [CrossRef]
Kukkar, A.; Mohana, R.; Sharma, A.; Nayyar, A. Prediction of student academic performance based on their emotional wellbeing and interaction on various e-learning platforms. Educ. Inf. Technol. 2023, 1–30. [Google Scholar] [CrossRef]
Poudyal, S.; Mohammadi-Aragh, M.J.; Ball, J.E. Prediction of student academic performance using a hybrid 2D CNN model. Electronics 2022, 11, 1005. [Google Scholar] [CrossRef]

Figure 1. Steps involved in the data mining process.

Figure 2. Proposed hybrid framework based on deep learning techniques to predict online higher education students’ academic performance.

Figure 3. Data pre-processing methods.

Figure 4. 1D-convent CNN layers.

Figure 5. Data distribution for the valuable characteristics learned from students.

Figure 6. Impact of various factors on the result of students.

Figure 7. Taxonomy of proposed work.

Figure 8. Graphical representation of input features.

Figure 9. Model’s configuration.

Figure 10. Classification of data points into safe and at-risk classes.

Figure 11. Training and validation accuracy.

Figure 12. Training and validation loss.

Figure 13. Confusion matrix for deep learning framework.

Table 1. Comparison of extant techniques employed in several previous studies.

Paper	Contribution	Technique	Results	Limitations
[28]	Proposed model for student assessment and feedback	Improved FCN	84% accuracy	Less accuracy of the model.
[29]	Proposed an efficient performance assessment model.	ANN	95% accuracy	A greedy approach needs to be considered for better outcomes.
[30]	Multimode model for student assessment	Data fusion	Successfully predicted the performance of learners	Lack of semantic feature extraction.
[31]	Proposed hyper model for assessment of students	LSTM and CNN	Successfully predicted student drop-out	Misclassification in data.
[32]	Predicted leaner behavior using deep learning	RNN, GRU LSTM	Successfully in predicting student behavior	Less behavioral features are considered.
[33]	ML model for student performance prediction	RF, NB, SVM, MLP and LR	Achieved 97% accuracy	Feature extraction is done poorly.
[34]	Model for predicting psychological health of students	LR, SVC, DT, AdaBoost and XGB	Models performed efficiently except for AdaBoost	Model overfitting and take more time for computations.
[35]	Proposed model for assessment of student satisfaction	KNN and SVM	Both classifiers performed well	KNN classifier takes more time to learn.
[36]	Identification of student learning behavior	Ensemble Learning, SVM, RF, DT, LR and KNN	Ensemble Learning achieved 84% accuracy	Small dataset.
[37]	Automated system for learner assessment	LR, RF, XGBoost, Extra Tree, KNN and MLP	Extra Tree showed the highest performance	Model overfitting.

Table 2. Survey questions involved in the dataset.

Characteristics	Learners’ Feedback
Characteristics	5	4	3	2	1
Mentors were guiding properly.	1540	3990	2055	1800	1615
Lectures were taken timely.	2365	6950	845	380	460
Free time was available.	2820	4750	1360	1095	975
Avail of the feedback option after the lecture.	2190	7095	975	325	415
Book reading habit.	4120	5770	145	555	310
Made proper notes during the lecture.	2845	3910	1280	1500	1465
Stress while revising lectures.	2880	4610	1350	870	1290
Knowledge retaining ability.	2040	4190	2155	1850	765
Having healthy relationships with family members.	2600	6220	277	1018	885
Enough income.	3160	5345	720	660	1115
Practicals are conducted weekly.	1930	7434	525	772	339
Strong internet connection.	2372	7550	248	450	380
Healthy diet pattern.	1445	7660	540	1185	170
Exercise daily.	1372	8429	135	623	441
Last semester’s GPA was fine.	1888	6656	576	1360	520
The quiz was taken weekly.	4189	5170	622	576	443
Assignments are uploaded regularly.	2814	5889	544	1432	421
Used social media applications.	3859	5590	966	240	345
Involved in social gatherings.	1540	3990	2055	1800	1615
Attended lecture attentively.	2365	6950	845	380	460
Did a part-time job during studies.	1445	7660	540	1185	170
The nature of the job was online.	1372	8429	135	623	441
The presentation was given online.	1888	6656	576	1360	520
Decision Label	SA	A	N	D	SD

Table 3. Detailed performance analysis for both classes.

Classification	Accuracy	Precision	F1-Score	Recall
Safe	98.8%	99.4%	98.1%	97.6%
At Risk	98.2%	97.9%	97.4%	97.5%

Table 4. Comparison of some recent studies with deep learning framework.

Paper	Technique	FS Algorithm	Chosen Attributes	Accuracy
[42]	KNN, DT, NB, ANN & SVM	GA	10	91.12%
[44]	NB, KNN, DT & DISC	SBS, SFS & DE	6	83.09%
[45]	ANN, AdaBoost & SVM	WFS & CFS	9	91%
[46]	CatBoost	Pearson Correlation Coefficient	15	96.8%
[47]	KNN, DT, SVM, NB & LR	HHO, PSO & HGSO	25	98.6%
[Our work]	LSTM, CNN	1D-CNN	28	98.8%

Table 5. Comparison with existing deep learning models.

Paper	Technique	FS Algorithm	Accuracy
[48]	CNN + RNN	Galactic Rider Swarm Optimization	94%
[49]	CNN + LSTM	CNN	80.52%
[50]	LSTM + RF + GB	LSTM	96%
[51]	2D CNN	CNN	88%
[Our work]	CNN + LSTM	1D-CNN	98.8%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Altaf, S.; Asad, R.; Ahmad, S.; Ahmed, I.; Abdollahian, M.; Zaindin, M. A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic. Sustainability 2023, 15, 11731. https://doi.org/10.3390/su151511731

AMA Style

Altaf S, Asad R, Ahmad S, Ahmed I, Abdollahian M, Zaindin M. A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic. Sustainability. 2023; 15(15):11731. https://doi.org/10.3390/su151511731

Chicago/Turabian Style

Altaf, Saud, Rimsha Asad, Shafiq Ahmad, Iftikhar Ahmed, Mali Abdollahian, and Mazen Zaindin. 2023. "A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic" Sustainability 15, no. 15: 11731. https://doi.org/10.3390/su151511731

APA Style

Altaf, S., Asad, R., Ahmad, S., Ahmed, I., Abdollahian, M., & Zaindin, M. (2023). A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic. Sustainability, 15(15), 11731. https://doi.org/10.3390/su151511731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Framework of Deep Learning Techniques to Predict Online Performance of Learners during COVID-19 Pandemic

Abstract

1. Introduction

2. Literature Review

3. Proposed Framework for Performance Assessment

3.1. Materials & Methods

3.1.1. Data Acquisition

3.1.2. Resampling Data

3.1.3. Data Pre-Processing Stage

3.1.4. Feature Selection and Extraction

3.1.5. Machine Learning Classifiers

Convolutional Neural Networks (CNN)

Long-Short Term Memory (LSTM)

3.1.6. Performance Authentication of Model

4. Experiment and Results

Dataset Description

5. Conclusions and Future Recommendations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI