Increased Digital Resource Consumption in Higher Educational Institutions and the Artificial Intelligence Role in Informing Decisions Related to Student Performance

As education is an essential enabler in achieving Sustainable Development Goals (SDGs), it should “ensure inclusive, equitable quality education, and promote lifelong learning opportunities for all”. One of the frameworks for SDG 4 is to propose the concepts of “equitable quality education”. To attain and work in the context of SDG 4, artificial intelligence (AI) is a booming technology, which is gaining interest in understanding student behavior and assessing student performance. AI holds great potential for improving education as it has started to develop innovative teaching and learning approaches in education to create better learning. To provide better education, data analytics is critical. AI and machine learning approaches provide rapid solutions with high accuracy. This paper presents an AI-based analytics tool created to predict student performance in a first-year Information Technology literacy course at The University of the South Pacific (USP). A Random Forest based classification model was developed which predicted the performance of the student in week 6 with an accuracy value of 97.03%, sensitivity value of 95.26%, specificity value of 98.8%, precision value of 98.86%, Matthews correlation coefficient value of 94% and Area Under the ROC Curve value of 99%. Hence, such a method is very useful in predicting student performance early in their courses of allowing for early intervention. During the COVID-19 outbreak, the experimental findings demonstrate that the suggested prediction model satisfies the required accuracy, precision, and recall factors for forecasting the behavioural elements of teaching and e-learning for students in virtual education systems.


Introduction
Artificial intelligence (AI), connectivity (the Internet of Things), information digitisation, additive manufacturing (such as 3D printing), virtual or augmented reality, machine learning, blockchain, robotics, quantum computing, and synthetic biology are all examples of areas where the digital revolution can help to facilitate Sustainable Development Goals (SDGs) [1,2]. Similarly, the digital transformation will fundamentally affect many aspects of global communities and economies, resulting in a shift in how the sustainability paradigm is interpreted. Digitalization is a key driver of disruptive, multiscalar change, not just a "tool" for resolving sustainability concerns. Working with digital revolution is already reshaping leisure, work, education, behaviour, and governance. Generally, these contributions can boost labour, energy, resource, and carbon productivity, as well as cut production costs, improve service access, and dematerialise production [2]. Rapid increase in digital resources have influenced the education sector to achieve the SDGs. Therefore, with the above argument it is also debatable if these efforts have succeeded in changing curricula and teaching methods to be more sustainable. The concept and understanding of sustainability is crucial to the development of acceptable educational pedagogies, their implementation, and their capacity to provide what they are created for.
EQE is one of the key drivers of a country's economic prosperity and supports sustainability. Exponential growth in enrolment in Higher Education Institutions (HEI) has been observed in the past twenty years [2], as a result of the perceived importance of further education in career development and opportunities. In contrast to the students historically entering tertiary education, a shift in student demographics has been recorded with an increasingly heterogeneous student population taking multi-modal course deliveries [3]. With the increase in student numbers, the demand for state-of-the-art services and resources from the learners has also escalated [4].
Competition amongst the HEI is stiff, as they strive to attract students to take their programmes. With the growing consumerism in higher education, criteria amongst students to choose HEI is more complex, considering factors such as the delivery of the service, reputation and likelihood of getting a better career amongst the traditional socio-economic factors [5]. While student enrolment is dependent on the reputation and attractions on offer, student satisfaction and success are the impetus that drive student retention. Thus, factors that lead to student success are given increasing importance. Various measures of student success and attainment are used by the HEIs, which include the use of cross-sectional and longitudinal data measuring student progress, completion rates for courses and programmes, to the success of their alumni [6]. Universities strive to maximize successful completion of courses and programmes with student support services, tools and technologies that have been shown to enhance student learning. This warrants the use of new and innovative pedagogies to captivate interest and maximize the potential of the learners.
Today, education relies heavily on ICT, with new tools in the field of higher education [7][8][9]. For instance, distance learning is not a challenge anymore with off-campus students accessing learning resources using e-learning and m-learning tools [10][11][12][13][14]. In addition to this, AI is gaining interest as it can be used to execute tasks normally associated with human intelligence [15,16], such as social networking applications for learning, speech recognition, learning management systems (LMS), decision-making, cloud learning services, visual perception, mobile learning applications and translating languages [8,9]. Currently, most universities are livestreaming lectures and offering full courses online. Massive Open Online Courses (MOOCs) have made higher education courses from some of the world's most prominent universities available to anybody with a reasonable Internet connection anywhere in the world [17]. Virtual reality will increasingly allow students to participate in field trips and obtain practical experience without ever leaving the classroom or their homes. Through Internet platforms like as chegg.com, students have access to "personal" instructors 24 h a day, from anywhere on the globe [18,19]. Textbooks, school libraries, and even consolidated campus attendance are all on the decline.
In conjunction with SDG 4, which aims to "provide inclusive and equitable quality education and encourage life-long learning opportunities for everyone", the digital revolution in education will undoubtedly enhance access to high-quality education throughout the world [1]. However, in order to do this, the essential broadband and energy infrastructure must be supplied simultaneously in poor countries and rural places. The rapid digital revolution of education will have an influence on our cities' structure and social connections. The necessity for centralised campuses and accompanying infrastructure will shrink as education is increasingly given remotely, allowing students to learn from home, either individually or via "virtual classes".
Student performance is one of the most essential elements for any learning institution [20]. Student enrolment and attendance records, as well as their examination results, are the most conventional form of data mining (DM) in higher education institutions [11,[20][21][22]. In this age of big data, education data mining (EDM) is an interdisciplinary field where machine learning, statistics, DM, psycho-pedagogy, information retrieval, cognitive psychology and recommended systems methods and techniques are used in various educational data sets to resolve educational issues [23]. This phenomenon surrounding the EDM can be better explained in Figure 1. To date, little has been done in EDM using AI in education in the developing world. In the current dynamic status of EDM, numerous studies have been carried out in relation to different typologies of DM in educational environment [23][24][25].    The use of AI in the educational environment is imperative because it can contribute significantly to the improvement of the teaching and learning processes, as well as encourage the process of knowledge construction [15][16][17]. Based on the results of a report on the sustainability of higher education and TEL [26], when identifying the necessary conditions for technology to assist and not obstruct teaching and learning, we need to be very careful.
This research is designed to model an AI based predictor for student performance in a higher education online course and the significant contributions of the paper can be recounted as follows: • A framework for an AI based student performance predictor is proposed, • Digital resources are used in informing decisions related to student performance, • Al prediction for student performance is designed and analyzed for a first-year IT literacy course at The University of the South Pacific (USP).
In this work, the main focus was to achieve better accuracy when compared with previous research [20] and the early prediction of student performance by employing AI in EDM. A Random Forest (RF) classifier model is applied to the data set and an accuracy of 97.03% was achieved at week 6.
The paper is organized as follows. Section 2 summarizes the literature with the current direction, the role of digital learning, and the involvement of AI in HEI. Section 3 provides the design and architecture of the developed model (i.e., intelligent Early Warning System (iEWS)). The methodology used to predict student performance is presented in Section 4. Section 5 provides the results and discussion, and the conclusion and research suggestions are provided in Section 6.

Types of Early Warning Systems
A substantial body of research shows that progressive trends for students are significant contributors to student performance in online learning. There are several methods associated with EWS. One of the most common techniques is the use of statistical analysis to predict performance. Until recently, statistical approaches were largely applied in educational institutes to understand potential student pass/fail and dropout rates. More recently, different approaches have been combined to show better performance in EDM. Different predictive techniques are used in order to have a better prediction rate. Different classification methods are applied for given data sets. Figure 2 depicts the graphical representation for a list of the common methods used for EWS for student performance prediction.

The Evolution of EWS in Higher Education
The topic of predictive algorithms is often regarded as the most relevant field of study within the data analytics discipline. EWS is widely used in various fields of study and has impacted the education sector [20,27,28] more recently. One of the prime reasons for applying EWS is that universities use it to track student progress and recognize students at risk of failing a course or dropping out of a course or programme [11,29]. Various techniques are being proposed, applied and tested, and there are many advanced tools available in the literature which have better-predicting accuracy in the field of EDM [30,31]. One of the pre-processing algorithms of EDM is known as Clustering 32. Interestingly, DM is one of the most popular techniques which is widely applied in education to analyse student performance [25,32]. EWS has been used widely in secondary schools in the United States for many years. It has been used to track student success in schools and to identify measures that predict the likelihood of dropping out of school [33,34]. The features and variables that were collected for EWS were based on demographic/historical data, ongoing test results and the use of LMS. Once the EWS identifies an at-risk student, the teacher has the option of providing corrective measures which includes indicating different alert signals on a student's Moodle page and alert message via e-mail messages or text messages [29,30]. Additionally, students were allowed to get a referral to an academic advisor to address the problem faced in a particular course. classification methods are applied for given data sets. Figure 2 depicts the graphical representation for a list of the common methods used for EWS for student performance prediction.

Figure 2.
Lists of the common methods and attributes used in EWS as predictive tools.

The Evolution of EWS in Higher Education
The topic of predictive algorithms is often regarded as the most relevant field of study within the data analytics discipline. EWS is widely used in various fields of study and has impacted the education sector [20,27,28] more recently. One of the prime reasons for applying EWS is that universities use it to track student progress and recognize students at risk of failing a course or dropping out of a course or programme [11,29]. Various techniques are being proposed, applied and tested, and there are many advanced tools available in the literature which have better-predicting accuracy in the field of EDM [30,31]. One of the pre-processing algorithms of EDM is known as Clustering32. Interestingly, DM is one of the most popular techniques which is widely applied in education to analyse student performance [25,32]. EWS has been used widely in secondary schools in the United States for many years. It has been used to track student success in schools and to identify measures that predict the likelihood of dropping out of school [33,34]. The features and variables that were collected for EWS were based on demographic/historical data, ongoing test results and the use of LMS. Once the EWS identifies an at-risk student, the teacher has the option of providing corrective measures which includes indicating different alert signals on a student's Moodle page and alert message via e-mail messages or text messages [29,30]. Additionally, students were allowed to get a referral to an academic advisor to address the problem faced in a particular course.
There has been an increase in different types of prediction models used in learning analytics. According to [35], analytical researchers are trying to predict with better accuracy and employing different classification tools to compare accuracies. The common classification algorithms, such as EM, C4.5, Naive Bayes Classifier, Support Vector Machines, K-nearest neighbor [29], neural network models [36], and decision tree methods [37] are also employed.
In most cases, the analysis is performed to predict whether a student will pass or fail a course based on the binary response variable 'pass/fail'. Principally, one of the fundamentals and keen methodologies usually applied in predictive models is that the analysis There has been an increase in different types of prediction models used in learning analytics. According to [35], analytical researchers are trying to predict with better accuracy and employing different classification tools to compare accuracies. The common classification algorithms, such as EM, C4.5, Naive Bayes Classifier, Support Vector Machines, K-nearest neighbor [29], neural network models [36], and decision tree methods [37] are also employed.
In most cases, the analysis is performed to predict whether a student will pass or fail a course based on the binary response variable 'pass/fail'. Principally, one of the fundamentals and keen methodologies usually applied in predictive models is that the analysis is usually performed on a single course rather than used for several courses. As a systemic approach, model features and response variables are used in classifying at-risk students but for a prediction model [29], the beginning of the semester is too early to identify at-risk students. It is often difficult to contrast the studies and identify which study has obtained the most accurate results.
Azcona and Casey [37], argue that a single course analysis is more efficient in terms of accuracy. This may be because each course is structured differently and, therefore, the feature will be not the same for classification in different courses. In a similar study by Ognjanovic et al. [38], it was evident that predictive models could be applied to multiple courses. However, they noted that the inherent differences in disciplines caused specific variables to be strong for some courses and weak for other courses. Hence, the nature of the course should be considered before selecting variables for an early warning system.

AI in Early Warning Systems
The involvement of AI in previous years has attracted several controversial remarks [15,16]. The use of AI in computing power, DM and Big Data technologies appears to be a more advanced tool in predicting with better accuracy [15,32]. As mentioned earlier, AI used a better classification tool to predict the accuracy of any EDM. Ognjanovic et al. [38] and Andriessen et al. [39] both examined AI methods used in learning platforms and the relationship between education and AI, respectively. To add to this, academic performance in game-based learning strategies was studied by Stojanovska et al. [40]. They also studied flip teaching techniques, and video conferencing sessions by mining personality traits, learning style and satisfaction. Basavaraju et al. [41] proposed a study by supervised learning to use the android app. Table 1 shows the different research carried out in the field of AI and DM methods and their accuracies. EMD study was carried out where student's behavioural features were used to model the system. The system yielded 22.1% accuracy, and later, using an ensemble method, they noticed there was an increase in the accuracy of 25.8% [42].
A Deep Neural Network (DNN) was used to analyze student performance in Keras library. They used online data sets and achieved 83.4% accuracy, and the quality of the classifier was measured by Cost Function and Accuracy [43]. In 2017, a Recurrent Neural Network (RNN) was implemented to predict the students' performance for logged data from 108 students. The predicting feature used was log data of an LMS and the results revealed a 90% accuracy [44]. A review was carried out on predicting student performance using DM methods and showed that the results of Neural Network and Decision Tree had achieved an accuracy of 98% and 91%, respectively [45]. A prediction model was developed using an Artificial Neural Network (ANN). The work was designed to predict the Cumulative Grade Point Average of students. The academic datasets were modelled in one of the universities in Bangladesh. They performed a compassion test with the predicted and original grades. The highest accuracy of 99.98% and Root Mean Square Error of the work was 0.176546 [46].

Design and Architecture of Intelligent Early Warning System (iEWS) Model
This study retrieved complete online interaction data for undergraduate students of a fully online first year course, Communication Information Literacy, at the USP for one semester. The USP uses Moodle LMS where all the online, face-to-face and blended courses are hosted. Moodle requires user authentication to access the registered courses for a particular student and detailed interactions for each student for the course are recorded in the Moodle database including system login, logout, material access, assignment submission, discussion forum activities, score package records, quiz activities and numerous other activities and resource data. All these data are stored on individual activity/resource table and all other interactions in the course are stored in a log table.
An EWS (Student Alert Moodle Plugin) developed by the Faculty of Science, Technology and Environment at the USP was implemented in week 4 of the semester in the course [11]. The data from the EWS plugin were used to extract features to develop iEWS predictor. The architecture is shown in Figure 3 and the process flow is discussed in Table 2. semester. The USP uses Moodle LMS where all the online, face-to-face and blende courses are hosted. Moodle requires user authentication to access the registered course for a particular student and detailed interactions for each student for the course are re orded in the Moodle database including system login, logout, material access, assignmen submission, discussion forum activities, score package records, quiz activities and nume ous other activities and resource data. All these data are stored on individual activity/re source table and all other interactions in the course are stored in a log table.
An EWS (Student Alert Moodle Plugin) developed by the Faculty of Science, Tech nology and Environment at the USP was implemented in week 4 of the semester in th course [11]. The data from the EWS plugin were used to extract features to develop iEW predictor. The architecture is shown in Figure 3 and the process flow is discussed in Tab 2.  EWS data are extracted, and data prepressing is done (Data cleaning and EWS features are extracted). 5 EWS features are used to develop the iEWS predictor. 6 The iEWS predictor is tested with the test data. 7 If iEWS predicts a student to fail, then teacher sets strategies for these students.  The iEWS predictor is tested with the test data. 7 If iEWS predicts a student to fail, then teacher sets strategies for these students.

Methodology
This study discusses the proposed predictor called iEWS, which uses students' EWS data based on online course login, interaction and completion for as early as Week 6 to predict if the student will pass or fail a course. The following sections discuss the dataset, data cleaning and extraction of features, statistical measures and validation scheme used to measure the performance and RF classifier used for prediction.

Dataset
On implementation of EWS in week 4, the completion rates, interaction rates and average logins per week increased. Completion rate is based on the number of the course activities mentioned earlier that were completed by the students in each week. EWS data collection started in week 4, after the EWS was implemented for weekly/fortnightly intervals. In this research, a total of 1523 student data-sets were used, in which 1271 students passed (positive samples) and 252 students failed (negative samples).

Features
The following attributes from EWS plugin were used for this study:

Reducing the Imbalance between Classes
After investigating the dataset, it was clear that the number of positive samples (students passed) was much bigger than the negative samples (students failed). This clearly resulted in a high-class imbalance of dataset.
The k-nearest neighbour technique was employed to reduce the imbalance of the dataset (i.e., between samples and classes) to remove redundant positive samples. Euclidean distance between all the samples in the dataset was calculated. Firstly, the cut-off was set by dividing the number of positive instances and negative instances (1271/252) which equals to a ratio of 5.04, thus K = 5 was set. This implies that there was a removal of a positive sample if there existed at least a positive sample within five nearest neighbours. After initial filtering, imbalance classes still remained, therefore, the K value was continuously increased until both the sets were approximately similar in size. This method eventually reduced the initial positive samples of 1271 to 256 with a threshold value of 29 (k = 29), which implies a positive sample was removed if at least one negative sample existed within the 29 nearest neighbours. The negative instances were not changed and remained at 252. The final dataset after filtering (filtered negative samples and positive samples) was used to carry out 6-, 8-, 10-fold cross-validation and assess the predictor's performance.

Tool
MATLAB ® software was used to carry out data pre-processing, feature extraction, reducing the imbalance between classes, splitting the data set into "N" folds of approximately equal sample size with similar positive and negative counts, creating a Weka data format (ARFF) file for Weka Classifiers. Weka was developed by University of Waikato in New Zealand, for classification and performance assessment [61,62].
The code was written in Java to train and test a set of classifiers provided by Weka for which performance assessment was carried out for different "N" folds. Net beans IDE was used for Java code and Weka.jar library downloaded from (http://www.cs.waikato.ac.nz/ ml/weka/snapshots/weka_snapshots.html (accessed on 15 October 2021)) and referenced in the java project to access and run the Weka classifiers required [62]. Different classifiers were used to train and test to finalize the best classifier for iEWS predictor, based on the performance from each of the classifiers stated below.

Classifier
C4.5 (J48) is an algorithm used to generate a decision tree for classification of different applications [63]. PART is a partial decision tree algorithm, developed from C4.5 and RIPPER algorithms [64]. A decision table represents conditional logic with a list of tasks depicting business rules that can be used with the same number of conditions, which makes it different from the decision tree [65]. One Rule (OneR) is a simple classification algorithm that creates one rule for predictor in the data and then selects the rule with minimum error rate [66]. Decision stump consists of one level of Decision Tree, and uses only a single attribute for splitting [67]. Logistic regression is a statistical model, which uses a logistic function to model and predict the probability of an outcome that can have two values or binary classes [68]. Sequential Minimal Optimization (SMO) algorithm is based on the Support Vector Machine (SVM) solving quadratic programming (QP) problem, which arises during the training of SVM [69]. Multilayer perceptrons (MLP) is one type of neural network, which has a similar structure as a single layer perceptron, with one or many hidden layers and two phases [70].

RF
RF and decision trees are well known and used for the supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis. It has been used in many other similar studies [24,[48][49][50]55,[57][58][59][60] and it gives high accuracy as shown in Figure 3. RF is an ensemble approach that includes a lot of trees for decision. The growing level of trees in a candidate feature set is calculated by an optimal Sustainability 2022, 14, 2377 9 of 17 law. The candidate feature set is a random subset of all features, which is distinct at each tree level. The RF grouping is an ensemble identification, corresponding to a new approach consisting not just one but several classifiers as well. In reality, hundreds of classifiers are built into RF grouping, and their selections are commonly combined by plurality vote. The concept remains that sometimes the combination of ensemble classifiers are more reliable than any of the ensembles [71,72], evicting conflicts among subsets of features. The RF classification is, therefore, commonly used for remotely sensed imagery processing. The common element in all of these procedures is that for the k-th tree, a random vector ∅ k is generated independent of the prior random vectors ∅ k . . . ∅ k−1 , but with the same distribution; the tree is grown using the training set and ∅ k , resulting in a classifier h(∅ k ) where x is an input vector [71]. The genetic expression to predict a class of an observation is obtained by: where, argmax y represent the Y maximize value of ∑ k i−1 I(h i (X, θ k ) = Y) which is the output variable, I(h i (X, θ k )) is the indicator function, and h i (X, θ k ) is a single decision tree.
The classifier comprises various trees which are uniformly assembled by pseudorandomly selecting subsets of feature vector components, that is, trees are assembled in randomly picked subspaces that preserve the maximum precision of training data and increase the accuracy of generalization as it increases in complexity [73].

Statistical Measures
To evaluate the performance of the proposed predictor and compare with the existing predictors, few measures such as sensitivity (Sn), specificity (Sp), accuracy (Acc), precision (Pre) and Matthews correlation coefficient (MCC) were employed in this work.
On the other hand, specificity assesses the proportion of correctly identified number of students failed. A specificity of 1 demonstrates an accurate predictor which is able to predict negative instance of the dataset (number of students failed) whereas a specificity equal to 0 shows that the predictor is unable to identify the number of students failed. The metric for specificity is defined as: where, P + is number of students passed predicted correctly and P − represents the number of students passed incorrectly classified by the predictor On the other hand, specificity assesses the proportion of correctly identified number of students failed. A specificity of 1 demonstrates an accurate predictor which is able to predict a negative instance of the dataset (number of students failed), whereas specificity equal to 0 shows that the predictor is unable to identify the number of students failed. The metric for specificity is defined as: where, F + is the number of students failed predicted correctly and F − represents the number of incorrectly classified students failed by the predictor. For a predictor to correctly distinguish between positive samples and negative samples, the accuracy of the predictor is evaluated. A predictor with an accuracy equal to 1 shows an accurate predictor, whereas a zero accuracy means the predictor is completely incorrect. Accuracy is calculated as: where P and F are the total numbers of passed and failed students, respectively. Precision is another assessment measure of the predictor, defined as the ratio of the number of correctly identify students passed over sum of correctly classified passed and failed students.
The final statistical measure used in this paper is the Matthews correlation coefficient (MCC). It shows the value of the correlation coefficient between predicted and observed instances. The MCC metric is calculated as: A best predictor is the one that achieves high performance in the five statistical measures discussed. However, it should perform better at least in some of the measures compared to the existing predictors. A predictor that is unable to predict passed or failed students correctly cannot be used for prediction.

Validation Scheme
The effectiveness of a new predictor needs to be assessed with a validation method. Two of the most commonly used ones are the jackknife and n-fold validation scheme [23,73]. In the validation phase, an independent test set has to be used to assess the predictor. The jackknife validation is less arbitrary than the n-fold cross-validation and provides unique results for a dataset. As per the literature [74,75], the same validation scheme (n-fold cross-validation) technique was used in this study. The n-fold cross-validation technique was carried out in the following steps listed in Table 3 and shown in Figure 4.  1 Split pre-processed data set into n folds of approximately equal sample size with similar positive and negative samples in each. 2 Separate one of the folds as an independent test set and use the other n-1 folds as training data. 3 Train the model with training data and adjust the parameters of the predictor 4 Use the independent test set (2) to validate the predictor by computing all the statistical measures 5 Repeat steps 1 to 4 for other folds until n folds for validation and calculate the average of each statistical measure for n-folds and record the result In this study, 6-, 8-and 10-fold cross-validations was conducted to assess iEWS predictor and recorded the result.

Results and Discussion
In order to verify the performance of any proposed predictor, it has to be assessed using different measures. The five statistical metrics: accuracy, sensitivity, specificity, precision and Matthews correlation coefficient, which are normally used, were used in this In this study, 6-, 8-and 10-fold cross-validations was conducted to assess iEWS predictor and recorded the result.

Results and Discussion
In order to verify the performance of any proposed predictor, it has to be assessed using different measures. The five statistical metrics: accuracy, sensitivity, specificity, precision and Matthews correlation coefficient, which are normally used, were used in this study [29,36,37,48,70]. This section presents the results of the proposed predictor.

Comparison with Statistical Analysis
In the previous study [20], a statistical model was developed with an accuracy of 60.8%. It is worth noting that the same dataset was used to develop an iEWS predictor and the accuracy was compared [20]. In comparison to the old EWS model, this new iEWS predicted the accuracy of 97%, which is an improvement of at least 36.2%. Accuracy of prediction was 97% in week 6, 98% in week 8 and 98.4% in week 10.
Furthermore, the main advantage of the proposed iEWS is that it can predict whether a student can pass or fail so the corrective measures can be taken as early as possible. The model was able to identify and predict the student's performance just by analysing the three attributes (i.e., avgcomprate, avglogin, and courseworkscore). It is worth noting that out of nine different classification tools, RF predicted the best performance (accuracy) with the given attributes. Therefore, weeks 6, 8, and 10 datasets are employed to develop the model. It was seen that week 6 showed very promising results for which the sensitivity, specificity, precision, accuracy, MCC for iEWS for 6-, 8-and 10-fold cross-validation trials were calculated.

iEWS Prediction with RF
The aim of Moodle-based EWS is to monitor the learning progress of students in a course and to identify at-risk students as early as possible so teachers can implement strategies to assist those students. The early prediction in week 6 (which very high accuracy) of the semester by the proposed iEWS shows a promising tool that can be used by HEIs to intervene and assist the more vulnerable students. This prediction uses significant features of average completion rate, average login frequency and coursework from EWS plugin in this first year IT course.
The effective use of RF classifier in EWS also contributes to the outcome. In short, the combination of EWS data + RF classifier play a significant role in predicting whether students pass or fail the course. The results for Week 6 are given in Figure 5 with three different folds. A huge improvement in accuracy for proposed iEWS by at least 36.2% is seen over the statistical model in [20]. It is also observed that iEWS predictor recorded high sensitivity, specificity, precision and MCC, implying its great performance. The promising results show the ability of the proposed iEWS predictor to correctly identify students passing and failing the course as early as week 6 of the semester. Consequently, using an RF-based model has the potential to accelerate educational development, and the efficiency of education may be shown to increase dramatically. By effectively and efficiently using RF methods in the context of teaching and learning, education will be transformed, radically altering teaching, learning, and research. Educators that use digital tools will acquire a better knowledge of how their students are developing with their studies, allowing them to intervene early and increase student performance and retention.
It is worth noting that the features and classifier used for this study may not work for other courses as the online presence and activities differ in courses. The more online activities a course would have, the better the ability for prediction, as the activities will contribute to the completion rate and coursework of EWS. A similar study was carried out to predict at-risk students in a course using standards-based grading where they created a specific course predictive model to identify at-risk student in week 5 [31]. The common tool used in this study was SVM, K-NN and Naive Bayes classifier. The Naïve Bayes classifier had the best results among the seven testing models. The different accuracy of the prediction model used are showed in Figure 6. activities a course would have, the better the ability for prediction, as the activities will contribute to the completion rate and coursework of EWS. A similar study was carried out to predict at-risk students in a course using standards-based grading where they created a specific course predictive model to identify at-risk student in week 5 [31]. The common tool used in this study was SVM, K-NN and Naive Bayes classifier. The Naïve Bayes classifier had the best results among the seven testing models. The different accuracy of the prediction model used are showed in Figure 6.  In most cases, the EWS report relied on midterm grades [35]. At this point, it is often too late into the term and students either cannot cope or drop out of the course. This has been one of the drawbacks of EWS. For this reason, improving the accuracy of EWS and predicting performance much earlier is of great importance. In iEWS, RF classification is used, which predicted more accurately and early in the semester. In this study, since the EWS was introduced in the course in week 4, the earliest prediction could be made in  In most cases, the EWS report relied on midterm grades [35]. At this point, it is often too late into the term and students either cannot cope or drop out of the course. This has been one of the drawbacks of EWS. For this reason, improving the accuracy of EWS and predicting performance much earlier is of great importance. In iEWS, RF classification is 94 Figure 6. Different level of accuracy of the prediction model [29]. In most cases, the EWS report relied on midterm grades [35]. At this point, it is often too late into the term and students either cannot cope or drop out of the course. This has been one of the drawbacks of EWS. For this reason, improving the accuracy of EWS and predicting performance much earlier is of great importance. In iEWS, RF classification is used, which predicted more accurately and early in the semester. In this study, since the EWS was introduced in the course in week 4, the earliest prediction could be made in week 6. However, if EWS is engaged in a course much earlier, detection could be even sooner.
As discussed earlier, the proposed model is able to predicate the students' performance as early as week 6 of the semester, with an accuracy of 97.03%. Furthermore, most literature studies propose self-developed models to predict the student performance, but they have failed to mention how early in the semester the prediction of student performances were made. However, the proposed model enabled sustainability in education by providing a iEWS for students as well as for educators. It also saves energy, time, and resources while predicating the students' performance as early as possible.

Conclusions
The tendency of students to procrastinate and fall under at-risk categories is often reported by numerous academics as a significant factor that negatively influences student success in higher education blended courses, making its prediction a very useful task for universities and students alike. In this context, this research conducts a different approach, i.e., an AI-based predictor that can predict students' performance as early as possible in the era of the SDGs from a systems perspective. The use of ICT tools contributes to an excellent learning environment among students and learning pedagogies. Such tools were heavily involved in the current education system which uplifted and connected the whole society.
In this work, an AI approach is applied to the same model, the RF classifier model was developed with week 6 EWS data and an accuracy of 97.03% was achieved. An AI platform is designed with LMS and EWS, and the RF classifier is applied with respective sensitivity, specificity and precision. All methods appeared to be sensitive to the increment in the number of classes. RF, with an accuracy of 97.03%, showed a better performance using categorical features compared to other classification methods (see Figure 5). When comparing the accuracy of prediction of student performance using iEWS with that determined through a statistical analysis, it proved higher by more than 35%. In future, this work can be expanded by using a different predictive method and feature vectors of different lengths from different courses. Moreover, different hybrid feature vectors can be created using pre-education grade, students' submission, logins, gender, location of origin, and social interaction behaviour to examine the effect of various time-related indicators on the EWS and at-risk student's predication as earliest possible.
This research objective process can help all those involved in education and sustainability collaborate more effectively, allowing educational institutions to develop a clear vision of what sustainability means to them, and work towards transforming individuals, groups, organisations, communities, and systems by developing the skills needed to transition to a more sustainable future. One of the most significant effects of digitisation in the coming decades will undoubtedly be in the field of virtual education. The development and delivery of course content and curricula will be drastically altered as a result of the digitisation of education. Curricula will need to reflect this digitally capable culture to ensure pupils remain engaged in learning, given the increased digital awareness and competency of students, even those as young as pre-school age. Curricula with more flexibility, standardisation, and even globalisation have the potential to promote equitability and give more options.
In broad terms, sustainability in education is an attempt to reconcile growing a quality learning environment with socio-economic objectives. The framework established to address the requirement to contextualise the function of ESD helps both educators and students to see the wider picture and grasp the role of education in sustainable development. During the COVID-19 outbreak, these experimental findings demonstrate that the suggested prediction model satisfies the required accuracy, precision, and recall factors for forecasting the behavioural elements of teaching and e-learning for students in virtual education systems. Its phases should be thought of as conceptual, since greater specificity will be heavily influenced by the setting, institutional capability, challenge, timing, and resources available to the educational redesign process.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: