An Enhanced Evolutionary Student Performance Prediction Model Using Whale Optimization Algorithm Boosted with Sine-Cosine Mechanism

: The students’ performance prediction (SPP) problem is a challenging problem that man-agers face at any institution. Collecting educational quantitative and qualitative data from many resources such as exam centers, virtual courses, e-learning educational systems, and other resources is not a simple task. Even after collecting data, we might face imbalanced data, missing data, biased data, and different data types such as strings, numbers, and letters. One of the most common challenges in this area is the large number of attributes (features). Determining the highly valuable features is needed to improve the overall students’ performance. This paper proposes an evolutionary-based SPP model utilizing an enhanced form of the Whale Optimization Algorithm (EWOA) as a wrapper feature selection to keep the most informative features and enhance the prediction quality. The proposed EWOA combines the Whale Optimization Algorithm (WOA) with Sine Cosine Algorithm (SCA) and Logistic Chaotic Map (LCM) to improve the overall performance of WOA. The SCA will empower the exploitation process inside WOA and minimize the probability of being stuck in local optima. The main idea is to enhance the worst half of the population in WOA using SCA. Besides, LCM strategy is employed to control the population diversity and improve the exploration process. As such, we handled the imbalanced data using the Adaptive Synthetic (ADASYN) sampling technique and converting WOA to binary variant employing transfer functions (TFs) that belong to different families (S-shaped and V-shaped). Two real educational datasets are used, and ﬁve different classiﬁers are employed: the Decision Trees (DT), k-Nearest Neighbors (k-NN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), and LogitBoost (LB). The obtained results show that the LDA classiﬁer is the most reliable classiﬁer with both datasets. In addition, the proposed EWOA outperforms other methods in the literature as wrapper feature selection with selected transfer functions.


Introduction
Students' performance prediction (SPP) problem is a common challenge for institutions' lecturers and decision-makers to develop the best educational strategies for students. To perform such a prediction, several educational parameters can be employed to evaluate the performance of students, such as exams grades, Grade Point Average (GPA), lecture absenteeism, number of attempts to pass a course or an exam. Moreover, other demographic features such as gender, family relationship, parent profession, marital status, and personal habits [1,2]. Predicting students' performance for educational organizations has been conducted by many scientific communities. Examining a vast amount of educational data and extract their impacts on students' performances is closely related to educational data mining (EDM) and machine learning (ML) algorithms. Generally speaking, EDM is a set of data mining methods that tries to extract hidden and valuable information from educational data to expand our understanding of students' performance and enhance the learning process [3,4].
EDM applications require two types of data: (i) educational data collected from educational systems such as exams centers, virtual courses, registration offices, and elearning systems, and (ii) demographic data that presents information about students. Demographic data is usually collected by surveys or personal meetings. Both types of data can be used to build a robust EDM application, which is able to manipulate seemingly meaningless educational data into valuable knowledge that can improve the learning process and avoid negative performance [5]. In EDM, generally speaking, different kinds of data mining methods are needed, including but not limited to classifications [6], clustering [7], association rule mining [8], and web mining [9]. Moreover, due to modern learning technologies such as online classrooms, exams, and seminars, EDM applications can manipulate educational data accurately for a better understanding of the students' performance, and learning process [10]. Such EDM applications can assist both tutors and decision-makers in executing suitable learning strategies that fit their students.
In reality, there are many advantages of EDM applications, such as revealing the weaknesses of the learning process between the teachers and students, predicting dropout potential, and negative student behaviors [11]. Moreover, it can determine the lapses and weaknesses of teaching strategies. EDM applications assist with reviewing the current learning models and evaluate their effectiveness. It can be used to evaluate the feedback information obtained from students and determine the limitations of the learning processes. EDM can cluster students based on their levels based on different criteria such as personal skills, learning behaviors, social attitudes, and interests [12].
EDM and ML allow us to design a learning model(s) to predict students' performance as a classification or recognition model(s). However, selecting a robust ML model is a challenging task due to several factors such as data nature, imbalanced data, noisy data, incomplete data, and the number of collected samples. Imbalanced data plays a vital role that affects the overall performance of ML models. For example, the number of passed students is much higher than the number of failed students, and the performance of learning model(s) will be influenced toward passed students. So, the learning process will suffer from overfitting problem. As a result, it is essential to analyze the educational data before building the EDM application. Moreover, the educational data should not have missing data to prevent the unstable behavior of the ML model. Several research papers addressed the imbalanced educational datasets while building ML models [13][14][15]. In general, imbalanced data is manipulated based on data level (e.g., resampling methods) or algorithm level (e.g., cost-sensitive learning). Figure 1 depicts the life cycle of EDM process.
In data mining techniques (e.g., classification), data preprocessing has a major impact on both the quality of chosen features and the performance of learning algorithms [16,17]. Feature selection (FS) is a fundamental preprocessing stage that aims to uncover and keep informative patterns (features) and remove noisy, uninformative, and irrelevant ones from the feature space. Detecting high-quality subset of features will boost the accuracy of learning classifiers and lessen the computational cost [18,19]. According to assessment criteria of the selected subset of features, FS techniques follow one of two branches: filters or wrappers [19,20]. Filter FS methods utilize scoring matrices for estimating the excellence of the selected subset of features. In other words, in filter type, features are weighted using a filter technique (e.g., information gain or chi-square), and then the features that possess weights less than a pre-set threshold are excluded from the features set. In the case of wrapper FS, a learning classifier (e.g., Linear Discriminant Analysis or K-Nearest Neighbour) is hired to decide the excellence of subsets of features produced by a search approach [21,22]. In general, in comparison with filter methods, wrapper FS can deliver better performance because it can implicitly discover and employ dependencies between features of a subset, whereas filter FS may miss such an advantage. However, the computational cost of using filter FS is cheaper than wrapper FS [23]. Feature subset generation is identified as a search operation for finding a high-quality subset from a given set of patterns where a search mechanism such as complete/exact, random, or a heuristic is employed [24][25][26]. In a complete search, all potentially obtainable feature subsets in the search space are formed and assessed. In other words, if a dataset includes M features, then 2 M subsets will be obtained and examined to identify the most valuable one. Complete search is impractical when dealing with massive datasets because of its high computational cost. Random search is another mechanism for generating subsets of features. In this mechanism, looking for the following feature subset in the feature space is done randomly [27]. In some cases, the random search may lead to generate all potential subsets of features as in the complete search mechanism [18,28]. Compared to complete and random search, heuristic search is a different search mechanism for generating subsets of features. It is defined by Talbi [28] as upper layer general methods that can be employed as guiding mechanisms to design underlying heuristics for resolving particular optimization problems. In contrast to complete/exact methods [29,30], meta-heuristics algorithms such as Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) have demonstrated outstanding ability in solving many FS problems [19,[31][32][33].
WOA is a modern meta-heuristic algorithm, introduced by Mirjalili and Lewis [34]. It simulates the humpback whales' intelligent foraging behavior. WOA possesses a simple structure that makes it easy to implement. It also has only two primary parameters that need to be adjusted. In addition, the WOA algorithm depends on just one parameter for smooth shifting from exploration to exploitation. WOA has shown high exploration ability. Unlike other meta-heuristic algorithms, WOA updates the position vector of a whale (solution) in the exploration stage with respect to the position vector of a randomly chosen search agent rather than the optimal search agent discovered so far [17,[34][35][36]. Like other meta-heuristic algorithms, WOA has drawbacks like early convergence and the ease of falling into the local optimum. Hence, scholars have made several improvements to the basic version of WOA to overcome its limitation and employed it to solve various optimization problems. For instance, [35] proposed an improved version of WOA based on Natural Selection Operators and applied it as a wrapper feature selection method for software fault prediction. Mafarja and Mirjalili [17] combined WOA with simulated annealing (SA) algorithm to enhance its exploitation ability and applied their enhanced WOA-based approach for feature selection. Also, Ning and Cao [36] proposed an improved variant of WOA and applied it for solving complex constrained optimization problems. A Mixed-Strategy-based WOA was proposed by Ding et al. [37] for optimizing the parameter of a hydraulic turbine governing system (HTGS). Abdel-Basset et al. [38] proposed Levy flight and logical chaos mapping based WOA approach and employed it to tackle virtual machine (VM) placement problem. As presented in [39], WOA has the same problem as many other optimization algorithms and tends to be stuck into local optima. To overcome this problem, two enhancements for the WOA algorithm were proposed. The first improvement involves applying Elite Opposition-Based Learning (EOBL) the initialization stage of WOA, whereas the second one includes the integration of evolutionary operators comprising mutation, crossover, and selection from the Differential Evolution algorithm at the end of every WOA iteration. Since the WOA-based algorithms have been widely and effectively used in various applications, this is the foundation and motivation of this research as well.
This paper proposes an evolutionary-based SPP model that integrates an enhanced variant of WOA (EWOA) with an ML algorithm. The new variant EWOA is used to enhance the FS process and the prediction of students' performance. The efficiency of the proposed model developed in this research is evaluated on two real, imbalanced, and public educational datasets adopted from the literature. To sum up, the main contributions of this research are as follows:

1.
The ADASYN sampling technique is applied to handle the problem of imbalanced data.

2.
Various types of well-known ML algorithms are assessed to select the best-performing one to handle the SPP problem.

3.
Eight fuzzy transfer functions from S-shaped and V-shaped families are examined to prepare WOA to match the binary search space of the FS problem.

4.
An improved form of the WOA algorithm is introduced by combining it with the Sine Cosine Algorithm (SCA) and Logistic Chaotic Map (LCM) mechanism. The main objectives are overcoming the main weak point of WOA (i.e., weakness exploitation process) and keeping an appropriate scale between exploration and exploitation processes.

5.
The performance of the proposed EWOA is evaluated against the state-of-the-art metaheuristic algorithms and shows promising results.
The rest of the paper is organized as follows: Section 2 presents the related works of SPP and related EDM applications. Section 3 explores the proposed methods. Section 4 explores the educational datasets used in this work. Section 5 presents the performance evaluation criteria for the proposed method. The results and analysis are presented in Section 6. Finally, the conclusion and future works are presented in Section 7.

Related Work
The principle of EDM has gained the interest of scholars due to its hardness and significance to the educational field. Data mining algorithms have been employed in different manners for addressing the EDM problem depending on the nature of the problem, such as classification, clustering, and sequential pattern analysis [40,41]. In addition to the aforementioned classes, some hybrid approaches that benefit from more than one technique (e.g., classification and clustering) were proposed for improving the prediction of the performance of students [42,43]. Recently, researchers also employed wrapper FS approaches that combine ML classifiers and optimization algorithms to improve the overall performance of SPP models [15,44]. The following subsections explore related works for each category.

Classification Methods
Classification techniques such as; Decision Trees (DT), Support Vector Machines (SVM), Naive Bayes (NB), and Artificial Neural Networks (ANN) are widely used in the field of education to predict students' performance. For example, as stated in [45], the DT classifier was applied to predict the final grades of students in a university course under study. Ahmad et al. [46] used eight-year data from 2006 to 2014 of undergraduate students to predict their academic performance in computer science courses. The applied dataset contains information such as gender, hometown, family income, and GPA. In addition, three classification algorithms comprising DT, Rule-Based (RB), and NB were utilized for building SPP models. The experimental results revealed that the RB classifier is the best one compared to the other classifiers by recording the highest accuracy rate of 71.3%. Hamsa et al. [47] proposed an academic performance prediction model using two approaches, including fuzzy genetic algorithm (FGA) and DT. Internal and sessional makes along with admission scores were selected as features. The resultant prediction model can be used to determine the students' performance for each module. Hence, instructors can identify low-performing students and take early steps to improve their performance.
SVM has been applied in SPP fields. For instance, Asogbon et al. [48] tried to accurately predict students' performance with the aim of place them into suitable faculty courses where a multi-class SVM (MSVM) classifier was used to build the prediction model. In addition, the educational students' dataset from the University of Lagos, Nigeria, was applied to examine the proposed model. Findings of the experiments revealed that MSVM based SPP model with 7-fold cross-validation could correctly predict students' performances and provide the university management with the required information for placing students in various academic programs. In addition, Pratiyush and Manu [49] utilized an SVM classifier for predicting the placement of students. The proposed model was evaluated on an educational dataset of students containing six features: attendance, GPA, reasoning, quantitative, communication skills, and technical skills. The authors stated that prediction results could provide educational institutions a better understanding of how students should be placed. Furthermore, based on the psychological information (features) of students, Burman and Som [50] proposed a classification model using SVM to categorize students into three classes, including high, average, and low, depending on their academic performance. Experimental results showed that SVM with Radial Basis kernel function could provide better accuracy than using Linear Kernel function, which is nearly 90%. Another example of using classification approaches in the field of education for predicting the performance of students can be found in [51]. In this study, two classifiers, NB and SVM, were applied over students' data such as residence, GPA, and profile data to predict whether their college student will finish their studies in four years or less. Experimental results showed that SVM surpassed NB with 69.15% accuracy.
Using the NB classifier in the field of SPP, Shaziya et al. [52] introduced a model for predicting students' performance in semester exams. This model is based on NB classifier and is used to predict the end-of-semester results of students. The outcome of the proposed model can help students in improving their academic performance. Makhtar et al. [53] estimated student's performance using NB classifier. The proposed model is utilized to discover the hidden patterns between subjects that influence the performance of students. In addition, the Best-First approach was applied for feature selection. Results have shown the superiority of the NB algorithm in predicting the performance of students compared to several classifiers such as Random TreeMulti-Classes Classifier, Conjunctive Rule, Nearest Neighbour, and Lazy IB1. The authors concluded that the NB classifier could be utilized for the classification of students' performance in the early phase of the second semester with 74% accuracy.
Neural network (NN) classifier is also utilized to develop automated SPP models. As presented in [54], for instance, the authors used Back Propagation Neural Network (BP-NN) based on the classification to predict future student performance based on their previous knowledge along with other new students with similar characteristics. Academic data of six subjects for 60 high school students were used for model evaluation. Results show that the model is able to produce precise results. Rana and Garg [55] also applied two machine learning classifiers, including NN and NB, using WEKA machine learning software to predict the performance of students. The authors evaluated the proposed models on a small dataset that includes information of 58 students. The recorded results confirmed that NB is better than the NN classifier.
As stated earlier, FS is a core pre-processing procedure that aims to find and eliminate noisy, uninformative, and irrelevant features from datasets to reduce data dimensionality and boost the efficiency of machine learning classifiers. Wrapper and filter-based FS approaches have been applied for some works in the area of SPP. For example, in [56], a filter FS approach based on information gain (IG) was employed to filter the highly informative students' behavioral features for building prediction models. A set of ML classifiers including DT, ANN, and NB boosted with ensemble methods such as bagging and boosting were utilized for classification. Results showed that using students' behavioral features can remarkably enhance the performance of students' prediction model. In [14], a feed-forward Multi-Layer Perceptron (MLP) technique integrated with stochastic training algorithms was applied as an SPP model. In addition, IG was exploited as an FS approach, and the SMOTE oversampling technique was applied to deal with the problem of imbalanced data. Experimental results confirmed that the proposed MLP based approach efficiently resolves SPP problems compared with several ML classifiers such as DT, KNN, Logistic Regression (LR), Linear Discriminant Analysis (LDA), SVM, and Random Forest (RF), plus a set of state-of-the-art methods.
Wrapper FS approaches that combine optimization algorithms with ML classifiers have also been applied to improve the performance of SPP models. For instance, a wrapperbased FS technique was proposed by Turabieh et al. [44] for resolving the problem of SPP. In this technique, an improved form of the recent Harris Hawks Optimization algorithm (HHO) was applied to explore the search space for discovering the most informative features. In addition, the KNN classifier was used for evaluating the goodness of the produced subsets of features by the HHO algorithm. Several ML classifiers, including KNN, Layered recurrent neural network (LRNN), NB, and ANN, were applied over a real student performance prediction dataset to assess the overall performance of the SPP system. Most Promising accuracy value was achieved when HHO is applied in conjunction with the LRNN classifier, which is equal to 92%. Another wrapper FS approach based on Binary Teaching-Learning Based Optimization (TLBO) was introduced by Alraddadi et al. [15] for improving the performance of student performance prediction. TLBO algorithm was applied as a search strategy while various ML classifiers (i.e., SVM, LDA, LR, RF, and DT) were used for evaluating the quality of subsets of features generated by the TLBO algorithm. Moreover, two real student performance prediction datasets were adopted for evaluation purposes. It was observed that the utilized datasets are highly imbalanced. For this reason, oversampling techniques (i.e., SMOTE) were applied over the datasets to handle the problem of imbalanced data. The experimental results proved the power of the proposed wrapper FS in improving the classification performance of LR and LDA classifiers. TLBO algorithm demonstrated its capability to improve the overall performance of ML classifiers. The AUC results of TLBO with LDA classifier are increased up to 3% and 8% for both examined datasets compared with the results of LDA without applying the feature selection approach (TLBO).

Clustering Methods
Clustering is known as an unsupervised ML technique where data are classified into clusters of data that have similar characteristics that are different than the characteristics of the data in the other clusters [57]. Various clustering algorithms were applied to educational datasets to cluster students based on their performance in order to give educational organizations better insights in understanding their students and their different learning styles to find the best strategies for their students' success [58]. For example, in [59] Harwati et al. employed the k-mean clustering method to classify their student performance to improve it. Their study was carried on using data for 306 students from different universities. The collected data consist of demographic features such as gender, origin, GPA, grade of certain courses, and course attendance. They found that these input features formed three different clusters; smart, normal, and low. Park et al. [60] employed the latent class analysis (LCA) method as a clustering method for educational data to extract common features from online behavior data of 612 courses tracked from the Learning management system and database of a South Korean University. Their work identified four different clusters of how Blended Learning is adopted and implemented, which gives the educational organization better visualization of the data and helps in providing strategic plans. These groups are immature which consist of 50% of the courses, collaboration (24.3%), discussion (18%) and sharing (7.2%). Valsamidis et al. [61] proposed a methodology based on two clustering algorithms; Simple K-means and Markov Clustering (MCL) for the purpose of improving the content quality of Learning Management Systems (LMS) by analyzing their log data files. The former algorithm is used to cluster the courses and the latter for clustering the students' activity, giving the instructors better insights into both students and courses.

Sequential Pattern Analysis Methods
Sequential pattern analysis methods are used to discover hidden knowledge by finding the unknown interrelationships and data patterns [62]. Many research papers investigated EDM using sequential pattern analysis methods. Simpson et al. [63] investigated eEDM for classrooms using sequential pattern analysis methods to discover severe expressive communication in the environment of general education. Nakamura et al. [64] proposed a sequential pattern analysis method to extract good knowledge from learning histories of programming courses. The authors developed a tool for collecting learning histories. The proposed approach offers an excellent analysis of the relationships between learning situations and learning processes in programming courses.

Hybrid Methods
Hybrid methods are a branch of data mining that combines multiple existing data mining techniques to enhance the methods' performance and results. In [42], a hybrid approach was proposed by combining clustering and sequential patterns methods to improve student performance. The authors tested their methods on a real dataset, and the results were promising. Tarus et al. [65] employed a hybrid approach between ontology and sequential pattern mining to discover hidden knowledge for real data obtained from a public university. The proposed method shows excellent results for decision-makers. In [43], students' information, including various features such as demographic, academic, behavior, and others, were collected and used to construct students' performance prediction model in which classification and clustering techniques were applied. Four classifiers, including SVM, NB, DT, and NN, were utilized to assess the students' performance dataset measures. Based on the results of classification, the optimal features that provide best results were identified. Then, K-Means clustering in conjunction with the majority vote method was applied to predict students' academic performance. The accuracy of the hybrid SPP model that combines clustering and classification is 0.7547% when used with academic, behavior, and other features of the students' performance dataset. The proposed SPP model confirmed its superiority compared to other existing models.
In addition to the categories mentioned above, fuzzy logic has also been applied to predict students' performance. For instance, Rojas et al. [66] proposed a fuzzy logicbased model that enables educational institutions and teachers to monitor the process of the academic performance of students continuously. Lee et al. [67] proposed a fuzzy evaluation model for e-learning using importance and satisfaction measures where a performance evaluation matrix was used. A fuzzy evaluation model based on fuzzy linguistic hedges for students' academic progress was proposed by [68]. The model modifies the grades of questions by integrating factors such as complexity, importance, and difficulty of examination questions to reflect skills and deep learning obtained through the course.
Finally, we can conclude that examining educational data to improve the overall educational process is needed. Since educational data is high dimensional, ML methods are most suitable to analyze and find hidden knowledge. To achieve this, we believe that employing wrapper FS methods will help educational organizations to understand the most valuable factors (i.e., features) that affect the student's performance. Therefore, in the next section, we propose an enhanced wrapper FS method based on WOA.

Proposed Approach
The proposed approach is depicted in Figure 2. The proposed approach has seven steps as follows:

2.
Preprocessing the collected data in order to be consistent. In this step, we removed all the records that have missing attributes and normalized the data between [0,1].

3.
Apply EWAO as a feature selection to reduce the search space and remove the weakness attributes that have no impact on the overall performance.

4.
Apply an ADASYN to overcome the imbalanced data and avoid overfitting problem while learning process.

5.
Build a machine learning classifier that is able to predict the students' performance. 6.
Evaluate the obtained results based on the area under the ROC curve (AUC). 7.
Finally, the obtained results are reported.
The following subsections explore the main methods employed in the proposed methodology. First, an overview of the ADASYN oversampling technique is presented in Section 3.1. Second, an overview of the basic WOA is presented in Section 3.2. Third, the main components of our enhancement over WOA are presented in Sections 3.3 and 3.4, respectively. The Logistic Chaotic Map (LCM) is presented in subsection 3.3, where LCM is proposed inside the WOA to control the population diversity. The updating mechanism of the proposed enhancement is performed based on SCA, which is presented in Section 3.4. The proposed EWOA is presented in Section 3.5, which combines WOA, LCM, and SCA as a new FS algorithm. Section 3.6 explains how transfer functions are used to convert the original WOA to match the binary search space for the FS problem. Finally, Section 3.7 presents the formulation of FS as an optimization problem (i.e., fitness function and solution encoding).

ADASYN for Handling Imbalanced Data
Learning from imbalanced data is a significant challenge that could degrade the prediction quality of ML algorithms. This problem appears in most real classification problems where the target classes are not approximately equally represented [69]. For instance, in binary classification problems, the data samples of one class are normally limited (rare instances) compared to other samples. In such situations, the classification algorithm is trained using highly imbalanced data. Thus, it tends to choose the patterns in the majority classes, which results in imprecise minority class prediction [70].
ADASYN is a promising synthetic sampling approach developed basically over the idea of SMOTE approach, which both have been extensively employed to handle the problem of imbalanced learning [71]. The main concept of ADASYN is to generate minority data samples considering their distributions adaptively. In specific, more synthetic data is produced for the samples of minority class that are difficult to learn in contrast with minority class samples that are simpler to learn. ADASYN facilitates learning from imbalanced data by achieving two objectives; it reduces the learning bias towards the dominant class and adapts the decision boundary to focus on those more challenging to learn samples. The detailed procedure of ADASYN can be found in [71].

Whale Optimization Algorithm
Whales are considered the largest mammals that live in groups. Among the types of whales is the humpback whale [34]. In nature, Humpback whales have a wonderful hunting strategy to find food such as krill and fishes [72]. The search strategy for humpback is named bubble-net feeding, in which humpback creates bubbles in an upward spiral swimming track around the target (i.e., fish, seals, squid, etc.) WOA is a swarm optimization method that simulates the process of the humpback whales while searching for their foods in the oceans based on creating bubble-nets to constrict the prey, and then whales move toward their preys in a spiral shape before the attack. Mirjalili and Lewis [34] proposed WOA in 2016, which mimics the searching process for whales while hunting. The exploration process inside WOA simulates the encircling mechanism of the whales in nature. The authors represent the prey location as the best solution found so far, while the rest solutions represent the candidate whales. Figure 3 demonstrates the spiral movement of the whale while searching for food. Since WOA is a population-based algorithm, the first phase of WOA is to create the initial population (humpback whales), as shown in the following Algorithm 1. where LB presents lower bound of the decision variables, UB presents the upper bound of the decision variables, nopop presents the size of population, and n denotes number of decision variables.

Encircling Prey
The second phase of the WOA is to determine the best solution (whale) based on the fitness function. Each solution is structured as a vector of the decision variables. The rest solutions will update their positions in the search space with respect to the best solution using Equations (1) and (2).
where k presents the current iteration, X * presents the best solution so far, and A and C denote specific coefficient vectors estimated based on Equations (3) and (4), respectively. || denotes the absolute value, and · is a component-by-component multiplication. Note that the dimension of vectors is equal to the number of variables (features) of the problem being solved.
where a presents a variable with initial value equals 2. This variable will linearly decrease toward 0 after a set of iterations as in Equation (5). r is a random vector between 0 and 1, which is produced using a uniform distribution. Equations (1) and (2) give the WOA the ability to search in n-dimensional solution space (i.e., 2D and 3D) in an efficient manner as shown in Figure 4.
where k is the current iteration, while K is the maximum number of iterations.

Bubble-Net Attacking
Two mathematical models have been proposed to mimic the whale performance while attacking their prays: The shrinking encircling mechanism and Spiral updating position. To update the whales' position around the best solution in the search space, the shrinking encircling mechanism mimics this process by reducing the value of variable a over the course of generations in a linear manner. Figure 5 demonstrates the expected positions of whales around the best solution. In nature, whales swim in an upward spiral path while hunting their food. To mimic this process, a logarithmic spiral function is used, as shown in Equation (6).
where D =| x * (k) − X(k) | denotes the distance between the ith solution and the optimal solution found so far, the parameter b creates the shape of the spiral function and I is random number between −1 and 1. Figure 6 depicts the spiral swimming process for whales while hunting. To model the shrinking encircling and spiral swimming behaviors, a probability of 50% is assumed to select between these two behaviors throughout the course of optimization. Each whale selects the operation to be performed randomly based on its location with respect to the optimal solution so far. Equation (7) explores the operation selection based on a random number p.
In simple, the exploration phase in WAO occurs once each whale in the population updates its position based on an arbitrarily selected whale. The next position for the whale will be in the area between its current position and the position of a randomly selected. The exploration phase occurs when the variable (A) has a value between −1 and 1 as shown in Figure 7. The exploitation phase occurs when each whale updates its current position based on the position of the best whale so far, where a linear decreeing of the variable (A). In simple, Equations (8) and (9) present the exploration phase of WOA. Finally, the Pseudo-code of WOA is presented in Algorithm 2. Update the position of X(k) by Eq.(6) Estimate the fitness value for each in the population. Update X * k = k + 1 return X *

Logistic Chaotic Map (LCM)
To improve the population diversity and increase the exploratory behaviour of WOA, a logistic chaotic map strategy is employed in this work. The chaotic map strategy is an efficient method to adjust parameter values to improve the exploration process and final solution. Moreover, the chaotic map strategy enhances the convergence speed and the search precision [73,74]. A chaotic sequence number is introduced to replace a random number in WAO algorithm (called p in WOA). The Equation (10) generates a logistic chaotic sequence number.
where C t is a chaotic sequence at iteration t. The initial value for C 1 is usually 0.8, and the value interval is within [0,1]. The chaotic sequence number is employed to balance between two updating mechanisms (i.e., spiral-path and shrinking-circles path) inside the WOA. As a result, a logistic chaotic map will guarantee that 50% of the iterations will go for each updating mechanisms. Chaotic maps are frequently used to improve the performance of optimization algorithms. They are essentially utilized to enhance the convergence behaviors of meta-heuristic optimization algorithms and avoid being stuck into local optima. Chaotic maps are employed in meta-heuristic algorithms to produce chaotic variables instead of random ones. Chaos is a non-linear approach that has deterministic dynamic manners [74,75]. It is highly sensitive to its initial state where a large number of sequences can be simply produced by adjusting its initial state [74,75]. In addition, chaos has the characteristic of ergodicity and non-repetition. Hence, it can accomplish straightforward and faster searches in contrast with the stochastic searches that basically depend on probability distributions [76]. Chaotic maps have been used to promote the performance of many optimization algorithms such as particle swarm optimization (PSO) [74,77], Artificial bee colony (ABC) [75], Krill Herd optimization algorithm (KH) [76] and Bat Algorithm (BA) [78].

Sine-Cosine Algorithm
Sine-Cosine algorithm (SCA) is a population-based optimization algorithm that was introduced by Mirjalili in 2016 [79]. The main idea of SCA is that each solution will update its position with respect to the position of the best solution in the search space using Equations (11) and (12).
where X k i represents the position of the current solution in the ith dimension at iteration k. P k i represents the ith dimension of the best solution so far, r 1 , r 2 , and r 3 are three random variables, and || indicates the absolute value. To simplify Equations (11) and (12), both equations have been combined for final position updating as shown in in Equation (13).
where the parameter r 1 determines the updating direction, that represents the space between X k i solution and P k i solution. The parameter r 2 determines the updating distance between the current solution and the best solution so far. The parameter r 3 , however, balances emphasizing or de-emphasizing the influence of desalination in describing the distance by giving random weights for the best solution P k i . Finally, the parameter r 4 is used to switch between the sine and cosine components in Equation (13). Figure 8 demonstrates the switching mechanism between sine and cosine algorithms with the range in [−2, 2]. The exploration process in SCA is guaranteed in this range, since each solution may update its location outside the feasible search space. Any metaheuristic algorithm should achieve a proper trade-off between exploration and exploitation processes. In SCA, this balance between exploration and exploitation through optimization is obtained by decreasing the range of sine and cosine, as shown in Equation (14).
where variables k and K represent current, and maximum iterations, respectively. a is a constant. Figure 9 explores the way of decreasing the range of the sine and cosine after a set of iterations at a = 3. Algorithm 3 presents the pseudo-code of the SCA algorithm. Initialize a random population of search agents (solutions) (X) Evaluate all solutions by the objective function P= the optimal solution found so far. while (k < K) do Update r 1 , r 2 , r 3 and r 4 for each search agent in the population do if (r 4 < 0.5) then X k+1

Enhanced Whale Optimization Algorithm
In this subsection, we are using the concepts of the three methods mentioned above (i.e., WOA, SCA, and LCM) to propose a new hybrid algorithm that improves the overall performance of WOA. In the original WOA, the position vector of a whale (solution) is updated in the exploration stage with respect to the position vector of a randomly chosen search agent rather than the optimal search agent discovered so far. As a result, the performance of the exploration process is excellent, while the performance of the exploitation process is weak. This weakness also comes from selecting the updating mechanism (i.e., spiral-path and shrinking-circles path), which is performed randomly. To overcome this weakness, LCM is employed to ensure that 50% of the iterations go for each updating mechanism.
Since SCA benefits from superior exploitation [79] and the exploration occurs once the obtained value from sine or cosine function is larger than 1 and smaller than −1, we adopted the SCA to enhance the worst half of the population in WAO after each iteration. The worst half of the population is considered as an initial population for SCA. This will improve the exploitation of WOA. Algorithm 4 shows the proposed enhanced WOA (EWOA). Update the position of X(k) by Eq. (7) Estimate the fitness value for each solution (whale) in the population. Apply SCA on worst half of the population. Update X * k = k + 1 return X *

Transfer Functions to Develop Binary Variant of WOA
WOA is a continuous search algorithm by nature. Therefore, it is not applicable in its original form to deal with FS which is a binary optimization problem. Accordingly, it is imperative to convert WOA to a binary structure by utilizing a binarization scheme. Transfer Function (TF) is deemed as one of the most frequently applied binarization schemes [80,81]. For this purpose, we employed eight different TFs form two well-know groups that are S-shaped and V-shaped [81] (see Figure 10) to develop a binary variant of WOA for the FS problem. In the TF-based binarization scheme, two steps are performed. In the first step, a TF function is employed to convert the real-valued solution R n into an intermediate normalized solution I = (I 1 , I 2 , . . . , I n ) within [0,1] such that each element in I represent the probability of transforming the corresponding element in R n into 0 or 1. In the second step, a binarization rule is used to convert the output of TF into binary. In the literature, the most common binarization rules are called standard method given in Equation (15) and complement method given in Equation (18). Broadly, The standard rule is used with S-shaped TFs while the complement rule is used with V-shaped TFs [82].
Considering S2 sigmoid function, the probability of updating the generated realvalued solution of WOA into binary is presented in Equation (15).
where X j i is a variable that represents the jth element of the ith real-valued solution X, k represents the current iteration. The updating process for S-shape group is presented in Equation (16) for the next iteration.
where X j i (k + 1) represents the binary value of the corresponding X j i , and the S(X i j (k)) is the probability value that is evaluated based on Equation (15).
The updating process for V-shape for the forthcoming iteration is presented in Equation (18), which is evaluated based on the probability values that is illustrated in Equation (17) [83]. Table 1 explores the mathematical models for S-shape and V-shape TFs functions.
where is the complement. With the complement binarization rule, the new binary value (x j i (k + 1) is set considering the current binary solution, that is to say, based on the probability value V(x j i (k)), the jth element is either kept or flipped.

S-Shaped Family V-Shaped Family
Name Transfer Function Name Transfer Function S1

Whale Optimization Algorithm as a Feature Selection
Adapting metaheuristic algorithms to handle any optimization problem requires identifying two fundamental parts, including solution encoding and evaluation (fitness) function. Employing WOA as a binary feature selection algorithm means that the potential solution (i.e., features subset) is is expressed as a binary vector with length n (see Figure 11), where n presents the number of features in the original dataset. Each cell inside the binary vector has either 1 (i.e., selected feature) or 0 (i.e., not selected). The main objective of the FS process is to find the smallest features subset that leads to achieving the maximum classification accuracy. Accordingly, FS can be defined as a complex multi-objective optimization problem. Aggregation is deemed one of the most common prior procedures where multiple objectives are combined into a single function. Each objective is assigned a weight to decide its significance [84]. A good ratio between selected features and classification accuracy should be achieved to have a robust FS algorithm. So, the minimization fitness function used in this work is presented in Equation (19) to assess the appropriateness of the selected subset of features.
↓ Fitness(X) = αC ER + β |S| |N| (19) where Fitness(X) is the fitness value of the subset X, C ER represents the classification error rate for the employed internal classifier using the subset X. S refers to the number selected features. N refers to the total number of features in the original dataset. α ∈[0,1], whereas β = (1 − α) are adopted from [82,85,86].

Student Performance Datasets
In this paper, we adopted two public datasets for student performance prediction. The first dataset (Data1) proposed by [87] in 2008. The second dataset (Data2) was obtained from Gazi University in Ankara (Turkey) [88]. The following subsections describe both datasets.

Data1
This dataset was obtained from two Portuguese secondary schools. It contains 33 features (inputs) such as demographic data, grades, social features, etc. The dataset is collected based on school mark reports, and well-structured questionnaires [87]. The dataset contains information about two subjects: Mathematics (mat) and Portuguese language (por). The main objective of this data is to predict the final grade feature, which is called G3 in the dataset. In this work, we convert the final grade into a binary where value 1 for (G3 < 10), while value 0 for (G3 ≥ 10). For more details about this dataset, interested readers can read [87]. In this work, we normalized all input features into [0,1] as a pre-processing step. We used the Portuguese language for training, whereas the Mathematics data for testing our trained models.

Data2
The second dataset contains 32 input features (i.e., 28 features represent course-specific questions and four additional features) and a single output feature (i.e., a number representing the number of times the course is repeated). All input features are normalized into [0,1], to make sure that all values are in a common scale, without having differences in the ranges of values [89]. Since we are working on a binary classification problem, we converted the output to 0 if the student repeats the course 0 or 1 time and to 1 if the student repeats the course more than 1. Interested readers about this data can explore the dataset's official website http://archive.ics.uci.edu/ml/datasets/turkiye+student+evaluation, accessed on 8 January 2021. Table 2 explores the details of each dataset. It is clear that both datasets are imbalanced. For example, in Data1, the minority class is 1, while the majority class is 0. The minority class for Data2 is 1 (i.e., repeat > 1), which is 0.156% of the whole dataset. As a result, it is important to handle this problem as a pre-processing step to avoid overfitting problem during the learning process. Appendix A explores the Data1 and Data2 features descriptions.  Figure 12 presents The 2D visualization for the two applied datasets based on Principle Component Analysis (PCA). It can be observed that the imbalance level of the data is high. In addition, liner separation of the data is not possible. Therefore, more sophisticated learning classifiers are needed to obtain better performance.

Performance Evaluation
There are several criteria to evaluate binary classification methods, including accuracy, precision, recall, F-measure, and area under ROC curve (AUC). All these criteria are affected by a cut-off value on the predicted probability of the student performance except the AUC criteria. In general, the devalue cut-off value is 0.5, which may not be a suitable value while examining the performance of a classifier [90]. As a result, the AUC measure is not related to the cut-off value, which makes it a more suitable criterion to evaluate binary classification methods [91,92].
Moreover, ROC curves are not affected by any changes in class distributions. The AUC value is determined based on the relation between True Positive (TP) rate vs. False Positive (FP) rate. A confusion matrix is used to evaluate the final AUC value, as shown in Table 3.
where P and N are variables present the actual positive and negative samples, respectively. Finally, AUC criteria helps researchers to generalize the obtained results [93].

Experimental Results and Simulations
In this section, we have performed extensive experiments to evaluate the performance of the proposed enhanced version of WOA for resolving the problem of students' performance prediction. We examined the effect of re-sampling and feature selection on the performance of several machine learning classifiers. In addition, the performance of WOA and Enhanced WOA with S-Shaped and V-Shaped TFs is also investigated. We also compared the performance of the best variants of EWOA with other well-regarded algorithms in terms of AUC, selected features, and fitness values.

Experimental Setup
For both tested datasets, we used a K-fold cross-validation method for training and evaluating the proposed method with k = 5. Compared to the simple hold-out validation, the K-fold cross-validation has the advantage of approximating the generalization error. It allows the users to test all the data by using different folds of training and testing sets. Thus each sample has the chance of being appeared in the training and testing set [21,94].
All the optimizers were investigated using the same common settings (swarm size = 20, maximum iterations = 70, α = 0.99, β = 0.01, Number of runs = 10). The internal parameters of the applied algorithms were selected according to trials and errors on small simulations and recommended settings in the literature [82]. For instance, Mirjalili and Lewis [34] recommended the a parameter to be from 2 to 0, while Rashedi et al. [83] recommended the value 10 for the parameter G 0 in BGSA. The parameter values for the BBA algorithm were obtained from Mirjalili et al. [95]. Table 4 shows the detailed parameters settings that are used in this paper for each algorithm.
Due to the stochastic nature of meta-heuristic algorithms, each experiment is repeated 10 times, and the results are recorded in terms of average (Avg) and standard deviation (Std). In addition, the non-parametric Wilcoxon statistical test with a 5% degree of significance is also performed to detect the significant difference between the obtained results of different algorithms. The interest in non-parametric statistical analysis has grown recently in the field of computational intelligence [96].

Preliminary Experiments
The first series of experiments were employed to assess the performance of five different classifiers (i.e., kNN, DT, LDA, LB, and NB) and determine which is the most applicable approach that fits both case studies here in this work. The preliminary experiments were divided into two categories, the first experiments are to classify the datasets without any preprocessing, while the second experiments are to examine the performance of classifiers with the resampling method using different balancing ratios. Table 5 explores the performance of the classifiers without resampling and without FS using four measures (i.e., TPR, TNR, AUC, and accuracy), while Table 6 explores the performance of each classifier with different balancing ratios without FS.
Inspecting AUC values in Table 5, it is evident that the LB classifier outperforms all other classifiers with excellent performance for Data 1 (i.e., AUC = 0.8463) and poor performance for Data2 (i.e., AUC = 0.5982). The reported results in Table 6 after employing a re-sampling process with different oversampling ratios show that the KNN classifier has excellent performance (i.e., AUC = 0.8600) for Data 1 with oversampling ratio equals to 0.4. In contrast, the LDA classifier shows a good performance (i.e., AUC = 0.6352) for Data2 with oversampling ratio equals to 1.0. Table 7 compares the performance of all classifiers based on three criteria (i.e., TPR, TNR, and AUC) values with and without oversampling. It is evident that the oversampling method for both cases will improve the performance of all classifiers. The performance of LDA dominates all other classifiers with re-sampling. As a result, we will adopt the LDA as a primary classifier for evaluating the performance of the proposed EWOA. G 0 (for BGSA) 10 a (Convergence constant for bGWO) from 2 to 0 Q min Minimum frequency (for BBA) 0 Q max Maximum frequency (for BBA) 2 A Loudness (for BBA) 0.5 r Pulse rate (for BBA) 0.5 a (Convergence constant for WOA) from 2 to 0 E (for HHO) from 2 to 0 ω (for PSO) from 0.9 to 0.2 c 1 and c 2 (for PSO 2 GA selection Roulette Wheel Selection Probability of mutation in GA 0.01 Probability of crossover in GA 0.9 elite size (in GA) 2 c (for BGOA) from 0.01 to 0.00004

Results with Feature Selection
To examine the performance of WOA, we performed a sensitivity analysis on WOA with S2 (WOA-S2) as a transfer function using a different number of agents (whales). Table 8 explores the obtained results LDA classifier with different number of agents (i.e., 5, 10, 20, 30, 40, and 50). It is evident that the performance of LDA is not stable with a different number of agents. For example, the best performance is obtained when the number of agents equals 30 for both datasets. Choosing the correct number of agents that fit wither with the problem itself and classifier is important. In this subsection, we examine the performance of WOA with S-shape and V-shape transfer functions. Tables 9 and 10 report the obtained results. The average and standard deviation are reported in each table. It is evident that the performance of WOA-S4 outperforms all other S-shape transfer functions concerning the F-Test value. Figure 13 demonstrates the convergence diagrams for Data1 and Data 2. It is clear that the convergence of WOA-S2 is more robust and can discover more areas in the search space. The performance of WOA-V4 outperforms all other V-shape transfer functions with respect to the F-Test value. Figure 14 depicts the convergence diagrams for WOA using V-shape transfer functions. It can be seen that the performance of WOA-V4 for both datasets outperforms other V-shape transfer functions.

Performance of WOA with V-Shaped TFs
In order to perform further analysis on the obtained results, Table 11 presents a statistical analysis using the Wilcoxon test with a significance level of 0.05. We compared all transferred functions with WOA-V4 since WOA-V4 outperforms all S-shape and V-shapes transfer functions to simplify the comparison. It is clear that the performance of WOA-V4 is not similar to all S-shape transfer functions.   Table 12 reports a compression between WOA-S2 and WOA-V4 based on Average and standard deviations for AUC, number of selected features, and fitness value. It is evident that for both datasets, WOA-V4 outperforms the WOA-S2 in all measurements. Moreover, the Wilcoxon test results show that both transfer functions have a p-value less than 0.05. Thus, from all the previous results, the performance of V-shape version 4 is more reliable with WOA for both datasets.  Table 13 explores the obtained results based on AUC, the number of selected features, and fitness value for EWOA and WOA using the best TFs (i.e., S2 and V4). For Data 1, the performance of EWOA-S2 outperforms other methods in terms of avg. AUC (i.e., 0.91683) and fitness value (i.e., 0.8302). While the performance of EWOA-V4 outperforms other methods for Data 2. Figure 15 depicts the convergence for all methods. We employed the F-Test value to determine the best approach. The obtained results show that EWOA-4 outperforms all other methods with F-test value equals 1.58.

The Most Relevant Features Selected by EWOA-V4
To explore the most relevant features that impact students' performance, we employed ten independent runs using EWOA-V4 for both datasets. Table 14 shows the selected features for each run over Data1. Obviously, second-period grade (G2) appears in all runs, which means that tutors should give more attention to this feature, while travelingtime, absence, and first-period grade (G1) affect the student's performance. Moreover, the obtained average for selected features shows that at least two features have an effect, and one of them should be a second-period grade (G2). Table 15 explores the selected features for Data2. From the reported results, three features (i.e., instr, Attendance, and difficulty) are the most relevant features that tutors should pay attention to them to predict students' performance. Finally, Table 16 summarizes the selected features for each dataset based on the number of selections and ratios. We believe that each educational organization should examine their data carefully to find the most relevant features that affect their students' performance.

Comparison of EWOA with Other Well-Known Algorithms
After performing extensive experiments to prove the efficiency of EWOA over the conventional WOA, we validate its performance by comparing it with a set of well-regarded algorithms, namely Binary Harris Hawks Optimization (BHHO) [97], Binary Gravitational Search Algorithm (BGSA) [98], Binary Grasshopper Optimisation Algorithm (BGOA) [99], Binary Particle Swarm Optimization (BPSO) [100], Binary Grey Wolf Optimizer (BGWO) [101], Binary Bat Algorithm (BBA) [102], Binary Ant Lion Optimizer (BALO) [103], and Genetic Algorithm (GA) [104]. We adopted these competitors because they are categorized into different groups of meta-heuristic techniques. For instance, GA is evolutionary-based, GSA is physics-based, while the others are swarm-based. Hence each algorithm has its exploratory and exploitative potentials. Moreover, these algorithms have been successfully applied as wrapper FS approaches in different domains. To make a fair comparison, ADASYN was used with all competing approaches. Table 17 presents a deep comparison between all approaches in terms of average AUC, number of features, and fitness values with STD values and the F-test ranking. The reported results clarify that the proposed EWOA-S2 and EWOA-V4 exceed the other algorithms in achieving higher ACU rates with fewer features on the utilized datasets. Accordingly, the proposed EWOA efficiently keeps the most informative features that offer better classification performance in dealing with student performance prediction. Based on the overall ranking, the EWOA-V4 outperforms all other methods with the rank of 1.33. It is ranked as the best performing method in terms of the considered metrics. Moreover, EWOA-S2 comes in second place with a mean rank of 1.67. In contrast, the performance of BBA is the worst one (rank of 8.83). Figure 16 illustrates the convergence curves of the developed EWOA-V4 versus other methods. Obviously, EWOA-V4 achieves a better acceleration trend in dealing with both datasets. The diverse exploratory and exploitative behaviors in the developed EWOA-V4 improve its ability to explore the targeted space and converge faster toward better solutions.

Comparison with State-of-the-Art Approaches
To further validate the results of the proposed method, it is compared with nine stateof-the-art methods in [89] using the G-mean measure (G-mean is the reported measure in this study). Considering the results in Table 18, the superiority and competitiveness of the proposed method is evident again. Furthermore, we compared the proposed method with the best results achieved in the study of Thaher and Jayousi [14] and Alraddadi et al. [15] in terms of AUC measure. As in Table 19, it can be seen that our proposed approach achieved the best AUC rates compared to results presented in previous studies on the same datasets.    Table 18. Validation of our proposed method with the proposed methods by [89] in terms of Gmean measure.  Taken together, the experiments and comparative results demonstrated the merits of the proposed WOA methods. The superiority of the proposed methods are due the several reasons. Firstly, the exploitation of WOA was improved using the SCA algorithm. It has been demonstrated several times in the literature that SCA's exploitation is its main strength, so the accuracy of results obtained in this work are due to the use of SCA in conjunction with WOA. Despite high exploitative, the algorithm perform well on highdimensional data sets too, which are very challenging due to the large number of locally optimal solutions. This is due use of chaotic maps and different transfer functions what allow the proposed method to show diverse exploratory behaviours.

Conclusions and Future Works
In this work, an enhanced approach as a wrapper feature selection that combines the Whale Optimization Algorithm (WOA) with Sine Cosine Algorithm (SCA) is introduced. The main idea is to enhance the performance of the WOA exploitation process by improving the worst half in the population based on the SCA algorithm at every iteration. In addition, to enhance the population diversity and increase the exploratory behaviour of WOA, chaotic sequence number generated by logistic chaotic map is employed to balance between two updating mechanisms (i.e., spiral-path and shrinking-circles path) inside the WOA.
The performance of the proposed algorithm was examined on educational data come from two different schools. Five different classifiers have been examined (i.e., k-NN, DT, LDA, NB, and LB). The performance of LDA outperforms other classifiers with respect to the AUC value. The performance of EWOA with V4 TF (EWOA-V4) shows an outstanding performance compared to other algorithms in the literature.
The limitation of this work is the availability of students' performance datasets, where few datasets are available for research. Another limitation of this work is that the proposed enhanced WOA has only been tested in the SPP domain. In addition, the parameters of the algorithms were set based on small simulations and common settings in the literature. In future works, we will examine the performance of the proposed approach in multi-objective optimization problems and more complex data such as medical and biological datasets. We will also conduct extensive experiments to determine the most appropriate values of common and internal parameters for the enhanced WOA as well as other utilized algorithms.   Difficulty level of course as seen by the student.

Q1
The content of the semester course, teaching methodology and assessment methods were clarified at the beginning.

Q2
The course aims and objectives were clearly explained at the beginning of the period. 7 Q3 The course deserved the credit's value assigned to it. 8 Q4 The course was delivered based on the syllabus provided on the first day of class. 9 Q5 Activities of the class including discussions, homework assignments, applications and studies were appropriate and satisfactory.

Q6
The textbook and other resources of the course were up to date and sufficient.

Q7
The course provided activities such as discussion, laboratory, field work, applications and other studies.

Q8
The Exams, quizzes, assignments and projects contributed in helping the learning. 13 Q9 I highly enjoyed the class and was eager to actively participate during the lectures. 14 Q10 My preliminary expectations about the course were realized at the end of the course period or year. 15 Q11 The course was relevant and useful for the development of my professional.

Q12
The course helped me see life and the world with a new perspective.

Q13
The Instructor's knowledge was related and up to date.

Q14
The Instructor came prepared for classes.

Q15
The Instructor taught based on the announced plan of the lesson. 20 Q16 The Instructor was faithful to the course and understandable.

Q17
The Instructor attended classes on time.

Q18
The instructor's speech and was a smooth and easy to follow.

Q19
The Instructor effectively exploited class hours . 24 Q20 The Instructor explained the course and was eager to be helpful to his/her students. 25 Q21 The Instructor exposed a positive approach to his/her students.

Q22
The Instructor was respectful and open regarding views of students about the course.

Q23
The Instructor encouraged his/her to participate in the course.

Q24
The Instructor supplied course related homework assignments and projects, and he/she assisted/guided students.

Q25
The Instructor answers the questions regarding the course in both inside/outside of the course.

Q26
The instructor's assessment system including midterm, final questions, projects and assignments effectively measured the course's objectives.

Q27
The Instructor provided and discussed solutions of the exams with his/her students.

Q28
The Instructor treat all students in an objective and proper manner. 33 Repeat Number of times the student is studying this course.