Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation

Smadi, Ala; Al-Qerem, Ahmad; Nabot, Ahmad; Jebreen, Issam; Aldweesh, Amjad; Alauthman, Mohammad; Abaker, Awad M.; Al Zuobi, Omer Radhi; Alzghoul, Musab B.

doi:10.3390/su15065267

Open AccessArticle

Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation

by

Ala Smadi

¹,

Ahmad Al-Qerem

¹

,

Ahmad Nabot

¹

,

Issam Jebreen

¹,

Amjad Aldweesh

^2,*

,

Mohammad Alauthman

³

,

Awad M. Abaker

⁴,

Omer Radhi Al Zuobi

⁴ and

Musab B. Alzghoul

⁴

¹

Computer Science Department, Faculty of Information Technology, Zarqa University, Zarqa 13110, Jordan

²

College of Computer Science and IT, Shaqra University, Shaqra 11961, Saudi Arabia

³

Department of Information Security, University of Petra, Amman 11196, Jordan

⁴

College of Computer at Al-Gunfudah, Umm Al-Qura University, Mecca 24382, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(6), 5267; https://doi.org/10.3390/su15065267

Submission received: 3 February 2023 / Revised: 6 March 2023 / Accepted: 12 March 2023 / Published: 16 March 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In Jordanian higher education institutions, a competency exam was developed to ensure that students had the ability to attain particular competence levels. The results of the competency examination are one of the measures used as key performance indicators (KPIs) evaluating the quality of academic programs and universities. There are numerous evaluation methods for students’ performances based on the academic achievement of the pupils, including the application of conventional statistical approaches and machine learning. The objective of this paper is to develop a framework to help decision-makers and universities evaluate academic programs using ML by identifying programs and learning outcomes that need to be established by analyzing competency exam data. The developed framework can also reduce exam costs by substituting machine learning algorithms for the actual execution of the exam. We have created a dataset that can assist academics with their study; the dataset includes demographic and academic data about students, such as their gender, average university degree, type of university, and outcomes on the competency exam based on their level and competencies. Experiments supported the claim that models trained using samples from the student sub-dataset outperform models constructed using samples from the entire dataset. In addition, the experiments demonstrated that ML algorithms are an effective tool for recognizing patterns in student performance. Experiments demonstrated that no single ML model outperforms other ML models. However, the MLP model generates more accurate models, making them more beneficial for developing robust frameworks.

Keywords:

student performance prediction; competency exams; data heterogeneity; machine learning; data segmentation; classification

1. Introduction

A competency exam, abbreviated “CE,” was established for use in Jordanian higher education institutions with the intention of ensuring that students are capable of reaching predetermined levels of expertise [1]. In alignment with the achievement of the fourth Sustainable Development Goal (SDG 4), which aims to ensure inclusive and equitable quality education and promote lifelong learning opportunities for all, Jordanian higher education institutions have established a competency exam to ensure that students attain predetermined levels of expertise [2]. This exam serves as a key performance indicator for program quality. SDG 4 strives to address global issues related to education and facilitate affordable and high-quality education, funding, and equality by 2030 [3]. The goal is to equip learners with the necessary skills to achieve sustainable development throughout their lives.

The results of the competency exams are one of the metrics that are used as key performance indicators (KPIs) for the quality of programs. As a direct consequence of this, initiatives to assess the level of academic proficiency possessed by students have become increasingly significant. This is crucial in order for management to be able to devise a plan for programs whose performance has not met certain criteria, such as modifying the learning objectives or improving the learning process. A competence exam is a series of questions that measures the general and accurate abilities that graduates of Jordanian universities are required to possess at the bachelor’s level in order to evaluate the efficacy of educational results in universities. These graduates must have a level of education equal to or higher than a bachelor’s degree.

Competency-based education (CBE) is an approach to education that focuses on the acquisition of specific skills and knowledge by students [2]. The goal of CBE is to equip students with the skills and knowledge necessary for success in the workforce. However, there are several driving behaviors that can affect student performance in CBE [4]. These behaviors include motivation, self-regulation, goal-setting, and self-efficacy. Motivation is a critical factor in CBE as it drives students to engage in learning activities and complete their work. Self-regulation refers to the ability of students to manage their own learning, set goals, and monitor their progress. Goal-setting is important in CBE as it helps students to stay focused and motivated. Finally, self-efficacy, or the belief that one can successfully complete a task, can greatly impact student performance in CBE. All of these behaviors are interrelated and play a crucial role in student success in CBE [5]. Understanding these behaviors can help educators to create effective learning environments and support student success. To assess the quality of graduates from Jordan’s higher education institutions and aid in the formulation of regulations, it is important to evaluate the university’s progress and identify areas for improvement.

The CE exam is broken up into two distinct parts. The macro level and the micro level are the two parts of the exam. At the macro level, there is a collection of thirty competencies that serves as an evaluation of the students’ more comprehensive skill sets. Depending on the student’s major, the 45-item list of competencies that comprise the micro level will seem very different. The performance of students can be evaluated in a variety of ways, including the use of machine learning as well as more traditional statistical methods, and this evaluation is based on the results of the students’ exams (pass or fail). In order to provide an accurate prediction model for student performance, the data from the competency exams will be merged with ML models. After classifying every individual subset of data according to its features. When evaluating the performance of students, a variety of factors, including the data source (questionnaire or dataset and size), the type of education (e-learning or traditional learning), and the categories of student features, are taken into consideration [6,7].

Although the fact that the competency exam in Jordan was first implemented in 2008, there is a paucity of research on the ways in which the exam affects the development and improvement of competencies, as well as the quality of the program, through the improvement of student performance; in addition, there are no studies on ML to evaluate the data provided by the exam. A large number of machine learning (ML) programs have been developed in an effort to predict student achievement. This is due to the fact that there are multiple markers that be used to quantify student performance. It is possible to categorize student performance models based on factors such as academic status (from kindergarten to graduate level). However, it is unlikely that developing broad models based on these factors alone would produce accurate results in terms of classification or regression. Additionally, models could be constructed by considering information related to the degree of performance, such as whether a student is at risk of dropping out or has a specific grade [8]. Every single data case has been included in the various attempts that have been made in the past in an effort to develop accurate and exhaustive classification models. In the fields of machine learning, however, there are a number of classification techniques, such as Decision Trees, K Nearest Neighbor (KNN), Naive Bayes, Support Vector Machine (SVM), and neural networks. Some of these techniques are included below. By conducting an analysis of the data from the competency exams and determining the academic programs and learning outcomes that need to be established, the purpose of this study is to provide guidance to those responsible for making decisions regarding the development of academic programs that make use of ML. We intend to investigate the differences in student characteristics that already exist by taking a close look at important demographic and academic elements that affect a student’s performance in CE. Demographic factors such as race, ethnicity, socioeconomic status, and gender can have an impact on a student’s performance on competency-based exams, as they may have less access to resources and support systems [9]. Academic characteristics, such as prior knowledge, study habits, and learning styles, can also affect exam performance [10]. However, students, educators and policymakers can all be involved to create an equitable and supportive learning environment.

This will be carried out so as to achieve our goal. As a consequence of this, the research project intends to develop prediction models in a number of different sub-datasets, taking into consideration the gender, grade level, involvement style, and any other relevant features of the students. These characteristics were chosen because of their capacity to recognize successful student performance.

Using the data from the competency exam, the researcher first divides the exam data into subgroups depending on the demographic and academic characteristics of the students. Next, the researcher evaluates the machine learning models and presents a comparative analysis between multiple machine learning models. The case statement for this paper investigates the extent to which student data and machine learning can be used to determine the nature of academic programs that should be developed for institutions. In regard to this topic, we discuss three significant research issues, which are as follows:

I. Can ML models be used to detect student performance, and if so, to what extent is this possible, and is it even a feasible strategy?

II. Which machine learning model provides the most accurate results, and how should ML models be utilized in practice?

III. Does the variety of student data affect the accuracy with which machine learning algorithms forecast the results?

The following are the contributions that we make as a result of our efforts.

1. Determining whether it is “appropriate” or valid to use a machine learning model to predict student achievement.

2. Formulating recommendations for how the performance of ML models might be improved when working with this kind of data

3. Creates student subgroups based on significant demographic and academic characteristics, and then examines the efficacy of these subsets in determining the accuracy of predictions made by machine learning models.

The rest of this paper is structured as follows. In Section 2, the related works are presented. Academic-Data Heterogeneity is discussed in Section 3. The paper’s methodology is covered in Section 4. In Section 5, experiments and results of different case are explained. Section 6 conclude the paper and makes recommendations for future research.

2. Related Works

Multiple studies have demonstrated the effectiveness of using machine learning techniques to predict student behavior and performance in educational settings [10,11,12,13,14]. These studies have applied various algorithms including J48, Naive Bayes, Neural Network, Bagging, Boosting, Logistic Regression, and Decision Trees. The results of these studies have shown that machine learning models can achieve high prediction accuracy and have been used to predict student enrollment, admission to colleges, dropout, and the risk of failure and withdrawal in online courses. These findings highlight the potential for machine learning to be used in education to support student success and improve decision-making.

The primary stakeholders in educational institutions are the students. The effectiveness of educational institutions is crucial in generating graduates and post-graduates of the highest caliber. The modern educational institutions work to maintain quality and reputation in the educational community. In actuality, the institutions are more concerned with their reputation than with the caliber of instruction [15]. However, a number of government and accreditation organizations make sure that educational institutions maintain a high standard of learning, and the explicit accreditation procedures have forced the institutions to develop and adopt unique procedures to uphold their standards [16].

The goal of artificial intelligence (AI) is to give computers enough intelligence to enable them to think and respond in ways that are similar to those of a human [17]. Humans, as opposed to computers, are able to gain knowledge from experience, allowing them to rationally choose the best course of action given their unique set of circumstances. However, for the computer to complete the essential duty, it must adhere to human-made algorithms. Artificial intelligence seeks novel ways to give computers intelligence and make them behave similarly to humans in order to decrease this difference between computers and people. Projects that create systems that grant humans’ unique intellectual processes—such as the capacity to reason, find meaning, or learn from experience—are frequently referred to by this name. According to [18], AI applications are steadily expanding across many commercial, service, manufacturing, and agricultural domains. Future AI artifacts will be able to communicate with people in their own languages and adjust to their moods and movements.

Many models have been put out in various educational contexts to address the prediction of student performance. Ensemble approaches were used by [19] to investigate the connection between students’ semester courses and results. According on the results of the experimental evaluation, Random Forest and Stacking Classifiers have the highest accuracy. In order to identify the weak students and help the institution create interference measures to increase student attrition, Ref. [20] updated the Genetic Algorithm (GA) to remove extraneous features. In [21] work, they extracted a collection of attributes from the institution’s auto-grading system and used them to create decision tree and linear regression models. The study helps the university identify the difficult students and intelligently assign teaching hours automatically.

A decision tree approach was put up by [22] to identify the key elements that affect students’ academic achievement. A survey was used to gather information about the demographic, academic, and social characteristics of the pupils. Ref. [23] proposed a machine learning technique in which supervised machine learning algorithms are utilized to train prediction models for forecasting student performance after the K-Means algorithm generates a set of coherent clusters. In order to predict whether a student would graduate on time or after the expected graduation date, Ref. [24] built a model [25] investigates the relationship between students’ social interactions and academic outcomes. Although a decision tree proved to be a beneficial tool, there was only a minor link between the two elements. The examination of the literature demonstrates that machine learning algorithms are useful tools for creating models that forecast students’ ultimate results.

Similarly, Ref. [26] developed models based on decision tree algorithms to predict students’ academic performance. The researchers collected data on students’ demographics, academic, and family background information through questionnaires. The models were built using these data and the decision tree algorithm. The results showed that the models produced by [27] were accurate in predicting students’ academic performance based on the various factors studied. These studies highlight the potential of machine learning in education and demonstrate how decision tree algorithms can be used to predict student behavior and performance in educational settings. The results of these studies can inform the development of similar systems and can be used to support decision-making in educational settings. The related works on student performance based on data source, data type, and ML and evaluation metrics have been conducted in the field of education.

These works have used various regression models, decision trees, clustering techniques, and machine learning algorithms to predict student performance. The data sources have varied from questionnaires, exams, learning management systems, and socio-demographic information. The data types have included demographic information, academic data, and log files from e-learning systems. The ML and evaluation metrics have been used to compare different prediction models, with accuracy scores as a measure of performance. Works have shown that incorporating demographic and academic information, as well as study habits and social behavior, can result in better predictions of student performance [28]. Neural networks and SVM have performed well in the studies, and the Naive Bayesian model has also shown good results in some cases.

In the field of educational research, various factors can influence student performance [29]. The “Learning type” category in the field refers to the method used for students in the sample used. This can be either E-learning, Traditional Learning, or Both (E-learning and Traditional Learning). The “Dataset source” category refers to the source of the used sample and the method of collecting the sample. This can be either pre-existing data collected through university systems, e-learning systems, or other sources, or a questionnaire set of questions to be answered by the sample students [30]. The “Sample size” category refers to the number of rows in the sample. The “Type of data” category classifies the data based on a set of foundations specified in this type of research [31,32]. This can include demographic data such as a student’s personal data such as gender and age, academic data such as the student’s average of specialization [33], and other data such as internet usage and student interaction on social networking sites [33]. Finally, the “ML Models” category refers to the ML algorithms used to predict student performance. The Description features used in contrasting different studies are provided in Table 1.

Learning Type: The learning type category refers to the method of learning used by students in the sample. There are three options for learning type: e-learning, traditional learning, and both (e-learning and traditional learning). E-learning refers to online learning, where students access course materials and participate in online discussions through the internet. Traditional learning refers to classroom-based learning, where students attend lectures and complete coursework in a physical setting. The category “both (e-learning and traditional learning)” refers to a combination of both e-learning and traditional learning, where students participate in both online and classroom-based learning.

Dataset source: The dataset source category refers to the source of the sample used for the study and the method used to collect the sample. The two options for dataset source are pre-existing data and questionnaire. Pre-existing data refer to data that has already been collected from university systems, e-learning systems, or other sources. Questionnaire refers to a set of questions that are answered by the sample students, which is used to collect data for the study.

Sample size: The sample size category refers to the number of rows in the sample. This category provides information about the size of the sample used in the study, which is an important factor in determining the reliability and validity of the results.

Type of data: The type of data category refers to the classification of data based on a set of foundations specified in this type of research. There are three options for type of data: demographic data, academic data, and other data. Demographic data refer to student personal data such as gender and age. Academic data refer to the student’s academic information, such as their average university degree and specialization. Other data refer to other student data such as internet usage and student interaction on social networking sites.

ML Models: The ML models category refers to the machine learning algorithms used to predict student performance. This category provides information about the type of machine learning models used in the study and the effectiveness of these models in predicting student performance.

We summarizes and contrast a prior studies based on the previously explained features in Table 2.

3. Academic—Data Heterogeneity

The academic performance of engineering students was studied using a regression model by [44] the questionnaire was created based on the intended study in order to collect data from the students. The input information on students’ academic achievement was gathered from 150 undergraduate engineering specialties. The models that were forecasted and the proportion of accurate predictions were calculated and verified using a variety of metrics. The outcomes demonstrated that the regression model provides the greater prediction accuracy. From the information above, teachers evaluate the group’s performance and adjust their teaching methods according to the outcomes of the engineering students in each area. The study, however, was unable to demonstrate whether a regression model can be utilized to enhance learning results. Furthermore, a methodological gap occurs when a multiple regression model is used with a limited sample size. In [45] the authors conducted a study on the academic performance of students of nine countries in the PISA 2015 test. The questionnaire was prepared to collect information from the students in addition to their exam results. They apply multilevel regression trees In the first stage, apply regression trees and boosting to identify which are the school level characteristics related to school value-added (estimated at first stage) in the second stage.

In their research [46] The BS students’ academic performance dataset from Kaggle.com, which was extracted from the learning management system, is where the data were obtained from. The K-means clustering data mining technique is then used to obtain clusters, which are further mapped to find the key features of a learning context. To evaluate the student’s performance, relationships between these features are found. In [36] the authors applied a decision tree algorithm with inputs including student academic information and student activity, researchers investigated students’ academic performance. they construct a data collection of records for 22 undergraduate students from the Spring 2017 semester at a private higher education institution in Oman [47].

Use demographic data such as (age, nationality, marital status) and academic data such as (major and grade) to compare various resampling techniques such as Borderline SMOTE, Random Over Sampler, SMOTE, SMOTE-ENN, SVM-SMOTE, and SMOTE-Tomek to handle the imbalanced data problem while predicting students’ performance using two different datasets. In order to predict students’ grades (pass/fail), the study by [48] gathered information from e-learning, pre-course data, and socioeconomics. The information includes information about each student’s enrollment as well as activity information produced by the university’s (LMS), Information about the students, including sociodemographic characteristics, is contained in the enrollment data. In this study, student subpopulations are created based on important demographic and academic characteristics for building student sub-models, and their value in identifying vulnerable pupils is assessed.

Ref. [38] Used machine learning algorithms to predict and categorize student performance. This study takes into account the Student Performance Dataset (SPD) and the Student Academic Performance Dataset (SAPD), two datasets. Analysis of educational data, in particular the impact of social environment and family on students’ performance, is crucial to raising the factors and boosting the quality of education for future generations. Analysis of various datasets is crucial in order to anticipate and categorize how students will behave in related courses and to offer early intervention to improve performance [49]. Discovered a relationship between students’ academic performance and their socio-demographic (e.g., gender, and economic status) and academic (e.g., kind of university and their performance in that school) characteristics.

Ref. [50] Showed how taking into account academic records and socio-demographic data during the enrollment of a given candidate can result in higher-performing models with better prediction accuracy. Students’ study habits and social behavior (partying) data were collected via mobile phones as alternative data sources, and it was discovered that both are strongly connected with their GPA [51]. Ref. [52] Investigated an ML-based model for forecasting student performance. Students’ transcript data, including GPA, were gathered for the study. They employed support vector machines, neural networks, naive cells, and ML (SMO) approaches after pre-processing the data. The IT department’s Naive Bayesian model performed the best (95.7%). A study was conducted by [53] to use NN to predict student performance. An LMS log file including log data on 4601 students across 17 undergraduate courses served as the dataset for this investigation. The study assessed the prediction performance of neural networks against six different classifiers on this dataset to determine the NN applicability. These classifiers, which trained using the data gathered throughout each course, included NB, kNN, DT, RF, SVM and LR. The training characteristics came from LMS data collected during the course of each course, and they ranged from information on how much time was spent on each course page to grades received for course tasks. With an accuracy score of 66.1%, the NN surpassed all other classifiers. Ref. [54] Used a data mining classifier to predict undergraduate students’ performance. Four different classifiers DT, RF, NB, and rule induction—are used to assess student performance. Depending on the different methods it uses, different classifiers display varying degrees of accuracy. These evaluated findings are specifically utilized to forecast the students’ impending grades and the pertinent factors (such as Internet connection, study time, etc.) that influence the students’ academic success. The findings showed that decision trees had a 90.00% prediction accuracy, NB had an 84.00% prediction accuracy, random forest had an 85.00% prediction accuracy, and induction rule had an 82.00% prediction accuracy.

Many previous studies have explored the ability of machine learning algorithms to predict students’ performance by using different types of data, such as demographic information and data from e-learning systems. However, these studies have largely relied on using students’ grades as a measure of performance, without clearly distinguishing between grades as a measure of achievement versus grades as a measure of competence. Furthermore, these studies have not extensively investigated the impact of data heterogeneity and data segmentation on the accuracy of the machine learning algorithms. This paper, in contrast, places a special focus on examining the effect of data heterogeneity and data segmentation on the accuracy of these algorithms and measures students’ performance based on their competence rather than solely relying on their grades.

In summary, these works have shown the effectiveness of using machine learning techniques to predict student behavior and performance in educational settings. These studies have used various algorithms, including J48, Naive Bayes, Neural Network, Bagging, Boosting, Logistic Regression, and Decision Trees. However, the current models are effective for a single course and are helpful locally. Therefore, this study aimed to develop a prediction model for student performance that could be applied to any course offered by the host university, using decision tree algorithms. This study also contributes to the identification of the key factors that influence student performance, including demographic, academic, and social characteristics. The primary stakeholders in educational institutions are the students, and the effectiveness of educational institutions is crucial in generating graduates and post-graduates of the highest caliber.

4. Material and Method

4.1. Dataset Description

The study data were collected over two academic semesters and were focused on seven academic families, out of the more than one hundred majors available in the competency exam. The selected families were chosen to represent half of the total number of families present in the exam, and were also categorized based on their type, either scientific or humanitarian. In addition, the two semesters were differentiated based on the method of participation, whether it was on campus or online. This was carried out to ensure that the collected data were diverse and representative of the overall competency exam. The study’s findings and analyses were based on Table 3, which outlines the number of students present in each family. Overall, the study collected data from a total of 45,392 students, indicating that the dataset is sufficiently large to provide robust results and statistical analyses. By focusing on a diverse range of families and majors, the study provides a comprehensive understanding of the competency exam’s effectiveness in evaluating students’ knowledge and skills in Jordan.

Demographic data refer to information about individuals that includes characteristics such as age, gender, ethnicity, and socioeconomic status, whereas academic data refer to the information related to the student educational background, including their university, major, and grade. The assessment data refer to data collected through tests or evaluations of an individual’s knowledge or skills, such as exam scores or performance in specific tasks. In the context of machine learning, assessment data are often used as training data for algorithms that predict future outcomes based on past performance.

4.2. CE-Exploratory Data Analysis

Demographic data refer to the basic information of students, typically collected from universities and used in the exam system. The data include the student’s name, ID, and gender. The name consists of three syllables in Arabic text, the ID is a unique number, and gender is represented as symbols (1 for male and 2 for female). The academic data for students contain information about their university, major, and grades. The data were received from universities and processed to be uploaded to the exam system. The data include the following attributes:

University code: A unique number that represents each educational institution;
Grade: The last rate of the student at the university, standardized into one of four options (Fair, Good, Very Good, Excellent);
Major: The student’s exact major, standardized into one of 189 majors in the AQACHEI (Arab Quality Assurance and Accreditation Commission for Higher Education Institutions);
College: The college in which the student studies.
The AQACHEI also adds the following data:
Standard Major: The exam system contains a group of standard majors, including 189 majors that cover all the specializations taught in Jordanian universities;
Family: The commission collects all the majors into groups that have common characteristics, each with a two-digit code;
University code: A categorical number that represents each university in which the student studies, converted into a set of symbols in the form of two numbers;
University type: A categorical number that represents each university’s type, based on financing method (government or private).

The data have been modified to standardize some of the information, such as the grade, major, and gender, which were standardized into categorical data.

The assessment data are a representation of the results of students in an exam taken at the AQACHEI (Academic Quality Assurance and Accreditation Commission for Higher Education Institutions). The exam has two levels of competencies—macro level and micro level. The macro level consists of 30 items measuring general skills and is 40% of the total exam items. The micro level is 60% of the total exam and includes competencies that differ for each major. The results for each student were extracted according to their answers and are reported at the level of competence. The study added several variables to improve the analysis and understanding of the data. The first variable, “Family type”, categorizes families into two groups: scientific (1) and humanitarian (2), based on the nature of the study. This information is presented in Table 3, which shows the classification of families used in the study.

Another variable, “Participation Method”, classifies students based on their examination method (0: in-campus, 1: online), and the number of students based on each method is presented in Table 4. The study also created a new variable called “Exam type” which indicates the type of exam the students took (0: general, 1: accurate). Additionally, a questionnaire was added to evaluate the students’ satisfaction with the university, and it was rated from 0 to 5.

Finally, the students’ scores were collected based on each level of competency, and were divided by the number of questions to obtain a percentage score. Based on the exam hypotheses, a proficient student in the general exam was defined as one who scored more than 55% (cut-off score). The exam results were then converted from a numerical score (0–100) to a classification variable (0: not perfect, 1: perfect). A similar process was followed for the accurate exam, with the cut-off score varying according to each family. The final results were then placed in a new variable called “Test result”.

The process of studying and analyzing the data involves converting categorical variables, which are represented as ordinal integers, into numerical ones. This is necessary in order to use various machine learning algorithms and models. The correlation matrix was used to determine the relationship between different variables in the dataset as shown in Figure 1.

There are three types of correlations: positive correlation, negative correlation, and no correlation. Variables that have a correlation coefficient of above 0.5 are considered strongly correlated. To handle non-normality in the data, normalization techniques such as Min-Max scaling are used to shift and rescale the values so that they fall between 0 and 1. The heat map after normalization shows a generally positive correlation between the parameters such as gender, grade, family, family type, survey, and participation method as we can see in Figure 2.

Further verification was performed by testing different scalers and correlation scores as shown in Figure 3.

4.3. Used Machine Learning

The Decision Trees (DT) algorithm was used to build a model that can predict student performance based on the extracted features from the competency exam dataset. The algorithm builds a tree-like model of decisions and their possible consequences, where each internal node represents a test on an attribute and each branch represents the outcome of the test, leading to a new internal node or a leaf node. The Support Vector Machines (SVM) algorithm was used to classify students’ performance as good or poor based on their exam results. SVM is a supervised machine learning algorithm that uses a hyperplane to separate data points into different classes. The Multi-layer Perceptron (MLP) algorithm was used to build a neural network that can predict student performance based on the extracted features. The MLP consists of multiple layers of nodes, where each node in one layer is connected to all nodes in the previous and next layers. The K-Nearest Neighbors (KNN) algorithm was used to classify students’ performance based on the results of their closest neighbors. KNN is a non-parametric algorithm that looks for the k closest training examples in the feature space and uses the majority class among these neighbors as the prediction. Finally, the Logistic Regression (LR) algorithm was used to build a model that can predict the probability of a student’s performance being good or poor based on the extracted features. LR is a parametric algorithm that models the relationship between the dependent variable and one or more independent variables by estimating probabilities using a logistic function. Hyperparameter values for all the machine learning models used in our study are shown in Table 5.

4.4. CE-Model Construction

The Figure 4 methodology used in this research is a three-phase process that aims to predict students’ performance based on their competence. The first phase is data preparation, which involves cleaning and pre-processing the dataset. This step involves removing irrelevant features, handling missing values, and improving the quality of the dataset.

The second phase is data segmentation in the context of modeling student performance using a competency exam dataset involves dividing the dataset into smaller, more manageable subsets for the purpose of analysis and modeling. The purpose of segmenting the data is to reduce the complexity of the data and make it easier to analyze and interpret. Data segmentation also helps to identify patterns and relationships within the data that are not easily noticeable in the larger dataset. When segmenting the competency exam dataset, relevant factors such as student demographic information, prior academic performance, and specific exam scores get used to create subgroups. These subgroups can then be analyzed to identify any correlations between the factors and student performance. The end goal is to use this information to develop accurate machine learning models for predicting student performance based on their competence, rather than just their grades. The final phase is the experimental evaluation, where a set of machine learning algorithms are run on the prepared dataset. Each algorithm produces a prediction model, which is then evaluated and compared using various metrics to choose the most robust model.

The CE dataset was subjected to a series of experiments based on data segmentation. The data were divided into segments based on the type and value of the features. In the first level, all the data were selected. In the second level, the data were segmented based on the method of participation or exam type. In the third level, the student results were analyzed and predicted using data segmented based on exam type and participation method. In the fourth level, data were segmented based on main family, participation method, and exam type. Figure 5 illustrates the process of data segmentation.

After dividing the dataset into a training set and a test set, the next step is to apply various machine learning algorithms and compare their performance in terms of accuracy. The algorithms used in this study included Decision Trees (DT), Support Vector Machines (SVM), Multi-layer Perceptron (MLP), K-Nearest Neighbors (KNN), and Logistic Regression (LR).

The final step involved evaluating the performance of each algorithm and selecting the best performing one. The accuracy of the selected algorithm was then tested on the test set, as the goal of this study was to determine the most effective machine learning algorithm for predicting student performance based on the CE dataset. Through the process of feature selection, data segmentation, and applying various machine learning algorithms, the best performing algorithm was selected and used to make predictions about student performance.

By segmenting the dataset based on different feature types and values, we aimed to perform a deeper analysis of the data. By creating models in sub-datasets, the authors obtain specific trends or patterns in the data that are better captured by the models built in these sub-datasets compared to the base model built on the full dataset. The sub-models built include the exam type, participation method, and family of program, and their performance will be compared to the base model and evaluated using metrics such as accuracy, precision, recall, and F1-score. Ultimately, the goal was to determine which model performs the best in predicting student performance.

5. Experiments and Results

The experiments were based on a CE-dataset that was segmented into different sub-datasets based on feature type and value. Four different classification techniques were used to build prediction models for the student performance. The assessment of prediction models for the student dataset involved segmenting the data based on specific features, applying different classification techniques, and evaluating their performance using metrics such as accuracy and cross-validation. The models were then compared to a base model created from the full dataset and the best-performing sub-model is discussed in each sub-dataset.

Experiment-1 All-Dataset Case Study

In the all-dataset case study, the data were analyzed using four machine learning techniques: Logistic Regression, KNN Classifier, Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Feature selection was performed using Decision Trees (DT) and Random Forest (RF) and Extra Trees (ET), and the most important features were found to be the student’s grade and the university, indicating that the student’s grade and the university’s quality of instruction play a role in student performance. The results of the cross-validation accuracy were presented in Table 6, which showed that MLP achieved the highest accuracy (0.651784) among the four techniques. The other techniques also showed decent results, but MLP outperformed them in terms of accuracy, F1 score, AUC, recall, and precision.

Experiment-2 Macro Exam Case Study

The study analyzed the performance of ML methods applied to a sub-dataset based on student results in a macro exam. The dataset contains 36,977 records and the most important features are found to be grade, survey, and university respectively. The results show that Multi-Layer Perceptron (MLP) achieved the best accuracy with a score of 0.624482. The logistic regression, KNN classifier, and support vector machines also performed well but with lower accuracy compared to MLP. The emergence of the importance of the survey can be explained by the fact that students who are satisfied with their universities are more serious about taking the exams and non-professional students generally tend to be dissatisfied with their universities. The classifier accuracy of the training set for all models shown in Table 7 shows that Multi-Layer Perceptron achieved the best performance with accuracy (0.624482).

Experiment-3 Micro Exam Case Study

In this case study, the results of a machine learning analysis of student results in a micro exam are presented. The most important features found to influence the students’ results are participation method, grade, and university. The participation method is seen as an important feature because it is believed to have an impact on the students’ results in their area of specialization. The results of the machine learning analysis show that the KNN classifier performed the best with an accuracy of 0.753663. This can be attributed to the strong connection between the features used and the student’s specialist. The experiment shows that the most important factors influencing student performance in the micro exam are the participation method, grade, and university. The change in participation method appears to be an important variable, which suggests that the way students participate in the exam (such as online or in-person) has an impact on their results. This could be due to the relationship between the student’s specialization and the participation method, as well as the possibility of variations in the exam questions.

This result suggests that the KNN classifier is the best model for predicting the results of micro exams, with an accuracy of 0.7536. The performance of all machine learning methods used is relatively high, and this can be attributed to the connection of the features used in the dataset, which are more specialized in nature. The fact that the participation method, grade, and university are the most important features can be explained by the impact of the student’s specialization on their performance and the possibility of changing questions at this level. Table 8 shows the accuracy metrics of different machine learning models applied to a micro level dataset of 8415 records. The accuracy, F1, AUC, recall, and precision metrics have been used to evaluate the performance of each model. The KNN classifier achieved the best accuracy of 0.7536, followed by the Multi-layer Perceptron with an accuracy of 0.7314. The Logistic Regression and Support Vector Machines had lower accuracy scores of 0.62297 and 0.621782 respectively. The F1 score is a measure of a model’s precision and recall, and in this case, the KNN classifier also had the highest F1 score of 0.672287. The AUC metric represents the area under the ROC (Receiver Operating Characteristic) curve, and again the KNN classifier had the highest AUC score of 0.736898. The recall and precision metrics are used to measure a model’s ability to identify positive cases, and the KNN classifier performed the best in both recall and precision as well.

In the micro exam case study, the student’s participation method, grade, and university are the most important features in predicting the student’s exam result. The KNN classifier has the best performance among the different machine learning methods applied with an accuracy of 0.753663. The high accuracy of the different models used could be attributed to the connection of the features used at a more specialized level.

Experiment-4 Participation Method, Exam Type and Family Case Study

This case study involves the analysis of the relationship between student performance and the combination of participation method, exam type, and family. The study first categorized students into three levels: participation method, exam type, and family see Table 9. This table presents the number of students in each experiment based on the combination of participation method, exam type, and family dataset. The study aimed to analyze the relationship between student performance and the combination of these factors. The table provides a breakdown of the number of students in each experiment, with a total of 45,392 students included in the analysis. The participation method was divided into two categories—online and in-campus—while the exam type was divided into two categories: macro and micro levels. The family dataset was divided into seven categories, representing half of the families present in the exam, with each family categorized as either scientific or humanitarian. The table provides a comprehensive overview of the sample size for each experiment, which is useful for understanding the scope and generalizability of the findings of the study.

The Decision Tree Classifier algorithm was applied to evaluate the accuracy of the prediction of student performance based on these three factors. The results of the study showed that experiment 9, which involves a traditional type of exam with traditional participation method and family number 4, had the highest prediction accuracy of 0.95. Further analysis was performed on the combination that showed the highest accuracy, and various machine learning algorithms were applied to verify the results. Further analysis of experiment 9 using various machine learning algorithms confirmed the results, with the accuracy ranging from 0.942 to 0.951 as we can see in Table 10. These results suggest that the traditional type of exam with traditional participation method and family number 4 could be considered important factors in predicting student performance.

The results showed that using a student subset-generated model in the fifth level of segmentation led to improved performance compared to other levels. The combination of the factors of Participation Method, Exam Type, and Family Dataset were found to have superior performance compared to other models. The results of the campus Participation Method experiments also showed similar results, indicating that this approach is effective in predicting academic performance. Referring to the research questions related to the use of machine learning in evaluating student performance in Jordanian higher education institutions. The results indicate that ML models can be effectively used to detect student performance, and the experiments showed that ML models have the potential to be an accurate and feasible strategy for evaluating student performance. Additionally, the results of the experiments suggested that the MLP model provides an accurate results. However, it is important to note that no single ML model outperforms other ML models consistently, and selecting the appropriate model depends on the specific context and objectives of the evaluation. Finally, the study found that the variety of student data does impact the accuracy of machine learning algorithms in forecasting results. Thus, researchers and decision-makers should carefully consider the selection and preparation of data sets to achieve the most accurate results.

Discussion and Remarks

We assessed the accuracy of the created models and sub-models in identifying students’ performance. However, when it came to analyzing the various datasets, no single ML model stands out as having higher performance. For the majority of experiments, MLP performs best when predicting the dataset, correctly classifying over 95% of students for the models created from the (Participation method, Exam type, and Family) dataset. The experiments carried out on the combined datasets showed a prediction accuracy of 70%. The results of all-dataset experiments revealed that the second level model in the campus participation method dataset had an accuracy of 83%, while the accuracy rose to 94% for the experiment using the (Participation method, Exam type, and Family) dataset. Furthermore, the findings of the experiments showed that the grade of the students in the university, the university of graduation, and the questionnaire measuring students’ satisfaction with the university all have a significant impact on the performance of the models. The findings suggest that the university grade and the University of Graduation have a significant impact on the performance of the models, and the variable of the questionnaire measuring students’ satisfaction with the university also affects the models’ accuracy. These findings indicate the importance of considering multiple factors when predicting students’ performance. It was also observed that students from the educational sciences family yielded the best prediction results due to their familiarity with the nature of questions asked in competency exams, which focus more on measuring skills rather than direct achievement.

The sub-models combining exam type (macro and micro) with a specific participation method demonstrated better prediction results compared to the model that only considers exam type. However, not all sub-models performed better. Further investigation using datasets that focus on specific features, such as exam type and participation method, has been deemed useful. The results showed that subset models generated from the combination of (Participation method, Exam type, and Family dataset) produced high accuracy, with MLP having the best performance. The results also indicated that factors such as students’ grades, university of graduation, and family background play a significant role in the accuracy of the models. The educational sciences family showed the highest prediction results due to their familiarity with the type of questions in the competency exams. The results showed that the sub-models combining exam type and participation method had superior predictions compared to the exam type only model. The findings suggest that it is useful to consider different datasets for certain features when developing prediction models.

Segmenting the competency dataset into sub-datasets based on majors, type of test (macro or micro), and study type (online or face-to-face) can improve the performance of ML models. By dividing the data into subsets, the ML models can be trained more efficiently, focusing on the specific features of each subset. This approach can also help in identifying any potential bias or variations in the data that affecting the model’s performance. Additionally, using domain-specific feature engineering techniques can also help in improving the accuracy of the ML models. The feature engineering process involves identifying the most relevant features that impact the target variable and transforming the data to highlight these features. Overall, the approach of data segmentation and domain-specific feature engineering can enhance the accuracy of the ML models, leading to better insights and decision-making.

In comparison to previous studies in the context of competency exam prediction using machine learning, this study has several pros and cons. One of the strengths of this study is the large dataset that was used, consisting of over 45,000 students from seven different academic families. This dataset was also collected from Jordanian universities over two academic semesters, which increases the generalizability of the study findings. Additionally, the study used a comprehensive set of demographic and academic variables as predictors, which provided a more holistic view of the factors influencing students’ performance on the CE.

Several research studies that have utilized machine learning techniques to assess competencies and skills across various domains. The studies include a range of applications, such as assessing physician competencies [49], analyzing graduate student employment outcomes [50], assisting volunteers with cause-effect reasoning [51], promoting software engineering competencies [52], evaluating reading level for foreign students, assessing science competency in student-composed text [53], and grading computer programs automatically based on programming skills and program complexity [54]. Table 11 summarize some key differences between our work and others. These studies highlight the potential of machine learning techniques in assessing competencies and skills, providing insights into developing more effective educational and training programs. However, while the literature review covers a wide range of applications, the focus of the current work is on competency exam data and the effect of data segmentation on accuracy. The current work also develops a framework for forecasting and assessing student performance and answers specific questions related to the suitability of ML algorithms and the appropriate model for accuracy. Overall, both the current work and the literature review demonstrate the potential of machine learning for analyzing and predicting student performance, but the current work provides more specific insights into the use of machine learning in analyzing competency exam data.

On the other hand, one of the limitations of the study is that it only utilized one type of algorithm, Random Forest, to predict CE outcomes. Other studies [55,56] have utilized different machine learning algorithms which may have yielded different results. Additionally, the study did not consider non-cognitive variables such as motivation or study habits, which have been shown to be important predictors of academic performance in other studies. Therefore, it is possible that the addition of these variables could have improved the predictive accuracy of the models.

In summary, the findings of this study contribute to the growing body of research on using machine learning techniques to predict academic performance. The study identified important predictors of CE performance and demonstrated the effectiveness of machine learning in predicting student outcomes. However, further research is needed to explore the generalizability of the findings to other contexts and to compare the effectiveness of different machine learning algorithms in predicting academic performance.

Practical implications of this study include the development of a framework that decision-makers and universities can use to evaluate academic programs using ML. This framework can help identify programs and learning outcomes that need to be established by analyzing competency exam data. It can also reduce exam costs by substituting machine learning algorithms for the actual execution of the exam.

6. Conclusions

This paper aimed to compare the performance of machine learning algorithms in terms of accuracy, precision, recall, and f1 score using competency exam data, and to study the effect of data segmentation on these results. The development of a framework for forecasting and assessing student performance was also a goal. The results of the paper indicate a favorable direction for the prediction aspect. Three research questions were created from the core research topic. The first question asked if using ML algorithms to identify student performance using competency exam data was a suitable approach, and the answer was a clear yes. The algorithms were able to identify a pattern in the data and the problem structure was well-suited for a quantitative approach that ML algorithms can easily handle. The second research question concerned the appropriate model based on accuracy while implementing ML models. It was found that there was no single ML model that stood out as having a higher performance, but MLP performed best for the majority of experiments. The third research question asked if the segmentation of student data based on features affects the accuracy of prediction, and it was shown that investigating datasets divided by certain features, such as exam type and participation method, is useful for improving accuracy.

6.1. Practical Implications

These results can be valuable for decision-makers in higher education institutions to assess students without the need for a competency exam, and improve the quality of educational outcomes. The institutions can also use the information to improve their academic programs and provide more support to students before graduation to raise their performance and enhance the quality of their programs.

6.2. Future Recommendations

Future recommendations could include expanding the dataset used in this study to include data from a wider range of universities and students. This would increase the generalizability of the framework and make it applicable to a broader population. Additionally, future research could explore the use of other ML models and techniques to further improve the accuracy and robustness of the framework. Finally, it may be beneficial to investigate the potential impact of using the competency exam results as a KPI on student motivation and performance. Moreover, a larger dataset could have been employed to observe the behavior of the models. The study could have also explored different types of algorithms using the same dataset, in order to compare their performance. Additionally, feature engineering could have been taken to the next level by incorporating new features into the dataset and evaluating the impact of these updates on the machine learning methods. These improvements would have allowed for a more comprehensive examination of the relationship between student performance and machine learning algorithms.

Author Contributions

Conceptualization, A.S. and A.A.-Q.; Methodology, A.S., A.A.-Q. and A.N.; Validation, M.A.; Resources, A.S.; Writing—original draft, I.J.; Writing—review & editing, A.N., A.A., A.M.A., O.R.A.Z. and M.B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Acknowledgments

The author (Amjad Aldweesh) would like to thank the Deanship of Scientific Research at Shaqra University for supporting this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ferguson, T.; Roofe, C.G. SDG 4 in higher education: Challenges and opportunities. Int. J. Sustain. High. Educ. 2020, 21, 959–975. [Google Scholar] [CrossRef]
Wong, S.-C. Competency Definitions, Development and Assessment: A Brief Review. Int. J. Acad. Res. Progress. Educ. Dev. 2020, 9, 95–114. [Google Scholar] [CrossRef] [PubMed]
Shilbayeh, S.; Abonamah, A. Predicting Student Enrolments and Attrition Patterns in Higher Educational Institutions using Machine Learning. Int. Arab J. Inf. Technol. 2021, 18, 562–567. [Google Scholar] [CrossRef]
Sharabati, A.A.; Al-Wadi, M.; Noor, A.-N. The Jordanian Universities Competencies Tests—Comparative Study. In Proceedings of the Fifth International Arab Conference on Quality Assurance in Higher Education (IACQA2014), Sharjah, United Arab Emirates, 3–5 March 2015. [Google Scholar]
Sillat, L.H.; Tammets, K.; Laanpere, M. Digital competence assessment methods in higher education: A systematic literature review. Educ. Sci. 2021, 11, 402. [Google Scholar] [CrossRef]
Henri, M.; Johnson, M.D.; Nepal, B. A Review of Competency-Based Learning: Tools, Assessments, and Recommendations. J. Eng. Educ. 2017, 106, 607–638. [Google Scholar] [CrossRef]
Iatrellis, O.; Savvas, I.K.; Kameas, A.; Fitsilis, P. Integrated learning pathways in higher education: A framework enhanced with machine learning and semantics. Educ. Inf. Technol. 2020, 25, 3109–3129. [Google Scholar] [CrossRef]
Holmes, N. Engaging with assessment: Increasing student engagement through continuous assessment. Act. Learn. High. Educ. 2017, 19, 23–34. [Google Scholar] [CrossRef] [Green Version]
Islam, A.; Tasnim, S. An Analysis of Factors Influencing Academic Performance of Undergraduate Students: A Case Study of Rabindra University, Bangladesh (RUB). Shanlax Int. J. Educ. 2021, 9, 127–135. [Google Scholar] [CrossRef]
Adejo, O.W.; Connolly, T. Predicting student academic performance using multi-model heterogeneous ensemble approach. J. Appl. Res. High. Educ. 2018, 10, 61–75. [Google Scholar] [CrossRef]
Adekitan, A.I.; Noma-Osaghae, E. Data mining approach to predicting the performance of first year student in a university using the admission requirements. Educ. Inf. Technol. 2018, 24, 1527–1543. [Google Scholar] [CrossRef]
Agrawal, S.; Vishwakarma, S.K.; Sharma, A.K. Using data mining classifier for predicting student’s performance in UG level. Int. J. Comput. Appl. 2017, 172, 39–44. [Google Scholar] [CrossRef]
Alhassan, A.; Zafar, B.; Mueen, A. Predict Students’ Academic Performance based on their Assessment Grades and Online Activity Data. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef]
Abu Amrieh, E.; Hamtini, T.; Aljarah, I. Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. Int. J. Database Theory Appl. 2016, 9, 119–136. [Google Scholar] [CrossRef]
Belachew, E.B.; Gobena, F.A. Student Performance Prediction Model using Machine Learning Approach: The Case of Wolkite University. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2017, 7, 46–50. [Google Scholar] [CrossRef]
Bharara, S.; Sabitha, A.S.; Bansal, A. Application of learning analytics using clustering data Mining for Students’ disposition analysis. Educ. Inf. Technol. 2017, 23, 957–984. [Google Scholar] [CrossRef]
Bhardwaj, B.K.; Pal, S. Data Mining: A prediction for performance improvement using classification. arXiv 2012, arXiv:1201.3418. [Google Scholar]
Fan, C.; Chen, M.; Wang, X.; Wang, J.; Huang, B. A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data. Front. Energy Res. 2021, 9, 652801. [Google Scholar] [CrossRef]
Cantabella, M.; Martínez-España, R.; Ayuso, B.; Yáñez, J.A.; Muñoz, A. Analysis of student behavior in learning management systems through a Big Data framework. Future Gener. Comput. Syst. 2019, 90, 262–272. [Google Scholar] [CrossRef]
Cerezo, R.; Sánchez-Santillán, M.; Paule-Ruiz, M.P.; Núñez, J.C. Students′ LMS interaction patterns and their relationship with achievement: A case study in higher education. Comput. Educ. 2016, 96, 42–54. [Google Scholar] [CrossRef]
Chui, K.T.; Fung, D.C.L.; Lytras, M.D.; Lam, T.M. Predicting at-risk university students in a virtual learning environment via a machine learning algorithm. Comput. Hum. Behav. 2018, 107, 105584. [Google Scholar] [CrossRef]
Chung, R.G.; Lo, C.L. The development of teamwork competence questionnaire: Using students of business administration department as an example. Int. J. Technol. Eng. Educ. March Spec. Issue 2007, 51–57. [Google Scholar]
Love, B.C. Comparing supervised and unsupervised category learning. Psychon. Bull. Rev. 2002, 9, 829–835. [Google Scholar] [CrossRef] [Green Version]
Nayak, P.S.; Hiremath, S.G.; Biradar, A. Neural Networks: All You Need to Know. J. Adv. Commun. Syst. 2018, 1, 12–16. [Google Scholar]
Roger, D.D.; Gupta, A.; Yule, S.J. Using Machine Learning to Assess Physician Competence: A Systematic Review. Academic medicine. J. Assoc. Am. Med. Coll. 2019, 94, 3. [Google Scholar]
Diab, S. Optimizing stochastic gradient descent in text classification based on fine-tuning hyper-parameters approach. A case study on automatic classification of global terrorist attacks. arXiv 2019, arXiv:1902.06542. [Google Scholar]
Erisman, W.; Steele, P. Adult College Completion in the 21st Century: What We Know and What We Don’t. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2622629 (accessed on 16 February 2023).
Pereira, F.D.; Oliveira, E.H.T.; Fernandes, D.; Cristea, A. Early performance prediction for CS1 course students using a combination of machine learning and an evolutionary algorithm. In Proceedings of the 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT), Maceio, Brazil, 15–18 July 2019. [Google Scholar]
Ghorbani, R.; Ghousi, R. Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques. IEEE Access 2020, 8, 67899–67911. [Google Scholar] [CrossRef]
Granitto, P.M.; Furlanello, C.; Biasioli, F.; Gasperi, F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom. Intell. Lab. Syst. 2006, 83, 83–90. [Google Scholar] [CrossRef]
González-Sopeña, J.; Pakrashi, V.; Ghosh, B. An overview of performance evaluation metrics for short-term statistical wind power forecasting. Renew. Sustain. Energy Rev. 2020, 138, 110515. [Google Scholar] [CrossRef]
Turabieh, H. Hybrid Machine Learning Classifiers to Predict Student Performance. In Proceedings of the 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 9–11 October 2019. [Google Scholar]
Hasan, R.; Palaniappan, S.; Raziff AR, A.; Mahmood, S.; Sarker, K.U. Student academic performance prediction by using decision tree algorithm. In Proceedings of the 2018 4th International Conference on Computer and Information Sciences (ICCdOINS), Istanbul, Turkey, 3–5 May 2018; pp. 1–5. [Google Scholar]
Villagrá-Arnedo, C.J.; Gallego-Durán, F.J.; Llorens-Largo, F.; Compañ-Rosique, P.; Satorre-Cuerda, R.; Molina-Carmona, R. Improving the expressiveness of blackbox models for predicting student performance. Comput. Hum. Behav. 2017, 72, 621–631. [Google Scholar] [CrossRef] [Green Version]
Rolland, S.S.; de Oliveira, C.F. Predicting Students Performance in Introductory Programming Courses: A Literature Review. 2021. Available online: http://repositorio.upt.pt/jspui/handle/11328/3396 (accessed on 12 January 2023).
Jakkula, V. Tutorial on Support Vector Machine (svm); School of EECS, Washington State University: Pullman, WA, USA, 2006; Volume 37, p. 3. [Google Scholar]
Mastoory, Y.; Rajaee Harandi, S.; Abdolvand, N. The effects of communication networks on students′ academic performance: The synthetic approach of social network analysis and data mining for education. Int. J. Integr. Technol. Educ. 2016. [CrossRef]
Kotsiantis, S.B.; Kanellopoulos, D.; Pintelas, P.E. Data preprocessing for supervised leaning. Int. J. Comput. Sci. 2006, 1, 111–117. [Google Scholar]
Molina-Carmona, R.; Compañ-Rosique, P.; Satorre-Cuerda, R.; Villagrá-Arnedo, C.J.; Gallego-Durán, F.J.; Llorens-Largo, F. Technological Ecosystem Maps for IT Governance: Application to a Higher Education Institution. In Open Source Solutions for Knowledge Management and Technological Ecosystems; IGI Global: Hershey, PA, USA, 2017; pp. 50–80. [Google Scholar]
Patil, S.; Saroja, K. Mining social media data for understanding students’ learning experiences using memetic algorithm. Mater. Today Proc. 2018, 5, 693–699. [Google Scholar] [CrossRef]
Sorour, S.E.; Tsunenori, M. Building an interpretable model of predicting student performance using comment data mining. In Proceedings of the 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, 10–14 July 2016; pp. 285–291. [Google Scholar]
Saa, A.A. Educational data mining & students’ performance prediction. Int. J. Adv. Comput. Sci. Appl. 2016, 7. [Google Scholar]
Pal, S.; Chaurasia, V. Is alcohol affect higher education students performance: Searching and predicting pattern using data mining algorithms. Int. J. Innov. Adv. Comput. Sci. 2017, 2347–8616. [Google Scholar] [CrossRef]
Okagbue, E.F.; Ezeachikulo, U.P.; Nchekwubemchukwu, I.S.; Chidiebere, I.E.; Kosiso, O.; Ouattaraa, C.A.T.; Nwigwe, E.O. The effects of Covid-19 pandemic on the education system in Nigeria: The role of competency-based education. Int. J. Educ. Res. Open 2023, 4, 100219. [Google Scholar] [CrossRef]
Helal, S.; Li, J.; Liu, L.; Ebrahimie, E.; Dawson, S.; Murray, D.J.; Long, Q. Predicting academic performance by considering student heterogeneity. Knowl.-Based Syst. 2018, 161, 134–146. [Google Scholar] [CrossRef]
Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015. [Google Scholar]
Khalil, M.; Ebner, M. Clustering patterns of engagement in Massive Open Online Courses (MOOCs): The use of learning analytics to reveal student categories. J. Comput. High. Educ. 2016, 29, 114–132. [Google Scholar] [CrossRef]
Khasanah, A.U.; Harwati, A. Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques. IOP Conf. Ser. Mater. Sci. Eng. 2017, 215, 012036. [Google Scholar] [CrossRef] [Green Version]
Mahboob, T.; Irfan, S.; Karamat, A. A machine learning approach for student assessment in E-learning using Quinlan′s C4. 5, Naive Bayes and Random Forest algorithms. In Proceedings of the 2016 19th International Multi-Topic Conference (INMIC), Islamabad, Pakistan, 5–6 December 2016; pp. 1–8. [Google Scholar]
García-Peñalvo, F.J.; Cruz-Benito, J.; Martin-Gonzalez, M.; Vázquez-Ingelmo, A.; Sánchez-Prieto, J.C.; Therón, R. Proposing a Machine Learning Approach to Analyze and Predict Employment and its Factors. Int. J. Interact. Multimed. Artif. Intell. 2018, 5, 39. [Google Scholar] [CrossRef] [Green Version]
Pandey, R.; Gaurav, B.; Hemant, P. EMAssistant: A Learning Analytics System for Social and Web Data Filtering to Assist Trainers and Volunteers of Emergency Services. In Proceedings of the 16th International Conference of Information Systems for Crisis Response and Management, Valencia, Spain, 19–22 May 2019. [Google Scholar]
Thanachawengsakul, N.; Wannapiroon, P.; Nilsook, P. The Knowledge Repository Management System Architecture of Digital Knowledge Engineering using Machine Learning to Promote Software Engineering Competencies. Int. J. Emerg. Technol. Learn. (Ijet) 2019, 14, 42–56. [Google Scholar] [CrossRef] [Green Version]
Alhothali, A.; Albsisi, M.; Assalahi, H.; Aldosemani, T. Predicting Student Outcomes in Online Courses Using Machine Learning Techniques: A Review. Sustainability 2022, 14, 6199. [Google Scholar] [CrossRef]
Leeman-Munk, S.P.; Wiebe, E.N.; Lester, J.C. Assessing elementary students’ science competency with text analytics. In Proceedings of the Fourth International Conference on Learning Analytics and Knowledge (LAK ‘14), Indianapolis, Indiana, 24–28 March 2014. [Google Scholar]
Ziker, C.; Truman, B.; Dodds, H. Cross Reality (XR): Challenges and Opportunities across the Spectrum. In Innovative Learning Environments in STEM Higher Education; Ryoo, J., Winkelmann, K., Eds.; Springer: Cham, Switzerland, 2021; pp. 55–77. [Google Scholar]
Ciolacu, M.; Tehrani, A.F.; Binder, L.; Svasta, P.M. Education 4.0—Artificial Intelligence Assisted Higher Education: Early recognition System with Machine Learning to support Students’ Success. In Proceedings of the 2018 IEEE 24th International Symposium for Design and Technology in Electronic Packaging (SIITME), Iasi, Romania, 25–28 October 2018; pp. 23–30. [Google Scholar]

Figure 1. Heat Map Between Features.

Figure 2. Heat Map after Normalization.

Figure 3. Different Scalers and Correlation.

Figure 4. Research methodology.

Figure 5. Data segmentation.

Table 1. Description of Fields Used in Summarized Articles.

Field	Description	Categories
Learning type	The learning method used for students in the sample used	E-learning Traditional Learning Both (E-learning And Traditional Learning)
Dataset source	The source of the used sample and the method of collecting the sample	Pre-existing data (through university systems, e-learning systems, or other sources) Questionnaire (a set of questions to be answered by the sample students)
Sample size	The number of rows in the sample
Type of data	Classifying the data based on a set of foundations specified in this type of research	Demographic data (Students’ personal data such as gender and age) Academic data (The student’s statement at the university is akin to the average of specialization) Other data (Other student data such as internet usage and student interaction on social networking sites)
ML MODELS	ML algorithms used to predict student performance	Classification and regression algorithms

Table 2. Literature Summary.

Reference	LT *	DS *	Sample Size	Type of Data	ML MODELS
Ref. [34]	E	P	336	Others	SVM
Ref. [19]	E	P	76,268	Demographic data, others	AP
Ref. [21]	M	P	32,593	Demographic data, Academic Others	RTV-SVM
Ref. [35]	M	P		Demographic data, Academic, others	SVM, J48-NN
Ref. [36]	T	P	6845	Demographic data, Academic	RF(TH), RF(L), LR-NN
Ref. [16]	M	P	500	Demographic data, Academic, others	K-means
Ref. [14]	M	Q	500	Demographic data, Academic, others	ANN, NB, J48
Ref. [13]	M	P	60	Demographic data, Academic, others	NB-MLP, C4.5
Ref. [37]	M	P	139	Demographic data, Academic, others	CART
Ref. [33]	M	P	22	Academic, others	J48, RT-RF
Ref. [38]	M	P	60	Demographic data Academic, others	J48, NB-RF
Ref. [39]	T	Q		Demographic data, Academic, others	RT
Ref. [40]	T	P	2785	Others, Academic	MA, ID3-NB
Ref. [41]	T	Q	89	Academic	C4.5-RF
Ref. [11]	T	P	1445	Demographic data, Academic	Tree, RF, NN, NB, LR
Ref. [42]	T	Q	270	Demographic data, others	C4.5, ID3-CART, CHAID
Ref. [43]	T	Q	450	Demographic data, Academic, others	BFTree, J48, RepTree, Simple Cart
Ref. [20]	E	P	140	Others	EM

* P: Pre-existing data, Q: Questionnaire, T: Traditional Learning, E: E-learning, M: Mix, DS: Dataset source, LT: Learning Type.

Table 3. Classification of families.

Figure	Family Code	Family Type
IT	1	Scientific (1)
Ethics	2	Humanitarian (2)
Business	3	Humanitarian (2)
Education	4	Humanitarian (2)
Medical	5	Scientific (1)
Languages	6	Humanitarian (2)
Sciences	7	Scientific (1)

Table 4. Participation Method/Student.

Participation Method	Students Number
In campus (0)	26,802
Online (1)	18,590

Table 5. Hyperparameters used in ML-algorithms.

Model	Hyperparameters	Hyperparameter Values
Decision Trees (DT)	Max depth	7, 10, 20
	Min samples split	5, 10
Support Vector Machines (SVM)	Kernel	Linear
	C	10
Multi-layer Perceptron (MLP)	Hidden layer sizes	(50,50)
	Activation function	ReLU
K-Nearest Neighbors (KNN)	Number of neighbors	5
Logistic Regression (LR)	Regularization	L2

Table 6. Classification performances (all-dataset case study).

Data Set Size	Ml-Model	Accuracy Metrics
45,392		accuracy	f1	Auc	Recall	precision
	Logistic Regression	0.609267	0.691196	0.586553	0.784585	0.617675
	KNN Classifier	0.634087	0.683358	0.624455	0.708432	0.659998
	Multi-Layer Perceptron	0.651784	0.725801	0.629099	0.826877	0.646744
	Support Vector Machines	0.624027	0.688981	0.608073	0.747167	0.639202

Table 7. Classification performances (macro exam dataset).

Data Set Size	Ml-Model	Accuracy Metrics
36,977		Accuracy	f1	Auc	recall	precision
	Logistic Regression	0.606364	0.748082	0.514323	0.973866	0.607287
	KNN Classifier	0.607626	0.686361	0.580639	0.71538	0.659604
	Multi-Layer Perceptron	0.624482	0.72014	0.579259	0.805047	0.651434
	Support Vector Machines	0.600144	0.750113	0.5	1	0.600144

Table 8. Classification performance (micro level dataset).

Data Set Size	Ml-Model	Accuracy Metrics
8415		Accuracy	f1	Auc	recall	precision
	Logistic Regression	0.62297	0.006263	0.501571	0.003141	1
	KNN Classifier	0.753663	0.672287	0.736898	0.668063	0.676564
	Multi-Layer Perceptron	0.731485	0.639362	0.711475	0.629319	0.64973
	Support Vector Machines	0.621782	0	0.5	0	0

Table 9. Numbers of students in each experiment (Participation method and Exam type and Family dataset).

	Participation Methods		Exam Type		Family						Number of Students
	1	0	1	0	1	2	3	4	5	6	Number of Students
Experiment 6											1022
Experiment 7											2492
Experiment 8											2133
Experiment 9											2767
Experiment 10											4914
Experiment 11											6902
Experiment 12											2767
Experiment 13											1881
Experiment 14											1821

Table 10. Classification performances of experiments (Participation method and Exam type and Family dataset).

Data Set Size	Ml-Model	Accuracy
2767	Logistic Regression	0.942238
	KNN Classifier	0.951865
	Multi-Layer Perceptron	0.942238
	Support Vector Machines	0.942238

Table 11. Key differences between our work and others.

Reference	Key Points
[49]	Assessing physicians’ competencies using NLP, SVM, and hidden Markov models, good results were obtained
[50]	Predictive models for analyzing how graduate students get employed using clustering and machine learning algorithms, Better understanding of the relation between students’ competencies and suitable jobs
[51]	EMAssistant system using machine learning techniques to assist volunteers with cause–effect reasoning using open-source system
[52]	Knowledge repository management system architecture for promoting software engineering competencies using machine learning using the six-step knowledge verification process
[53]	Hybrid text analytics method called “WriteEval” for analyzing student-composed text to assess science competency using machine learning using Soft cardinality and precedent feature collection techniques
[54]	System for grading computer programs automatically using machine learning techniques based on programming skills and program complexity using regression models (linear regression, SVM, Random Forests) used for model learning and accurate grades obtained
Current work	The effect of data segmentation on the accuracy for different ML using competency exam data

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Smadi, A.; Al-Qerem, A.; Nabot, A.; Jebreen, I.; Aldweesh, A.; Alauthman, M.; Abaker, A.M.; Al Zuobi, O.R.; Alzghoul, M.B. Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation. Sustainability 2023, 15, 5267. https://doi.org/10.3390/su15065267

AMA Style

Smadi A, Al-Qerem A, Nabot A, Jebreen I, Aldweesh A, Alauthman M, Abaker AM, Al Zuobi OR, Alzghoul MB. Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation. Sustainability. 2023; 15(6):5267. https://doi.org/10.3390/su15065267

Chicago/Turabian Style

Smadi, Ala, Ahmad Al-Qerem, Ahmad Nabot, Issam Jebreen, Amjad Aldweesh, Mohammad Alauthman, Awad M. Abaker, Omer Radhi Al Zuobi, and Musab B. Alzghoul. 2023. "Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation" Sustainability 15, no. 6: 5267. https://doi.org/10.3390/su15065267

APA Style

Smadi, A., Al-Qerem, A., Nabot, A., Jebreen, I., Aldweesh, A., Alauthman, M., Abaker, A. M., Al Zuobi, O. R., & Alzghoul, M. B. (2023). Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation. Sustainability, 15(6), 5267. https://doi.org/10.3390/su15065267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unlocking the Potential of Competency Exam Data with Machine Learning: Improving Higher Education Evaluation

Abstract

1. Introduction

2. Related Works

3. Academic—Data Heterogeneity

4. Material and Method

4.1. Dataset Description

4.2. CE-Exploratory Data Analysis

4.3. Used Machine Learning

4.4. CE-Model Construction

5. Experiments and Results

6. Conclusions

6.1. Practical Implications

6.2. Future Recommendations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI