Development of a Model Using Data Mining Technique to Test, Predict and Obtain Knowledge from the Academics Results of Information Technology Students

Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge can be productive in understanding numerous challenges in the scope of education and providing possible solutions to these challenges. The main objective of this research is to utilize the J48 decision algorithm model to test, classify and predict the students’ dataset by identifying some important attributes and instances. The analysis was conducted on the final year students’ academic results in C# programming amongst five universities which was imported in csv excel file dataset in WEKA environment. These training datasets contained the scores obtained in the examinations, grade remarks, grades, gender, and department. The knowledge extracted for the prediction model will help both the tutors and students to determine the success grade performance in the future. Flow lines, J48 decision trees, confusion matrices and a program flowchart were generated from the students’ dataset. The KAPPA value obtained from the prediction in this research ranges from 0.9070–0.9582 which perfectly agrees with the standard for an ideal analysis on datasets.


Introduction
The students' academic performance is an important aspect in most tertiary educational system, particularly the higher learning institutions. The excellent records achieved amongst students' academic performances in examinations have become one of the key factors in considering tertiary institutions on the highly ranked Q.S world university rating system [1]. In the world today, a huge amount of students' data increases daily which makes it very critical to perform analysis on data to discover and retrieve useful information likewise knowledge from this data. There are numerous techniques that have been proposed in the evaluation (which involves testing, prediction and knowledge discovery of dataset) of students' academic performance. Data mining is one of the most common techniques utilized to analyze the academic performance of students and it has been recently applied in a vast approach regarding the educational sectors [2]. Data mining, also known as Knowledge discovery from data (KDD), can be defined a process of discovering interesting patterns and knowledge from stored data. Data Mining has various methods for used analyzing which include classification, clustering, and association rules [3]. Data mining could also be referred to as data dredging, which is a multidisciplinary field that obtains relevant information from large amount of data at the confluence among other specializations which includes artificial intelligence, statistics, databases, and information science [4].
In the educational sectors, one of the major objectives is to provide learning processes that allow for understanding students and their learning paths, termed as Educational Data Mining and Learning Analytics (EDM/LA). Educational Data Mining (EDM) is a discipline that focuses on extraction of useful information and knowledge from huge educational database, thereby utilizing this useful information and knowledge dredged to predict students' academic performance [5]. Apart from extracting and analyzing educational data, Educational Data Mining can enhance and develop students' performance in the teaching and learning domain [6]. There are several works in Educational Data Mining and Learning Analytics (EDM/LA) which has been devoted to prediction methods of student performance. According to [7], the authors compared different decision trees based on the students' academic performance for prediction. The decision trees were able to reveal the total number of students with excellent grades and those with failed grades, as this prediction effectively improved both the teaching/learning process in the institution and mitigated the failure rate amongst the students.
WEKA is a Data Mining tool used for managing the experimental analysis for data mining process such as (predictions, classification, clustering, association rule and evaluation); it also provides a flexible support for machine learning research and serves as a tool for introducing people to machine learning in the educational environment [8]. This research work focuses on using the J48 decision tree Classification model in WEKA to analyze the students' academic performance of Information Technology (I.T) department in five universities across five countries which includes Iraq, Sudan, Nigeria, South Africa, and India. The data was obtained from the records of the undergraduate students in the final year study of the five countries in the second semester of examinations. The authors in [9] revealed the taxonomy for Data mining approaches and this was illustrated pictorially, see Figure 1. science [4]. In the educational sectors, one of the major objectives is to provide learning processes that allow for understanding students and their learning paths, termed as Educational Data Mining and Learning Analytics (EDM/LA). Educational Data Mining (EDM) is a discipline that focuses on extraction of useful information and knowledge from huge educational database, thereby utilizing this useful information and knowledge dredged to predict students' academic performance [5]. Apart from extracting and analyzing educational data, Educational Data Mining can enhance and develop students' performance in the teaching and learning domain [6]. There are several works in Educational Data Mining and Learning Analytics (EDM/LA) which has been devoted to prediction methods of student performance. According to [7], the authors compared different decision trees based on the students' academic performance for prediction. The decision trees were able to reveal the total number of students with excellent grades and those with failed grades, as this prediction effectively improved both the teaching/learning process in the institution and mitigated the failure rate amongst the students. WEKA is a Data Mining tool used for managing the experimental analysis for data mining process such as (predictions, classification, clustering, association rule and evaluation); it also provides a flexible support for machine learning research and serves as a tool for introducing people to machine learning in the educational environment [8]. This research work focuses on using the J48 decision tree Classification model in WEKA to analyze the students' academic performance of Information Technology (I.T) department in five universities across five countries which includes Iraq, Sudan, Nigeria, South Africa, and India. The data was obtained from the records of the undergraduate students in the final year study of the five countries in the second semester of examinations. The authors in [9] revealed the taxonomy for Data mining approaches and this was illustrated pictorially, see Figure 1. The research conducted in [10] revealed that the authors substantiated and built methodology for an ensemble classification of individual students' performance and collective performance quantification. According to [11], educational data mining involves four development phases which are filtering process of the students' data: selection of attributes or variables relating to their performance; extraction of knowledge for the filtered students' data; interpretation and evaluation. The research study by the authors in [12] was conducted by predicting successfully binary academic performance on school students who had number of passed test as 40-60% in both mathematics and computer science with the aim of obtaining correlation between the scores to investigate the student' The research conducted in [10] revealed that the authors substantiated and built methodology for an ensemble classification of individual students' performance and collective performance quantification. According to [11], educational data mining involves four development phases which are filtering process of the students' data: selection of attributes or variables relating to their performance; extraction of knowledge for the filtered students' data; interpretation and evaluation. The research study by the authors in [12] was conducted by predicting successfully binary academic performance on school students who had number of passed test as 40-60% in both mathematics and computer science with the aim of obtaining correlation between the scores to investigate the student' cognitive abilities. The J48 algorithm is one of the best machine learning algorithms which can examine educational data categorically and continuously; it has been used by most researchers for classification of students' dataset and it usually obtains accurate results [13]. According to research study conducted in [14], the J48 algorithm was utilized for classification on students' dataset also comparing their performances with evaluation principles such as accuracy and implementation time. It revealed that the performance of classification techniques differs with datasets. The study also showed that factors such students' datasets, number of instances, attributes and the type of attributes enhanced the classifier's performance. J48 came out with better results on most educational dataset [13,14]. Researchers have applied decision tree utilizing the J48 classification algorithm to predict academic performances of students in the tertiary institution by simply testing this algorithm on unseen dataset to calculate accuracy. They intend to use this algorithm build model that can be used by the university to predict student performance, evaluate the teaching skills adopted by the lecturers and improve the learning potentials of the students in the other academic specializations [15].

Dataset Description
The data of the students' academic record analyzed in WEKA utilized the J48 classification algorithm method to test and predict from the students' future learning outcome using final year students' dataset record from five countries. The analysis was conducted on the students' academic results in C# programming language examinations with a total grade of 100%. The departments considered include Computer Science in Lagos State University Nigeria; Computer Science in University of Kirkuk Iraq; School of computers and systems science in Jawaharlal Nehru University New Delhi India; College of Computer Science and Information Technology in Sudan University of Science and Technology, Khartoum Sudan; and Computer Sciencein University of Cape Town South Africa. The students' dataset obtained consist of five attributes which are "scores obtained in the C-SHARP (C#) examinations", "grade remarks", "grades", "gender" and "department". For the purpose of the J4.8 algorithm analysis in WEKA, only "grades" columns to produce a detailed accuracy class reading. The grades were classified into A (70-100) marks, B (60-69) marks, C (50-59) marks, D (40-49) marks and F (0-39) marks which depicts excellent, very good, average, poor and failed, respectively. The functional requirements for the analysis of the students' data conducted in WEKA can be illustrated pictorial with the aid of program flowchart. Program Flow charts ( Figure 2) are data flow that describes the sequence of data operations and decisions for a particular program or algorithm [16].

Methods
The J48 Decision Algorithm is a predictive machine learning model that the dependent variables also known as target value of a new sample based on various attribute values of the data available [17]. The node of a J48 decision tree denotes the different utilized attributes [18]. With the aid of tree classification algorithm, the essential distribution of data become easier to understand and flexible to implement. J48 is an extension of ID3 and it develops a decision node utilizing the expected estimations of the class. J48 algorithm deals with decision trees pruning, lost or missing attribute estimations of the data and varying attribute costs [19]. The J48 algorithm can be generated via the following three stages [20]:

Methods
The J48 Decision Algorithm is a predictive machine learning model that the dependent variables also known as target value of a new sample based on various attribute values of the data available [17]. The node of a J48 decision tree denotes the different utilized attributes [18]. With the aid of tree classification algorithm, the essential distribution of data become easier to understand and flexible to implement. J48 is an extension of ID3 and it develops a decision node utilizing the expected estimations of the class. J48 algorithm deals with decision trees pruning, lost or missing attribute estimations of the data and varying attribute costs [19]. The J48 algorithm can be generated via the following three stages [20]: • Stage 1: If an instance belongs to similar class, the leaves are labeled with a similar class; • Stage 2: For each attribute, the potential data will be figured and the gain in this data will be attained from the test conducted on attribute; • Stage 3: Finally, the best attribute will be selected in regard to the current selection parameter.

Students' Dataset Analysis in WEKA
The J48 tree generated in WEKA for the students' academic dataset across the 5 countries utilized 50% percentage split with training set: 25% for the test data and the remaining 25% for validate to obtain the classifier model. The J48 decision tree classifier output algorithm obtained from the students' result for the five universities analyzed is displayed in the Appendix A section of this work.

Calculations of the Evaluation Measures of the Detailed Accuracy Class Table
In the data analysis conducted, the three standard measures used in the evaluation of the classification qualities include the Recall, Precision and F-Measure. Precision is the ratio of the correctly classified cases of total number of misclassified cases and correctly classified cases [21]. The recall is the ratio of correctly classified samples to the total number of unclassified instances and correctly classified cases. The F-measure is the aggregate of the values of recall and precision [21,22]. Other measures used in the obtaining and evaluation of results include the execution time, TP rate, FP rate, ROC area, PRC area and confusion matrix [23].
The calculations of the precision, F-measure, recall values can be obtained using the Equations (1)-(3), respectively: The TP represents the values of the true positive rate; the FP represents false positive rate value, and the FN represents the false negative rate. The precision, F-measure and the Recall values are some of the evaluation parameters generated in WEKA in the detailed accuracy by class table.

Outcomes of J48 Decision Tree Generated from Students' Dataset Analysis
This section shows the J48 decision trees generated from the students' academic result imported in WEKA environment platform for the analysis. See Figures 3-7. The Grade_Remarks Attribute Platform for Students' dataset is shown in Appendix A of this research.

Results and Discussion
The results for the analysis, based on the Kappa statistical values, mean that absolute error, recall, Precision, and F-Measure obtained from the five universities can be computed in tabular form. Table 1 shows the values obtained from the student's dataset analysis. The Kappa interpretation obtained revealed a range of 0.9070-0.9582 which perfectly agrees with the general values for most analysis.

Plots of Evaluation Parameters from the Analysis Conducted on the Students' Dataset
The parameters (TP Rate, FP Rate, Precision, Recall, F-Measure, MCC, ROC-Area and PRC-Area) obtained in this research work based on detail accuracy class analysis revealed from WEKA, we plotted flow lines that illustrate these parameters for the purpose of obtaining knowledgeable patterns to be displayed in a statistical perspective. These flow lines were illustrated based on values of the evaluation parameter derived from the WEKA analysis conducted on the five universities considered as case study in this work. Figures 8-12 illustrates the plots of the parameters for the five universities.

Results and Discussion
The results for the analysis, based on the Kappa statistical values, mean that absolute error, recall, Precision, and F-Measure obtained from the five universities can be computed in tabular form. Table 1 shows the values obtained from the student's dataset analysis. The Kappa interpretation obtained revealed a range of 0.9070-0.9582 which perfectly agrees with the general values for most analysis.

Plots of Evaluation Parameters from the Analysis Conducted on the Students' Dataset
The parameters (TP Rate, FP Rate, Precision, Recall, F-Measure, MCC, ROC-Area and PRC-Area) obtained in this research work based on detail accuracy class analysis revealed from WEKA, we plotted flow lines that illustrate these parameters for the purpose of obtaining knowledgeable patterns to be displayed in a statistical perspective. These flow lines were illustrated based on values of the evaluation parameter derived from the WEKA analysis conducted on the five universities considered as case study in this work. Figures 8-12 illustrates the plots of the parameters for the five universities.

Analysis of J48 Decision Trees Generated in WEKA for the Five Universities
The Figures 3-7 shown in this research study illustrates the J48 decision trees generated in WEKA for the five universities. In this section, we provided a detailed explanation of the J48 tree generated in the Section 3.3 of this work. The J48 decision tree classifier shown in Figure 3 illustrates that 11 students had grade A and passed with scores greater than 69 marks; 6 students had grade B, passed with scores greater than 59 marks and less than equal to 69 marks; 7 students had grade C and passed with scores less than or equal to 59 marks; 7 students had grade D and failed with scores greater than 39 marks; and 9 students had grade F with scores less than or equal to 39 marks. In general, a total of twenty-four students were in the category of those who passed while total of sixteen students were in the category of those failed. The J48 decision tree classifier shown in Figure  4 illustrates that 9 students had grade A and passed with scores greater than 69 marks; 7 students had grade B, passed with scores greater than 59 marks and less than equal to 69 marks; 10 students had grade Cand passed with scores less than or equal to 59 marks; ten students had grade D and failed with scores greater than 39 marks; and 4 students had grade F with scores less than or equal to 39 marks. In general, a total of twenty-six students were in the category of those who passed while total of fourteen students were in the category of those who failed. The J48 decision tree classifier shown in Figure 5 illustrates that 2 students had grade A and passed with scores greater than 66 marks; 8 students had grade B, passed with scores greater than 59 marks and less than equal to 66 marks; 9 students had grade Cand passed with scores less than or equal to 59 marks; 12students had grade D and failed with scores greater than 39 marks; and nine students had grade F with scores less than or equal to 39 marks. In general, a total of nineteen students were in the category of those who passed while total of 21 students were in the category of those who failed. The J48 decision tree classifier shown in Figure 6 illustrates that 13 students had grade A and passed with scores greater than 67 marks; 4 students had grade B, passed with scores greater than 59 marks and less than equal to 67 marks; 4 students had grade C and passed with scores less than or equal to 59 marks; 4students had grade D and failed

Analysis of J48 Decision Trees Generated in WEKA for the Five Universities
The Figures 3-7 shown in this research study illustrates the J48 decision trees generated in WEKA for the five universities. In this section, we provided a detailed explanation of the J48 tree generated in the Section 3.3 of this work. The J48 decision tree classifier shown in Figure 3 illustrates that 11 students had grade A and passed with scores greater than 69 marks; 6 students had grade B, passed with scores greater than 59 marks and less than equal to 69 marks; 7 students had grade C and passed with scores less than or equal to 59 marks; 7 students had grade D and failed with scores greater than 39 marks; and 9 students had grade F with scores less than or equal to 39 marks. In general, a total of twenty-four students were in the category of those who passed while total of sixteen students were in the category of those failed. The J48 decision tree classifier shown in Figure 4 illustrates that 9 students had grade A and passed with scores greater than 69 marks; 7 students had grade B, passed with scores greater than 59 marks and less than equal to 69 marks; 10 students had grade Cand passed with scores less than or equal to 59 marks; ten students had grade D and failed with scores greater than 39 marks; and 4 students had grade F with scores less than or equal to 39 marks. In general, a total of twenty-six students were in the category of those who passed while total of fourteen students were in the category of those who failed. The J48 decision tree classifier shown in Figure 5 illustrates that 2 students had grade A and passed with scores greater than 66 marks; 8 students had grade B, passed with scores greater than 59 marks and less than equal to 66 marks; 9 students had grade Cand passed with scores less than or equal to 59 marks; 12 students had grade D and failed with scores greater than 39 marks; and nine students had grade F with scores less than or equal to 39 marks. In general, a total of nineteen students were in the category of those who passed while total of 21 students were in the category of those who failed. The J48 decision tree classifier shown in Figure 6 illustrates that 13 students had grade A and passed with scores greater than 67 marks; 4 students had grade B, passed with scores greater than 59 marks and less than equal to 67 marks; 4 students had grade C and passed with scores less than or equal to 59 marks; 4 students had grade D and failed with scores greater than 37 marks; and 15 students had grade F with scores less than or equal to 37 marks. In general, a total of twenty-one students were in the category of those who passed while total of 19 students were in the category of those who failed. The J48 decision tree classifier shown in Figure 7 illustrates that 4 students had grade A and passed with scores greater than 69 marks; 9 students had grade B, passed with scores greater than 57 marks and less than equal to 69 marks; 10 students had grade Cand passed with scores less than or equal to 57 marks; 12students had grade D and failed with scores greater than 37 marks; and 5 students had grade F with scores less than or equal to 37 marks. In general, a total of twenty-three students were in the category of those who passed while total of seventeen students were in the category of those who failed.

Conclusions and Future Scope
As a result of the rapid increase in extraction of useful knowledge from data, data mining has significantly contributed to most educational institutions in many countries today. The test and prediction conducted on students' academic performance has really helped both learners and educators to improve their learning and teaching skills, respectively. This research work uses the WEKA data analytics platform to perform J48 classification algorithm on the students' result across five universities in five countries on the basis of the Execution time, TP rate, FP rate, Precision, Recall, ROC Area, PRC Area, MCC and the F-measure. WEKA took different attributes based on the stratified cross validation via the J 48 tree algorithm to obtain the correctly classified instances, the incorrectly classified instances and others (which includes the mean absolute, root mean squared, relative absolute and root relative squared) error values. Confusion matrixes were generated for the students' dataset with A, B, C, D and F representing the class labels. The Kappa values obtained from the analysis revealed a range of 0.907-0.9582, which is the perfect reading for most analytical values. Plots such as flow lines and Bar charts were generated on both the evaluation parameters and the attributes, respectively. We discovered that the J48 algorithm provided better results and, in future, we intend to extend our research using different parameters in a different analytic environment. Acknowledgments: The authors are indeed grateful to universities used as case study for providing their students' academic data for the success of this research.