Data Mining Approach to Predict Success of Secondary School Students: A Saudi Arabian Case Study
Abstract
:1. Introduction
2. Related Work
- The lack of studies in the KSA predicts the academic performance of high school students.
- Most of the studies in the KSA target the undergraduate level. However, the issues must be addressed earlier for better career counseling/adoption.
- Mainly studies focus on academic performance rather than demographic and academic factors.
- Most of the studies in the literature target urban areas students. However, in rural areas and suburbs, students face more issues which are the target of the ongoing study.
- NB, DT, and RF are among the most widely used algorithms in education data mining for success prediction.
- Thus, in the current study, their selection is based on their suitability to the EDM, dataset nature, and size.
- Moreover, it is observed that accuracy is the most widely used metric to evaluate the efficiency of the EDM algorithms in the literature.
- Most common demographic factors: gender, age, address, the relationship between mother and father, in addition to the age of father and mother, their work as well, place and type of residence.
- The most used academic factors were the semester grades and the subject grades and the final grade for the degree in addition to the mock score, the duration of the study, and the number of subjects in a year.
3. Description of the Proposed Techniques
3.1. Random Forests
3.2. J48
3.3. Naïve Bayes
4. Empirical Studies
4.1. Description of High School Student Dataset
Statistical Analysis of the Dataset
4.2. Experimental Setup
4.3. Dataset Collection
4.4. Dataset Pre-Processing
4.4.1. Digitization
4.4.2. Missing and Conflicting Values
4.4.3. Data Transformation
4.5. Data Augmentation
4.6. Feature Extraction
4.7. Optimization Strategy
- Accuracy is the result of dividing the number of true classified outcomes by the whole of classified instances. The accuracy is computed by the equation:
- Recall is the percentage of positive tweets that are properly determined by the model in the dataset. The recall calculated by [48]:
- Precision is the proportion of true positive tweets among all forecasted positive tweets. The equation of precision measure calculated by [48]:
- F-score is the harmonic mean of precision and recall. The F-score measure equation is [48]:
4.7.1. Random Forest
4.7.2. J48
4.7.3. Naïve Bayes
5. Result and Discussion
5.1. Results of Investigating the Effect of Balance Dataset Using SMOTE Technology
5.2. Results of Investigating the Effect of Feature Selection on the Dataset
5.3. Comparison of 10-Fold Cross-Validation and Direct Partition Results
5.4. Analysis of Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Grossman, P. Teaching Core Practices in Teacher Education; Harvard Education Press: Cambridge, UK, 2018. [Google Scholar]
- Quinn, M.A.; Rubb, S.D. The importance of education-occupation matching in migration decisions. Demography 2005, 42, 153–167. [Google Scholar] [CrossRef] [PubMed]
- Education in Saudi Arabia. Available online: https://en.wikipedia.org/wiki/Education_in_Saudi_Arabia (accessed on 30 January 2022).
- Smale-Jacobse, A.E.; Meijer, A.; Helms-Lorenz, M.; Maulana, R. Differentiated Instruction in Secondary Education: A Systematic Review of Research Evidence. Front. Psychol. 2019, 10, 2366. [Google Scholar] [CrossRef] [PubMed]
- Mosa, M.A. Analyze students’ academic performance using machine learning techniques. J. King Abdulaziz Univ. Comput. Inf. Technol. Sci. 2021, 10, 97–121. [Google Scholar]
- Aggarwal, V.B.; Bhatnagar, V.; Kumar, D.; Editors, M. Advances in Intelligent Systems and Computing, 654 Big Data Analytics; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Han, J.; Kamber, M.; Pei, J. Data Mining, 3rd ed.; Elsevier Science & Technology: Amsterdam, The Netherlands, 2012. [Google Scholar]
- Mathew, S.; Abraham, J.T.; Kalayathankal, S.J. Data mining techniques and methodologies. Int. J. Civ. Eng. Technol. 2018, 9, 246–252. [Google Scholar]
- Jackson, J. Data Mining; A Conceptual Overview. Commun. Assoc. Inf. Syst. 2002, 8, 19. [Google Scholar] [CrossRef]
- Yoon, S.; Taha, B.; Bakken, S. Using a data mining approach to discover behavior correlates of chronic disease: A case study of depression. Stud. Health Technol. Inform. 2014, 201, 71–78. [Google Scholar]
- Mamatha Bai, B.G.; Nalini, B.M.; Majumdar, J. Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care; Springer: Singapore, 2019. [Google Scholar]
- Othman, M.S.; Kumaran, S.R.; Yusuf, L.M. Data Mining Approaches in Business Intelligence: Postgraduate Data Analytic. J. Teknol. 2016, 78, 75–79. [Google Scholar] [CrossRef] [Green Version]
- Kokotsaki, D.; Menzies, V.; Wiggins, A. Durham Research Online Woodlands. Crit. Stud. Secur. 2014, 2, 210–222. [Google Scholar]
- Athani, S.S.; Kodli, S.A.; Banavasi, M.N.; Hiremath, P.G.S. Predictor using Data Mining Techniques. Int. Conf. Res. Innov. Inf. Syst. ICRIIS 2017, 1, 170–174. [Google Scholar]
- Salal, Y.K.; Abdullaev, S.M.; Kumar, M. Educational data mining: Student performance prediction in academic. Int. J. Eng. Adv. Technol. 2019, 8, 54–59. [Google Scholar]
- Yağci, A.; Çevik, M. Prediction of academic achievements of vocational and technical high school (VTS) students in science courses through artificial neural networks (comparison of Turkey and Malaysia). Educ. Inf. Technol. 2019, 24, 2741–2761. [Google Scholar] [CrossRef]
- Rebai, S.; Ben Yahia, F.; Essid, H. A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Plan. Sci. 2020, 70, 100–724. [Google Scholar] [CrossRef]
- Sokkhey, P.; Okazaki, T. Hybrid Machine Learning Algorithms for Predicting Academic Performance. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 32–41. [Google Scholar] [CrossRef] [Green Version]
- Adekitan, A.I.; Noma-Osaghae, E. Data mining approach to predicting the performance of first year student in a university using the admission requirements. Educ. Inf. Technol. 2019, 24, 1527–1543. [Google Scholar] [CrossRef]
- Alhassan, A.M. Using data Mining Techniques to Predict Students’ Academic Performance. Master Thesis, King Abdulaziz University, Jeddah, Saudi Arabia, 2020. [Google Scholar]
- Alyahyan, E.; Dusteaor, D. Decision Trees for Very Early Prediction of Student’s Achievement. In Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 13–15 October 2020. [Google Scholar]
- Pal, V.K.; Bhatt, V.K.K. Performance prediction for post graduate students using artificial neural network. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 446–454. [Google Scholar]
- Lin, A.; Wu, Q.; Heidari, A.A.; Xu, Y.; Chen, H.; Geng, W.; Li, Y.; Li, C. Predicting Intentions of Students for Master Programs Using a Chaos-Induced Sine Cosine-Based Fuzzy K-Nearest Neighbor Classifier. IEEE Access 2019, 7, 67235–67248. [Google Scholar] [CrossRef]
- Sánchez, A.; Vidal-Silva, C.; Mancilla, G.; Tupac-Yupanqui, M.; Rubio, J.M. Sustainable e-Learning by Data Mining—Successful Results in a Chilean University. Sustainability 2023, 15, 895. [Google Scholar] [CrossRef]
- Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
- Hu, C.; Chen, Y.; Hu, L.; Peng, X. A novel random forests based class incremental learning method for activity recognition. Pattern Recognit. 2018, 78, 277–290. [Google Scholar] [CrossRef]
- Pavlov, Y.L. Random Forests; De Gruyter: Zeist, The Netherlands, 2019; pp. 1–122. [Google Scholar]
- Paul, A.; Mukherjee, D.P.; Das, P.; Gangopadhyay, A.; Chintha, A.R.; Kundu, S. Improved Random Forest for Classification. IEEE Trans. Image Process. 2018, 27, 4012–4024. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble Methods in Machine Learning. In Proceedings of the International Workshop on Multiple Classifier Systems, Cagliari, Italy, 9–11 June 2000; pp. 1–15. [Google Scholar]
- Luo, C.; Wang, Z.; Wang, S.; Zhang, J.; Yu, J. Locating Facial Landmarks Using Probabilistic Random Forest. IEEE Signal Process. Lett. 2015, 22, 2324–2328. [Google Scholar] [CrossRef]
- Gall, J.; Lempitsky, V. Decision Forests for Computer Vision and Medical Image Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Paul, A.; Mukherjee, D.P. Reinforced quasi-random forest. Pattern Recognit. 2019, 94, 13–24. [Google Scholar] [CrossRef]
- Gholap, J. Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility. arXiv 2012. [Google Scholar] [CrossRef]
- Christopher, A.B.A.; Balamurugan, S.A.A. Prediction of warning level in aircraft accidents using data mining techniques. Aeronaut. J. 2014, 118, 935–952. [Google Scholar] [CrossRef] [Green Version]
- Aljawarneh, S.; Yassein, M.B.; Aljundi, M. An enhanced J48 classification algorithm for the anomaly intrusion detection systems. Clust. Comput. 2019, 22, 10549–10565. [Google Scholar] [CrossRef]
- Lewis, D.D. Naive (Bayes) at forty: The independence assumption in information retrieval. In Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science; Nédellec, C., Rouveirol, C., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; Volume 1398. [Google Scholar] [CrossRef] [Green Version]
- John, G.H.; Langley, P. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI’95), Montreal, QC, Canada, 18–20 August 1995; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 338–345. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. AAAI 2020, 34, 13001–13008. [Google Scholar] [CrossRef]
- Al-Azani, S.; El-Alfy, E.S.M. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text. Procedia Comput. Sci. 2017, 109, 359–366. [Google Scholar] [CrossRef]
- Kumar, V. Feature Selection: A literature Review. Smart Comput. Rev. 2014, 4. [Google Scholar] [CrossRef]
- Samuels, P.; Gilchrist, M.; Pearson Correlation. Stats Tutor, a Community Project. 2014. Available online: https://www.statstutor.ac.uk/resources/uploaded/pearsoncorrelation3.pdf (accessed on 21 July 2021).
- Doshi, M.; Chaturvedi, S.K. Correlation Based Feature Selection (CFS) Technique to Predict Student Performance. Int. J. Comput. Networks Commun. 2014, 6, 197–206. [Google Scholar] [CrossRef]
- Rahman, A.; Sultan, K.; Aldhafferi, N.; Alqahtani, A. Educational data mining for enhanced teaching and learning. J. Theor. Appl. Inf. Technol. 2018, 96, 4417–4427. [Google Scholar]
- Rahman, A.; Dash, S. Data Mining for Student’s Trends Analysis Using Apriori Algorithm. Int. J. Control Theory Appl. 2017, 10, 107–115. [Google Scholar]
- Rahman, A.; Dash, S. Big Data Analysis for Teacher Recommendation using Data Mining Techniques. Int. J. Control Theory Appl. 2017, 10, 95–105. [Google Scholar]
- Zaman, G.; Mahdin, H.; Hussain, K.; Rahman, A.U.; Abawajy, J.; Mostafa, S.A. An Ontological Framework for Information Extraction from Diverse Scientific Sources. IEEE Access 2021, 9, 42111–42124. [Google Scholar] [CrossRef]
- Alqarni, A.; Rahman, A. Arabic Tweets-Based Sentiment Analysis to Investigate the Impact of COVID-19 in KSA: A Deep Learning Approach. Big Data Cogn. Comput. 2023, 7, 16. [Google Scholar] [CrossRef]
- Basheer Ahmed, M.I.; Zaghdoud, R.; Ahmed, M.S.; Sendi, R.; Alsharif, S.; Alabdulkarim, J.; Albin Saad, B.A.; Alsabt, R.; Rahman, A.; Krishnasamy, G. A Real-Time Computer Vision Based Approach to Detection and Classification of Traffic Incidents. Big Data Cogn. Comput. 2023, 7, 22. [Google Scholar] [CrossRef]
- Nasir, M.U.; Khan, S.; Mehmood, S.; Khan, M.A.; Rahman, A.-U.; Hwang, S.O. IoMT-Based Osteosarcoma Cancer Detection in Histopathology Images Using Transfer Learning Empowered with Blockchain, Fog Computing, and Edge Computing. Sensors 2022, 22, 5444. [Google Scholar] [CrossRef]
- Nasir, M.U.; Zubair, M.; Ghazal, T.M.; Khan, M.F.; Ahmad, M.; Rahman, A.-U.; Al Hamadi, H.; Khan, M.A.; Mansoor, W. Kidney Cancer Prediction Empowered with Blockchain Security Using Transfer Learning. Sensors 2022, 22, 7483. [Google Scholar] [CrossRef]
- Rahman, A.-U.; Alqahtani, A.; Aldhafferi, N.; Nasir, M.U.; Khan, M.F.; Khan, M.A.; Mosavi, A. Histopathologic Oral Cancer Prediction Using Oral Squamous Cell Carcinoma Biopsy Empowered with Transfer Learning. Sensors 2022, 22, 3833. [Google Scholar] [CrossRef]
- Farooq, M.S.; Abbas, S.; Rahman, A.U.; Sultan, K.; Khan, M.A.; Mosavi, A. A Fused Machine Learning Approach for Intrusion Detection System. Comput. Mater. Contin. 2023, 74, 2607–2623. [Google Scholar]
- Rahman, A.U.; Dash, S.; Luhach, A.K.; Chilamkurti, N.; Baek, S.; Nam, Y. A Neuro-fuzzy approach for user behaviour classification and prediction. J. Cloud Comput. 2019, 8, 17. [Google Scholar] [CrossRef] [Green Version]
Ref. | Year | Algorithm Used | High Accuracy Achieved | Country | Dataset Size | Limitations |
---|---|---|---|---|---|---|
[14] | 2017 | NB | 87% | Portugal | 395 |
|
[15] | 2019 | NB, J48, RF, RT, REPTree, JRip, OneR, SL and ZeroR. | 76.7% | Portugal | 649 |
|
[16] | 2019 | ANN | ~96.9% | Malaysia and Turkey | 922 1050 |
|
[17] | 2020 | Regression Tree and RF | - | Tunisia | 105 |
|
[18] | 2020 | RF, C5.0, NB and SVM | 99.7% | Cambodia | 1204 |
|
[19] | 2019 | RF, Tree Ensemble, DT, NB, LR, and Resilient backpropagation | 51.9% | Nigeria | 1445 |
|
[20] | 2020 | DT, RF, sequence of minimum optimization, multi-layer perception, and LR | 72.4% | Saudi | 241 |
|
[21] | 2020 | J48, RT and REPTree | 69.3٪ | Saudi | 339 |
|
[22] | 2019 | ANN RF Linear regression | 97.749% | Portugal | 395 |
|
[23] | 2019 | CESCA-FKNN RF SVM kernel extreme | 82.47% | China | 702 |
|
[24] | 2023 | ANN, AdaBoost, NB, RF, J48 | 65.2% | Chile | 18,610 |
|
[25] | 2022 | RF, KNN, SVM, LR | 70–75% | Turkey | 1854 |
|
No | Attribute | Description | Domain |
---|---|---|---|
1 | Gender | Gender | female = 1 Male = 2 |
2 | Age | Age year of high school graduation | <18 years =1 18–20 years = 2 above 20 years = 3 |
3 | Social_status | Social status | Single = 1 Married = 2 |
4 | Specialization | Specialization | Scientific = 1 Literary = 2 Management = 3 |
5 | BS | The number of brothers and sisters | Less than or equal 1 = 1 From 2 to 5 = 2 Above 6 = 3 |
6 | Rank | Ranking among sibling | Eldest = 1 Middle child = 2 Youngest = 3 |
7 | FM_Relative | Relative relation between mother and father | Yes = 1 No = 2 |
8 | F_Age | Father’s Age | Less than 45 years = 1 Form 45–55 years = 2 above 55 years = 3 |
9 | F_Edu | Father’s Education | none = 1 Elementary and intermediate = 2 secondary = 3 Bachelors = 4 Postgraduate = 5 |
10 | Father_live | Does the father live with the family? | Yes = 1 No = 2 Dead = 3 |
11 | Father_Job | Father’s job | Works = 1 does not work = 2 retired = 3 |
12 | Mother_Age | Mothers Age | Less than 45 years = 1 Form 45–55 years = 2 above 55 years = 3 |
13 | Mother_Edu | Mothers Education | none = 1 Elementary and intermediate = 2 secondary = 3 Bachelors = 4 Postgraduate = 5 |
14 | Mother _Live | Does the mother live with the family? | Yes = 1 No = 2 Dead = 3 |
15 | Mother _Job | Mother’s job | Works = 1 does not work = 2 retired = 3 |
16 | Family_income | Family income | Less than 3000 = 1 From 3000 to 6000 = 2 From 7000 to 10,000 = 3 From 10,000–15,000 = 4 Above 15,000 = 5 |
17 | Acc_type | Accommodation type | Apartment = 1 Floor = 2 Villa = 3 |
18 | Rented_Acc | Rented accommodation | Yes = 1 No = 2 |
19 | Acc_place | Accommodation place | Village = 1 Residential scheme = 2 |
20 | GS_1 | Grade in semester 1 | From 90–100% = 1 From 89–80% = 2 From 79–70 = 3 Less than 70% = 4 |
21 | GS_2 | Grade in semester 2 | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
22 | GS_3 | Grade in semester 3 | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
23 | GS_4 | Grade in semester 4 | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
24 | GS_5 | Grade in semester 5 | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
25 | GS_6 | Grade in semester 6 | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
26 | Class | Final high school graduation rate | From 90–100% = 1 From 89–80% = 2 From 79–70% = 3 Less than 70% = 4 |
No | Attribute | Mean | Median | Standard Deviation | Maximum | Minimum |
---|---|---|---|---|---|---|
1 | Gender | 1.474286 | 1 | 0.499815 | 2 | 1 |
2 | Age | 1.967619 | 2 | 0.596506 | 4 | 1 |
3 | Ss | 1.135238 | 1 | 0.342304 | 2 | 1 |
4 | Sp | 1.601905 | 2 | 0.641739 | 3 | 1 |
5 | BS | 2.062857 | 2 | 0.616141 | 3 | 1 |
6 | Rank | 2.030476 | 2 | 0.69005 | 3 | 1 |
7 | Relative | 1.632381 | 2 | 0.482617 | 2 | 1 |
8 | Father_Age | 2.731429 | 3 | 1.054915 | 5 | 1 |
9 | Father_Edu | 3.607619 | 4 | 1.247264 | 6 | 1 |
10 | Father_live | 1.32 | 1 | 0.659979 | 3 | 1 |
11 | Father_Job | 1.664762 | 1 | 0.868224 | 3 | 1 |
12 | Mother_Age | 2.308571 | 2 | 0.959098 | 5 | 1 |
13 | Mother_Edu | 3.085714 | 3 | 1.314295 | 6 | 1 |
14 | Mother _Live | 1.245714 | 1 | 0.584962 | 3 | 1 |
15 | Mother _Job | 1.737143 | 2 | 0.541636 | 3 | 1 |
16 | F_income | 3.325714 | 4 | 1.208512 | 5 | 1 |
17 | Acc_type | 1.750476 | 2 | 0.762069 | 3 | 1 |
18 | Rented_A | 1.786667 | 2 | 0.410052 | 2 | 1 |
19 | Acc_place | 1.588571 | 2 | 0.492562 | 2 | 1 |
20 | GS_1 | 1.952381 | 2 | 0.948894 | 4 | 1 |
21 | GS_2 | 1.889524 | 2 | 0.936528 | 4 | 1 |
22 | GS_3 | 1.849524 | 2 | 0.911245 | 4 | 1 |
23 | GS_4 | 1.761905 | 1 | 0.896599 | 4 | 1 |
24 | GS_5 | 1.668571 | 1 | 0.904116 | 4 | 1 |
25 | GS_6 | 1.55619 | 1 | 0.788264 | 4 | 1 |
26 | Class | 1.84381 | 2 | 0.974084 | 4 | 1 |
Class | Data Augmentation | |
---|---|---|
Before | After | |
A | 250 | 305 |
B | 153 | 306 |
C | 76 | 304 |
D | 46 | 306 |
Total | 526 | 1221 |
No | Attribute | Correlation Coefficient |
---|---|---|
1 | GS_4 | 0.4245 |
2 | GS_3 | 0.3984 |
3 | GS_1 | 0.3979 |
4 | GS_5 | 0.3913 |
5 | GS_2 | 0.3619 |
6 | GS_6 | 0.3453 |
7 | Acc_place | 0.2983 |
8 | Family_income | 0.2076 |
9 | BS | 0.2044 |
10 | M_live | 0.1849 |
11 | F_job | 0.1721 |
12 | Acc_type | 0.1651 |
13 | M_job | 0.1453 |
14 | M_edu | 0.145 |
15 | Social_status | 0.1415 |
16 | F_edu | 0.1327 |
17 | F_age | 0.1297 |
18 | FM_Relative | 0.1255 |
19 | M_age | 0.1183 |
20 | Gender | 0.1041 |
21 | Rented_Acc | 0.0961 |
22 | F_live | 0.0934 |
23 | Specialization | 0.0931 |
24 | Rank | 0.087 |
25 | Age | 0.068 |
Parameters | Optimal Value | Accuracy | Optimal Value | Accuracy |
---|---|---|---|---|
10-Fold | 75:25 Split | |||
numIterations | 50 | 95.62% | 50 | 95.42% |
seed | 3 | 1 |
Parameters | Optimal Value | Accuracy | Optimal Value | Accuracy |
---|---|---|---|---|
10-Fold | 75:25 Split | |||
confidenceFactor | 0.55 | 92.00% | 0.1 | 91.60% |
miniNumObj | 7 | 10 |
Type of Dataset | RF | J48 | NB | |||
---|---|---|---|---|---|---|
10-Fold | 75:25 Split | 10-Fold | 75:25 Split | 10-Fold | 75:25 Split | |
Imbalance Dataset | 95.62% | 95.42% | 92.00% | 91.60% | 96.38% | 98.47% |
Balance Dataset | 98.20% | 98.69% | 94.92% | 97.70% | 97.54% | 98.69% |
Feature Selected | RF | J48 | NB | |||
---|---|---|---|---|---|---|
10-Fold | 75:25 Split | 10-Fold | 75:25 Split | 10-Fold | 75:25 Split | |
All | 98.2% | 98.69% | 94.92% | 97.70% | 97.54% | 98.69% |
Class, GS_4, GS_3, GS_1, GS_5, GS_2, GS_6, Acc_place, Family_income, BS, M_live, F_job, Acc_type, M_job | 97.13% | 97.38% | 94.92% | 96.07% | 97.30% | 99.34% |
Class, GS_4, GS_3, GS_1 GS_5, GS_2 GS_6 | 96.48% | 97.38% | 95.33% | 97.70% | 96.31% | 98.03% |
Class, GS_4, GS_3, GS_1 | 93.20% | 94.75% | 92.22% | 94.43% | 94.27% | 96.72% |
Class, GS_4 | 79.20% | 80.33% | 79.20% | 80.33% | 79.20% | 80.33% |
Class | 25.06% | 21.97% | 24.65% | 21.97% | 24.65% | 21.97% |
RF | J48 | NB | |
---|---|---|---|
10-fold cross-validation | 98.2% | 95.33% | 97.54% |
75:25 Direct partition | 98.69% | 97.70% | 99.34% |
Metrics | RF | J48 | NB | |||
---|---|---|---|---|---|---|
10-Fold | 75:25 Split | 10-Fold | 75:25 Split | 10-Fold | 75:25 Split | |
TP | 0.982 | 0.987 | 0.953 | 0.977 | 0.975 | 0.993 |
FP | 0.006 | 0.004 | 0.016 | 0.007 | 0.008 | 0.002 |
Precision | 0.982 | 0.987 | 0.953 | 0.977 | 0.976 | 0.993 |
Recall | 0.982 | 0.987 | 0.953 | 0.977 | 0.975 | 0.993 |
F-Measure | 0.982 | 0.987 | 0.953 | 0.977 | 0.975 | 0.993 |
ROC Area | 0.999 | 1.000 | 0.989 | 0.998 | 0.997 | 1 |
Accuracy | 98.2% | 98.69% | 95.33 | 97.70% | 97.54% | 99.34% |
RF | ||||||||
10-Fold | 75% Split | |||||||
A | B | C | D | A | B | C | D | |
A = 1 | 296 | 9 | 0 | 0 | 66 | 1 | 0 | 0 |
B = 2 | 9 | 296 | 1 | 0 | 3 | 74 | 0 | 0 |
C = 3 | 0 | 2 | 301 | 1 | 0 | 0 | 80 | 0 |
D = 4 | 0 | 0 | 0 | 306 | 0 | 0 | 0 | 81 |
J48 | ||||||||
10-Fold | 75% Split | |||||||
A | B | C | D | A | B | C | D | |
A = 1 | 289 | 26 | 0 | 0 | 65 | 2 | 0 | 0 |
B = 2 | 26 | 279 | 1 | 0 | 4 | 73 | 0 | 0 |
C = 3 | 0 | 3 | 300 | 1 | 0 | 1 | 79 | 0 |
D = 4 | 0 | 0 | 0 | 306 | 0 | 0 | 0 | 81 |
NB | ||||||||
10-Fold | 75% Split | |||||||
A | B | C | D | A | B | C | D | |
A = 1 | 298 | 7 | 0 | 0 | 66 | 1 | 0 | 0 |
B = 2 | 14 | 292 | 0 | 0 | 1 | 76 | 0 | 0 |
C = 3 | 0 | 7 | 295 | 2 | 0 | 0 | 80 | 0 |
D = 4 | 0 | 0 | 0 | 306 | 0 | 0 | 0 | 81 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alghamdi, A.S.; Rahman, A. Data Mining Approach to Predict Success of Secondary School Students: A Saudi Arabian Case Study. Educ. Sci. 2023, 13, 293. https://doi.org/10.3390/educsci13030293
Alghamdi AS, Rahman A. Data Mining Approach to Predict Success of Secondary School Students: A Saudi Arabian Case Study. Education Sciences. 2023; 13(3):293. https://doi.org/10.3390/educsci13030293
Chicago/Turabian StyleAlghamdi, Amnah Saeed, and Atta Rahman. 2023. "Data Mining Approach to Predict Success of Secondary School Students: A Saudi Arabian Case Study" Education Sciences 13, no. 3: 293. https://doi.org/10.3390/educsci13030293
APA StyleAlghamdi, A. S., & Rahman, A. (2023). Data Mining Approach to Predict Success of Secondary School Students: A Saudi Arabian Case Study. Education Sciences, 13(3), 293. https://doi.org/10.3390/educsci13030293