A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning
Abstract
:1. Introduction
2. Related Works
2.1. Evaluation of Student Performance
2.2. Evaluation of Student Performance for Reading Ability
2.3. Evaluation of Student Performance for Grading
2.4. Evaluation of Student Performance for Dropout Prediction
2.5. Evaluation of Student Performance for Academic Achievement
3. Materials and Methods
- To make the right decisions and appropriate policies, what are the main factors and most significant features for evaluating student performance? Additionally, which of the three types of data, including demographics, behavior and grades, have the most influence?
- What is the best and most effective ML model for evaluating and classifying student performance that can determine a good boundary in the data?
- In order to make a more efficient and convenient model for predicting new instances, when we run models on data that only include the most important features, what impact on their performance occurs?
3.1. Dataset
3.2. Model Selection
3.2.1. Random Forest (RF)
3.2.2. Support Vector Machines (SVM)
3.2.3. Logistic Regression (LR)
3.2.4. Artificial Neural Network (ANN)
3.3. Feature Selection
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mondal, K. A Synergy of Artificial Intelligence and Education in the 21 st Century Classrooms. In Proceedings of the 2019 International Conference on Digitization (ICD), Sharjah, United Arab Emirates, 18–19 November 2019; pp. 68–70. [Google Scholar]
- Awad, M.; Khanna, R. Machine learning in action: Examples. In Efficient Learning Machines; Springer: Berlin/Heidelberg, Germany, 2015; pp. 209–240. [Google Scholar]
- Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Choi, S.-M. Asthma-prone areas modeling using a machine learning model. Sci. Rep. 2021, 11, 1912. [Google Scholar] [CrossRef] [PubMed]
- Sadeghi-Niaraki, A.; Mirshafiei, P.; Shakeri, M.; Choi, S.-M. Short-Term Traffic Flow Prediction Using the Modified Elman Recurrent Neural Network Optimized Through a Genetic Algorithm. IEEE Access 2020, 8, 217526–217540. [Google Scholar] [CrossRef]
- Karsenti, T. Artificial intelligence in education: The urgent need to prepare teachers for tomorrow’s schools. Form. Prof. 2019, 27, 112–116. [Google Scholar] [CrossRef] [Green Version]
- Fahimirad, M.; Kotamjani, S.S. A review on application of artificial intelligence in teaching and learning in educational contexts. Int. J. Learn. Dev. 2018, 8, 106–118. [Google Scholar] [CrossRef]
- Li, H.; Dai, T. Explore Deep Learning for Chinese Essay Automated Scoring. J. Phys. 2020, 1631, 012036. [Google Scholar] [CrossRef]
- Arianti, N.D.; Irfan, M.; Syaripudin, U.; Mariana, D.; Rosmawarni, N.; Maylawati, D.S. Porter Stemmer and Cosine Similarity for Automated Essay Assessment. In Proceedings of the 2019 5th International Conference on Computing Engineering and Design (ICCED), Singapore, 11–13 April 2019. [Google Scholar]
- Filho, A.H.; Do Prado, H.A.; Ferneda, E.; Nau, J. An Approach to Evaluate Adherence to the Theme and the Argumentative Structure of Essays. Procedia Comput. Sci. 2018, 126, 788–797. [Google Scholar] [CrossRef]
- Yun, W.H.; Lee, D.; Park, C.; Kim, J.; Kim, J. Automatic Recognition of Children Engagement from Facial Video Using Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 696–707. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Educational data mining: A survey from 1995 to 2005. Expert Syst. Appl. 2007, 33, 135–146. [Google Scholar] [CrossRef]
- Cornell-Farrow, S.; Garrard, R. Machine learning classifiers do not improve the prediction of academic risk: Evidence from Australia. Commun. Stat. Case Stud. Data Anal. Appl. 2020, 6, 228–246. [Google Scholar] [CrossRef]
- Silva, C.; Fonseca, J. Educational Data Mining: A Literature Review; Europe and MENA Cooperation Advances in Information and Communication Technologies; Springer: Cham, Switzerland, 2017; pp. 87–94. [Google Scholar]
- Romero, C.; Ventura, S. Data Mining in Education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 12–27. [Google Scholar] [CrossRef]
- Harvey, J.L.; Kumar, S.A.P. A Practical Model for Educators to Predict Student Performance in K-12 Education using Machine Learning. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 3004–3011. [Google Scholar]
- Cassano, R.; Costa, V.; Fornasari, T. An Effective National Evaluation System of Schools for Sustainable Development: A Comparative European Analysis. Sustainability 2019, 11, 195. [Google Scholar] [CrossRef] [Green Version]
- Clark, N. Education in Iran. World Education News Reviews. Available online: https://wenr.wes.org/2017/02/education-in-iran (accessed on 7 February 2017).
- Athani, S.S.; Kodli, S.A.; Banavasi, M.N.; Hiremath, P.G.S. Student performance predictor using multiclass support vector classification algorithm. In Proceedings of the 2017 International Conference on Signal Processing and Communication (ICSPC), Coimbatore, India, 28–29 July 2017; pp. 341–346. [Google Scholar]
- Chen, J.F.; Do, Q.H. Training neural networks to predict student academic performance: A comparison of cuckoo search and gravitational search algorithms. Int. J. Comput. Intell. Appl. 2014, 13, 1450005. [Google Scholar] [CrossRef]
- Costa, E.J.F.; Campelo, C.E.C.; Campos, L.M.R.S. Automatic Classification of Computational Thinking Skills in Elementary School Math Questions. In Proceedings of the 2019 IEEE Frontiers in Education Conference (FIE), Covington, KY, USA, 16–19 October 2019. [Google Scholar]
- Du, Y.; Yang, L. What affects the difficulty of Chinese syntax? In Proceedings of the 2019 International Conference on Asian Language Processing (IALP), Shanghai, China, 15–17 November 2019; pp. 71–74. [Google Scholar]
- Cai, C.Y.; Yan, K.; Lu, H.; Ye, M. Intelligent Placement Model Based on Decision Tree. In Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; pp. 837–841. [Google Scholar]
- Yagci, A.; Cevik, M. Prediction of academic achievements of vocational and technical high school (VTS) students in science courses through artificial neural networks (comparison of Turkey and Malaysia). Educ. Inf. Technol. 2019, 24, 2741–2761. [Google Scholar] [CrossRef]
- Saiful, M. Implementation of the Neural Network (NN) Algorithm in Analysis of Student Class Increment Data Based on Report Card Value. J. Phys. 2020, 1539, 012034. [Google Scholar]
- Chen, C.M.; Wang, J.Y.; Chen, Y.T.; Wu, J.H. Forecasting reading anxiety for promoting English-language reading performance based on reading annotation behavior. Interact. Learn. Environ. 2016, 24, 681–705. [Google Scholar] [CrossRef]
- Amir, O.; Gal, K.; Yaron, D.; Karabinos, M.; Belford, R. Plan recognition and visualization in exploratory learning environments. In Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2014; Volume 524, pp. 289–327. [Google Scholar]
- Ma, K.; Yang, L. Automatic Extraction and Quantitative Evaluation of the Character Relationship Networks from Children’s Literature works. In Proceedings of the 2019 International Conference on Asian Language Processing (IALP), Shanghai, China, 15–17 November 2019; pp. 188–193. [Google Scholar]
- Salim, Y.; Stevanus, V.; Barlian, E.; Sari, A.C.; Suhartono, D. Automated English Digital Essay Grader Using Machine Learning. In Proceedings of the 2019 IEEE International Conference on Engineering, Technology and Education (TALE), Yogyakarta, Indonesia, 10–13 December 2019. [Google Scholar]
- Saha, S.K.; Rao Ch, D. Development of a practical system for computerized evaluation of descriptive answers of middle school level students. Interact. Learn. Environ. 2019, 1–14. [Google Scholar] [CrossRef]
- Gil, J.S.; Delima, A.J.P.; Vilchez, R.N. Predicting students’ dropout indicators in public school using data mining approaches. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 774–778. [Google Scholar] [CrossRef]
- Nangia, S.; Anurag, J.; Gambhir, I. A Machine Learning Approach to Identify the Students at the Risk of Dropping Out of Secondary Education in India. In International Conference on Soft Computing and Signal Processing; Springer: Singapore, 2020; Volume 1118, pp. 557–569. [Google Scholar]
- Sansone, D. Beyond Early Warning Indicators: High School Dropout and Machine Learning. Oxf. Bull. Econ. Stat. 2019, 81, 456–485. [Google Scholar] [CrossRef]
- Şara, N.B.; Halland, R.; Igel, C.; Alstrup, S. High-school dropout prediction using machine learning: A Danish large-scale study. In 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015-Proceedings; Louvain-la-Neuve Ciaco: Bruges, Belgium, 2015; pp. 319–324. [Google Scholar]
- Kostopoulos, G.; Kotsiantis, S.; Verykios, V.S. A prognosis of junior high school students’ performance based on active learning methods. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2017; pp. 67–76. [Google Scholar]
- De Melo, G.; Vasconcelos Filho, E.P.; Oliveira, S.M.; Calixto, W.P.; Ferreira, C.C.; Furriel, G.P. Evaluation techniques of machine learning in task of reprovation prediction of technical high school students. In Proceedings of the 2017 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), Pucon, Chile, 18–20 October 2017; pp. 1–7. [Google Scholar]
- Figueiredo, M.; Lurdes Esteves, M.; Neves, J.; Vicente, H. Lab classes in chemistry learning an artificial intelligence view. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2014; Volume 299, pp. 565–575. [Google Scholar]
- Black, M.P.; Tepperman, J.; Narayanan, S.S. Automatic prediction of children’s reading ability for high-level literacy assessment. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 1015–1028. [Google Scholar] [CrossRef]
- Çınar, A.; Ince, E.; Gezer, M.; Yılmaz, Ö. Machine learning algorithm for grading open-ended physics questions in Turkish. Educ. Inf. Technol. 2020, 25, 3821–3844. [Google Scholar] [CrossRef]
- Costa-Mendes, R.; Oliveira, T.; Castelli, M.; Cruz-Jesus, F. A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach. Educ. Inf. Technol. 2021, 26, 1527–1547. [Google Scholar] [CrossRef]
- Çetinkaya, A.; Baykan, Ö.K. Prediction of middle school students’ programming talent using artificial neural networks. Eng. Sci. Technol. Int. J. 2020, 23, 1301–1307. [Google Scholar] [CrossRef]
- Coleman, C.; Baker, R.S.; Stephenson, S. A better cold-start for early prediction of student at-risk status in new school districts. In Proceedings of the International Conference on Educational Data Mining (EDM), Montreal, QC, Canada, 2–5 July 2019; pp. 732–737. [Google Scholar]
- Lee, S.; Chung, J.Y. The machine learning-based dropout early warning system for improving the performance of dropout prediction. Appl. Sci. 2019, 9, 3093. [Google Scholar] [CrossRef] [Green Version]
- Rebai, S.; Ben Yahia, F.; Essid, H. A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Econ. Plan. Sci. 2020, 70, 100724. [Google Scholar] [CrossRef]
- Aguiar, E.; Lakkaraju, H.; Bhanpuri, N.; Miller, D.; Yuhas, B.; Addison, K.L. Who, when, and why: A machine learning approach to prioritizing students at risk of not graduating high school on time. In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, Poughkeepsie, NY, USA, 16–20 March 2015; pp. 93–102. [Google Scholar]
- Xiao, Y.; Hu, J. Assessment of Optimal Pedagogical Factors for Canadian ESL Learner’s Reading Literacy Through Artificial Intelligence Algorithms. Int. J. Engl. Linguist. 2019, 9, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Taga, M.; Onishi, T.; Hirokawa, S. Automated Evaluation of Students Comments Regarding Correct Concepts and Misconceptions of Convex Lenses. In Proceedings of the 2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI), Yonago, Japan, 8–13 July 2018; pp. 273–277. [Google Scholar]
- Tomkins, S.; Ramesh, A.; Getoor, L. Predicting Post-Test Performance from Online Student Behavior: A High School MOOC Case Study. In Proceedings of the International Conference on Educational Data Mining (EDM), Raleigh, NC, USA, 29 June–2 July 2016; pp. 239–246. [Google Scholar]
- Aslan, S.; Cataltepe, Z.; Diner, I.; Dundar, O.; Esme, A.A.; Ferens, R.; Kamhi, G.; Oktay, E.; Soysal, C.; Yener, M. Learner Engagement Measurement and Classification in 1:1 Learning. In Proceedings of the 2014 13th International Conference on Machine Learning and Applications, Detroit, MI, USA, 3–6 December 2014; pp. 545–552. [Google Scholar]
- Hu, X. Automated recognition of thinking orders in secondary school student writings. Learn. Res. Pract. 2017, 3, 30–41. [Google Scholar] [CrossRef]
- Yousafzai, B.K.; Hayat, M.; Afzal, S. Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Educ. Inf. Technol. 2020, 25, 4677–4697. [Google Scholar] [CrossRef]
- Cruz-Jesus, F.; Castelli, M.; Oliveira, T.; Mendes, R.; Nunes, C.; Sa-Velho, M.; Rosa-Louro, A. Using artificial intelligence methods to assess academic achievement in public high schools of a European Union country. Heliyon 2020, 6, e04081. [Google Scholar] [CrossRef]
- Hung, J.L.; Rice, K.; Kepka, J.; Yang, J. Improving predictive power through deep learning analysis of K-12 online student behaviors and discussion board content. Inf. Discov. Deliv. 2020, 48, 199–212. [Google Scholar] [CrossRef]
- Sokkhey, P.; Okazaki, T. Comparative Study of Prediction Models for High School Student Performance in Mathematics. IEIE Trans. Smart Process. Comput. 2019, 8, 394–404. [Google Scholar] [CrossRef]
- Luis-Rico, I.; Escolar-Llamazares, M.C.; De la Torre-Cruz, T.; Jimenez, A.; Herrero, A.; Palmero-Camara, C.; Jimenez-Eguizabal, A. Entrepreneurial Interest and Entrepreneurial Competence Among Spanish Youth: An Analysis with Artificial Neural Networks. Sustainability 2020, 12, 1351. [Google Scholar] [CrossRef] [Green Version]
- Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Naser, J.A. Neural Networks; A Brief Introduction. In Proceedings of the American Power Conference, Chicago, IL, USA, 29 April–1 May 1991; Volume 53. [Google Scholar]
- Kursa, M.B.; Rudnicki, W.R. Feature selection with the Boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Abbas, J.; Aman, J.; Nurunnabi, M.; Bano, S. The impact of social media on learning behavior for sustainable education: Evidence of students from selected universities in Pakistan. Sustainability 2019, 11, 1683. [Google Scholar] [CrossRef] [Green Version]
- Sohlberg, M.M.; Mateer, C.A. Effectiveness of an attention-training program. J. Clin. Exp. Neuropsychol. 1987, 9, 117–130. [Google Scholar] [CrossRef]





| Demographics Data | Behavioral Data | Grades | |||||
|---|---|---|---|---|---|---|---|
| Variable | Group | Count | Variable | Group | Count | Variable | Group | 
| Gender | Male | 221 | Study time | Less t 1 | 23 | First math midterm | 0–20 | 
| Female | 237 | 1–2 | 127 | ||||
| Age | 15–18 | - | 2–3 | 177 | Second math midterm | 0–20 | |
| Family members | L | 301 | 3–4 | 82 | |||
| S | 157 | More t 4 | 49 | First science midterm | 0–20 | ||
| Family relation | Excellent | 118 | Supervision by school | Yes | 78 | ||
| Good | 253 | No | 380 | Second science midterm | 0–20 | ||
| Normal | 56 | Supervision by family | Yes | 256 | |||
| Bad | 21 | No | 202 | First language midterm | 0–20 | ||
| Worse | 10 | Higher edu | Yes | 404 | |||
| Guardian | Father | 319 | No | 54 | Second language midterm | 0–20 | |
| mother | 107 | Internet | Yes | 374 | |||
| other | 32 | No | 84 | First sports midterm | 0–20 | ||
| Mother Educational Background | No formal | 9 | Extra class | Yes | 137 | ||
| Primary. s | 54 | No | 285 | Second sports midterm | 0–20 | ||
| High. s | 140 | Work | Yes | 176 | |||
| Bachelor | 201 | No | 282 | First art midterm | 0–20 | ||
| Graduate | 55 | Usage of each internet, TV, game | Less t 1 | ||||
| Father Educational Background | No formal | 15 | 1–2 | Second art midterm | 0–20 | ||
| Primary. s | 86 | 2–3 | |||||
| High. s | 99 | 3–4 | |||||
| Bachelor | 197 | More t 4 | |||||
| Graduate | 62 | Free time | 1 | 59 | |||
| Mother-job | Education and social | 152 | 2 | 91 | |||
| Health | 146 | 3 | 169 | ||||
| Services | 61 | 4 | 104 | ||||
| Agriculture and env | 55 | 5 | 35 | ||||
| Engineering | 44 | Go-out rate | 1 | 27 | |||
| Father-job | Education and social | 229 | 2 | 119 | |||
| Health | 153 | 3 | 146 | ||||
| Services | 34 | 4 | 90 | ||||
| Agriculture and env | 25 | 5 | 76 | ||||
| Engineering | 17 | Friendship | 1 | 40 | |||
| Travel duration | 0 | 300 | 2 | 75 | |||
| 1 | 120 | 3 | 82 | ||||
| 2 | 33 | 4 | 102 | ||||
| 3 | 5 | 5 | 160 | ||||
| Absences | 0–10 | ||||||
| Attention | 0–100 | ||||||
| Feature Names | Ranking | Support | Feature Names | Ranking | Support | Feature Names | Ranking | Support | 
|---|---|---|---|---|---|---|---|---|
| Gender | 14 | FALSE | Sup-family | 18 | FALSE | First math midterm | 1 | TRUE | 
| Age | 11 | FALSE | Higher-edu | 17 | FALSE | Second math midterm | 1 | TRUE | 
| Family-members | 12 | FALSE | Internet | 19 | FALSE | First science midterm | 1 | TRUE | 
| Family-relation | 10 | FALSE | Extra class | 4 | FALSE | Second science midterm | 1 | TRUE | 
| Mother-edu | 6 | FALSE | Work | 3 | FALSE | First language midterm | 1 | TRUE | 
| Father-edu | 8 | FALSE | Internet use | 1 | TRUE | Second language midterm | 1 | TRUE | 
| Mother-job | 6 | FALSE | TV use | 1 | TRUE | First sports midterm | 1 | TRUE | 
| Father-job | 9 | FALSE | Do Game | 1 | TRUE | Second art midterm | 1 | TRUE | 
| Guardian | 15 | FALSE | Free time | 5 | FALSE | First art midterm | 1 | TRUE | 
| Travel-duration | 12 | FALSE | Go-out rate | 2 | FALSE | Second art midterm | 1 | TRUE | 
| Study time | 1 | TRUE | Absences | 1 | TRUE | Friendship | 1 | TRUE | 
| Sup-school | 16 | FALSE | Attention | 1 | TRUE | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zafari, M.; Sadeghi-Niaraki, A.; Choi, S.-M.; Esmaeily, A. A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning. Appl. Sci. 2021, 11, 11534. https://doi.org/10.3390/app112311534
Zafari M, Sadeghi-Niaraki A, Choi S-M, Esmaeily A. A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning. Applied Sciences. 2021; 11(23):11534. https://doi.org/10.3390/app112311534
Chicago/Turabian StyleZafari, Mostafa, Abolghasem Sadeghi-Niaraki, Soo-Mi Choi, and Ali Esmaeily. 2021. "A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning" Applied Sciences 11, no. 23: 11534. https://doi.org/10.3390/app112311534
APA StyleZafari, M., Sadeghi-Niaraki, A., Choi, S.-M., & Esmaeily, A. (2021). A Practical Model for the Evaluation of High School Student Performance Based on Machine Learning. Applied Sciences, 11(23), 11534. https://doi.org/10.3390/app112311534
 
         
                                                

 
       