Predicting Student Success in English Tests Using Artificial Intelligence Algorithm

Huynh-Cam, Thao-Trang; Truong, Dat Tan; Chen, Long-Sheng; Lu, Tzu-Chuen; Nalluri, Venkateswarlu

doi:10.3390/engproc2025098019

Open AccessProceeding Paper

Predicting Student Success in English Tests Using Artificial Intelligence Algorithm^†

by

Thao-Trang Huynh-Cam

¹

,

Dat Tan Truong

²

,

Long-Sheng Chen

^1,3,*

,

Tzu-Chuen Lu

¹

and

Venkateswarlu Nalluri

¹

Department of Information Management, Chaoyang University of Technology, Taichung 413310, Taiwan

²

Department of Educational Management, Faculty of Social Sciences, Dong Thap University, Cao Lanh City 81118, Vietnam

³

Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei 106344, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at the 2024 4th International Conference on Social Sciences and Intelligence Management (SSIM 2024), Taichung, Taiwan, 20–22 December 2024.

Eng. Proc. 2025, 98(1), 19; https://doi.org/10.3390/engproc2025098019

Published: 20 June 2025

(This article belongs to the Proceedings of 2024 4th International Conference on Social Sciences and Intelligence Management (SSIM 2024))

Download

Browse Figures

Versions Notes

Abstract

In Vietnam, English proficiency is a graduation requirement and offers students great opportunities to win scholarships and employability after graduation. Universities in the Mekong Delta region (MDR) often face challenges in foresting students’ English proficiency despite continuous assistance offered. Although students have taken online supplementary courses (OSC) delivered through e-learning systems to support their English formal classes for several years, students’ successes in English tests with such supplementary courses and the predictors of this issue remain unknown. Therefore, we developed a model to predict students’ success in English final tests based on behaviors and grades in OSC using logistic regression (LR) and classification and regression tree (CART) classifiers. A total of 109 students of OSC in a target university in MDR participated in this study, and the result showed that CART (area under the curve (AUC) = 89.3%) was slightly better than LR. The outcomes of this study contribute to students’ success in English tests and the enhancement of the effectiveness of online supplementary courses for English improvements.

Keywords:

student success; English tests; prediction models; important factors for student success; artificial intelligence algorithms; Mekong Delta region in Vietnam

1. Introduction

In Vietnam, English proficiency is a requirement for graduation from college and offers students great opportunities to win scholarships to study abroad and be employed after graduation. In academics, English is mandatory in the academic curriculum at all educational levels from compulsory education to universities and foreign language centers [1,2,3]. University students are required to obtain at least a Level 3 (B1) certificate of English Proficiency aside from the degree diploma [3,4]. In workplaces, English proficiency is a compulsory requirement for graduates’ employability [3,5].

To satisfy the increasing demands for English learning, English courses, short- and long-term training programs, and tests are offered nationwide [2,3]. Yearly, universities in the Mekong Delta region (MDR) offer diverse English courses and high-quality English teaching and learning environments [2]. However, in practice, many students have gained low English proficiency even though they have passed the English entrance examination. This results from low English exposure, teacher-dependent learning habits, and limited time in crowded classes [2].

Universities often face challenges in foresting students’ English language proficiency despite continuously offering students various English supplementary courses in e-learning systems. Although students take these supplementary courses in addition to their English formal classes for several years, student success in English tests and the predictors of this issue remain unknown.

Big data, including students’ information, academic results, and examination marks and grades in different courses and programs recorded in the school database, is used to obtain information related to student success [6]. For instance, Gardner et al. [7] investigated educational assessment in automated essay scoring systems and computerized adaptive tests. Reference [8] explored the impacts of education level, gender, place, and course attendance on the TOEFL iBT scores of listening, reading, and writing skills of university students in Egypt whose first language was Arabic. Reference [2] measured the effectiveness of MOODLE-based assessment in English reading and listening skills in a Vietnamese university in MDR.

Recently, artificial intelligence (AI) algorithms have been used widely in student assessments due to their accurate and automated explainable abilities. For example, Bujang et al. [5] utilized a decision tree (DT), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (KNN), logistic regression (LR), and random forest (RF) to predict student grades in the first-semester courses in Malaysia. RF, XGBoost, SVM, and voting algorithms were also used to forecast the pre-English course performance of students at the International School of the Vietnam National University in Hanoi, Vietnam [9]. Following this trend, we used DT (classification and regression trees, CART) and LR to predict student success in English tests in this study.

Although research on student success in English has received theoretical and practical attention in non-English-speaking countries, researchers only emphasized student success in international learning environments, where English is strongly motivated in daily life and academics. For the first time, we studied the university student’s success in English tests with supplementary online courses in the e-learning system in MDR, where English is at a lower motivation level, using AI algorithms. The outcomes contribute to student success in English tests and provide a reference to make suggestions to enhance the effectiveness of online supplementary courses for English improvements.

2. Materials and Methods

Research Method

This study was conducted at Dong Thap University located in MDR. Non-majoring English students in this university are required to take English preparation courses and pass the English Proficiency Tests (EPTs) for graduation. However, the average success percentage is low every year. Therefore, the university has continuously provided various supporting programs. Online supplementary courses (OSC) delivered through e-learning systems are one of these programs to support their English formal classes. The main purpose of these courses is to improve student learning autonomy and computer skills. In these courses, students access sample practice quizzes/tests.

There is one pre-test and fifteen practice tests in the authentic EPT frameworks. The test score ranges from 0 to 10. The time allotted for each test is 100 min, including review and submission time. After submitting these practice tests, students receive immediate test scores without lengthy waits for grading periods from teachers and can review their mistakes through automated feedback and explanation functions, which are available in the language management system. Although these tests are taken repeatedly based on students’ needs, the score of the first attempt was recorded for further study in this research.

Figure 1 depicts the 5-step research steps employed in this study.

Step 1: Data collection and cleaning

The dataset was obtained from the database of a foreign languages center of a Vietnamese university in MDR in Semester 1 of the 2021–2022 academic year. Students’ identities were anonymized for ethical purposes. Since we emphasized student success in English tests, only students who had taken English supplementary online courses in the e-learning system were selected. After collecting data, we used Microsoft Excel 2016 to clean and transform the original data into available data. Then, all missing and unrelated values (i.e., name, ID, enrolled class, etc.) were removed. We transformed category values into numeric/binary ones and normalized data. After cleaning, the final data included 109 students’ information. Among them, 62 (56.9%) passed the English final examination, while 47 (43.1%) failed. The input and output factors are described in Table 1. Figure 2 shows the correlation matrix among them.

Step 2: Prediction model constructions

CART and LR classifiers were used to construct prediction models on the Jupyter Notebook tool, version 6.5.4 with Scikit-learn packages in Python 3 (ipykernel). The dataset was divided into training and testing data with the ratio shown in Table 2. Each model was constructed using these training–testing datasets. The mean value and standard deviation (SD) of each model were used for comparing prediction accuracy.

For better performance, we used “Pass/Fail”, which was transformed from the English final score as a class label to conduct two classification cases. Case A used original datasets, while Case B oversampled the class “fail”. Figure 3 shows two cases of data classification.

Step 3: Evaluation

We calculated accuracy, F1, the area under the curve (AUC), receiver operating characteristic (ROC), and a confusion matrix to evaluate the model performance. The values of these methods ranged from 0 to 1, where 1 refers to excellent prediction performance [10].

Step 4: Comparison

After comparing the prediction results of two classification cases and classifiers, we selected the best case and classifier to retrieve important factors associated with student success in English tests.

Step 5. Discussion and Conclusions

The research results were compared, and contributions to theories and practices of predictions on student success in English tests were verified on a basic map. Solutions for enhancing the success of students and online English supplement courses were proposed based on these extracted factors.

3. Results

3.1. Classification

Table 3 presents the classification results of two data classification cases. The accuracy, F1, and AUC of Case B were higher than those of Case A. Although CART and LR classifiers showed the same overall accuracy and F1 results, CART was slightly better than LR with an AUC of 89.33%. Hence, CART was used for further analysis.

For in-depth comparison, a confusion matrix was computed to evaluate the classifiers’ performance. Figure 4 shows the confusion matrix results in two classification cases. The value increased from 0.556 in Case A to 0.923 in Case B, indicating that Case B was better than Case A. The CART classifier in Case B predicted the “Pass” class as 1.0 and the “Fail” class as 0.923. The CART model correctly predicted passing and failing students in Case B. Hence, CART was used for important feature selection.

3.2. Feature Selection

Figure 5 displays the rankings of feature importance from the CART model constructed in Case B. Four factors of “Quiz 7 score” (F10), “Quiz 3 score” (F6), “Quiz 11 score” (F14), and “Number of quizzes completed in the online supplement courses” (F1) significantly impacted student success in the English final examination.

4. Discussion and Conclusions

Table 4 lists important factors. The testing content and difficulty level among the provided test sets might be different.

Vietnamese universities need to provide additional support to students who are expected to pass the English final tests and better prepare for examinations. For example, schools and English centers must offer practice tests and/or mock tests in online supplementary courses to minimize students’ ill-preparation and lack of experience with online tests caused by low levels of technology and computer skills [2]. Importantly, sample practice and mock tests must be aligned with national English standards to help students feel familiar with the real tests. Additionally, English centers and teachers must frequently double-check the difficulty level among quizzes provided in these supplementary courses. The testing contents, quiz items, and answer keys also need to be assessed frequently for reliability and validity assurances.

English proficiency is important for students and universities in non-native English-speaking countries. It is a graduation requirement and a great opportunity for better scholarship and high-income employment. Despite continuously offering students various English courses and tests such as supplementary online courses via e-learning systems, the real status of student success in English tests and the predictors of this issue remain unknown. By using CART and LR, successful and unsuccessful students in English supplementary online courses can be classified. The results show that CART is better than LR, with an AUC of approximately 90%. The scores of quizzes 7, 3, and 11, and the number of quizzes students completed in these courses, significantly affected students’ pass or fail rate of the English final tests.

This study’s results significantly contribute to formulating theories and practices in online supplementary course designs for student success in English tests. From a theoretical perspective, an effective feature selection method was identified for teaching and learning English in non-native English-speaking countries, including Vietnam in MDR. For better accuracy performance, future research is necessary to address imbalanced data issues although the classes in the datasets were equal. Additionally, more evaluation metrics other than F1, AUC, and a confusion matrix must be applied. From a practical perspective, more predictors for student success in English tests need to be developed through sample practice tests in supplementary online courses.

Even though the current study contributes to the theories and practices of predictions on student success in English tests, limitations must be addressed. First, we only considered the scores of examinations provided in the online supplementary courses on an e-learning system as inputs. In future research, it is necessary to apply other factors to determine better predictions. Moreover, the research data of this study were collected from a single foreign language center of a single Vietnamese university; therefore, the results might not be generalized, necessitating the inclusion of more data.

Author Contributions

Conceptualization, T.-T.H.-C., L.-S.C. and D.T.T.; methodology, T.-T.H.-C. and L.-S.C.; software, T.-T.H.-C.; validation, T.-T.H.-C., T.-C.L. and L.-S.C.; formal analysis, T.-T.H.-C.; investigation, D.T.T.; resources, D.T.T.; data curation, T.-T.H.-C. and D.T.T.; writing—original draft preparation, T.-T.H.-C.; writing—review and editing, L.-S.C. and V.N.; visualization, V.N.; supervision, L.-S.C.; project administration, D.T.T. and T.-C.L.; funding acquisition, D.T.T. and T.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the lowest risk. Any risk suffered by the research subjects is not higher than those who do not participate in the study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data is unavailable due to privacy.

Acknowledgments

We are grateful Dong Thap University, Vietnam and Chaoyang University of Technology, Taiwan for providing access to their facilities, which allow us to conduct the experiments reported in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nguyen, T.N. Thirty years of English language and English education in Vietnam. Eng. Today 2017, 33, 33–35. [Google Scholar] [CrossRef]
Huynh-Cam, T.T.; Agrawal, S.; Chen, L.S.; Fan, T.L. Using MOODLE-based e-assessment in English listening and reading courses: A Vietnamese case study. J. Inst. Res. South East Asia 2021, 19, 66–92. [Google Scholar]
Ngo, M.T.; Tran, L.T. Current English education in Vietnam: Policy, practices, and challenges. In English Language Education for Graduate Employability in Vietnam; Springer Nature: Singapore, 2023; pp. 49–69. [Google Scholar]
Ministry of Education and Training (MOET). Suggestions for Implementing the National Foreign Language Project 2020 in 2024 at the Local Units. Available online: https://thuvienphapluat.vn/cong-van/Giao-duc/Cong-van-259-BGDDT-DANN-2023-de-xuat-trien-khai-De-an-Ngoai-ngu-Quoc-gia-tai-don-vi-552539.aspx (accessed on 11 November 2024).
Prime Minister. Approving, Revising, and Amending the National Foreign Language Teaching and Learning Project in the National Education System for the Period 2017–2025. Available online: https://datafiles.chinhphu.vn/cpp/files/vbpq/2017/12/2080.signed.pdf (accessed on 11 November 2024).
Bujang, S.D.A.; Selamat, A.; Ibrahim, R.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H.; Ghani, N.A.M. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021, 9, 95608–95621. [Google Scholar] [CrossRef]
Gardner, J.; O’Leary, M.; Yuan, L. Artificial intelligence in educational assessment: ‘Breakthrough? Or buncombe and ballyhoo?’. J. Comp. Assis. Learn. 2021, 37, 1207–1216. [Google Scholar] [CrossRef]
Hassan, K.M.; Khafagy, M.H.; Thabet, M. Mining educational data to analyze the student’s performance in TOEFL iBT Reading, Listening and Writing Scores. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 327–334. [Google Scholar] [CrossRef]
Quynh, T.D.; Dong, N.D.; Thuan, N.Q. A case study of student performance predictions in English course: The data mining approach. In International Congress on Information and Communication Technology; Springer Nature: Singapore, 2024; pp. 419–429. [Google Scholar]
Huynh-Cam, T.T.; Chen, L.S.; Lu, T.C. Early prediction models and crucial factor extraction for first-year undergraduate student dropouts. J. Appl. Res. High. Educ. 2025, 17, 624–639. [Google Scholar] [CrossRef]

Figure 1. Research workflow of this study.

Figure 2. Correlation matrix.

Figure 3. Two cases of data classification.

Figure 4. Confusion matrices of two classification cases.

Figure 5. Important features from CART.

Table 1. Input and output factor description.

Factor ID	Factors	Factor Description and Transformed Values
F1	Number of quizzes students completed in the OSC	5–15
F2	Students complete the OSC	1 = Yes, 0 = No
F3	Pre-test score	0–10
F4	Quiz 1 score	0–7.71
F5	Quiz 2 score	0–8.57
F6	Quiz 3 score	0–9.71
F7	Quiz 4 score	0–9.14
F8	Quiz 5 score	0–9.43
F9	Quiz 6 score	0–9.43
F10	Quiz 7 score	0–9.14
F11	Quiz 8 score	0–9.14
F12	Quiz 9 score	0–9.44
F13	Quiz 10 score	0–10
F14	Quiz 11 score	0–9.71
F15	Quiz 12 score	0–10
F16	Quiz 13 score	0–10
F17	Quiz 14 score	0–9.71
F18	Quiz 15 score	0–10
Output: Pass	English final exam scores	1 = Pass: ≥5~10; 0 = Fail: <5.0

Table 2. Data division for prediction models.

Dataset	Number of Students	Percentage
Training set	87	80%
Testing set	22	20%
Total	109	100%

Table 3. Classification results for two cases.

Classification Cases	Classifier	Prediction Performance (%)
		Overall Accuracy		F1		AUC
		Mean	SD	Mean	SD	Mean	SD
Case A: Original datasets	CART	69.50	7.45	59.17	8.59	65.67	6.65
Case A: Original datasets	LR	68.33	5.85	60.17	8.64	81.50	6.38
Case B: Oversampling datasets	CART	74.67	7.87	76.50	10.99	89.33	6.02
Case B: Oversampling datasets	LR	74.67	7.87	76.50	10.99	73.67	4.93

Table 4. Rank order of the most important factors for student success in English final tests.

Rank Order	Factors
1	F10. Quiz 7 score
2	F6. Quiz 3 score
3	F14. Quiz 11 score
4	F1. Number of quizzes completed in the OSC

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huynh-Cam, T.-T.; Truong, D.T.; Chen, L.-S.; Lu, T.-C.; Nalluri, V. Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Eng. Proc. 2025, 98, 19. https://doi.org/10.3390/engproc2025098019

AMA Style

Huynh-Cam T-T, Truong DT, Chen L-S, Lu T-C, Nalluri V. Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Engineering Proceedings. 2025; 98(1):19. https://doi.org/10.3390/engproc2025098019

Chicago/Turabian Style

Huynh-Cam, Thao-Trang, Dat Tan Truong, Long-Sheng Chen, Tzu-Chuen Lu, and Venkateswarlu Nalluri. 2025. "Predicting Student Success in English Tests Using Artificial Intelligence Algorithm" Engineering Proceedings 98, no. 1: 19. https://doi.org/10.3390/engproc2025098019

APA Style

Huynh-Cam, T.-T., Truong, D. T., Chen, L.-S., Lu, T.-C., & Nalluri, V. (2025). Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Engineering Proceedings, 98(1), 19. https://doi.org/10.3390/engproc2025098019

Article Menu

Predicting Student Success in English Tests Using Artificial Intelligence Algorithm^†

Abstract

1. Introduction