Next Article in Journal
Ion Exchange of Na+ Ions with H+ Ions on ZSM-5 Zeolite Using Acetic Acid
Previous Article in Journal
Retail Service Quality Assessment Using Interval-Valued Pythagorean Fuzzy Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Predicting Student Success in English Tests Using Artificial Intelligence Algorithm †

by
Thao-Trang Huynh-Cam
1,
Dat Tan Truong
2,
Long-Sheng Chen
1,3,*,
Tzu-Chuen Lu
1 and
Venkateswarlu Nalluri
1
1
Department of Information Management, Chaoyang University of Technology, Taichung 413310, Taiwan
2
Department of Educational Management, Faculty of Social Sciences, Dong Thap University, Cao Lanh City 81118, Vietnam
3
Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei 106344, Taiwan
*
Author to whom correspondence should be addressed.
Presented at the 2024 4th International Conference on Social Sciences and Intelligence Management (SSIM 2024), Taichung, Taiwan, 20–22 December 2024.
Eng. Proc. 2025, 98(1), 19; https://doi.org/10.3390/engproc2025098019
Published: 20 June 2025

Abstract

:
In Vietnam, English proficiency is a graduation requirement and offers students great opportunities to win scholarships and employability after graduation. Universities in the Mekong Delta region (MDR) often face challenges in foresting students’ English proficiency despite continuous assistance offered. Although students have taken online supplementary courses (OSC) delivered through e-learning systems to support their English formal classes for several years, students’ successes in English tests with such supplementary courses and the predictors of this issue remain unknown. Therefore, we developed a model to predict students’ success in English final tests based on behaviors and grades in OSC using logistic regression (LR) and classification and regression tree (CART) classifiers. A total of 109 students of OSC in a target university in MDR participated in this study, and the result showed that CART (area under the curve (AUC) = 89.3%) was slightly better than LR. The outcomes of this study contribute to students’ success in English tests and the enhancement of the effectiveness of online supplementary courses for English improvements.

1. Introduction

In Vietnam, English proficiency is a requirement for graduation from college and offers students great opportunities to win scholarships to study abroad and be employed after graduation. In academics, English is mandatory in the academic curriculum at all educational levels from compulsory education to universities and foreign language centers [1,2,3]. University students are required to obtain at least a Level 3 (B1) certificate of English Proficiency aside from the degree diploma [3,4]. In workplaces, English proficiency is a compulsory requirement for graduates’ employability [3,5].
To satisfy the increasing demands for English learning, English courses, short- and long-term training programs, and tests are offered nationwide [2,3]. Yearly, universities in the Mekong Delta region (MDR) offer diverse English courses and high-quality English teaching and learning environments [2]. However, in practice, many students have gained low English proficiency even though they have passed the English entrance examination. This results from low English exposure, teacher-dependent learning habits, and limited time in crowded classes [2].
Universities often face challenges in foresting students’ English language proficiency despite continuously offering students various English supplementary courses in e-learning systems. Although students take these supplementary courses in addition to their English formal classes for several years, student success in English tests and the predictors of this issue remain unknown.
Big data, including students’ information, academic results, and examination marks and grades in different courses and programs recorded in the school database, is used to obtain information related to student success [6]. For instance, Gardner et al. [7] investigated educational assessment in automated essay scoring systems and computerized adaptive tests. Reference [8] explored the impacts of education level, gender, place, and course attendance on the TOEFL iBT scores of listening, reading, and writing skills of university students in Egypt whose first language was Arabic. Reference [2] measured the effectiveness of MOODLE-based assessment in English reading and listening skills in a Vietnamese university in MDR.
Recently, artificial intelligence (AI) algorithms have been used widely in student assessments due to their accurate and automated explainable abilities. For example, Bujang et al. [5] utilized a decision tree (DT), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (KNN), logistic regression (LR), and random forest (RF) to predict student grades in the first-semester courses in Malaysia. RF, XGBoost, SVM, and voting algorithms were also used to forecast the pre-English course performance of students at the International School of the Vietnam National University in Hanoi, Vietnam [9]. Following this trend, we used DT (classification and regression trees, CART) and LR to predict student success in English tests in this study.
Although research on student success in English has received theoretical and practical attention in non-English-speaking countries, researchers only emphasized student success in international learning environments, where English is strongly motivated in daily life and academics. For the first time, we studied the university student’s success in English tests with supplementary online courses in the e-learning system in MDR, where English is at a lower motivation level, using AI algorithms. The outcomes contribute to student success in English tests and provide a reference to make suggestions to enhance the effectiveness of online supplementary courses for English improvements.

2. Materials and Methods

Research Method

This study was conducted at Dong Thap University located in MDR. Non-majoring English students in this university are required to take English preparation courses and pass the English Proficiency Tests (EPTs) for graduation. However, the average success percentage is low every year. Therefore, the university has continuously provided various supporting programs. Online supplementary courses (OSC) delivered through e-learning systems are one of these programs to support their English formal classes. The main purpose of these courses is to improve student learning autonomy and computer skills. In these courses, students access sample practice quizzes/tests.
There is one pre-test and fifteen practice tests in the authentic EPT frameworks. The test score ranges from 0 to 10. The time allotted for each test is 100 min, including review and submission time. After submitting these practice tests, students receive immediate test scores without lengthy waits for grading periods from teachers and can review their mistakes through automated feedback and explanation functions, which are available in the language management system. Although these tests are taken repeatedly based on students’ needs, the score of the first attempt was recorded for further study in this research.
Figure 1 depicts the 5-step research steps employed in this study.
  • Step 1: Data collection and cleaning
The dataset was obtained from the database of a foreign languages center of a Vietnamese university in MDR in Semester 1 of the 2021–2022 academic year. Students’ identities were anonymized for ethical purposes. Since we emphasized student success in English tests, only students who had taken English supplementary online courses in the e-learning system were selected. After collecting data, we used Microsoft Excel 2016 to clean and transform the original data into available data. Then, all missing and unrelated values (i.e., name, ID, enrolled class, etc.) were removed. We transformed category values into numeric/binary ones and normalized data. After cleaning, the final data included 109 students’ information. Among them, 62 (56.9%) passed the English final examination, while 47 (43.1%) failed. The input and output factors are described in Table 1. Figure 2 shows the correlation matrix among them.
  • Step 2: Prediction model constructions
CART and LR classifiers were used to construct prediction models on the Jupyter Notebook tool, version 6.5.4 with Scikit-learn packages in Python 3 (ipykernel). The dataset was divided into training and testing data with the ratio shown in Table 2. Each model was constructed using these training–testing datasets. The mean value and standard deviation (SD) of each model were used for comparing prediction accuracy.
For better performance, we used “Pass/Fail”, which was transformed from the English final score as a class label to conduct two classification cases. Case A used original datasets, while Case B oversampled the class “fail”. Figure 3 shows two cases of data classification.
  • Step 3: Evaluation
We calculated accuracy, F1, the area under the curve (AUC), receiver operating characteristic (ROC), and a confusion matrix to evaluate the model performance. The values of these methods ranged from 0 to 1, where 1 refers to excellent prediction performance [10].
  • Step 4: Comparison
After comparing the prediction results of two classification cases and classifiers, we selected the best case and classifier to retrieve important factors associated with student success in English tests.
  • Step 5. Discussion and Conclusions
The research results were compared, and contributions to theories and practices of predictions on student success in English tests were verified on a basic map. Solutions for enhancing the success of students and online English supplement courses were proposed based on these extracted factors.

3. Results

3.1. Classification

Table 3 presents the classification results of two data classification cases. The accuracy, F1, and AUC of Case B were higher than those of Case A. Although CART and LR classifiers showed the same overall accuracy and F1 results, CART was slightly better than LR with an AUC of 89.33%. Hence, CART was used for further analysis.
For in-depth comparison, a confusion matrix was computed to evaluate the classifiers’ performance. Figure 4 shows the confusion matrix results in two classification cases. The value increased from 0.556 in Case A to 0.923 in Case B, indicating that Case B was better than Case A. The CART classifier in Case B predicted the “Pass” class as 1.0 and the “Fail” class as 0.923. The CART model correctly predicted passing and failing students in Case B. Hence, CART was used for important feature selection.

3.2. Feature Selection

Figure 5 displays the rankings of feature importance from the CART model constructed in Case B. Four factors of “Quiz 7 score” (F10), “Quiz 3 score” (F6), “Quiz 11 score” (F14), and “Number of quizzes completed in the online supplement courses” (F1) significantly impacted student success in the English final examination.

4. Discussion and Conclusions

Table 4 lists important factors. The testing content and difficulty level among the provided test sets might be different.
Vietnamese universities need to provide additional support to students who are expected to pass the English final tests and better prepare for examinations. For example, schools and English centers must offer practice tests and/or mock tests in online supplementary courses to minimize students’ ill-preparation and lack of experience with online tests caused by low levels of technology and computer skills [2]. Importantly, sample practice and mock tests must be aligned with national English standards to help students feel familiar with the real tests. Additionally, English centers and teachers must frequently double-check the difficulty level among quizzes provided in these supplementary courses. The testing contents, quiz items, and answer keys also need to be assessed frequently for reliability and validity assurances.
English proficiency is important for students and universities in non-native English-speaking countries. It is a graduation requirement and a great opportunity for better scholarship and high-income employment. Despite continuously offering students various English courses and tests such as supplementary online courses via e-learning systems, the real status of student success in English tests and the predictors of this issue remain unknown. By using CART and LR, successful and unsuccessful students in English supplementary online courses can be classified. The results show that CART is better than LR, with an AUC of approximately 90%. The scores of quizzes 7, 3, and 11, and the number of quizzes students completed in these courses, significantly affected students’ pass or fail rate of the English final tests.
This study’s results significantly contribute to formulating theories and practices in online supplementary course designs for student success in English tests. From a theoretical perspective, an effective feature selection method was identified for teaching and learning English in non-native English-speaking countries, including Vietnam in MDR. For better accuracy performance, future research is necessary to address imbalanced data issues although the classes in the datasets were equal. Additionally, more evaluation metrics other than F1, AUC, and a confusion matrix must be applied. From a practical perspective, more predictors for student success in English tests need to be developed through sample practice tests in supplementary online courses.
Even though the current study contributes to the theories and practices of predictions on student success in English tests, limitations must be addressed. First, we only considered the scores of examinations provided in the online supplementary courses on an e-learning system as inputs. In future research, it is necessary to apply other factors to determine better predictions. Moreover, the research data of this study were collected from a single foreign language center of a single Vietnamese university; therefore, the results might not be generalized, necessitating the inclusion of more data.

Author Contributions

Conceptualization, T.-T.H.-C., L.-S.C. and D.T.T.; methodology, T.-T.H.-C. and L.-S.C.; software, T.-T.H.-C.; validation, T.-T.H.-C., T.-C.L. and L.-S.C.; formal analysis, T.-T.H.-C.; investigation, D.T.T.; resources, D.T.T.; data curation, T.-T.H.-C. and D.T.T.; writing—original draft preparation, T.-T.H.-C.; writing—review and editing, L.-S.C. and V.N.; visualization, V.N.; supervision, L.-S.C.; project administration, D.T.T. and T.-C.L.; funding acquisition, D.T.T. and T.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the lowest risk. Any risk suffered by the research subjects is not higher than those who do not participate in the study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data is unavailable due to privacy.

Acknowledgments

We are grateful Dong Thap University, Vietnam and Chaoyang University of Technology, Taiwan for providing access to their facilities, which allow us to conduct the experiments reported in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nguyen, T.N. Thirty years of English language and English education in Vietnam. Eng. Today 2017, 33, 33–35. [Google Scholar] [CrossRef]
  2. Huynh-Cam, T.T.; Agrawal, S.; Chen, L.S.; Fan, T.L. Using MOODLE-based e-assessment in English listening and reading courses: A Vietnamese case study. J. Inst. Res. South East Asia 2021, 19, 66–92. [Google Scholar]
  3. Ngo, M.T.; Tran, L.T. Current English education in Vietnam: Policy, practices, and challenges. In English Language Education for Graduate Employability in Vietnam; Springer Nature: Singapore, 2023; pp. 49–69. [Google Scholar]
  4. Ministry of Education and Training (MOET). Suggestions for Implementing the National Foreign Language Project 2020 in 2024 at the Local Units. Available online: https://thuvienphapluat.vn/cong-van/Giao-duc/Cong-van-259-BGDDT-DANN-2023-de-xuat-trien-khai-De-an-Ngoai-ngu-Quoc-gia-tai-don-vi-552539.aspx (accessed on 11 November 2024).
  5. Prime Minister. Approving, Revising, and Amending the National Foreign Language Teaching and Learning Project in the National Education System for the Period 2017–2025. Available online: https://datafiles.chinhphu.vn/cpp/files/vbpq/2017/12/2080.signed.pdf (accessed on 11 November 2024).
  6. Bujang, S.D.A.; Selamat, A.; Ibrahim, R.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H.; Ghani, N.A.M. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021, 9, 95608–95621. [Google Scholar] [CrossRef]
  7. Gardner, J.; O’Leary, M.; Yuan, L. Artificial intelligence in educational assessment: ‘Breakthrough? Or buncombe and ballyhoo?’. J. Comp. Assis. Learn. 2021, 37, 1207–1216. [Google Scholar] [CrossRef]
  8. Hassan, K.M.; Khafagy, M.H.; Thabet, M. Mining educational data to analyze the student’s performance in TOEFL iBT Reading, Listening and Writing Scores. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 327–334. [Google Scholar] [CrossRef]
  9. Quynh, T.D.; Dong, N.D.; Thuan, N.Q. A case study of student performance predictions in English course: The data mining approach. In International Congress on Information and Communication Technology; Springer Nature: Singapore, 2024; pp. 419–429. [Google Scholar]
  10. Huynh-Cam, T.T.; Chen, L.S.; Lu, T.C. Early prediction models and crucial factor extraction for first-year undergraduate student dropouts. J. Appl. Res. High. Educ. 2025, 17, 624–639. [Google Scholar] [CrossRef]
Figure 1. Research workflow of this study.
Figure 1. Research workflow of this study.
Engproc 98 00019 g001
Figure 2. Correlation matrix.
Figure 2. Correlation matrix.
Engproc 98 00019 g002
Figure 3. Two cases of data classification.
Figure 3. Two cases of data classification.
Engproc 98 00019 g003
Figure 4. Confusion matrices of two classification cases.
Figure 4. Confusion matrices of two classification cases.
Engproc 98 00019 g004
Figure 5. Important features from CART.
Figure 5. Important features from CART.
Engproc 98 00019 g005
Table 1. Input and output factor description.
Table 1. Input and output factor description.
Factor IDFactorsFactor Description and Transformed Values
F1Number of quizzes students completed in the OSC5–15
F2Students complete the OSC1 = Yes, 0 = No
F3Pre-test score0–10
F4Quiz 1 score0–7.71
F5Quiz 2 score0–8.57
F6Quiz 3 score0–9.71
F7Quiz 4 score0–9.14
F8Quiz 5 score0–9.43
F9Quiz 6 score0–9.43
F10Quiz 7 score0–9.14
F11Quiz 8 score0–9.14
F12Quiz 9 score0–9.44
F13Quiz 10 score0–10
F14Quiz 11 score0–9.71
F15Quiz 12 score0–10
F16Quiz 13 score0–10
F17Quiz 14 score0–9.71
F18Quiz 15 score0–10
Output: PassEnglish final exam scores1 = Pass: ≥5~10; 0 = Fail: <5.0
Table 2. Data division for prediction models.
Table 2. Data division for prediction models.
DatasetNumber of StudentsPercentage
Training set8780%
Testing set2220%
Total109100%
Table 3. Classification results for two cases.
Table 3. Classification results for two cases.
Classification CasesClassifierPrediction Performance (%)
Overall AccuracyF1 AUC
MeanSDMeanSDMeanSD
Case A: Original datasetsCART69.507.4559.178.5965.676.65
LR68.335.8560.178.6481.506.38
Case B: Oversampling datasetsCART74.677.8776.5010.9989.336.02
LR74.677.8776.5010.9973.674.93
Table 4. Rank order of the most important factors for student success in English final tests.
Table 4. Rank order of the most important factors for student success in English final tests.
Rank OrderFactors
1F10. Quiz 7 score
2F6. Quiz 3 score
3F14. Quiz 11 score
4F1. Number of quizzes completed in the OSC
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huynh-Cam, T.-T.; Truong, D.T.; Chen, L.-S.; Lu, T.-C.; Nalluri, V. Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Eng. Proc. 2025, 98, 19. https://doi.org/10.3390/engproc2025098019

AMA Style

Huynh-Cam T-T, Truong DT, Chen L-S, Lu T-C, Nalluri V. Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Engineering Proceedings. 2025; 98(1):19. https://doi.org/10.3390/engproc2025098019

Chicago/Turabian Style

Huynh-Cam, Thao-Trang, Dat Tan Truong, Long-Sheng Chen, Tzu-Chuen Lu, and Venkateswarlu Nalluri. 2025. "Predicting Student Success in English Tests Using Artificial Intelligence Algorithm" Engineering Proceedings 98, no. 1: 19. https://doi.org/10.3390/engproc2025098019

APA Style

Huynh-Cam, T.-T., Truong, D. T., Chen, L.-S., Lu, T.-C., & Nalluri, V. (2025). Predicting Student Success in English Tests Using Artificial Intelligence Algorithm. Engineering Proceedings, 98(1), 19. https://doi.org/10.3390/engproc2025098019

Article Metrics

Back to TopTop