Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students

Alenezi, Abdullah; Alenezi, Abdulhameed

doi:10.3390/educsci15091136

Open AccessArticle

Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students

by

Abdullah Alenezi

^1,*

and

Abdulhameed Alenezi

²

¹

Department of Curriculum and Instructional Technology, College of Humanities and Social Sciences, Northern Border University, Arar 91431, Saudi Arabia

²

Department of Instructional Technology, Education College, Jouf University, Al-Jawf 20151, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(9), 1136; https://doi.org/10.3390/educsci15091136

Submission received: 26 July 2025 / Revised: 28 August 2025 / Accepted: 28 August 2025 / Published: 1 September 2025

(This article belongs to the Special Issue Computer-Assisted Language Learning at the Dawn of the AI Revolution)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The growing application of artificial intelligence in education has created new avenues for second language learning. The following research explores the impact of learning with the help of chatbots on English conversation among secondary students in the Northern Borders Region in Saudi Arabia. The quasi-experimental design involved 30 students divided into two groups: an experimental group that interacted with an intervention using a GPT-powered chatbot for three weeks, and a control group that underwent traditional teaching. Pre- and post-tests were given to assess conversation competence. At the same time, students’ attitudes toward the chatbot-assisted learning experience were measured through questionnaires, teacher observation, and usage logs in the chatbot. Results showed statistically significant improvement in the experimental group’s speaking competence (mean gain = 5.24, p < 0.001). Students showed high motivation, elevated confidence, and high satisfaction with the learning experience provided through the chatbot (overall attitude mean = 4.35/5). Teacher observations testified that the students were much more engaged and spontaneous, and using the chatbot was positively correlated with score gain (r = 0.61). The outcomes indicate that chatbot-based learning is a practical approach for facilitating the development of spoken English, particularly in low-resource learning environments. The research provides empirical proof in favour of the incorporation of interactive AI into EFL teaching in all the secondary schools in Saudi Arabia.

Keywords:

chatbot-assisted learning; English conversation skills; artificial intelligence in education; EFL learners; Saudi secondary students; language technology; quasi-experimental study

1. Introduction

Over the past few years, artificial intelligence (AI) technologies have rapidly revolutionised education, reshaping students’ engagement with teaching material, instructors, and classmates. Of such innovations, AI-based chatbots—computer programmes that model conversations in a human-like form—have become significant instruments for second language learning. Their ability to respond instantaneously, interact personally, and remain judgement-free makes them potential pathways for improving students’ communicative competence in language learning environments where language opportunities beyond the classroom are few (Du & Daniel, 2024; Li et al., 2025).

The core questions for the research in this investigation are the following:

How much does chatbot-based teaching enhance the English speaking skills of students compared to conventional teaching?
What attitudes and experiences do students hold about the use of chatbots for practising English conversations?
How do teachers interpret students’ participation, progress, and difficulties in practising orally through chatbots?

Through the responses to these questions, this research aims to advance the literature base in AI in language education, provide practical guidelines for secondary educators in the Kingdom of Saudi Arabia, and drive consequential educational technology decision-making in linguistically and geographically heterogeneous environments. Ultimately, the focus here is on the transformative power of AI-powered chatbots as assistants, rather than competitors, for teachers in building communicative competence, autonomy, and digital literacy in 21st-century classrooms.

2. Literature Review

Saudi Arabia has seen growing interest in using digital technology in the classroom while teaching English as a Foreign Language (EFL) to support the country’s Vision 2030 agenda, prioritising education innovation and skills development (Al-Amri & Ahlam Mohammed, 2024). However, regardless of extensive investment in educational technology, the typical Saudi EFL classroom may still be teacher-dominated and test-based, restricting students’ chances for authentic, spontaneous language use (Mohamed, 2023; Alhammad, 2024). As such, students might not get enough practice in attaining communicative fluency, especially in speaking, a skill universally accepted as the most difficult and anxiety-provoking for EFL learners (Klímová & Ibna Seraj, 2023; Wu & Li, 2024).

Chatbots are an attractive response to such an obstacle. Existing literature indicates that conversational agents can afford learners low-stakes, readily available chances for actual dialogues, corrective feedback, and fluency building (Duong & Chen, 2025; Davar et al., 2025). Various studies under varying learning contexts support gains in learners’ vocabulary, sentence length, pronunciation, and confidence after adopting chatbot-based interventions (Hutauruk et al., 2024; Wiboolyasarin et al., 2024). Pilot tests in Saudi higher education, using software like the Bashayer chatbot 2023 and those utilising platforms driven by ChatGPT-3.5, have shown positive outcomes on motivation, participation, and language achievement (Al-Abdullatif et al., 2023; Aldowsari & Aljebreen, 2024). However, only limited investigations have subjected such tools’ application at the secondary level to rigorous analysis, particularly in peripheral areas such as the Northern Borders of the Kingdom of Saudi Arabia.

Such a gap in geography is meaningful. Those studying in the Northern Borders Region deal with unique educational problems, including fewer native English speakers, budget issues with schools, and fewer activities to help with language skills. Such barriers may slow the gaining of oral competence, increasing the gap in education between this region and others (Alruwaili & Kianfar, 2025; Elsheikh et al., 2025). Due to these facts, now is a good time to study using chatbots to enhance the English speech of people in the region.

What is more, since quantitative research now reports improved test results and changing attitudes among English learners who use chatbots (Alsalem, 2024; Al-Zahrani & Alasmari, 2025), research is essential to combine those results with qualitative comments obtained from both students and instructors. Investigating learners’ experiences with AI chatbots, which cover ease of use, involvement, and educational gains, helps educators improve and equalise their digital initiatives (Bin-Hady et al., 2024; Meo et al., 2024). It is also important to consider how teachers feel about learning management systems, as this helps sustain the introduction of these technologies in curricula and respond to issues related to necessary training, technology, and curricular fit.

The current study is looking into how AI-driven chatbots can improve English conversational practice among secondary students taught in the Northern Borders Region of Saudi Arabia. The study design compares a group that learns with chatbots with a group that receives instruction in the usual way. To make an evaluation, semi-structured interviews and observation of lessons are also used to check students’ views and notice any improvements in behaviour. Research has recently focused on using chatbots for language teaching in higher and K–12 settings (Annamalai et al., 2023; Su et al., 2025; Rashed, 2024). The current research aims to provide additional evidence on the impact of chatbots on language improvement for learners in different educational settings.

3. Materials and Methods

The quasi-experimental approach involved two parallel groups: an experimental group learning through a chatbot-based learning intervention and a control group receiving everyday English teaching. The design allowed the comparative assessment of the effectiveness of improving English conversation among Saudi secondary school students through chatbot-based learning. The design permitted the observation of the standard educational setup while avoiding disruptions and ensuring ethical practicality in the educational setup in the school.

3.1. Participants

The research involved 30 students from the Northern Borders Region in the Kingdom of Saudi Arabia, selected through purposive sampling according to availability and the granting of parental consent. The students were divided into two groups, and they were evenly split. The experimental group, comprising 15 students, was exposed to English conversational practice through a chatbot. In contrast, the second group, comprising 15 students, was left with the standard English as a Foreign Language (EFL) lesson plan without a chatbot. The experimental group consisted of male and female students, aged 15–17 years, all having proven intermediate English skills according to the placement tools used at the schools (Sa’ad et al., 2023). Full administrative approval was granted from the schools involved in the study, and ethical approval from the concerned authorities. Informed guardians’ consent and students’ assent were obtained before data collection.

3.2. Instruments and Tools

3.2.1. Pre- and Post-Tests

The English conversational skills of the students were measured through structured pre- and post-tests. The five aspects covered in the test rubric (Table 1) were fluency, vocabulary, grammar, pronunciation, and contextual relevance. All students were marked out of 20, while improvement was measured in terms of the difference between post-test and pre-test marks.

The pre- and post-test activities in the research under consideration were conversation tasks designed to assess students’ real-time speaking skills in English. Spontaneous language use was confined to a role-play type scenario in both tests. The experimental and control groups were tested this way: they were tested once at the beginning of the first week to determine a performance baseline and then again during the fourth week after the three-week intervention process. It was intended to quantify any improvements or deteriorations in conversational skills after some time, especially in such areas as fluency, vocabulary, grammar, accent, and competency in context, with a standardised rubric being used to allow the scores to be recorded in a similar way.

3.2.2. Chatbot Application

In this study, the chatbot is based on the GPT-3.5-turbo model by OpenAI, which is installed through the ChatGPT API and embedded in a browser-based interface specially designed for education. The interface was available on tablets provided by the school, and students could practice conversations in a structured way in a familiar and safe digital space. The system prompt of a given session was personalised according to the theme of the week’s learning. As an example, a session on ordering food might start with instructing the chatbot with the phrase, “You are a nice English teacher who helps the client to practice the conversations about ordering food in a restaurant”. Those were curriculum-based prompts enabling diverse and ad lib input by learners. There were no pre-determined responses, and the language model dynamically generated the chatbot responses depending on what the student typed and the prompt.

To provide a safe and proper learning experience, the OpenAI moderation API was implemented to eliminate any potentially harmful or off-topic text. Moreover, each student was automatically logged in, and the number of sessions, the overall duration of a session, and the total interaction time were recorded since all sessions were performed under supervision in the computer laboratories. Teachers reviewed weekly transcripts to check the level of engagement and the proper functioning of a chatbot. The typical student in the experimental group participated in 9–15 sessions (ca. 14.8 min each), totalling ca. 186 min of talking time in three weeks. The arrangement supplied students with regular, low-stress chances to rehearse spoken English in real time, controlled by adaptive prompts that promoted fluency, vocabulary, and situational precision. The chatbot served not just as a conversational partner, but also as a development of formative learning tools that aim at enforcing speaking behaviours that conformed to classroom instructions.

3.2.3. Student Attitude Questionnaire

After the intervention, students in the experimental group filled out a 10-item Likert-scale questionnaire for evaluating perceptions regarding the helpfulness, level of engagement, usability, and the impact on students’ confidence. Each item was rated on a scale from 1 (Strongly Disagree) to 5 (Strongly Agree), and each participant’s responses were averaged (See Appendix A).

3.2.4. Teacher Observation Rubric

The data triangulation was strengthened further through two English language instructors rating the same set of 15 students from the experimental group through an observation rubric. The students were rated on 5-point scales for their engagement, confidence, use of vocabulary, sentence structure, and spontaneity in speech. The ratings were averaged and correlated with the students’ performance scores.

3.2.5. Usage Logs and Demographics

Usage logs for the chatbot, including session count, session length, and total usage, were recorded automatically. Background questionnaire demographic information, including age and gender, was recorded to determine performance trends based on each variable.

3.3. Procedure

The four weeks were the period of the intervention. At week one, all the participants underwent a baseline speaking test (pre-test) followed by orientation and brief training on using the chatbot interface for the experimental group. In weeks one to three, both groups practised communication in English, although in a different way. The experimental group students were put in front of the GPT-driven chatbot in a laboratory setting in a 30 min supervised session, three times a week. During this period, an average of 14.8 min during each session was spent communicating with the chatbot, with the remaining time spent covering reading past chat logs, vocabulary thought, and short teacher facilitation. On average, a student spent 182.5 min in a chatbot across the intervention period.

On the contrary, the control group used the traditional kitchen sink EFL classroom model and the same thematic items every week (e.g., greetings, shopping, travel), without the chatbot integration strategy. Textbook-based dialogues, guided pair-work, roleplay, vocabulary drills, and pronunciation activities conducted by the teacher were a part of their classes. There was no difference in the amount of time spent schooling in classes or exposure to lesson content in both groups.

It should be pointed out that the key variable in this experiment was not how much time was spent practising it, but how to do it. Although the two groups were subjected to comparable teaching topics and time frames, the experimental one engaged in active, AI-enhanced conversations with feedback in real time, and the control group exercised using traditional peer-to-peer and peer-teacher interactions. This difference characterises the pedagogical change demonstrated by the integration of chatbots: a transition to an instructor-independent and self-adaptive conversation practice.

3.4. Data Analysis

Test scores, chatbot usage, and questionnaire responses were analysed using descriptive statistics (means, standard deviations). Paired t-tests were performed to analyse improvement within groups. Independent t-tests contrasted the post-test scores for the groups. Teacher ratings were analysed descriptively and cross-correlated with the students’ improvement measures to determine consistency between performance measures.

4. Results

The 15 students’ pre- and post-test scores were compared to assess the effect of learning through the chatbot. As indicated in Figure 1, all students showed improvement in the post-test compared to the baseline.

Descriptive statistics revealed the following

The eMP Integro group significantly increased the speaking skills level compared to the control group (Table 2b). The average of the experimental group jumped in the post-test period in comparison with the pre-test (10.40; SD 1.56) to 15.64 (SD 2.11) with an average gain of 5.24 points (p < 0.001). By comparison, the control group improved by 1.08 (p = 0.12). These outcomes correspond to the previous studies that indicated that AI-guided chatbots may provide quantifiable improvements in speaking skills within relatively short periods (Du & Daniel, 2024; Liu et al., 2025; Tai & Chen, 2024). This more significant enhancement of the study group indicates that the incorporation of chatbot-based practice, though during short and daily sessions, is capable of generating more significant language acquisition results than the conventional methods of EFL learning expanded on their own (Al-Abdullatif et al., 2023; Aldowsari & Aljebreen, 2024). The range of minimum and maximum scores also improved, rising from 8.04 to 12.36 (range) and 13.05 to 18.52 (range), respectively, reflecting that all learners benefited, albeit to varying degrees. Such gains were further statistically validated using the paired-sample t-test, with the result yielding t(14) = 12.78, p < 0.001, hence validating the fact that the post-test score increase was not chance-based but rather the result of the effectiveness of the use of the chatbot in learning. This supports earlier research suggesting that AI-based technology can effectively enhance EFL learners’ speaking skills (Liu et al., 2025; Duong & Chen, 2025).

The comparative effectiveness of learning with the chatbot was also validated with the control group’s scores (n = 15) before and after the test. These students were provided with standard EFL instructions without chatbots. Unlike the experimental group, the control group’s performance had an insignificant increase during the three weeks.

The control group’s average gain of 1.08 differs from the 5.24 gain in the experimental group. This implies that although a positive change was achieved through traditional instruction, it was significantly lower than with a chatbot. Table 2b below is a picture of the comparative gains of both groups.

4.1. Student Attitude Towards Chatbot Use

The 10-item Likert-scale dimensions were administered to gauge students’ subjective experiences with the chatbot intervention to complement the objective language gain assessment. The tool measured several dimensions: enjoyment, motivation, confidence, perceived language gain, and satisfaction. The findings, presented in Table 3, indicate the attitudes among students towards applying chatbot technology in English language learning. Each item was rated on a 5-point scale (1 = strongly disagree, 5 = strongly agree) for a detailed analysis of cognitive and affective responses. According to student responses, chatbot-facilitated learning was well accepted (overall mean = 4.35/5). Increased speaking confidence (4.5) and ease of use (4.5), referred to as primary dimensions, were rated the highest. Such results align with Alsalem (2024), who stated that Saudi EFL students find chatbot AI to be non-judgmental learning companions and engaging and motivating. These findings also reflect those of Bin-Hady et al. (2024) and Meo et al. (2024), who emphasised that conversational AI tools can potentially make learners more engaged and more comfortable during their communication.

4.2. Teacher Observations

In order to acquire an objective assessment of the students’ visible progress, two English language instructors separately rated ten students from the experimental group through independent, structured observation using a 5-point Likert scale-based, behaviourally oriented rubric. The rating was centred on impactful behaviour and language indicators, such as engagement, confidence, use of vocabulary, and sentence complexity. Teacher ratings verified student-reported improvements, and engagement (4.63) and confidence (4.52) were very high. Such findings align with previous studies that say AI-mediated conversation encourages spontaneous speech, extended vocabulary utilisation, and sentence syntax structure (Zhang, 2025; Wiboolyasarin et al., 2024). Teacher ratings were an essential kind of triangulation for the external verification of students’ self-reported attitudes and test scores. Table 4 displays the average ratings in six core areas, providing insights into the teachers’ perceptions regarding the effectiveness of instruction through the assistance provided by the chatbot in facilitating real-time language learning.

4.3. Chatbot Usage and Performance Correlation

In order to determine the level of engagement the students had with the chatbot tool, usage logs for all 15 students in the experimental group were gathered and analysed. The usage logs recorded the number of usage sessions, session duration, and the total minutes spent using the tool for the three-week intervention. The significant findings from the usage data appear in Table 5.

The statistics indicate that students interacted with the chatbot reliably during the intervention, registering 12.3 sessions and 182.5 min of engagement in total. Most importantly, the session length averaged 14.8 min, showing that students maintained topic-based conversations during each session. Usage log analysis showed a significant positive relationship between time used on chatbots and increased speaking scores, with a moderate-strong correlation (r = 0.61). This result confirms Silitonga et al. (2023) and Law (2024), who claimed that frequent exposure to AI chatbots indicates more successful language acquisition skills. The relatively small difference in usage metrics (minimum to maximum) also reminds us that the chatbot sessions were seamlessly incorporated into students’ learning habits. Such findings complement earlier work on the argument that AI-powered language tools are convenient and engaging for digital-native learners (Law, 2024; Alsalem, 2024; Tai & Chen, 2024). Furthermore, the persistence in interaction over time might have helped the observed improvements in speaking proficiency and learner confidence.

5. Discussion

This project sought to determine the effectiveness of learning using a chatbot in improving English conversational skills in secondary students in the Northern Borders Region, Kingdom of Saudi Arabia. While combining test performance, students’ perceptions, teachers’ observations, and the usage of the chatbot, the analysis provides an in-depth perception of the impact that artificial intelligence (AI)-powered speech tools may have on language learning in actual classrooms. The discussion below translates findings, positions them in comparison with the literature, examines the contextual significance for education in Saudi Arabia, and identifies limitations and directions for future work.

5.1. Chatbots and Measurable Language Gains

The efficiency of chatbot-assisted instruction is emphasised by statistically significant differences between the scores of the conversation test increased by the experimental (mean gain = 5.24, p < 0.001) and the control (mean gain = 1.08, p = 0.12) groups as a result of the experiment. The results of the present study confirm previous evidence that the application of AI-based chatbots can improve speaking proficiency even within a relatively short intervention time frame (Du & Daniel, 2024; Liu et al., 2025; Wiboolyasarin et al., 2025). The presented increase can also be attributed to cognitive spaced repetition and immediate feedback, which enhancement principles have been known to enhance retention and fluency (Li et al., 2025; Aldowsari & Aljebreen, 2024). Adaptive prompts made possible in this study enabled students to make corrections during the learning process, helping to follow the principles of communicative language teaching (Duong & Chen, 2025; Law, 2024).

In addition, the nature of the practice in the chatbot—briefer, relatively frequent episodes, corresponds with what cognitive theorists term spaced repetition and incremental practice, both demonstrated to be successful in lasting retention and skill development (Li et al., 2025). So the chatbot would have acted as a practice partner and formative assessment tool, facilitating the internalisation of proper usage.

5.2. Affective Benefits: Confidence, Enjoyment, and Motivation

The group that attended the experimental lesson described their overall perception of chatbots-enhanced learning (attitude scores = 4.35/5), with an exceptionally high score in speaking confidence (4.5) and the enjoyment of using it (4.5). It reflects the works of Alsalem (2024), Alhammad (2024), and Meo et al. (2024), who found that the use of AI chatbots helps mitigate the level of anxiety experienced by learners and facilitates their engagement. Chatbots can have immense value when gender-separated classrooms are used, or in such cases when they impose restrictions on actual interaction, as in the example of female and male students (Alruwaili & Kianfar, 2025). The outcome is increased readiness to take longer stints of talk, which is essential in developing oral proficiency (Bin-Hady et al., 2024).

At times, the EFL learners in classroom participation would experience apprehension in communicating, whereas the chatbot provided an uncritical, safe area for practice in speaking. Specifically in the Saudi culture, linguistic confidence among female learners may be dampened by social norms or the threat of public error (Alruwaili & Kianfar, 2025). Therefore, the chatbot served the purpose as an anxiety reduction tool, and consequently, enhanced willingness in learners to speak—a core element in speaking proficiency development (Meo et al., 2024).

These emotional benefits complement the argument made by Alhammad (2024) that chatbots enhance learner attitude through providing an enabling environment for exploration, ultimately leading to risk-taking and self-expression in English. The embedding of motivational technology also complements the overall education goals in the Vision 2030 for Saudi Arabia through digital innovation and learner-centred teaching (Al-Zahrani & Alasmari, 2025).

5.3. Teacher Observations: Validating Performance Shifts

Whereas self-report data would be impressionistic and scores would be narrow regarding scope, the points of observation given by teachers offered crucial triangulation (see Figure 2). Instructors recorded a significant increase in the progress of all the students in terms of guides of behaviour and language, especially in engagement (4.63), confidence (4.52), and speech fluency (4.33). It was observed that vocabulary range and syntactic variety improved, as well as pronunciation, which was smoother after the intervention. These results resonate with the claim that multimodal, iterative communication promotes real-time language processing (Bin-Hady et al., 2024; Zhang, 2025). The fact that the teacher assessment, student performance data, and data on attitudes align increases the overall plausibility of the effect of the intervention. Further, this indicates that such tools are theoretically viable and practically usable in real-life classroom environments.

5.4. Chatbot Usage as a Predictor of Performance

Chatbot usage was observed by analysing the usage log. Users experienced an average of 182.5 total minutes over three weeks, with a moderate-to-strong positive correlation (r = 0.61) between usage and improvement in score. This confirms the works of Silitonga et al. (2023) and Law (2024) on the idea that active usage of the AI tool on a regular and structured basis better predicts positive gains in learning. These findings indicate that regular participation in chatbots, as opposed to access, helps, which is why, on the one hand, there is a strong indication to program the routine use of chatbots into the EFL syllabus (Al-Abdullatif et al., 2023). This correlation indicates, however, that consistency in usage, rather than access, is an important factor in learning. It also offers empirical support for planned implementation in schools: routine, supervised chatbot usage can be integrated into regular lesson cycles, maximising student value.

Notably, the usage logs provide more than a reflection of engagement, as they are an instrument for teachers and schools planning further integration of chatbots into the curriculum. Trending session frequency and duration data can help teachers recognise the best time-on-task options or support less frequent users (Al-Abdullatif et al., 2023). Schools can also use this information to keep track of chatbot use and set standards for incorporating AI with instructor-delivered instruction. These forms of data-inspired planning keep chatbot implementation effective, multipersonal, and scalable.

5.5. Comparison to Prior Research

The findings of this study support the global literature on the effectiveness of chatbots in language learning. For instance, according to Annamalai et al. (2023) and Su et al. (2025), the two scholars assert that the interactive human-like environments created through chatbots enhance language learning and independence among learners. Nonetheless, in contrast with many earlier experiments on writing and vocabulary, the present work analysed exclusively spoken language development, with sound empirical gains in fluency, vocabulary recall, and sentence complexity (Duong & Chen, 2025; Aldowsari & Aljebreen, 2024). It combines data on teacher assessments and usage logs in real-time, providing an enhanced, practice-based insight that is greater than post-surveys alone. In addition, the positive orientations seen here correspond with the findings in Rashed (2024) in reporting that the Saudi students who used AI apps showed greater autonomy in language and were satisfied, as long as the tools were available and user-friendly.

Whereas the existing literature has limited itself to vocabulary learning or writing support (Duong & Chen, 2025; Aldowsari & Aljebreen, 2024), the current study targets the underrepresented area of spoken language development. The emphasis on real-time conversational skills can be explained by the fact that they are pedagogically problematic in the EFL learning environment, particularly in settings where there is a lack of academic opportunity to interact with others in their native language (Duong & Chen, 2025). This article adds a new layer to the body of knowledge by proving that chatbots can be used to achieve measurable fluency improvements within a very short intervention period, and they are good enough to warrant oral synchronous practice.

6. Implications for Saudi Secondary Education

The Northern Borders Region is an area in Saudi Arabia that is less urbanised and less resourced, and students might find fewer opportunities for in-context English language use. For this reason, the findings in this study are particularly pertinent to educational equity. As noted by Al-Amri and Ahlam Mohammed (2024), digital transformation in Saudi schools is not uniform, and such studies contribute towards the argument for more inclusive education policies aimed at reaching underrepresented areas.

Also, the limited existing body of literature on the application of secondary-level chatbots is supplemented here, since the overwhelming majority of the existing studies employed university students as subjects (AbuSahyon et al., 2023; Al-Abdullatif et al., 2023). It appears that, with proper scaffolding, even young learners benefit greatly through chatbot-supported conversation practice. The significance is also in the realm of culture. For gender-segregated educational systems, chatbots represent an interesting approach for providing female students with a greater voice and practice in English, avoiding social hesitation and constraint in human-to-human interactions (Alruwaili & Kianfar, 2025). In this sense, chatbots might be well-suited for the sociolinguistic condition of KSA’s educational system.

7. Conclusions

This research analysed the effect of learning assisted by chatbots on the English conversational ability among secondary students in Northern Saudi Arabia. In applying a quasi-experimental design with built-in performance tests, students’ attitude scales, teacher observation, and usage metrics for the chatbots, the study demonstrated that computer-based AI-powered chatbots greatly enhanced students’ English conversational skills in just three weeks. The quantitative outcomes indicated statistically significant gains in post-test performance, with the overall increase in the mean being 5.24 points (p < 0.001). Students were delighted, motivated, and confident in using the chatbot, and the overall attitude score averaged 4.35 out of 5. Teacher grading supported these findings, notably in engagement, use of vocabulary, and spontaneous speech. In addition, the usage metrics suggested that the correlation between the amount of time spent utilising the chatbot and the amount of performance achieved was positive, emphasising the need for ongoing usage.

These results complement international research that indicates that conversational AI tools can contribute significantly toward developing foreign languages when implemented intentionally in the curriculum. In the case of Saudi Arabia, specifically in underprivileged or rural areas, the use of chatbots offers a scalable, adaptive, and learner-centric approach toward enhanced fluency in English. Although the work is constrained in population and duration, it contributes to the new empirical work on language learning enhanced through AI in secondary education. Longitudinal analysis, greater populations, and investigating chatbot voices and tailored scaffolding mechanisms remain avenues for further work in subsequent research. Finally, this study confirms that learning through chatbots is not merely the latest fad, but an effective teaching method for enhancing the communicative abilities of Saudi students in today’s globalised world.

Limitations

Although the outcomes of this work are reassuring, many limitations need to be noted. First, the participants were limited to 30 students in a single area in Saudi Arabia, limiting the generalisability of the outcomes to wider groups. The short intervention duration—just three weeks—provides insight into short-term improvements, not the retention of the improvements in the longer term, nor the transferability of the speaking skills. Second, the quasi-experimental design in the work was primarily due to practical considerations in the schools, leading to non-random group assignment, the potential for selection bias, and the inability to account for alternative explanations. Finally, any conclusions relating to the particular chatbot model used here may be limited if the same model and interface were used, as the outcomes would likely differ. All these considerations point toward conducting further work with larger, diverse groups of students, randomised control trials, and intervention durations longer than three weeks to be able to understand how this form of language learning is both practical and sustainable in other educational settings in Saudi Arabia.

Author Contributions

The authors, A.A. (Abdullah Alenezi) and A.A. (Abdulhameed Alenezi), shared all stages of the work, from the idea, to conducting research, analysis, writing, and reviewing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the College Scientific Committee of Northern Border University (protocol code: 251228; 6 September 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Student Attitude Questionnaire Items

The following 10 items were asked in the post-intervention attitude questionnaire for the experimental group. Figure 1 and Figure 2 answers are on the 5-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree):

I enjoyed using the chatbot to practice English.
Using the chatbot increased my interest in learning English.
I was motivated to use the chatbot regularly.
Chatbot practice helped me feel more confident when speaking English.
I learned new vocabulary through my conversations with the chatbot.
The chatbot helped improve my grammar and sentence accuracy.
The feedback I received from the chatbot was helpful for my learning.
The chatbot interface was easy to use and navigate.
I would recommend this chatbot to other students learning English.
Using the chatbot helped me improve my English speaking skills overall.

References

AbuSahyon, A. S. E., Alshorman, O., Alshorman, O., & Al-Absi, B. (2023). Investigating the impact of AI-driven chatbots on the acquisition of English as a foreign language among Saudi undergraduate students. International Journal of Membrane Science and Technology, 10(2), 3075–3088. [Google Scholar] [CrossRef]
Al-Abdullatif, A. M., Al-Dokhny, A. A., & Drwish, A. M. (2023). Implementing the Bashayer chatbot in Saudi higher education: Measuring the influence on students’ motivation and learning strategies. Frontiers in Psychology, 14, 1129070. [Google Scholar] [CrossRef]
Al-Amri, N. A., & Ahlam Mohammed, A.-A. (2024). Drivers of chatbot adoption among K–12 teachers in Saudi Arabia. Education Sciences, 14(9), 1034. [Google Scholar] [CrossRef]
Aldowsari, B. I., & Aljebreen, S. G. (2024). The impact of using a ChatGPT-based application to enhance Saudi students’ EFL vocabulary learning. International Journal of Language and Literary Studies, 6(4), 380–397. [Google Scholar] [CrossRef]
Alhammad, A. I. (2024). The impact of ChatGPT on developing Saudi EFL learners’ literature appreciation. World Journal of English Language, 14(2), 331. [Google Scholar] [CrossRef]
Alruwaili, A. R., & Kianfar, Z. (2025). Investigating EFL female Saudi teachers’ attitudes toward the use of ChatGPT in English language teaching. Forum for Linguistic Studies, 7(2). [Google Scholar] [CrossRef]
Alsalem, M. S. (2024). EFL students’ perception and attitude towards the use of ChatGPT to promote English speaking skills in the Saudi context. Arab World English Journal, 15(4), 73–84. [Google Scholar] [CrossRef]
Al-Zahrani, A. M., & Alasmari, T. M. (2025). A comprehensive analysis of AI adoption, implementation strategies, and challenges in higher education across the Middle East and North Africa (MENA) region. Education and Information Technologies, 30, 11339–11389. [Google Scholar] [CrossRef]
Annamalai, N., Rashid, R., Hashmi, U., Mohamed, M., Alqaryouti, M., & Sadeq, A. (2023). Using chatbots for English language learning in higher education. Computers and Education: Artificial Intelligence, 5, 100153. [Google Scholar] [CrossRef]
Bin-Hady, W. R. A., Ali, J. K. M., & Al-humari, M. A. (2024). The effect of ChatGPT on EFL students’ social and emotional learning. Journal of Research in Innovative Teaching & Learning, 17(2), 243–255. [Google Scholar] [CrossRef]
Davar, N. F., Dewan, M. A. A., & Zhang, X. (2025). AI chatbots in education: Challenges and opportunities. Information, 16(3), 235. [Google Scholar] [CrossRef]
Du, J., & Daniel, B. K. (2024). Transforming language education: A systematic review of AI-powered chatbots for English as a foreign language speaking practice. Computers and Education: Artificial Intelligence, 6, 100230. [Google Scholar] [CrossRef]
Duong, T.-N.-A., & Chen, H.-L. (2025). An AI chatbot for EFL writing: Students’ usage tendencies, writing performance, and perceptions. Journal of Educational Computing Research, 63, 406–430. [Google Scholar] [CrossRef]
Elsheikh, O., Osman, E., Elamin, Y. M., Elyasa, Y. M., Osama Mudawe Nurain Mudawe, I. A. A., Syeda Sumaira Khurram, F., A., G. A. M., Ali, & Musa, M. (2025). Integration of ICT tools in elementary EFL education: A mixed methods study of teacher perspectives and implementation challenges in Saudi Arabia. Forum for Linguistic Studies, 7, 697–713. [Google Scholar] [CrossRef]
Hutauruk, B. S., Purba, R., Sihombing, S., & Nainggolan, M. (2024). The effectiveness of artificial intelligence by chatbot in enhancing the students’ vocabulary. JETAL: Journal of English Teaching & Applied Linguistics, 6(1), 13–19. [Google Scholar] [CrossRef]
Klímová, B., & Ibna Seraj, P. M. (2023). The use of chatbots in university EFL settings: Research trends and pedagogical implications. Frontiers in Psychology, 14, 1131506. [Google Scholar] [CrossRef]
Law, L. (2024). Application of generative artificial intelligence (GenAI) in language teaching and learning: A scoping literature review. Computers and Education Open, 6, 100174. [Google Scholar] [CrossRef]
Li, Y., Zhou, X., Yin, H., & Chiu, T. K. F. (2025). Design language learning with artificial intelligence (AI) chatbots based on activity theory from a systematic review. Smart Learning Environments, 12(1), 24. [Google Scholar] [CrossRef]
Liu, Z., Zhang, W., & Yang, P. (2025). Can AI chatbots effectively improve EFL learners’ learning effects?—A meta-analysis of empirical research from 2022–2024. Computer Assisted Language Learning, 1–27. [Google Scholar] [CrossRef]
Meo, M. J., Alqahtani, N., Albedah, F., & Banu, S. (2024). Empowering Saudi EFL learners using ChatGPT: An analysis of challenges and educational opportunities. Forum for Linguistic Studies, 6(6), 516–527. [Google Scholar] [CrossRef]
Mohamed, A. M. (2023). Exploring the potential of an AI-based chatbot (ChatGPT) in enhancing English as a foreign language (EFL) teaching: Perceptions of EFL faculty members. Education and Information Technologies, 29, 3195–3217. [Google Scholar] [CrossRef]
Rashed, A. (2024). AI application (ChatGPT) and Saudi Arabian primary school students’ autonomy in online classes: Exploring students and teachers’ perceptions. The International Review of Research in Open and Distributed Learning, 25(3), 1–18. [Google Scholar] [CrossRef]
Sa’ad, A., Alzyoud, A., AlShorman, O., & Al-Absi, B. (2023). AI-driven technology and chatbots as tools for enhancing English language learning in the context of second language acquisition: A review study. International Journal of Membrane Science and Technology, 10(1), 1209–1223. [Google Scholar] [CrossRef]
Silitonga, L. M., Hawanti, S., Aziez, F., Furqon, M., Siraj, D., Anjarani, S., & Wu, T. (2023). The impact of AI chatbot-based learning on students’ motivation in the English writing classroom. In Innovative technologies and learning. ICITL 2023. Lecture notes in computer science (pp. 542–549). Springer. [Google Scholar] [CrossRef]
Su, Y., Luo, M., & Zhong, C. (2025). To chat or not: Pre-service English teachers’ perceptions of and needs in chatbots’ educational application. SAGE Open, 15(1), 21582440251321853. [Google Scholar] [CrossRef]
Tai, T.-Y., & Chen, H. H.-J. (2024). Navigating elementary EFL speaking skills with generative AI chatbots: Exploring individual and paired interactions. Computers & Education, 220, 105112. [Google Scholar] [CrossRef]
Wiboolyasarin, W., Wiboolyasarin, K., Tiranant, P., Boonyakitanont, P., & Jinowat, N. (2024). Designing chatbots in language classrooms: An empirical investigation from user learning experience. Smart Learning Environments, 11(1), 32. [Google Scholar] [CrossRef]
Wiboolyasarin, W., Wiboolyasarin, K., Tiranant, P., Jinowat, N., & Boonyakitanont, P. (2025). AI-driven chatbots in second language education: A systematic review of their efficacy and pedagogical implications. Ampersand, 14, 100224. [Google Scholar] [CrossRef]
Wu, X., & Li, R. (2024). Unraveling effects of AI chatbots on EFL learners’ language skill development: A meta-analysis. The Asia-Pacific Education Researcher. [Google Scholar] [CrossRef]
Zhang, J. (2025). Integrating chatbot technology into English language learning to enhance student engagement and interactive communication skills. Journal of Computational Methods in Sciences and Engineering, 25(3), 2288–2299. [Google Scholar] [CrossRef]

Figure 1. Pre- and post-test scores (experimental group).

Figure 2. Teacher Observations.

Table 1. English Conversation Skills Assessment Rubric.

Criteria	4—Excellent	3—Good	2—Fair	1—Needs Improvement
Fluency	Speaks smoothly with no noticeable hesitation; ideas flow naturally and coherently.	Minor hesitation; generally smooth delivery with occasional pauses.	Noticeable pauses and hesitation interrupt flow; some difficulty maintaining conversation.	Frequent long pauses; speech is fragmented and complex to follow.
Vocabulary	Uses a wide range of vocabulary with precision and appropriateness.	Adequate vocabulary for the task; some repetition or limited word choice.	Basic vocabulary with limited variation; occasional inappropriate word use.	Minimal vocabulary; frequent inappropriate or incorrect word use.
Grammar	Uses complex and accurate grammatical structures consistently.	Mostly accurate grammar with minor errors that do not affect comprehension.	Frequent grammatical errors may affect understanding.	Grammar errors are frequent and severely hinder comprehension.
Pronunciation	Precise and accurate pronunciation; easily understood with natural intonation.	Generally clear with minor mispronunciations; understandable.	Some pronunciation issues; occasional difficulty for the listener.	Pronunciation impedes understanding; the listener struggles to comprehend.
Contextual Relevance	Responses are relevant, coherent, and appropriate to the topic; they show excellent understanding.	Mostly relevant and appropriate responses; demonstrates understanding of the context.	Occasionally off-topic; responses show limited understanding or relevance.	Frequently off-topic; responses do not show understanding of the conversation’s context.

Table 2. (a) Descriptive Statistics of Pre- and Post-Test Scores (Experimental Group), (b) Descriptive Statistics of Pre- and Post-Test Scores (Control Group), (c) Pre- and Post-Test Comparison for Experimental and Control Groups.

(a)
Measure	Pre-Test	Post-Test	Improvement
Mean	10.40	15.64	5.24
Standard Deviation	1.56	2.11	1.63
Minimum	8.04	12.36	3.17
Maximum	13.05	18.52	7.78
(b)
Measure	Pre-Test	Post-Test	Improvement
Mean	10.73	11.81	1.08
Standard Deviation	1.41	1.65	1.03
Minimum	8.36	9.42	0.67
Maximum	13.05	14.20	1.82
(c)
Group	Pre-Test Mean	Post-Test Mean	Improvement
Experimental	10.40	15.64	5.24
Control	10.73	11.81	1.08

Table 3. Mean Ratings on Student Attitude Items.

Student Attitude Item	Mean Rating
Enjoyment using the chatbot	4.3
Increased interest in English	4.4
Motivation to engage regularly	4.2
Boosted speaking confidence	4.5
Vocabulary improvement	4.3
Grammar/sentence accuracy	4.1
Helpful feedback from the chatbot	4.2
Ease of access and usability	4.5
Willingness to recommend to peers	4.4
Overall effectiveness for speaking skills	4.4
Overall Mean Score	4.35

Table 4. Mean Teacher Ratings (n = 10).

Teacher Rating Criterion	Mean Score
Engagement	4.63
Confidence	4.52
Speaking Spontaneity	4.33
Vocabulary Use	4.23
Sentence Complexity	4.02
Overall Improvement	4.35

Table 5. Chatbot Usage Summary.

Metric	Mean	Min	Max
Sessions per Student	12.3	9	15
Average Minutes per Session	14.8	10.2	19.3
Total Chatbot Minutes	182.5	129.0	247.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alenezi, A.; Alenezi, A. Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students. Educ. Sci. 2025, 15, 1136. https://doi.org/10.3390/educsci15091136

AMA Style

Alenezi A, Alenezi A. Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students. Education Sciences. 2025; 15(9):1136. https://doi.org/10.3390/educsci15091136

Chicago/Turabian Style

Alenezi, Abdullah, and Abdulhameed Alenezi. 2025. "Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students" Education Sciences 15, no. 9: 1136. https://doi.org/10.3390/educsci15091136

APA Style

Alenezi, A., & Alenezi, A. (2025). Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students. Education Sciences, 15(9), 1136. https://doi.org/10.3390/educsci15091136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Effectiveness of Chatbot-Assisted Learning in Enhancing English Conversational Skills Among Secondary School Students

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Participants

3.2. Instruments and Tools

3.2.1. Pre- and Post-Tests

3.2.2. Chatbot Application

3.2.3. Student Attitude Questionnaire

3.2.4. Teacher Observation Rubric

3.2.5. Usage Logs and Demographics

3.3. Procedure

3.4. Data Analysis

4. Results

4.1. Student Attitude Towards Chatbot Use

4.2. Teacher Observations

4.3. Chatbot Usage and Performance Correlation

5. Discussion

5.1. Chatbots and Measurable Language Gains

5.2. Affective Benefits: Confidence, Enjoyment, and Motivation

5.3. Teacher Observations: Validating Performance Shifts

5.4. Chatbot Usage as a Predictor of Performance

5.5. Comparison to Prior Research

6. Implications for Saudi Secondary Education

7. Conclusions

Limitations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Student Attitude Questionnaire Items

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI