Next Article in Journal
How Do Individual-Difference Variables Affect Adolescent Learners’ L2 English Speaking Development? A Microgenetic Study
Previous Article in Journal
Mentor Influence Among Hispanic Engineering Students’ Learning Research Experiences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pre-Service EFL Primary Teachers Adopting GenAI-Powered Game-Based Instruction: A Practicum Intervention

by
Akbota Raimkulova
1,
Kalibek Ybyraimzhanov
1,
Medera Halmatov
2,
Gulmira Mailybayeva
1,* and
Yerlan Khaimuldanov
1
1
Department of Pedagogy and Psychology, Zhetysu University, Taldykorgan 040009, Kazakhstan
2
Department of Child Development, Bilecik Şeyh Edebali University, Bilecik 11000, Turkey
*
Author to whom correspondence should be addressed.
Educ. Sci. 2025, 15(10), 1326; https://doi.org/10.3390/educsci15101326
Submission received: 4 September 2025 / Revised: 1 October 2025 / Accepted: 1 October 2025 / Published: 7 October 2025
(This article belongs to the Section Technology Enhanced Education)

Abstract

The rapid proliferation of generative artificial intelligence (GenAI) in educational settings has created unprecedented opportunities for language instruction, yet empirical evidence regarding its efficacy in primary-level English as a Foreign Language contexts remains scarce, particularly concerning pre-service teachers’ implementation experiences during formative practicum periods. This investigation, conducted in a public school in a non-Anglophone country during the Spring of 2025, examined the impact of GenAI-driven gamified activities on elementary pupils’ English language competencies while exploring novice educators’ professional development trajectories through a mixed-methods quasi-experimental approach with comparison groups. Four third-grade classes (n = 119 individuals aged 8–9) in a public school were assigned to either ChatGPT-mediated voice-interaction games (n = 58) or conventional non-digital activities (n = 61) across six 45 min lessons spanning three weeks, with four female student-teachers serving as instructors during their culminating practicum. Quantitative assessments of grammar, listening comprehension, and pronunciation occurred at baseline, post-intervention, and one-month follow-up intervals, while reflective journals captured instructors’ evolving perceptions. Linear mixed-effects modeling revealed differential outcomes across linguistic domains: pronunciation demonstrated substantial advantages for GenAI-assisted learners at both immediate and delayed assessments, listening comprehension showed moderate benefits with superior overall performance in the experimental condition, while grammar improvements remained statistically equivalent between groups. Thematic analysis uncovered pre-service teachers’ progression from technical preoccupations toward sophisticated pedagogical reconceptualization, identifying connectivity challenges and assessment complexities as primary barriers alongside reduced performance anxiety and individualized pacing as key facilitators. These findings suggest selective efficacy of GenAI across language skills while highlighting the transformative potential and implementation challenges inherent in technology-enhanced elementary language education.

1. Introduction

The landscape of education has undergone a notable transformation in recent years, propelled by the rapid ascent of artificial intelligence (AI) technologies that mimic human-like cognition and interaction (Rafikova & Voronin, 2025; K. Wang et al., 2024). Educational systems worldwide have progressively integrated digital technologies, with language learning emerging as a particularly fertile ground for technological innovation (Mohebbi, 2025). This broader digital transformation has created new paradigms for teaching and learning, fundamentally altering how knowledge is constructed and shared in classroom environments. Once confined to speculative fiction, tools like generative AI (GenAI) now permeate classrooms, offering dynamic ways to reshape how knowledge is imparted and absorbed. This surge mirrors broader societal trends, where GenAI footprint expands from everyday conveniences to sophisticated applications in learning environments (e.g., Perifanou & Economides, 2025; L. Zhang et al., 2025a). Within this context, second language (L2) education (referring to the learning of any language beyond one’s native tongue) has experienced particular disruption, as digital tools offer unprecedented opportunities for authentic language practice and personalized instruction. In the realm of English as a Foreign Language (EFL) instruction, particularly at the primary level, the integration of such technologies holds promise for addressing persistent challenges in L2 skill development (Y. Chen et al., 2025b; Mahmoudi-Dehaki & Nasr-Esfahani, 2025). Yet, as pre-service teachers navigate their initial forays into the profession, the adoption of GenAI-powered tools in game-based formats represents a relatively unexplored pedagogical approach, particularly in primary education contexts, blending innovation with the practical demands of classroom realities despite the increasing presence of GenAI in some educational settings.
This study delves into the implementation of GenAI-driven games within EFL practicum settings, focusing on third-grade pupils in a public school. Over a three-week intervention, pre-service teachers employed voice-mode interactions via the ChatGPT 4o mini app to facilitate individualized, game-like activities aimed at bolstering grammar, listening comprehension, and pronunciation skills. In contrast, comparison groups relied on traditional non-digital games, allowing for a comparative analysis of outcomes. The mixed-methods approach captured not only quantitative improvements in pupil performance but also qualitative insights into the teachers’ evolving perceptions and professional growth during their culminating practicum.

1.1. Problem Statement

Despite the hype surrounding AI’s potential in education, substantial gaps persist in understanding how GenAI can be effectively harnessed in primary EFL contexts, especially through game-based instruction led by novice educators. First, while voice-based AI assistants have shown promise in language learning contexts (Alpysbayeva et al., 2025; Koç & Savaş, 2025; Lee & Jeon, 2024), research specifically examining GenAI-mediated gamification in primary EFL settings remains notably absent. Second, existing research often skews toward higher education or self-reported perceptions (Lee et al., 2025), leaving primary-level interventions underexplored, particularly in non-native settings where language barriers and resource constraints amplify implementation hurdles.
Moreover, the professional development trajectories of pre-service teachers implementing such technologies during practicum periods have received insufficient scholarly attention. This gap is exemplified by a recent study (Gamlem et al., 2025) that surveyed student-teachers about their general attitudes and intentions regarding GenAI. The research found that second-year students perceived GenAI as less useful compared to other cohorts, with the authors hypothesizing that this was because first-year students had not yet undertaken their practicum, while second-year students were encountering teaching complexities for the first time during their six-week mandatory teaching period. However, the study did not explore whether participants implemented GenAI tools during their practicum placement or how they went through that experience.
Particularly susceptible to identity conflicts related to harnessing generative technology, pre-service teachers might be discouraged from investing in learning-to-teach practices facilitated by GenAI (Y. Zhang et al., 2025b). This oversight likely hampers the cultivation of innovative teaching practices that could spur young learner engagement and skill retention, while also neglecting the barriers that might deter widespread adoption in diverse classroom environments.

1.2. Aim and Potential Contribution of This Study

This investigation stands to bridge critical voids by providing empirical evidence on GenAI’s efficacy in fostering specific EFL skills among young learners, while illuminating the professional trajectories of pre-service teachers. The aim of this investigation is to examine the effectiveness of GenAI-powered game-based instruction on English language skills among elementary EFL learners, while simultaneously investigating pre-service teachers’ experiences during their practicum deployment of this pedagogical approach. The primary aim is to examine the effectiveness of GenAI-informed game-based instruction on English language skills among elementary EFL learners while investigating pre-service teachers’ practicum experiences with this approach. The following research questions guided the inquiry:
  • To what extent do GenAI-powered EFL games enhance English grammar skills in non-native primary pupils relative to non-digital games?
  • To what extent do GenAI-powered EFL games enhance English listening comprehension in non-native primary pupils relative to non-digital games?
  • To what extent do GenAI-powered EFL games enhance English pronunciation in non-native primary pupils relative to non-digital games?
  • How do pre-service teachers perceive and experience the integration of GenAI-powered game-based instruction during their practicum, and what professional development outcomes emerge from this experience?
  • What barriers and facilitators influence the implementation of GenAI-powered games in EFL primary classrooms, as reported by pre-service teachers?
By situating the research in authentic practicum contexts with young learners, the study provides empirical evidence where it is most urgently needed while contributing to theoretical understanding of how emerging technologies reshape language education at its most fundamental levels. Moreover, the study’s focus on a non-Western context adds nuance to global discourses on GenAI in education, potentially guiding policy toward equitable technology integration that supports both pupil achievement and educator readiness in an era of accelerating digital transformation.

2. Literature Review

2.1. Theoretical Underpinning

At its core, this study partially draws on constructionist learning theory, which posits that learners construct knowledge most effectively by actively creating and manipulating tangible objects or ideas against a social backdrop (Papert, 1980). In the context of chatbot-informed games, students engage in constructing personalized language interactions, such as voicing prompts to generate dynamic EFL scenarios, thereby internalizing grammar, listening, and pronunciation through iterative experimentation. This aligns with the theory’s emphasis on “learning by making,” where GenAI serves as a digital artifact that pupils tinker with, fostering deeper cognitive connections without direct teacher mediation.
Complementing this is Vygotsky’s zone of proximal development, which describes the gap between what learners can achieve independently and what they accomplish with guidance (Vygotsky, 1987). The intervention leverages the conversational agent as a scaffolding tool, providing immediate, adaptive feedback that extends learners’ capabilities in real-time voice exchanges, bridging the zone of proximal development in ways traditional games might not. For pre-service teachers, this framework informs their shift toward facilitative roles, observing and adjusting support to maximize pupil autonomy.

2.2. Gamification in Education

Educational landscapes have increasingly embraced gamification as a mechanism to spark student involvement, transforming routine lessons into compelling experiences through elements like challenges and rewards embedded in non-entertainment contexts (Babu & Moorthy, 2024).
A particularly promising development in language education is the convergence of gamification with Mobile-Assisted Language Learning (MALL), where learning activities are mediated through mobile devices (Gao & Pan, 2023). MALL leverages mobile phones in language teaching and learning, demonstrating documented benefits for teacher-student and student-student communication, as well as speaking and listening skills development (Shortt et al., 2023). The ubiquity of mobile devices enables learners to develop language comprehension skills rapidly and flexibly. Furthermore, the integration of smartphone applications with gamification concepts that incorporate play and fun elements has brought about significant improvements in pedagogical methods, inspiring and attracting learners (Ishaq et al., 2021).
Gamified learning environments apply game mechanics and dynamics to non-game contexts to enhance learners’ deep learning and critical thinking (Alturaiki et al., 2025; Gaurina et al., 2025) and to guide specific behaviors (Dehghanzadeh et al., 2021). These mechanics pertain to the rules and systems governing gamified components, such as scoring, while dynamics describe the behaviors and interactions arising from these mechanics, including competition and collaboration (Cheng et al., 2025). Research has established gamification as one of the most enjoyable, engaging, and effective methods for learning English as a foreign language (Far & Taghizadeh, 2024). Various components embedded in gamified environments, such as the quest for achievement, can increase learners’ motivation and interest in learning English (Chan & Lo, 2024) while reducing anxiety and fear of speaking a foreign language in front of others (Ai et al., 2025; S. Zhou, 2024). Scientific evidence indicates that L2 acquisition through games is more effective than non-game learning environments (Almelhes, 2024; G. Liu et al., 2024), a finding that extends to AI-powered gamified activities (Jiang et al., 2025).
This approach extends to serious games tailored for school curricula, which prioritize instructional objectives by mimicking authentic situations and incorporating mechanisms for interaction, evaluation, and critical thinking (Evmenova et al., 2025). Such strategies have proven effective in heightening participation and outcomes, particularly when fused with emerging technologies that amplify their adaptability. The integration of AI with gamified mobile learning represents a particularly powerful synthesis, as AI can enhance gamified learning experiences by offering dynamic, personalized adaptations that continually challenge students and promote skill development (Niño et al., 2025).

2.3. GenAI in Educational Contexts

Artificial intelligence encompasses systems that emulate human reasoning, learning, and task execution via neural architectures, reshaping numerous domains (Kalota, 2024). Public fascination with AI has intensified, driven by its pervasive influence across industries and routines (Pahi et al., 2024; Pham et al., 2025). In classrooms, AI’s integration signifies a leap toward smarter, tailored instruction, enabling educators to innovate models that emphasize customization and efficiency (M. Zhou & Peng, 2025). Opportunities abound in crafting individualized paths, boosting involvement, and equipping instructors with resources to meet diverse needs (Niño et al., 2025).
The practical implementation of GenAI in educational settings reveals both its transformative potential and the challenges educators face in harnessing this technology effectively. Educators often contend with hurdles like accommodating varied learner profiles, sourcing apt resources, delivering precise critiques, and nurturing conducive atmospheres; generative AI emerges as a potent ally in surmounting these (Mulyani et al., 2025). Defined as advanced algorithms that produce diverse outputs from inputs using vast pre-trained models (T. Hsu & Hsu, 2025), GenAI has exploded in prominence since ChatGPT’s debut, spawning myriad educational applications (Chiu, 2024; Honig et al., 2024; Monzon & Hays, 2025; J. Wu et al., 2025b). To harness its full potential safely, experts advocate pairing GenAI with human oversight to refine outputs and contextualize them appropriately (Giannakos et al., 2025).
The synergy between gamification and GenAI stands out as a dynamic force in modern pedagogy. Gamification infuses learning with competitive and rewarding features to elevate interest and results (Gao et al., 2025; Gu & Yan, 2025), while GenAI refines this by delivering bespoke experiences and insights, enabling progression aligned with individual paces (Al-Rousan et al., 2025). Feedback from GenAI often rivals human equivalents in encouragement and efficacy (Z. Zhang et al., 2025c), positioning these tools as collaborative partners that encourage hands-on exploration and deeper comprehension (Salinas-Navarro et al., 2024). Chatbots offer real-time feedback on grammar, pronunciation, and fluency, helping learners recognize and correct errors during practice, which promotes self-awareness and active learning (Ding & Yusof, 2025).
AI’s reach spans disciplines from computing to linguistics and numeracy (Tan et al., 2025), with evidence linking it to heightened engagement and success (Dahri et al., 2025). Syntheses reveal GenAI’s robust effects on achievement, though more pronounced in tertiary than primary levels, and stronger via textual interfaces than multimedia (X. Liu et al., 2025). Calls persist for rigorous, multifaceted studies to unpack GenAI’s enduring effects, especially in linguistic arenas (Belkina et al., 2025; Law, 2024).

2.4. GenAI in Language Learning

In language education, GenAI’s ascent has been meteoric, with reviews noting a predominance of studies on English in university settings, relying heavily on subjective accounts (Lee et al., 2025). Central to language teaching is fostering grammatical proficiency in communication, a domain where chatbots excel by replicating interactive dialogues (Chapelle, 2025). Enthusiasm for GenAI in L2 contexts stems from its utility in ideation, instruction, and evaluation across skills (Y. Wang et al., 2025a; H. Wu et al., 2025a), with aggregated analyses confirming moderate-to-strong benefits for acquisition (M. Li et al., 2025).
Speaking proficiency poses unique difficulties due to scarce conversational practice; GenAI platforms mitigate this by enabling fluid exchanges, including refinements and negotiations vital for customized growth (H. Wu & Liu, 2025). Nonetheless, drawbacks include potentially excessive or off-target responses, alongside infrastructural limitations (Cong-Lem et al., 2025; Kohnke et al., 2025). Investigations disproportionately target writing (J. Chen et al., 2025a; Guan et al., 2025; Kessler et al., 2025; Mi et al., 2025; Yang et al., 2025), underscoring the need to probe areas like oral and auditory competencies (Y. Wang et al., 2025b).
Recent investigations have explored various voice-based AI assistants in language learning contexts, including Google Assistant (H. H. J. Chen et al., 2023), Amazon Alexa (Elmaadaway et al., 2025; H. L. Hsu et al., 2023), and specialized applications like Speeko (Rad, 2024). These studies consistently report improvements in speaking and listening skills, with learners perceiving voice-based AI as inspiring tools for language practice. H. H. J. Chen et al. (2023) found that EFL college students considered Google Assistant useful for improving speaking and listening skills, appreciating its natural pronunciation and comprehensible utterances. Similarly, H. L. Hsu et al. (2023) demonstrated that interactions with Amazon Echo Show significantly improved L2 learners’ speaking skills while reducing speaking anxieties. Elmaadaway et al. (2025) reported enhanced oral reading fluency and reading comprehension among fourth graders using Alexa, suggesting the potential of voice chatbots for younger learners.
In sum, the literature portrays GenAI as a promising yet under-vetted catalyst for language learning, particularly at the primary level and for skills beyond writing. It also depicts pre-service teachers as pivotal mediators whose readiness can make or break classroom integration. The present study responds to those dual gaps through a mixed-methods quasi-experiment that scrutinizes language gains and captures teachers’ reflective voices.

3. Materials and Methods

This study employed a mixed-methods quasi-experimental design with a control group to examine the effects of GenAI-powered game-based instruction on English language skills among third-grade EFL learners, alongside qualitative insights into pre-service teachers’ experiences. The design incorporated pre-test, post-test, and follow-up measurements for quantitative outcomes, supplemented by reflective journals for qualitative data. Four intact third-grade classes from a single public school were assigned to either the intervention or comparison conditions based on practicum placements of pre-service teachers. Assignment occurred non-randomly due to the practicum structure, with two pre-service teachers implementing GenAI activities and two delivering conventional instruction. Classes were selected to ensure balance in average English academic performance (mediocre level), as indicated by school-reported grades.
The intervention spanned three weeks in Spring 2025, involving six 45 min lessons (twice weekly) within a 12-week culminating practicum for fourth-year Bachelor students majoring in primary education with EFL focus. While brief interventions may raise concerns about adequate exposure for language acquisition, research suggests that focused practice periods can yield measurable improvements in specific language skills, particularly in controlled experimental contexts (e.g., Tavakoli et al., 2016). Moreover, the six-lesson framework aligns with typical AI-driven intervention durations in primary school studies (Arkoumanis et al., 2025), where practical constraints often limit extended implementation periods (e.g., Hori et al., 2025). Quantitative measures targeted grammar, listening comprehension, and pronunciation, while qualitative data explored pre-service teachers’ perceptions and professional development. Ethical approval was obtained from the university institutional review board, and informed consent was secured from all pupils and their parents/guardians, mandatory for participation in GenAI interactions including voice exchanges. Only entire classrooms meeting this consent criterion were recruited.

3.1. Participants

The sample comprised 119 third-grade pupils (aged 8–9 years, non-native English speakers at a mediocre proficiency level) from four classes in a public primary school. These pupils were distributed across the ChatGPT group (two classes, n = 30 and n = 28, totaling 58) and the conventional group (two classes, n = 31 and n = 30, totaling 61). Table 1 compares participants’ characteristics.
Power calculations for a medium effect size (d = 0.5), alpha = 0.05, and power = 0.80 suggested a minimum of 64 participants per group; the actual samples (n = 58 for ChatGPT, n = 61 for non-digital) approximated this threshold closely. All pupils studied English as a foreign language using a standardized textbook. Four pre-service teachers (all female, aged 21–23 years) served as instructors: two for the ChatGPT group and two for the conventional group. The student-teachers assigned to the ChatGPT classes completed a two-week face-to-face training module prior to the practicum, delivered by hybrid educators. This module covered ethical aspects of GenAI use, prompt engineering techniques, the specific game-based activities to administer, supervision strategies, troubleshooting for instructional and technical challenges, and hands-on piloting of GenAI-integrated scenarios.

3.2. Experimental Procedures

The intervention occurred during the final three weeks of the 12-week practicum, with lessons held twice weekly for a total of six sessions per class. All lessons followed the standardized English textbook curriculum, incorporating blackboard use, visual aids, audio tapes, classroom and home exercises, and teacher explanations. In both groups, approximately 15 min of each 45 min lesson was dedicated to game-based activities to reinforce the lesson’s English topic. The key distinction lay in the game format: the conventional group used non-digital games (e.g., board games, flashcards, or group role-plays facilitated by the teacher), while the ChatGPT group employed voice-mode interactions via the mobile ChatGPT app on pupils’ personal smartphones. Pre-service teachers instructed pupils to install the app after the pre-test and before the intervention started; the research team supplied additional smartphones when needed to ensure all participants could engage.
Activities in the ChatGPT group were individual, with pupils using earphones equipped with microphones for private voice exchanges. Mobile internet was provided via portable Wi-Fi hotspots distributed by pre-service teachers at the start of the GenAI segment, ensuring stable connectivity for the app. Pre-service teachers supervised the process in real time, circulating the classroom to assist with technical issues or clarification. Prior to the study, researchers developed concise one-shot prompts in Russian, piloted them successfully with third graders outside the final sample, and distributed them to ChatGPT group pupils on paper slips during lessons. Pupils initiated the app’s voice mode and read these prompts aloud into the microphones, which then generated game-based interactions. Pre-service teachers’ instructions and ChatGPT’s responses occurred in the pupils’ L1 (Russian), except for target English phrases, to facilitate comprehension. These interactions aimed to bolster listening comprehension, speaking, and grammar through communicative sequences, encouraging pupils to practice pronunciation in context rather than isolated words. Although voice assistants cannot directly assess pronunciation due to speech-to-text conversion limitations (e.g., misinterpreting words like “sue” as “see you,” leading to inaccurate feedback), the activities promoted pronunciation improvement by providing modeled English phrases, inciting repetitions in game scenarios, and supplying feedback on grammar and relevance. This indirect approach was supposed to leverage GenAI’s strengths in generating scripts and evaluating responses, fostering natural speaking practice despite the tools’ unsuitability for direct pronunciation evaluation. Table 2 outlines the English topics and corresponding game-based activities for the six lessons in both groups. A sample activity is detailed in Appendix A.

3.3. Data Collection

Drawing from average class English proficiency level, an assessment battery was designed to gauge response variables. Data collection occurred in the sampled school during out-of-classroom time, administered by research assistants (in-service English teachers from another school) using paper-pencil formats. Assessments took place one week before the intervention (pre-test), one week after (post-test), and one week in the month following the post-test (follow-up). Instructions for all tests were translated into Russian by a certified translator to ensure accessibility.
Grammar was assessed using a method adapted from Busse et al. (2021). Specifically, participants translated six short question-answer pairs into English within 20 min. Scoring evaluated word order, auxiliary verb ‘to do’ usage, and verb conjugation, with a maximum of 18 points.
Listening comprehension examination was sourced from the A2 Flyers exams (Cambridge, 2018), specifically invented for 6–12 years old learners and aligned with the Common European Framework of Reference (CEFR) at A2 level. The 30 min test featured everyday scenarios on clothing and activities, with four tasks (20 items total) involving audio recordings played by examiners. Pupils selected options, matched illustrations, or answered open questions, earning one point per correct response (range: 0–20). The original fifth painting task was omitted.
Pronunciation involved pupils reading aloud six sentences (e.g., “Several children are playing in the park”) within 10 min. Outputs were audio-recorded and scored using a rubric from Hsieh et al. (2023), on a 1–5 scale per sentence (1: many errors in pronunciation and intonation; 5: correct and natural), yielding a maximum of 30 points. Two independent evaluators (university English teachers blind to group assignment and study objective) scored performances across occasions, achieving strong inter-rater reliability (Cohen’s kappa = 0.88). Discrepancies were resolved by averaging scores. Pronunciation was included as an outcome despite GenAI’s limitations in direct evaluation, as the activities encouraged contextual practice through modeled phrases and communicative repetition, potentially yielding indirect benefits observable in independent assessments.
The two pre-service teachers from the experimental condition maintained reflective journals throughout the intervention and for one month afterward, responding to six prompts adapted from Fitzpatrick and Leavy (2025) (see Appendix B). These entries captured perceptions of GenAI integration and professional growth.

3.4. Quantitative Analysis

The entire analytical sequence was executed using R software version 4.5.1. To evaluate data distribution before inferential analyses, descriptive statistics were computed for each response variable (grammar, listening comprehension, and pronunciation), including mean, standard deviation (SD), skewness, and kurtosis. Normality of model residuals was assessed using Q-Q plots and Shapiro–Wilk tests for each linear mixed-effects model (LMM). Homogeneity of variances was evaluated using Levene’s test applied to both raw data and model residuals (to check LMM assumptions) across groups for each variable and time point. Linearity was examined through scatterplots of standardized residuals against predicted values from the LMMs. Multicollinearity was tested using variance inflation factors (VIFs) based on linear models fitted for each outcome variable, with a conventional cut-off of 10.
To measure changes across the three measurement points (pre-test, post-test, and follow-up) and intergroup differences, LMMs were applied for each dependent variable. The use of LMMs was warranted due to the nested structure of the data (repeated measures within students) and their robustness to unbalanced designs and missing data points. Participants were treated as random effects with random intercepts, while time, group, and their interaction were included as fixed effects. Due to the small number of classes (n = 4), a class-level random effect was not included in the models. This decision was based on the widely cited recommendation that a minimum of 5 to 10 clusters are needed for random effects to provide reliable variance estimates (Oberpriller et al., 2022). For each parameter, estimates, standard errors (SE), t-values, p-values, marginal R2 (variance explained by fixed effects), and conditional R2 (variance explained by both fixed and random effects) were computed. Cohen’s d was computed as a criterion for effect size magnitude.
Post hoc pairwise intragroup comparisons were conducted using estimated marginal means (EMMs), focusing solely on pre-test versus follow-up contrasts, alongside intergroup post hoc comparisons. The significance level for all statistical tests was set at 0.05. All results were compiled into an Excel workbook for reporting, and line plots were generated to illustrate trends in outcomes.

3.5. Qualitative Analysis

Thematic analysis of de-identified reflective journal entries was performed by two linguistics graduates experienced in text coding. The corpus comprised 28 journal entries (seven entries per teacher across the intervention and follow-up period), totaling approximately 10,000 words. An inductive approach was applied using the constant comparative method: initial open coding identified patterns, followed by axial coding to refine categories, and selective coding to identify overarching themes and sub-categories (Korseberg & Stalheim, 2025). The analytic process involved three iterative cycles: first-cycle coding of individual entries, second-cycle pattern identification across entries, and third-cycle theme refinement through coder dialogue. Analysts independently coded a subset of entries, met once to compare emerging categories and resolve discrepancies through discussion, achieving inter-coder reliability of Cohen’s kappa = 0.76. Final themes were derived from consensus, focusing on pre-service teachers’ perceptions of GenAI integration, challenges, and professional development outcomes. Exemplar excerpts were selected to illustrate each theme based on their representativeness and clarity in conveying the identified patterns.

4. Quantitative Results

Assumption checks revealed that skewness values ranged from −0.819 to 1.226, while kurtosis values ranged from −1.496 to 0.15 across all variables and time points. Shapiro–Wilk tests on the residuals yielded p-values of 0.001 for all three outcomes. Moreover, Q-Q plots corroborated the non-normality of model residuals. However, these findings are unlikely to compromise consequent inferential analyses as LMM estimates are generally robust to departures from distributional assumptions (Schielzeth et al., 2020). Visual inspection of scatterplots identified a random spread of points around the zero horizontal line, confirming that the linearity assumption was met. Levene’s tests indicated homogeneity of variances across groups for all variables and time points (p > 0.05). Multicollinearity was not a concern, as all VIF values were below the conventional threshold of 10, ranging from 3.0 to 8.3.
The near-zero or modest conditional R2 values (ranging from 9.4% to 20.7%) observed across all three models in Table 3 indicate that the models explain only a small proportion of variance in the data. This finding reflects the complex, multifaceted nature of language learning in authentic classroom settings, where numerous unmeasured factors—including individual differences in aptitude, home language exposure, and prior learning experiences—contribute to performance variability. The low R2 values are also possibly due to the relatively small number of clusters (only four classes). Alternatively, this finding may indicate that the experimental manipulations (instructional approach and temporal progression) captured the primary sources of systematic variation in student performance, with limited additional unexplained individual heterogeneity. Scandola and Tidoni (2024) recommend that if the conditional R2 is <0.6, researchers should perform pairwise comparisons (e.g., pairwise regressions) on aggregated data, not relying exclusively on estimated-marginal-means techniques. Therefore, between-group pairwise regression for each variable of interest was computed.
Descriptive statistics indicated that for grammar skills, the ChatGPT group had mean scores of 9.22 (SD = 3.41) at baseline, which increased to 11.98 (SD = 3.77) at post-test and further to 12.53 (SD = 4.11) at one-month follow-up. In comparison, the non-digital group exhibited mean scores of 9.92 (SD = 3.40) at pre-test, 11.00 (SD = 3.67) at post-intervention evaluation, and 11.30 (SD = 3.85) at the delayed assessment. Within-group pairwise comparisons evinced that both cohorts had a statistically discernible improvement from pre-test to follow-up (p = 0.001). The LMM analysis (Table 1) revealed that both group × time interactions were non-significant: pre-test to post-test (p = 0.995) and baseline to follow-up (p = 0.58). This indicates that the improvement trajectories in grammar skills were statistically equivalent between the experimental condition students and those who received conventionally gamified instruction. As illustrated in Figure 1, both groups showed betterment over time, with the AI-supported learners demonstrating a steeper increase, particularly from baseline to post-test. Nonetheless, the overall between-group comparison was non-significant (B = 0.509, SE = 0.397, t = 1.284, p = 0.202, d = 0.235), indicating no substantial difference in grammar performance across the entire study period.
Within-group post hoc comparisons of listening comprehension scores corroborate that both groups gained significantly (p = 0.001). The LMM analysis found a statistically observable group×time (pre-test to post-test) interaction (p = 0.015), with a positive estimate (1.525) indicating greater improvement in the ChatGPT group from baseline to the conclusion of game-based sessions. However, the group × time (pre-test to follow-up) interaction was non-significant (p = 0.436). In support of this result, Figure 2 depicts that both groups paced at a similar rate all the way. The GenAI-assisted students had mean scores of 7.98 (SD = 2.36) at entry measurement, which rose to 9.0 (SD = 2.86) at post-intervention and further to 9.52 (SD = 2.13) at the retention assessment. The comparison group had mean scores of 7.03 (SD = 2.28) at pre-test, 8.02 (SD = 2.41) at post-assessment, and 8.44 (SD = 2.24) at retention measurement. On the other hand, the overall between-group comparison was significant (B = 1.008, SE = 0.245, t = 4.116, p = 0.001, d = 0.755), suggesting that the chatbot-assisted participants achieved superior performance compared to the conventional group when considering aggregate performance across the study period.
Regarding pronunciation skills, within-group pairwise comparisons found significant increments over the study span in both cohorts (p = 0.001). Figure 3 displays that chatbot users had greater initial gain from pre-test to post-test and also surpassed the conventional group at the retention assessment. Specifically, there was an average score of 19.28 (SD = 3.43) in the GenAI group at pre-evaluation, which shifted to 22.97 (SD = 3.21) at post-test, and 23.47 (SD = 3.94) at the delayed measurement. The comparison group exhibited mean scores of 18.98 (SD = 3.18) at baseline, 20.46 (SD = 3.19) at post-measurement, and 21.02 (SD = 2.83) at follow-up.
The LMM analysis detected a significant group×time (pre-test to post-test) interaction (p = 0.01), with a positive estimate (2.241) indicating greater improvement in the experimental group during the gamification period. However, the group×time (pre-test to follow-up) interaction was non-significant (p = 0.257). The between-group post hoc analysis contradicted this finding, showing no significant difference at baseline (B = 0.292, SE = 0.606, t = 0.482, p = 0.63, d = 0.088) but highly significant differences in favor of the ChatGPT group at both post-test (B = 2.507, SE = 0.587, t = 4.273, p = 0.001, d = 0.784) and follow-up (B = 2.449, SE = 0.627, t = 3.906, p = 0.001). The overall between-group comparison was highly significant (B = 1.749, SE = 0.343, t = 5.094, p = 0.001, d = 0.934), likely confirming the advantage of GenAI-powered instruction for pronunciation skill development.

5. Qualitative Results

5.1. Perceptions and Professional Development (RQ4)

5.1.1. Initial Motivations and Expectations

Pre-service teachers initially emphasized practical motivations for incorporating AI gamification. Ann’s early entries focused on engagement: “I want to use AI games because traditional activities sometimes feel repetitive for the kids. GenAI can create different scenarios each time, so the spy training game will not become boring.” Helen approached AI integration from a differentiation perspective: “My main idea is using AI to give each child individual attention during speaking practice, something I struggle to provide with 30 learners.”
Their pre-intervention reflections revealed contrasting concerns. Ann worried primarily about technical failures: “What if the app crashes mid-lesson? I do not have backup plans for every possible glitch.” Helen expressed deeper pedagogical anxiety: “I am concerned about losing control over the learning content. With traditional games, I know exactly what language will emerge. With AI, anything could happen.”

5.1.2. Learning About GenAI as Learners

Pre-service teachers discovered unexpected aspects of AI interaction through their own preparation. Ann reflected on her experience piloting the prompts: “When I tested the spy game myself, ChatGPT’s responses varied based on what I said, though sometimes it misunderstood or gave generic feedback.” This helped her realize that AI interactions were less predictable than she had expected.
Helen’s learning focused on the unpredictability she initially feared: “I learned that GenAI’s variability is not a bug, it is a feature. The same prompt generated completely different family scenarios for me each time I tested it. This could actually help prevent children from memorizing responses.”

5.1.3. Professional Growth and Skill Development

By mid-intervention, both pre-service teachers identified specific changes in their teaching approaches. Ann noted a shift in her monitoring strategies: “I have learned to circulate differently. Instead of checking if students are following instructions, I am listening for whether they are actually communicating. This has made me more attentive to individual progress.” She linked this change to a specific moment: “During lesson three, I realized I was still thinking like a traditional teacher, trying to control every interaction. But AI interactions taught me that learning can happen without my constant mediation.”
Helen’s growth centered on embracing uncertainty: “My biggest change is accepting that I do not need to predict every learning outcome. When children started asking ChatGPT follow-up questions about their hobbies, I initially wanted to redirect them. Then I realized they were using English spontaneously.” She identified lesson four as pivotal: “A pupil’s conversation with AI went completely off-script, but she was practicing past tense naturally. That is when I understood that sometimes the best teaching happens when we step back.”

5.1.4. Evolving Understanding of AI Role

Their final reflections revealed developing pedagogical insights regarding AI’s educational function. Ann articulated three key learnings: “First, AI does not replace teacher judgment. It creates space for it. Second, children respond differently to AI feedback than human feedback. Third, individual practice time is more valuable than I realized.” Her remaining questions focused on scalability: “How can we integrate this approach beyond short interventions? What other subjects could benefit from AI gamification?”
Helen’s concluding insights emphasized pedagogical transformation: “I learned that AI can be a patient practice partner, that students take different risks with technology than with humans, and that my role shifted from information deliverer to learning facilitator.” Her persistent question addressed assessment: “How do we evaluate learning that happens in private AI conversations that we cannot fully monitor?”

5.2. Barriers and Facilitators in Implementation (RQ5)

5.2.1. Technical and Logistical Challenges

Connectivity issues were identified as the primary barrier. Ann’s entries described practical solutions: “Portable hotspots helped, but when internet stuttered during lesson two, I learned to have children work in pairs temporarily.” Helen focused on device management: “Some students struggled with earphone adjustment, others spoke too softly for voice recognition. I started the AI segment with a ‘tech check’ routine.”
The pre-determined prompts generated mixed responses. Ann appreciated their structure: “Having tested prompts removed my anxiety about pupils receiving inappropriate content.” Helen found them occasionally limiting: “By lesson five, some children wanted to extend conversations beyond the prompts. I am still uncertain whether allowing this supports or distracts from learning objectives.”

5.2.2. Pedagogical Facilitators and Obstacles

AI’s consistent patience facilitated learning differently than human interaction. Ann observed: “ChatGPT never gets frustrated when children repeat the same mistake. This allowed more natural error correction.” Helen appreciated the reduced social pressure: “Students attempted pronunciation they would not try in front of classmates. The private nature of earphone conversations created a safe practice space.”
Language scaffolding emerged as both facilitator and concern. Ann valued the bilingual capability: “Russian-language instructions helped students understand game contexts while focusing cognitive energy on English production.” However, Helen worried about dependency: “I noticed some children immediately asking for Russian translations instead of attempting to understand English first. This tension between support and challenge remains unresolved.”

5.2.3. Classroom Dynamics and Management

AI interactions altered the assessment strategies of both pre-service teachers. Ann found it difficult to evaluate learning that occurred privately: “I could not hear everything children said to ChatGPT, so I had to rely on their willingness to share interesting exchanges afterward.” Helen struggled with timing: “Traditional games have natural endpoints, but AI conversations could potentially continue indefinitely. I had to learn when to signal wrap-up without cutting off productive practice.”
Engagement patterns were identified as key facilitators by both pre-service teachers. Ann noted sustained attention: “Unlike traditional games where enthusiasm peaks early, AI interactions maintained consistent child interest throughout the intervention.” Helen emphasized individual agency: “Students seemed more invested in conversations they initiated with AI than in teacher-directed activities.”

5.2.4. Future Implementation

Both pre-service teachers closed their journals with recognizably “unfinished” thinking—less a verdict, more a set of working hypotheses. For Ann, the focus remained on balance and feasibility. She framed GenAI gamification as a worthwhile but resource-sensitive tool: “I would use AI games again, but selectively, maybe once a month as a special event.” Her choice of “special” hinted not at gimmickry but at preserving novelty so that the practice retained its pull without overburdening lesson planning. She was candid about the hidden labor: aligning AI output with curricular aims, testing prompts, and managing devices all required time that competed with other professional priorities. In her mind, frequency would have to be low enough to keep preparation tolerable yet high enough that pupils did not need to re-learn the routines from scratch every time.
Helen’s closing reflections turned that caution up several notches: “Before I expand this, I need more insight into what really happens in those conversations.” She was acutely aware that what made the intervention engaging for children (its semi-private, unpredictable exchanges) was also what made it hard to document and assess in pedagogically meaningful ways. Helen saw this less as an insurmountable problem and more as a warning against rushing into large-scale adoption without better observation and evaluation tools.
Their differences here were sharpened by their respective priorities: Ann’s lens was operational and pragmatic, measuring AI’s place amid the many moving cogs of classroom life; Helen’s was diagnostic, treating AI activities as experiments whose side-effects had yet to be charted. If Ann was inclined to place AI gamification on the timetable in limited bursts, Helen was still conducting her internal cost–benefit analysis, where some cells in the ledger remained stubbornly blank. Beneath these divergences lay a shared recognition that the intervention had opened a door they could not entirely close. Both acknowledged that AI had altered child behavior in ways that traditional tasks rarely did. Yet the same characteristics that generated these benefits also introduced complexities: harder-to-monitor learning, potential overreliance on scaffolds, and performance patterns that did not fit neatly into lesson objectives. The teachers’ journals thus ended not with slogans or self-congratulation, but with a kind of professionally sober curiosity: an acceptance that the technology was neither a panacea nor a passing fad, and that working out its optimal role would require both patience and deliberate experimentation.

6. Discussion

This study was designed to ascertain the effectiveness of a GenAI-powered gamified intervention on specific English language competencies among primary school pupils, while simultaneously charting the professional journey of pre-service teachers tasked with its deployment. The investigation sought to provide a granular view of how this emerging technology performs against conventional methods and how novice educators adapt to its pedagogical demands. The findings illuminate a complex interplay of technological affordances, instructional design, and teacher development, offering a nuanced perspective on the integration of generative tools into foundational language education.
The results provide distinct answers to the guiding research questions. For grammar skills (RQ1), the intervention yielded no statistically significant advantage for the GenAI group over the non-digital instruction cohort, although both groups demonstrated improvement over time. This null finding may reflect the fundamental nature of grammar learning, which typically requires explicit instruction (Roehr-Brackin, 2024), structured assistance (Chia & Xavier, 2025), and metalinguistic awareness (i.e., regarding language as an object for reflection, moving beyond its instrumental use) (Delgado-Garza & Mayo, 2025) that may not be optimally supported through conversational AI interactions. The game-based activities in this study emphasized communicative practice and implicit learning through interaction, approaches that may be more conducive to developing fluency and comprehension (Fathi et al., 2025; Goh & Aryadoust, 2025) than grammatical accuracy. Additionally, the three-week intervention period may have been insufficient for measurable grammatical restructuring, as grammar acquisition often requires extended exposure and practice cycles (Chung & Révész, 2021).
In contrast, for listening comprehension (RQ2) and pronunciation (RQ3), the AI-guided activities conferred a substantial and statistically significant benefit, with the experimental group consistently outperforming the comparison group. Regarding pre-service teacher experiences (RQ4), the exploratory qualitative findings revealed an adequate professional evolution, as instructors transitioned from initial technical anxieties to a sophisticated, facilitator-centric pedagogical stance. Finally, the major barriers to implementation (RQ5) were identified as logistical hurdles like connectivity and device management, while key facilitators included the virtual assistant’s non-judgmental patience and the engaging, low-pressure environment it fostered for pupils.
Situating the quantitative outcomes within the existing literature reveals both parallels and divergences. The pronounced improvement in pronunciation skills aligns with the findings of Tai and Chen (2024), who reported that individual interactions with a GenAI chatbot significantly bolstered the speaking skills of elementary EFL learners compared to conventional instruction. Although their focus was on broader speaking proficiency, the current study corroborates the principle that individualized, AI-mediated oral practice can yield superior results. However, a direct comparison of the grammar and listening comprehension findings with prior research is challenging, as the field is in its nascent stages. The application of chatbot-led games to target these specific skills in primary EFL populations represents a largely uncharted research frontier, leaving a dearth of directly comparable studies. It is noteworthy, though, that a comprehensive meta-analysis of L2 listening instruction conducted before the GenAI boom found only a small effect size for conventional teaching methods (Chang, 2024), which makes the gains observed in the treatment group in this study all the more remarkable.
The qualitative insights into pre-service teacher development resonate strongly with recent explorations of generative conversational assistants in teacher training. The journey from viewing GenAI as a simple tool to leveraging it for creative and critical pedagogical purposes mirrors the trajectory described by Huang et al. (2025), whose participants similarly evolved from basic use to nuanced engagement, upgrading their prompt refinement and critical evaluation. Likewise, the themes of personal and professional growth, overcoming initial skepticism, and reflecting on future applications, as identified by Wood and Moss (2024), were clearly evident in the reflective journals of the pre-service teachers in this study. The present findings extend this work by embedding the teacher development process within a live, high-stakes practicum intervention, demonstrating how theoretical training translates into dynamic, real-world pedagogical adaptation.
Several mechanisms may explain these multifaceted findings. The significant gains in listening and pronunciation likely stem from the core architecture of the intervention: individualized, real-time, voice-based interaction. The generative platform provided a tireless and non-judgmental conversational partner, offering pupils repeated exposure to modeled English and compelling them to produce comprehensible speech in a low-stakes environment, thereby reducing the affective filter that often inhibits oral practice in a group setting. The lack of a significant intergroup difference in terms of grammar may suggest that the three-week duration was insufficient to effect measurable change in this more complex linguistic domain. For the student-teachers, the crucible of the practicum—where they were forced to troubleshoot technical glitches and adapt pedagogical strategies on the fly—likely accelerated their professional growth far more effectively than a purely theoretical training module could have. This hands-on, problem-solving context reportedly transformed them from passive recipients of knowledge into active constructors of their own pedagogical practice with technology.
The contributions of this study are twofold. For research, it offers one of the first mixed-methods empirical investigations into the use of chatbot-powered games in a primary EFL classroom, providing crucial baseline data on its effects on grammar, listening, and pronunciation. It also enriches the literature on teacher education by documenting the authentic, in situ professional development of pre-service teachers grappling with cutting-edge technology. For practice, the findings provide a tangible model for embedding GenAI into language curricula, highlighting the importance of structured prompts, robust technical support, and the cultivation of a facilitative teaching role. The study underscores that effective GenAI implementation is not merely a matter of deploying a tool, but of thoughtfully designing an entire pedagogical ecosystem around it.

6.1. Limitations

While this study offers valuable insights, its design incorporates several nuances that warrant careful consideration. The quasi-experimental nature, with non-random assignment of classes to conditions, introduces the possibility of pre-existing group differences that were not captured by the baseline measures, despite efforts to ensure comparable academic performance.
It should be acknowledged that both groups were slightly underpowered relative to the calculated requirement (n = 64), which may have reduced the ability to detect smaller effects and could explain the non-significant between-group difference in grammar scores. However, this represents a minor deviation that is unlikely to fundamentally compromise the study’s ability to detect educationally meaningful effects. Reassuringly, recent methodological work cautions against equating slightly underpowered with scientifically uninformative: First, credible evidence can still emerge from modestly powered studies. Lengersdorff and Lamm (2025) show that once a result is statistically significant, low power by itself is not a sufficient reason to downgrade its credibility; under both frequentist and Bayesian perspectives, such findings still contribute meaningful support for the tested hypothesis. Second, replicability is driven more by effect size than by sample size. In a database of 307 original–replication pairs, X. Li et al. (2024) found that the original sample size was essentially unrelated to replication success, whereas the magnitude of the effect was a robust positive predictor. Third, Nakagawa et al. (2024) advocate the idea that rigorous research design contributes more to the credibility and generalizability of results than merely achieving a nominal 80% statistical power target.
Furthermore, the design may be susceptible to the Hawthorne effect (Gunnarsson et al., 2025), where the pupils’ heightened engagement and performance in the experimental group could be partially attributable to their awareness of being participants in a research study and receiving increased attention from instructors, rather than being a direct consequence of the pedagogical merits of the intervention itself. Additionally, the grammar assessment method (translation tasks) may not have optimally aligned with the communicative, game-based pedagogical approach employed in the intervention, potentially obscuring genuine improvements in grammatical competence expressed through oral communication. On the other hand, an intervention among university students (Hung et al., 2025) demonstrated that a digital game-based learning approach exerted considerable betterments in both simple and difficult translation techniques. The discrepancy between our findings and the literature may reflect methodological considerations rather than true intervention ineffectiveness. Among all language-specific skills, grammar is seldom addressed in game-based language learning, despite the fact that learning grammar is one of the most challenging tasks for EFL students (Taye & Mengesha, 2024).
Another layer of complexity is the potential for a teacher effect; with only two pre-service teachers per condition, individual teaching styles and their proficiency in managing their respective instructional modes could have influenced the outcomes independently of the intervention itself. Moreover, the study’s implementation in a single school within a specific cultural and educational context limits the generalizability of findings to other primary EFL settings. Finally, the reliance on a single GenAI platform (ChatGPT) means the findings are contingent on its specific functionalities and limitations, and may not be directly applicable to other GenAI tools with different interaction designs or capabilities.

6.2. Suggestions for Practice and Further Research

Drawing from the findings and limitations, several avenues for practice and future inquiry emerge. For educators and teacher training programs, it is crucial to move beyond basic technical instruction and focus on developing pre-service teachers’ pedagogical design capacity with GenAI. This includes training in prompt engineering, strategies for scaffolding student interactions, and methods for assessing learning that occurs in private, AI-mediated conversations. Creating repositories of curriculum-aligned, pre-piloted prompts could also lower the barrier to entry for novice teachers. The reflective journaling component proved to be a powerful tool for professional growth and should be infused into practicum experiences involving technology adoption.
Future research should aim to build upon this study’s foundation by addressing its limitations. Longitudinal studies spanning a full academic year are needed to assess the durability of learning gains and to track whether the initial novelty effect subsides over time. Researchers should also explore the differential impacts of various generative platforms and interaction modalities. Drawing inspiration from the work of Pallant et al. (2025), a particularly fruitful line of inquiry would be to design studies that compare different pedagogical approaches to GenAI use. For instance, a future quasi-experiment could contrast a procedural approach, where students follow pre-set game prompts (as in the present study), with a mastery-oriented approach, where students are guided to use conversational agents to construct or augment their own knowledge, such as co-creating interactive stories or designing their own language games. Such a study could illuminate whether fostering greater student agency in GenAI interactions leads to deeper and more transferable learning outcomes.

7. Conclusions

This investigation provides evidence that GenAI-led games can serve as a viable tool for enhancing the listening and pronunciation skills of primary EFL learners, significantly outperforming traditional non-digital games. However, the exploratory nature of this study and its methodological constraints necessitate cautious interpretation of these findings. Concurrently, the study documents the professional maturation of pre-service teachers who, when immersed in the practical challenges of enactment, evolve from cautious novices into reflective, adaptive facilitators of technology-boosted learning. This research is among the first to empirically dissect the impact of a GenAI-gamified intervention within a primary school practicum setting through a mixed-methods lens, offering a granular account of both pupil outcomes and teacher development. The findings suggest that generative technology, when carefully integrated, can complement teachers’ work by creating personalized, engaging, and scalable opportunities for language practice.
Despite the inherent limitations of this exploratory work, its findings offer a foundation for continued exploration of the intersections between GenAI, game-based learning, and early language education. While transformative change is gradual, the tools shaping its trajectory are already in the hands of teachers and students. Understanding their optimal use is no longer a futuristic abstraction but a present-day imperative. The insights gleaned here illuminate a path forward, one that encourages thoughtful experimentation, robust teacher training, and a steadfast focus on creating effective learning environments for the next generation of global communicators.

Author Contributions

Conceptualization, A.R.; methodology: A.R.; project administration, K.Y.; formal analysis, M.H.; visualization, M.H.; validation, G.M. and Y.K.; writing—original draft, A.R., K.Y., and G.M.; writing—review & editing, M.H. and Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Zhetysu University on 22 January 2025 (Ref. No. 1871).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data supporting the findings is available on request from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Sample ChatGPT Activity

  • Spy Training Game
In this activity, ChatGPT acts as a spy trainer, providing English phrases for the student to repeat, with corrective feedback. The activity targets pronunciation indirectly through encouraged repetitions in communicative sequences, suitable for mediocre-level 8–9-year-olds.
Sample prompt in Russian (read aloud by the pupil):
Я третьеклассник, изучаю английский; давай сыграем в шпионскую тренировку: говори мне фразу на английском, проси меня повторить её и поправляй меня, если говорю неправильно; потом давай другую фразу; повторяй цикл.
English translation:
I am a third-grader studying English; let’s play spy training: tell me a phrase in English, ask me to repeat it and correct me if I say it wrong; then give another phrase; repeat the cycle.

Appendix B. Reflective Prompts

  • What are the main AI-based gamification ideas I want to incorporate in my English lesson? Why?
  • Open reflection on the AI-based gamification task carried out in class. What did I learn about generative AI as a learner?
  • Open reflection on upcoming AI-based gamification task (key considerations, hopes and concerns?)
  • Identify key changes in your knowledge, understanding, skills or dispositions. Describe the changes and identify experiences or key points in the intervention that influenced your teaching.
  • Three things I learned, two things I want to know more about, one question I still have.
  • Open post-teaching reflection on educational AI experience.

References

  1. Ai, Y., Hu, Y., & Zhao, W. (2025). Interactive learning: Harnessing technology to alleviate foreign language anxiety in foreign literature education. Interactive Learning Environments, 33(3), 2440–2459. [Google Scholar] [CrossRef]
  2. Almelhes, S. A. (2024). Gamification for teaching the Arabic language to non-native speakers: A systematic literature review. Frontiers in Education, 9, 1371955. [Google Scholar] [CrossRef]
  3. Alpysbayeva, N., Zholtaeva, G., Tazhinova, G., Syrlybayeva, G., & Assylova, R. (2025). Fostering pre-service primary teachers’ capacity to employ an interactive learning tool. Qubahan Academic Journal, 5(1), 662–673. [Google Scholar] [CrossRef]
  4. Al-Rousan, A. H., Ayasrah, M. N., Khasawneh, M. a. S., Obeidat, L. M., & Obeidat, S. S. (2025). AI-enhanced gamification in education: Developing and validating a scale for measuring engagement and motivation among secondary school students: Insights from the network analysis perspective. European Journal of Education, 60(3), e70153. [Google Scholar] [CrossRef]
  5. Alturaiki, S. M., Gaballah, M. K., & El Arab, R. A. (2025). Enhancing nursing students’ engagement and critical thinking in anatomy and physiology through gamified teaching: A non-equivalent quasi-experimental study. Nursing Reports, 15(9), 333. [Google Scholar] [CrossRef]
  6. Arkoumanis, G., Sofos, A., Ventista, O. M., Ventistas, G., & Tsani, P. (2025). The impact of artificial intelligence on elementary school students’ learning: A meta-analysis. Computers in the Schools. advance online publication. [Google Scholar] [CrossRef]
  7. Babu, S. S., & Moorthy, A. D. (2024). Application of artificial intelligence in adaptation of gamification in education: A literature review. Computer Applications in Engineering Education, 32(1), e22683. [Google Scholar] [CrossRef]
  8. Belkina, M., Daniel, S., Nikolic, S., Haque, R., Lyden, S., Neal, P., Grundy, S., & Hassan, G. M. (2025). Implementing generative AI (GeNAI) in higher education: A systematic review of case studies. Computers and Education Artificial Intelligence, 8, 100407. [Google Scholar] [CrossRef]
  9. Busse, V., Hennies, C., Kreutz, G., & Roden, I. (2021). Learning grammar through singing? An intervention with EFL primary school learners. Learning and Instruction, 71, 101372. [Google Scholar] [CrossRef]
  10. Cambridge. (2018). English Qualifications. Pre A1 starters, A1 movers and A2 Flyers. In Sample papers for young learners. For exams from 2018, 2. Cambridge Assessment English. [Google Scholar]
  11. Chan, S., & Lo, N. (2024). Enhancing EFL/ESL instruction through gamification: A comprehensive review of empirical evidence. Frontiers in Education, 9, 1395155. [Google Scholar] [CrossRef]
  12. Chang, A. C. S. (2024). The effect of listening instruction on the development of l2 learners’ listening competence: A meta-analysis. International Journal of Listening, 38(2), 131–149. [Google Scholar] [CrossRef]
  13. Chapelle, C. A. (2025). Generative AI as game changer: Implications for language education. System, 132, 103672. [Google Scholar] [CrossRef]
  14. Chen, H. H. J., Yang, C. T. Y., & Lai, K. K. W. (2023). Investigating college EFL learners’ perceptions toward the use of Google Assistant for foreign language learning. Interactive Learning Environments, 31(3), 1335–1350. [Google Scholar] [CrossRef]
  15. Chen, J., Huang, K., Lai, C., & Jin, T. (2025a). The impact of GenAI-based collaborative inquiry on critical thinking in argumentation: A case study of blended argumentative writing pedagogy. TESOL Quarterly. advance online publication. [Google Scholar] [CrossRef]
  16. Chen, Y., Ke, N., Huang, L., & Luo, R. (2025b). The role of GenAI in EFL speaking: Effects on oral proficiency, anxiety and risk-taking. RELC Journal. advance online publication. [Google Scholar] [CrossRef]
  17. Cheng, J., Lu, C., & Xiao, Q. (2025). Effects of gamification on EFL learning: A quasi-experimental study of reading proficiency and language enjoyment among Chinese undergraduates. Frontiers in Psychology, 16, 1448916. [Google Scholar] [CrossRef] [PubMed]
  18. Chia, A., & Xavier, C. A. (2025). Grammar as a meaning-making resource: Fostering meaningful, situated literacy development. TESOL Journal, 16, e70015. [Google Scholar] [CrossRef]
  19. Chiu, T. K. F. (2024). The impact of Generative AI (GenAI) on practices, policies and research direction in education: A case of ChatGPT and Midjourney. Interactive Learning Environments, 32(10), 6187–6203. [Google Scholar] [CrossRef]
  20. Chung, Y., & Révész, A. (2021). Investigating the effect of textual enhancement in post-reading tasks on grammatical development by child language learners. Language Teaching Research, 28(2), 632–653. [Google Scholar] [CrossRef]
  21. Cong-Lem, N., Soyoof, A., & Tsering, D. (2025). A systematic review of the limitations and associated opportunities of ChatGPT. International Journal of Human–Computer Interaction, 41(7), 3851–3866. [Google Scholar] [CrossRef]
  22. Dahri, N. A., Yahaya, N. B., Al-rahmi, W. M., Almuqren, L., Almgren, A. S., Alshimai, A., & Al-Adwan, A. S. (2025). The effect of AI gamification on students’ engagement and academic achievement in Malaysia: SEM analysis perspectives. IEEE Access, 13, 70791–70810. [Google Scholar] [CrossRef]
  23. Dehghanzadeh, H., Fardanesh, H., Hatami, J., Talaee, E., & Noroozi, O. (2021). Using gamification to support learning English as a second language: A systematic review. Computer Assisted Language Learning, 34(7), 934–957. [Google Scholar] [CrossRef]
  24. Delgado-Garza, P., & Mayo, M. P. G. (2025). Can we train young EFL learners to ‘notice the gap’? Exploring the relationship between metalinguistic awareness, grammar learning and the use of metalinguistic explanations in a dictogloss task. International Review of Applied Linguistics in Language Teaching, 63(3), 1573–1597. [Google Scholar] [CrossRef]
  25. Ding, D., & Yusof, A. M. B. (2025). Investigating the role of AI-powered conversation bots in enhancing L2 speaking skills and reducing speaking anxiety: A mixed methods study. Humanities and Social Sciences Communications, 12, 1223. [Google Scholar] [CrossRef]
  26. Elmaadaway, M. A. N., El-Naggar, M. E., & Abouhashesh, M. R. I. (2025). Improving primary school students’ oral reading fluency through voice chatbot-based AI. Journal of Computer Assisted Learning, 41(2), e70019. [Google Scholar] [CrossRef]
  27. Evmenova, A. S., Regan, K., Mergen, R., & Hrisseh, R. (2025). Educational games and the potential of ai to transform writing across the curriculum. Education Sciences, 15(5), 567. [Google Scholar] [CrossRef]
  28. Far, F. F., & Taghizadeh, M. (2024). Comparing the effects of digital and non-digital gamification on EFL learners’ collocation knowledge, perceptions, and sense of flow. Computer Assisted Language Learning, 37(7), 2083–2115. [Google Scholar] [CrossRef]
  29. Fathi, J., Rahimi, M., & Teo, T. (2025). Applying intelligent personal assistants to develop fluency and comprehensibility, and reduce accentedness in EFL learners: An empirical study of Google Assistant. Language Teaching Research. advance online publication. [Google Scholar] [CrossRef]
  30. Fitzpatrick, M., & Leavy, A. (2025). Reciprocal interplays in becoming STEM learners and teachers: Preservice teachers’ evolving understandings of integrated STEM education. International Journal of Mathematical Education in Science and Technology. advance online publication. [Google Scholar] [CrossRef]
  31. Gamlem, S. M., McGrane, J., Brandmo, C., Moltudal, S., Sun, S. Z., & Hopfenbeck, T. N. (2025). Exploring pre-service teachers’ attitudes and experiences with generative AI: A mixed methods study in Norwegian teacher education. Educational Psychology. advance online publication. [Google Scholar] [CrossRef]
  32. Gao, Y., & Pan, L. (2023). Learning English vocabulary through playing games: The gamification design of vocabulary learning applications and learner evaluations. Language Learning Journal, 51(4), 451–471. [Google Scholar] [CrossRef]
  33. Gao, Y., Zhang, J., He, Z., & Zhou, Z. (2025). Feasibility and usability of an artificial intelligence-powered gamification intervention for enhancing physical activity among college students: Quasi-experimental study. JMIR Serious Games, 13, e65498. [Google Scholar] [CrossRef]
  34. Gaurina, M., Alajbeg, A., & Weber, I. (2025). The power of play: Investigating the effects of gamification on motivation and engagement in physics classroom. Education Sciences, 15(1), 104. [Google Scholar] [CrossRef]
  35. Giannakos, M., Azevedo, R., Brusilovsky, P., Cukurova, M., Dimitriadis, Y., Hernandez-Leo, D., Järvelä, S., Mavrikis, M., & Rienties, B. (2025). The promise and challenges of generative AI in education. Behaviour and Information Technology, 44(11), 2518–2544. [Google Scholar] [CrossRef]
  36. Goh, C. C. M., & Aryadoust, V. (2025). Developing and assessing second language listening and speaking: Does AI make it better? Annual Review of Applied Linguistics, 45, 179–199. [Google Scholar] [CrossRef]
  37. Gu, J., & Yan, Z. (2025). Effects of GenAI interventions on student academic performance: A meta-analysis. Journal of Educational Computing Research, 63(6), 1460–1492. [Google Scholar] [CrossRef]
  38. Guan, L., Zhang, E. Y., & Gu, M. M. (2025). Examining generative AI–mediated informal digital learning of English practices with social cognitive theory: A mixed-methods study. ReCALL, 37(3), 315–331. [Google Scholar] [CrossRef]
  39. Gunnarsson, K. U., Collier, E. S., & Bendtsen, M. (2025). Research participation effects and where to find them: A systematic review of studies on alcohol. Journal of Clinical Epidemiology, 179, 111668. [Google Scholar] [CrossRef]
  40. Honig, C. D., Desu, A., & Franklin, J. (2024). GenAI in the classroom: Customized GPT roleplay for process safety education. Education for Chemical Engineers, 49, 55–66. [Google Scholar] [CrossRef]
  41. Hori, R., Fujii, M., Toguchi, T., Wong, S., & Endo, M. (2025). Impact of an EFL digital application on learning, satisfaction, and persistence in elementary school children. Early Childhood Education Journal, 53(5), 1851–1862. [Google Scholar] [CrossRef]
  42. Hsieh, W. M., Yeh, H. C., & Chen, N. S. (2023). Impact of a robot and tangible object (R&T) integrated learning system on elementary EFL learners’ English pronunciation and willingness to communicate. Computer Assisted Language Learning, 38(4), 773–798. [Google Scholar] [CrossRef]
  43. Hsu, H. L., Chen, H. H. J., & Todd, A. G. (2023). Investigating the impact of the Amazon Alexa on the development of L2 listening and speaking skills. Interactive Learning Environments, 31(9), 5732–5745. [Google Scholar] [CrossRef]
  44. Hsu, T., & Hsu, T. (2025). Teaching AI with games: The impact of generative AI drawing on computational thinking skills. Education and Information Technologies. advance online publication. [Google Scholar] [CrossRef]
  45. Huang, T., Wu, C., & Wu, M. (2025). Developing pre-service language teachers’ GenAI literacy: An interventional study in an English language teacher education course. Discover Artificial Intelligence, 5, 163. [Google Scholar] [CrossRef]
  46. Hung, H. T., Yang, J. C., & Chung, C. J. (2025). Effects of performance goal orientations on English translation techniques in digital game-based learning. International Journal of Human–Computer Interaction, 41(9), 5575–5590. [Google Scholar] [CrossRef]
  47. Ishaq, K., Zin, N. A. M., Rosdi, F., Jehanghir, M., Ishaq, S., & Abid, A. (2021). Mobile-assisted and gamification-based language learning: A systematic literature review. PeerJ Computer Science, 7, e496. [Google Scholar] [CrossRef]
  48. Jiang, X., Wang, R., Hoang, T., Ranaweera, C., Dong, C., & Myers, T. (2025). AI-powered gamified scaffolding: Transforming learning in virtual learning environment. Electronics, 14(13), 2732. [Google Scholar] [CrossRef]
  49. Kalota, F. (2024). A Primer on generative artificial intelligence. Education Sciences, 14(2), 172. [Google Scholar] [CrossRef]
  50. Kessler, M., Valle, J. M. R., Çekmegeli, K., & Farrell, S. (2025). Generative AI for learning languages other than English: L2 writers’ current uses and perceptions of ethics. Foreign Language Annals, 58, 508–531. [Google Scholar] [CrossRef]
  51. Koç, F. Ş., & Savaş, P. (2025). The use of artificially intelligent chatbots in English language learning: A systematic meta-synthesis study of articles published between 2010 and 2024. ReCALL, 37(1), 4–21. [Google Scholar] [CrossRef]
  52. Kohnke, L., Zou, D., & Su, F. (2025). Exploring the potential of GenAI for personalised English teaching: Learners’ experiences and perceptions. Computers and Education Artificial Intelligence, 8, 100371. [Google Scholar] [CrossRef]
  53. Korseberg, L., & Stalheim, O. R. (2025). The role of digital technology in facilitating epistemic fluency in professional education. Professional Development in Education. advance online publication. [Google Scholar] [CrossRef]
  54. Law, L. (2024). Application of generative artificial intelligence (GenAI) in language teaching and learning: A scoping literature review. Computers and Education Open, 6, 100174. [Google Scholar] [CrossRef]
  55. Lee, S., Choe, H., Zou, D., & Jeon, J. (2025). Generative AI (GenAI) in the language classroom: A systematic review. Interactive Learning Environments. advance online publication. [Google Scholar] [CrossRef]
  56. Lee, S., & Jeon, J. (2024). Visualizing a disembodied agent: Young EFL learners’ perceptions of voice-controlled conversational agents as language partners. Computer Assisted Language Learning, 37(5–6), 1048–1073. [Google Scholar] [CrossRef]
  57. Lengersdorff, L. L., & Lamm, C. (2025). With low power comes low credibility? Toward a principled critique of results from underpowered tests. Advances in Methods and Practices in Psychological Science, 8(1). [Google Scholar] [CrossRef]
  58. Li, M., Wang, Y., & Yang, X. (2025). Can generative AI chatbots promote second language acquisition? A meta-analysis. Journal of Computer Assisted Learning, 41(4), e70060. [Google Scholar] [CrossRef]
  59. Li, X., Liu, J., Gao, W., & Cohen, G. L. (2024). Challenging the N-Heuristic: Effect size, not sample size, predicts the replicability of psychological science. PLoS ONE, 19(8), e0306911. [Google Scholar] [CrossRef]
  60. Liu, G., Fathi, J., & Rahimi, M. (2024). Using digital gamification to improve language achievement, foreign language enjoyment, and ideal L2 self: A case of English as a foreign language learners. Journal of Computer Assisted Learning, 40(4), 1347–1364. [Google Scholar] [CrossRef]
  61. Liu, X., Guo, B., He, W., & Hu, X. (2025). Effects of generative artificial intelligence on k-12 and higher education students’ learning outcomes: A meta-analysis. Journal of Educational Computing Research, 63(5), 1249–1291. [Google Scholar] [CrossRef]
  62. Mahmoudi-Dehaki, M., & Nasr-Esfahani, N. (2025). Utilising GenAI to create a culturally responsive EFL curriculum for pre-teen learners in the MENA region. Education 3-13, 53(7), 1175–1189. [Google Scholar] [CrossRef]
  63. Mi, Y., Rong, M., & Chen, X. (2025). Exploring the affordances and challenges of GenAi feedback in L2 writing instruction: A comparative analysis with peer feedback. ECNU Review of Education. advance online publication. [Google Scholar] [CrossRef]
  64. Mohebbi, A. (2025). Enabling learner independence and self-regulation in language education using AI tools: A systematic review. Cogent Education, 12(1), 2433814. [Google Scholar] [CrossRef]
  65. Monzon, N., & Hays, F. A. (2025). Leveraging generative AI to improve motivation and retrieval in higher education learners. JMIR Medical Education, 11, e59210. [Google Scholar] [CrossRef] [PubMed]
  66. Mulyani, H., Istiaq, M. A., Shauki, E. R., Kurniati, F., & Arlinda, H. (2025). Transforming education: Exploring the influence of generative AI on teaching performance. Cogent Education, 12(1), 2448066. [Google Scholar] [CrossRef]
  67. Nakagawa, S., Lagisz, M., Yang, Y., & Drobniak, S. M. (2024). Finding the right power balance: Better study design and collaboration can reduce dependence on statistical power. PLoS Biology, 22(1), e3002423. [Google Scholar] [CrossRef]
  68. Niño, J. R. G., Delgado, L. P. A., Chiappe, A., & González, E. O. (2025). Gamifying learning with AI: A pathway to 21st-century skills. Journal of Research in Childhood Education, 39(4), 735–750. [Google Scholar] [CrossRef]
  69. Oberpriller, J., de Souza Leite, M., & Pichler, M. (2022). Fixed or random? On the reliability of mixed-effects models for a small number of levels in grouping variables. Ecology and Evolution, 12(7), e9062. [Google Scholar] [CrossRef]
  70. Pahi, K., Hawlader, S., Hicks, E., Zaman, A., & Phan, V. (2024). Enhancing active learning through collaboration between human teachers and generative AI. Computers and Education Open, 6, 100183. [Google Scholar] [CrossRef]
  71. Pallant, J. L., Blijlevens, J., Campbell, A., & Jopp, R. (2025). Mastering knowledge: The impact of generative AI on student learning outcomes. Studies in Higher Education. advance online publication. [Google Scholar] [CrossRef]
  72. Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. Basic Books. [Google Scholar]
  73. Perifanou, M., & Economides, A. A. (2025). Collaborative uses of GenAI tools in project-based learning. Education Sciences, 15(3), 354. [Google Scholar] [CrossRef]
  74. Pham, T. D., Karunaratne, N., Exintaris, B., Liu, D., Lay, T., Yuriev, E., & Lim, A. (2025). The impact of generative AI on health professional education: A systematic review in the context of student learning. Medical Education. advance online publication. [Google Scholar] [CrossRef]
  75. Rad, S. H. (2024). Revolutionizing L2 speaking proficiency, willingness to communicate, and perceptions through artificial intelligence: A case of Speeko application. Innovation in Language Learning and Teaching, 18(4), 364–379. [Google Scholar] [CrossRef]
  76. Rafikova, A., & Voronin, A. (2025). Human–chatbot communication: A systematic review of psychologic studies. AI and Society. advance online publication. [Google Scholar] [CrossRef]
  77. Roehr-Brackin, K. (2024). Explicit and implicit knowledge and learning of an additional language: A research agenda. Language Teaching, 57(1), 68–86. [Google Scholar] [CrossRef]
  78. Salinas-Navarro, D. E., Vilalta-Perdomo, E., Michel-Villarreal, R., & Montesinos, L. (2024). Using generative artificial intelligence tools to explain and enhance experiential learning for authentic assessment. Education Sciences, 14(1), 83. [Google Scholar] [CrossRef]
  79. Scandola, M., & Tidoni, E. (2024). Reliability and feasibility of linear mixed models in fully crossed experimental designs. Advances in Methods and Practices in Psychological Science, 7(1). [Google Scholar] [CrossRef]
  80. Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N. A., Garamszegi, L. Z., & Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152. [Google Scholar] [CrossRef]
  81. Shortt, M., Tilak, S., Kuznetcova, I., Martens, B., & Akinkuolie, B. (2023). Gamification in mobile-assisted language learning: A systematic review of Duolingo literature from public release of 2012 to early 2020. Computer Assisted Language Learning, 36(3), 517–554. [Google Scholar] [CrossRef]
  82. Tai, T., & Chen, H. H. (2024). Navigating elementary EFL speaking skills with generative AI chatbots: Exploring individual and paired interactions. Computers and Education, 220, 105112. [Google Scholar] [CrossRef]
  83. Tan, X., Cheng, G., & Ling, M. H. (2025). Artificial intelligence in teaching and teacher professional development: A systematic review. Computers and Education Artificial Intelligence, 8, 100355. [Google Scholar] [CrossRef]
  84. Tavakoli, P., Campbell, C., & McCormack, J. (2016). Development of speech fluency over a short period of time: Effects of pedagogic intervention. TESOL Quarterly, 50(2), 447–471. [Google Scholar] [CrossRef]
  85. Taye, T., & Mengesha, M. (2024). Identifying and analyzing common English writing challenges among regular undergraduate students. Heliyon, 10(17), e36876. [Google Scholar] [CrossRef]
  86. Vygotsky, L. (1987). Zone of proximal development. In Mind in society: The development of higher psychological processes. Harvard University Press. [Google Scholar]
  87. Wang, K., Ruan, Q., Zhang, X., Fu, C., & Duan, B. (2024). Pre-service teachers’ GenAI anxiety, technology self-efficacy, and TPACK: Their structural relations with behavioral intention to design GenAI-assisted teaching. Behavioral Sciences, 14(5), 373. [Google Scholar] [CrossRef]
  88. Wang, Y., Derakhshan, A., & Ghiasvand, F. (2025a). EFL teachers’ generative artificial intelligence (GenAI) literacy: A scale development and validation study. System, 133, 103791. [Google Scholar] [CrossRef]
  89. Wang, Y., Zhang, T., Yao, L., & Seedhouse, P. (2025b). A scoping review of empirical studies on generative artificial intelligence in language education. Innovation in Language Learning and Teaching. advance online publication. [Google Scholar] [CrossRef]
  90. Wood, D., & Moss, S. H. (2024). Evaluating the impact of students’ generative AI use in educational contexts. Journal of Research in Innovative Teaching and Learning, 17(2), 152–167. [Google Scholar] [CrossRef]
  91. Wu, H., & Liu, W. (2025). Exploring mechanisms of effective informal GenAI-supported second language speaking practice: A cognitive-motivational model of achievement emotions. Discover Computing, 28, 119. [Google Scholar] [CrossRef]
  92. Wu, H., Zeng, Y., Chen, Z., & Liu, F. (2025a). GenAI competence is different from digital competence: Developing and validating the GenAI competence scale for second language teachers. Education and Information Technologies. advance online publication. [Google Scholar] [CrossRef]
  93. Wu, J., Wang, J., Lei, S., Wu, F., & Gao, X. (2025b). The impact of metacognitive scaffolding on deep learning in a GenAI-supported learning environment. Interactive Learning Environments. advance online publication. [Google Scholar] [CrossRef]
  94. Yang, C., Li, R., & Yang, L. (2025). Revisiting trends in GenAi-assisted second language writing: Retrospect and prospect. Journal of Educational Computing Research, 63, 1819–1863. [Google Scholar] [CrossRef]
  95. Zhang, L., Yao, Z., & Moghaddam, A. H. (2025a). Designing GenAI tools for personalized learning implementation: Theoretical analysis and prototype of a multi-agent system. Journal of Teacher Education, 76(3), 280–293. [Google Scholar] [CrossRef]
  96. Zhang, Y., Lai, C., & Gu, M. M. Y. (2025b). Becoming a teacher in the era of AI: A multiple-case study of pre-service teachers’ investment in AI-facilitated learning-to-teach practices. System, 133, 103746. [Google Scholar] [CrossRef]
  97. Zhang, Z., Aubrey, S., Huang, X., & Chiu, T. K. F. (2025c). The role of generative AI and hybrid feedback in improving L2 writing skills: A comparative study. Innovation in Language Learning and Teaching. advance online publication. [Google Scholar] [CrossRef]
  98. Zhou, M., & Peng, S. (2025). The usage of AI in teaching and students’ creativity: The mediating role of learning engagement and the moderating role of AI literacy. Behavioral Sciences, 15(5), 587. [Google Scholar] [CrossRef]
  99. Zhou, S. (2024). Gamifying language education: The impact of digital game-based learning on Chinese EFL learners. Humanities and Social Sciences Communications, 11, 1518. [Google Scholar] [CrossRef]
Figure 1. Grammar scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. Both groups improved over time, but no significant between-group difference emerged.
Figure 1. Grammar scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. Both groups improved over time, but no significant between-group difference emerged.
Education 15 01326 g001
Figure 2. Listening comprehension scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. The ChatGPT group had superior listening comprehension scores compared to the conventional group across all time points.
Figure 2. Listening comprehension scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. The ChatGPT group had superior listening comprehension scores compared to the conventional group across all time points.
Education 15 01326 g002
Figure 3. Pronunciation scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. The ChatGPT group showed consistently larger gains in pronunciation than the comparison group.
Figure 3. Pronunciation scores over time. Point with error bar: mean and 95% confidence interval. Dot: individual score. The ChatGPT group showed consistently larger gains in pronunciation than the comparison group.
Education 15 01326 g003
Table 1. Participants’ Basic Demographics.
Table 1. Participants’ Basic Demographics.
GroupN (%)GenderSig aAge (Years)Sig b
FemaleMale Mean (SD)
Non-digital61 (51.3)37 (60.7)24 (39.3)0.0848.59 (0.50)0.051
ChatGPT58 (48.7)26 (44.8)32 (55.2) 8.76 (0.43)
Note: a chi-square; b independent t-test (two-tailed); SD, standard deviation.
Table 2. English Topics and Game-based Activities Across the Six Lessons.
Table 2. English Topics and Game-based Activities Across the Six Lessons.
LessonTopicNon-Digital GroupChatGPT Group
1Introductory ActivityGroup circle game with flashcards: Pupils pass cards and practice common English phrases in pairs, with teacher-led corrections.Individual spy training: GenAI provides phrases, prompts repetition, and offers feedback on grammar and speaking.
2Family MembersBoard game with dice: Pupils roll to name family roles and describe them using simple sentences, group verification.Family adventure quest: GenAI describes a family scenario in English, asks pupil to respond with related phrases, encourages speaking aloud with correct verb forms.
3Daily RoutinesFlashcard matching: Pupils match pictures to routine verbs, then role-play in small groups with teacher guidance.Time traveler game: GenAI narrates a daily routine in English, prompts pupil to echo and add their own, checking grammar in responses.
4Hobbies and ActivitiesCharades: Pupils act out hobbies non-verbally, others guess and form sentences like “I like playing soccer.”Hobby explorer challenge: GenAI suggests an activity phrase in English, urges repetition with enthusiasm, and evaluates sentence structure.
5Food and MealsMemory card game: Pupils flip cards to match foods and discuss preferences in pairs, using basic grammar.Chef competition: GenAI proposes a meal description in English, asks pupil to repeat and suggest additions, providing feedback on word order.
6Places in TownMap drawing relay: Groups draw town places and label them, then describe directions orally.City detective puzzle: GenAI gives location clues in English, prompts pupil to respond with directions, correcting grammar and encouraging clear speaking.
Table 3. Results of Linear Mixed Model Analysis.
Table 3. Results of Linear Mixed Model Analysis.
VariableTermEstimateSEtpR2 MargR2 Cond
GrammarIntercept9.9180.47520.8890.0010.0850.094
Group × Time (T1–T2)−0.0060.967−0.0060.9950.0850.094
Group × Time (T1–T3)0.530.9570.5540.580.0850.094
ListeningIntercept7.0330.30623.0080.0010.10.1
Group × Time (T1–T2)1.5250.6192.4620.0150.10.1
Group × Time (T1–T3)0.4840.6190.7810.4360.10.1
PronunciationIntercept18.9840.42444.820.0010.2070.207
Group × Time (T1–T2)2.2410.8582.6120.010.2070.207
Group × Time (T1–T3)0.9750.8581.1370.2570.2070.207
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Raimkulova, A.; Ybyraimzhanov, K.; Halmatov, M.; Mailybayeva, G.; Khaimuldanov, Y. Pre-Service EFL Primary Teachers Adopting GenAI-Powered Game-Based Instruction: A Practicum Intervention. Educ. Sci. 2025, 15, 1326. https://doi.org/10.3390/educsci15101326

AMA Style

Raimkulova A, Ybyraimzhanov K, Halmatov M, Mailybayeva G, Khaimuldanov Y. Pre-Service EFL Primary Teachers Adopting GenAI-Powered Game-Based Instruction: A Practicum Intervention. Education Sciences. 2025; 15(10):1326. https://doi.org/10.3390/educsci15101326

Chicago/Turabian Style

Raimkulova, Akbota, Kalibek Ybyraimzhanov, Medera Halmatov, Gulmira Mailybayeva, and Yerlan Khaimuldanov. 2025. "Pre-Service EFL Primary Teachers Adopting GenAI-Powered Game-Based Instruction: A Practicum Intervention" Education Sciences 15, no. 10: 1326. https://doi.org/10.3390/educsci15101326

APA Style

Raimkulova, A., Ybyraimzhanov, K., Halmatov, M., Mailybayeva, G., & Khaimuldanov, Y. (2025). Pre-Service EFL Primary Teachers Adopting GenAI-Powered Game-Based Instruction: A Practicum Intervention. Education Sciences, 15(10), 1326. https://doi.org/10.3390/educsci15101326

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop