Next Article in Journal
Intelligent Eyes on Buildings: A Scientometric Mapping and Systematic Review of AI-Based Crack Detection and Predictive Diagnostics of Building Structures
Previous Article in Journal
Media-Based Cultural Diversity Education: Television as an Informal Actor in the Construction of Cultural Difference
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input

Learning Institute for Empowerment, Multimedia University, Melaka 75450, Malaysia
Encyclopedia 2026, 6(4), 74; https://doi.org/10.3390/encyclopedia6040074 (registering DOI)
Submission received: 4 February 2026 / Revised: 16 March 2026 / Accepted: 20 March 2026 / Published: 27 March 2026
(This article belongs to the Section Arts & Humanities)

Abstract

Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine how artificial intelligence (AI)-mediated instruction supports EFL speaking development. The included studies were analysed according to AI modality, pedagogical integration, instructional input characteristics, and linguistic and affective outcomes. Findings indicate that AI tools—such as chatbots, automatic speech recognition systems, and large language models—consistently support affective outcomes, including reduced speaking anxiety and increased willingness to communicate. Improvements in fluency, pronunciation, and accuracy were frequently reported, particularly when AI tools were embedded within task-based and pedagogically structured instructional designs. However, evidence for sustained development of higher-order communicative competence was more variable. The review proposes a mediated input framework conceptualising AI as a design-sensitive instructional resource rather than an autonomous teaching agent.

1. Introduction

Speaking proficiency is widely recognised as one of the most demanding skills for learners of English as a Foreign Language (EFL). Unlike receptive skills such as reading and listening, speaking requires learners to process linguistic input in real time while simultaneously managing accuracy, fluency, pronunciation, and affective factors such as confidence and anxiety [1,2]. In many EFL contexts, particularly those characterised by limited exposure to sustained or authentic interaction, learners continue to struggle to develop spoken competence despite years of formal instruction [3]. These persistent challenges have prompted renewed interest in instructional approaches that foreground learners’ access to meaningful language exposure as a foundation for oral language development.
Input-oriented approaches have long emphasised the role of comprehensible input in facilitating language acquisition, proposing that learners benefit when exposure to language precedes pressured or premature output [4,5]. From this perspective, sustained listening, reading, and interactional exposure enable learners to internalise linguistic patterns and establish form–meaning connections that support spoken production. Empirical research suggests that such approaches can contribute to gains in both fluency and accuracy in speaking [6]. However, in many instructional settings—particularly large, examination-driven classrooms—providing sufficiently rich, individualised, and frequent input remains a longstanding pedagogical constraint.
Recent advances in artificial intelligence (AI) have expanded the ways in which speaking opportunities and language exposure can be provided in EFL classrooms. Technologies such as conversational chatbots, automatic speech recognition systems, and large language models allow learners to engage in responsive, repeatable, and low-anxiety interaction beyond the temporal and spatial limits of classroom instruction [7,8]. A growing body of empirical research has examined the use of AI-mediated tools for speaking practice, pronunciation training, and automated feedback, often reporting improvements in oral fluency, accuracy, and learner confidence [9,10,11]. Despite this expanding evidence base, existing studies vary considerably in how AI is pedagogically implemented and theoretically interpreted.
As a result, the pedagogical role of AI in EFL speaking development remains insufficiently synthesised. In particular, there is limited clarity regarding how different forms of AI-mediated speaking support relate to established constructs in second language acquisition, including learners’ engagement with language input, interaction, and affective conditions for learning. Prior studies are frequently fragmented across technologies, learner populations, and outcome measures, with many emphasising short-term performance gains or learner perceptions rather than offering integrated, theory-informed interpretations [12]. This fragmentation has constrained the field’s ability to draw coherent conclusions about how AI-mediated speaking instruction functions across instructional contexts.
This systematic review addresses this gap by synthesising empirical studies on AI-mediated instruction for EFL speaking development. The review identifies recurring instructional functions, pedagogical approaches, and learning outcomes associated with AI-supported speaking activities. Drawing on input-oriented and task-based perspectives as interpretive lenses, the review further examines how AI-mediated practices may support learners’ engagement with spoken language and oral development under different instructional conditions. Rather than advancing prescriptive claims about AI’s instructional role, the review provides an evidence-informed synthesis intended to support theoretically grounded research and principled pedagogical decision-making.
This review is informed by three complementary theoretical perspectives from second language acquisition research: input-based theory, interactionist perspectives on language learning, and sociocultural approaches to mediated learning. Input-oriented frameworks emphasise the importance of comprehensible and meaningful exposure to language as a foundation for acquisition, while interactionist accounts highlight how participation in dialogue and feedback processes supports linguistic development. Sociocultural perspectives further stress the role of mediation, scaffolding, and instructional design in shaping learning outcomes. Together, these perspectives provide a coherent interpretive framework for analysing how AI-mediated speaking environments influence learner engagement with instructional input, interaction, and affective conditions for language development.
Accordingly, the review addresses the following research questions:
  • RQ1: What AI technologies and pedagogical approaches have been employed to support EFL/ESL speaking development, and how are these pedagogically positioned (practice, feedback, or interaction)?
  • RQ2: What linguistic and affective outcomes are associated with AI-supported speaking instruction, and under what instructional conditions are these outcomes sustained?
  • RQ3: What affordances and limitations of AI-mediated speaking instruction emerge when interpreted through input-oriented and task-based perspectives?
From an educational perspective, understanding how AI-mediated speaking activities are designed and embedded within instructional contexts is essential for translating technological potential into sustainable classroom practice.

2. Literature Review

2.1. Speaking Development and the Input–Output Relationship

Speaking is widely recognized as one of the most demanding skills in second language learning because it requires learners to process linguistic input in real time while coordinating multiple dimensions of performance, including fluency, accuracy, pronunciation, and interactional competence [1,2]. Unlike receptive skills, speaking places immediate cognitive and affective demands on learners, often resulting in heightened anxiety and reduced willingness to communicate, particularly in EFL contexts where opportunities for authentic interaction are limited [3,8]. As a result, many learners struggle to develop spoken competence despite prolonged exposure to formal instruction.
Research in second language acquisition has long debated the relationship between input and output in speaking development. While output-oriented perspectives emphasize the role of pushed production in promoting linguistic accuracy and noticing [13], input-based perspectives argue that oral proficiency is fundamentally grounded in sustained exposure to meaningful and comprehensible input [4]. From this view, speaking emerges as a consequence of internalized linguistic knowledge rather than as its primary driver.

2.2. Input-Based Instruction as a Foundation for Oral Proficiency

Input-based instruction (IBI) builds on the assumption that learners acquire language most effectively when they are first exposed to structured, meaningful input before being required to produce output. Central to this perspective is the concept of comprehensible input, often described as language slightly beyond the learner’s current proficiency level (i + 1), which promotes acquisition through understanding rather than explicit rule learning [4].
Empirical studies have demonstrated that input-oriented approaches can support speaking development by strengthening form–meaning connections and reducing processing load during production. Processing Instruction, for example, has been shown to improve grammatical accuracy in spoken output by guiding learners to interpret linguistic forms more effectively [5]. Similarly, approaches such as Input Flood and Input Enhancement increase exposure to target forms and promote noticing, which can lead to improvements in both fluency and accuracy [6,14].
Recent research further suggests that meaning-focused and lexical input approaches can support oral fluency and spontaneous speech, while more form-focused input enhances accuracy and pronunciation control [6]. Together, these findings suggest that speaking proficiency develops most effectively when learners are given sufficient time and support to process input before engaging in output.

2.3. Technology-Enhanced Input and AI-Mediated Language Learning

Advances in educational technology have expanded the possibilities for delivering rich, repeated, and contextualized language input beyond traditional classroom constraints. Earlier forms of technology-enhanced language learning, including digital storytelling, video-based instruction, and mobile-assisted language learning (MALL), have been shown to increase learner engagement while broadening access to spoken language in meaningful contexts. These tools help address a persistent limitation of classroom-based instruction, namely learners’ restricted exposure to authentic and frequent input.
More recently, artificial intelligence (AI) has introduced a further shift in how input is delivered, personalized, and experienced. Conversational chatbots, automatic speech recognition systems, and large language model-based tools enable learners to engage in interactive dialogue, receive adaptive responses, and practice speaking in relatively low-anxiety environments. Existing studies increasingly suggest that AI-mediated speaking activities can support gains in fluency, pronunciation, and learner confidence, particularly by expanding opportunities for repeated practice and immediate feedback [15,16,17].
Recent research published in 2024–2025 provides further evidence that AI-mediated speaking support is most effective when it combines low-stakes interaction, immediate feedback, and pedagogically guided task design. Studies of AI chatbots and mobile conversational agents report improvements in speaking confidence, reduced anxiety, and greater willingness to communicate, especially among learners who may be hesitant to speak in teacher-fronted or peer-fronted settings [18,19,20]. Other recent studies show that AI-supported speech evaluation and feedback tools can improve fluency, pronunciation, speaking performance, and confidence, while also increasing motivation and classroom willingness to communicate [21,22,23,24].
Beyond learner performance outcomes, recent scholarship has also examined broader pedagogical implications of artificial intelligence for language learning and classroom practice. For example, ref. [25] discusses how AI-mediated environments influence language acquisition and linguistic development, highlighting both opportunities and emerging instructional challenges. Similarly, another study analyses EFL teachers’ perceptions of AI in relation to academic integrity and classroom pedagogy, emphasising the need for responsible and pedagogically informed integration of AI technologies. Related research on technology-supported engagement further indicates that digital innovations such as gamification can strengthen teacher–student interaction and enhance learners’ willingness to communicate in language classrooms [26,27].
Despite these promising developments, recent scholarship cautions against assuming that increased interaction time automatically produces fuller communicative development. Critical studies note that AI feedback may be inaccurate, overly generic, or too heavily focused on surface-level form. They also emphasize that pragmatic competence, discourse management, and socially situated communication still depend strongly on teacher mediation, blended pedagogy, and opportunities for transfer to human interaction [28,29,30,31,32].
These findings suggest that AI should not be viewed merely as a tool for feedback, assessment, or isolated speaking practice. Rather, it can be more productively conceptualized as a mediated input-and-interaction resource that provides learners with repeated exposure, adaptive response, and affective support. This perspective offers a stronger theoretical bridge between AI-mediated instruction and established second language acquisition frameworks, while also acknowledging that the educational value of AI depends on how it is pedagogically designed and integrated.

2.4. Conceptualizing AI as a Source of Comprehensible Input

From an input-based perspective, AI-mediated interaction can be viewed as a dynamic form of comprehensible input rather than merely a technological supplement. AI systems are capable of adjusting linguistic complexity, providing repeated exposure to lexical and grammatical patterns, and sustaining interaction over extended periods. These features closely align with key input characteristics identified in Second Language Acquisition (SLA)research, including frequency, salience, comprehensibility, and interactional relevance [6,33].
In addition, AI-mediated environments often reduce affective barriers associated with speaking, such as fear of negative evaluation, which has been shown to inhibit oral production [3]. By lowering anxiety and increasing opportunities for risk-free interaction, AI tools may create conditions that allow learners to process input more deeply before producing speech. This suggests that improvements in speaking performance may result not only from increased practice, but from enhanced quality and accessibility of input.
In this review, AI-mediated input is understood as linguistically meaningful language exposure that is generated or shaped by AI systems and that learners must process in order to understand, interpret, or make use of it. Such input does not occur only in traditionally receptive activities, but often emerges within speaking-oriented tasks. Examples include model responses produced by conversational agents, reformulations or recast-like replies to learner output, repeated exposure to lexical and syntactic patterns across interactions, and increased perceptual salience created through feedback and repetition. Importantly, the framework does not suggest that all AI-mediated speaking activities are input-based. Rather, it proposes that many production-oriented AI tasks incorporate input functions that influence spoken development indirectly by supporting learners’ processing, noticing, and engagement with language under more affectively accessible conditions.
A comparison between earlier studies (2018–2020) and more recent research (2024–2025) reveals several important developments in AI-supported speaking instruction. Earlier studies primarily focused on automatic speech recognition systems and structured chatbot interactions designed to improve pronunciation and fluency through repetitive practice. In contrast, recent research increasingly examines generative AI tools, conversational agents, and adaptive feedback systems that enable more interactive and personalised speaking practice. Moreover, contemporary studies place greater emphasis on affective variables such as speaking anxiety, motivation, confidence, and willingness to communicate. This shift reflects a broader movement toward learner-centred and socially mediated perspectives on technology-enhanced speaking instruction.

2.5. Research Gap and Direction for the Present Review

Although recent studies demonstrate growing interest in AI-mediated speaking instruction, the literature remains conceptually dispersed and lacks a unified theoretical synthesis. Many investigations focus on short-term performance gains without explicitly linking findings to input-based theory or examining how AI-mediated interaction functions as comprehensible input. Consequently, there is limited synthesis explaining which input characteristics consistently contribute to speaking development across contexts.
To address this gap, the present systematic review examines empirical studies on AI-mediated instruction for EFL speaking development through the lens of input-based theory. By synthesizing findings across studies and proposing a conceptual framework that positions AI as a dynamic provider of comprehensible input, the review seeks to clarify the theoretical role of AI in speaking development and to inform future research and pedagogical practice.
By synthesising empirical findings across studies and proposing a conceptual framework that positions AI as a dynamic provider of comprehensible input, the review aims to clarify the theoretical role of AI-mediated interaction in speaking development and to inform future research and pedagogical practice.
The present review therefore addresses the following research question: How does AI-mediated interaction function as a source of comprehensible input for the development of EFL speaking skills?

3. Methodology

3.1. Research Design

This study adopts a systematic review methodology conducted in full accordance with the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to ensure transparency, rigour, and replicability [28]. A completed PRISMA checklist is provided as Supplementary Materials, and the study selection process is illustrated in the PRISMA flow diagram (Figure 1). The review protocol was not registered in a public registry prior to data extraction.

3.2. Search Strategy

A structured and systematic search strategy was employed to identify empirical studies examining the role of artificial intelligence (AI) in supporting English as a Foreign Language (EFL) speaking development. The search focused on studies addressing AI-mediated speaking practice, conversational agents, automated feedback systems, and related forms of technology-enhanced oral language learning.
The literature search was conducted in the Scopus database. Scopus was selected because of its broad interdisciplinary coverage, strong representation of applied linguistics, educational technology, and language education research, and its rigorous indexing standards. The use of a single database also supported consistency and transparency in the identification and screening process.
The search was conducted between November 2024 and January 2025. A combination of keywords and Boolean operators was used to retrieve relevant studies. The search terms were designed to capture three core dimensions of the review: artificial intelligence technologies, language learning context, and speaking-related outcomes. A representative search string was as follows:
(“artificial intelligence” OR AI OR chatbot OR “speech recognition” OR “generative AI” OR “large language model”) AND (“EFL” OR “ESL” OR “second language”) AND (“speaking skills” OR “oral proficiency” OR pronunciation OR fluency OR “spoken communication”)
To improve relevance, the search was limited to English-language publications and focused on peer-reviewed journal articles and conference proceedings. The review considered studies published between 2018 and 2025, a period chosen to capture both earlier work on AI-supported speaking practice and more recent developments related to generative AI and advanced conversational systems.
Following retrieval, titles and abstracts were screened to identify studies directly related to AI-supported speaking instruction or speaking-related learning outcomes in EFL or ESL contexts. Studies that appeared relevant were then examined in full text and assessed according to the inclusion and exclusion criteria described in the following section.
This search strategy was intended to provide a focused and analytically robust body of literature for examining how AI-mediated technologies support speaking development, while maintaining methodological transparency in the review process.

3.3. Study Selection Process

The initial search yielded 119 records. As all records were retrieved from a single database and screening was conducted manually, no duplicate records were identified. Title screening excluded 74 studies that did not focus on AI-mediated instruction or interaction related to EFL speaking, leaving 45 studies for abstract screening.
Abstract screening was conducted against the predefined inclusion criteria. Nine records were excluded due to the absence of speaking-related outcomes, lack of learner-focused AI intervention, assessment-only applications without instructional or feedback components, or non-empirical study designs. A total of 36 empirical studies were retained for inclusion in the qualitative synthesis. Studies by the same authors were retained as separate records where they represented distinct publications. The study selection process is illustrated in the PRISMA flow diagram (Figure 1).

3.4. Inclusion and Exclusion Criteria

The inclusion and exclusion criteria were designed to ensure alignment with the objectives of the review and to support a transparent, theory-informed synthesis of empirical research on AI-mediated speaking instruction in EFL contexts. Studies were screened based on study design, participant profile, instructional context, AI technology, and outcome focus.
Specifically, the review included empirical studies (quantitative, qualitative, or mixed-methods) investigating the use of AI-mediated instructional or interactional tools to support speaking development among EFL or ESL learners in formal or semi-formal educational settings. Eligible studies reported speaking-related outcomes such as fluency, accuracy, pronunciation, oral performance, or speaking anxiety and were published in English-language, peer-reviewed journals between 2015 and 2025.
Studies were excluded if they were non-empirical, focused exclusively on non-speaking skills, examined AI applications limited to assessment or scoring without instructional or feedback components, did not involve learner participants, or were inaccessible in full-text form.
Table 1 summarises the eligibility criteria applied during the study selection process.

3.5. Data Extraction and Analysis

Data extraction focused on key characteristics of each study, including research context, participant profile, type of AI technology, instructional design, and reported speaking-related outcomes. The extracted information was analysed using thematic synthesis, which enabled patterns to be identified across studies regarding how AI-mediated instruction has been used to support speaking development in EFL and ESL contexts.
The synthesis was guided by four analytical dimensions: (a) AI modality, (b) pedagogical integration, (c) characteristics of instructional input, and (d) speaking-related outcomes. These dimensions were used to organise findings across heterogeneous studies and to support comparison at both descriptive and interpretive levels.
Given the diversity of research designs, participant populations, and outcome measures among the included studies, formal quality appraisal or risk-of-bias scoring was not undertaken. Instead, greater emphasis was placed on patterns that recurred across multiple studies, instructional designs, and learning contexts. As a result, the synthesis advances interpretive and explanatory insights into the pedagogical role of AI-mediated speaking instruction rather than making causal or broadly generalisable claims about effectiveness.
Patterns reported in the results were derived through qualitative thematic synthesis and frequency comparison across the included studies. Recurring outcomes were identified by examining the distribution of reported linguistic and affective effects across the dataset, allowing the analysis to distinguish between consistently reported outcomes and more variable or context-dependent findings.
This analytic approach supports a pedagogically oriented synthesis by highlighting how instructional design choices mediate the educational value of AI-supported speaking activities across diverse learning contexts.
Accordingly, the review adopts a design-sensitive and theory-building orientation. Rather than estimating aggregate effect sizes across heterogeneous studies, the objective of the synthesis is to identify recurring pedagogical patterns and to interpret how AI-mediated speaking practices function within instructional systems. Given the diversity of research designs, participant populations, and outcome measures across the included studies, a theory-building synthesis provides a more appropriate analytical approach than purely quantitative aggregation. This orientation enables the review to connect empirical findings to established constructs in second language acquisition, particularly those related to instructional input, interaction, and pedagogical mediation.

3.6. Methodological Limitations

Despite its strengths, this review is subject to certain limitations. The reliance on a single database (Scopus) may have resulted in the omission of relevant studies indexed exclusively in other databases such as ERIC or Web of Science. Nevertheless, given Scopus’s broad disciplinary coverage and rigorous indexing standards, the included studies provide a robust and reliable foundation for synthesis. Future reviews may benefit from multi-database search strategies to further expand coverage. As a result, the findings should be interpreted as analytically representative of pedagogically oriented AI-mediated speaking research rather than as a comprehensive census of all AI-related speaking studies.
As summarised in Table 2, the reviewed studies are characterised by a strong concentration in higher education EFL contexts and a predominant focus on affective outcomes such as anxiety reduction and learner engagement. This distribution provides important context for interpreting the speaking outcomes reported across AI modalities.

4. Results

This review synthesised findings from 36 empirical studies investigating the use of artificial intelligence (AI) to support EFL learners’ speaking development [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. The studies were conducted predominantly in higher education and secondary school contexts, with strong representation from Asian EFL settings. Methodologically, the corpus comprised experimental and quasi-experimental designs, mixed-methods investigations, and qualitative case studies, reflecting the heterogeneous nature of AI-mediated language learning research.
Across the dataset, AI technologies were positioned in three principal pedagogical roles: interactional speaking partners (e.g., chatbots and large language models), feedback providers (e.g., automatic speech recognition-based systems), and hybrid instructional tools embedded within structured pedagogical frameworks such as task-based learning or blended instruction [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. While most studies reported positive trends in speaking-related outcomes, the magnitude and durability of these effects varied, suggesting that the evidence should be interpreted as contextually bounded rather than universally generalisable. Table 3 provides a structured overview of the included studies, summarising AI modality, research design, speaking focus, affective outcomes, instructional context, and publication source.

4.1. AI as Interactional Speaking Partner

A substantial subset of the reviewed studies conceptualised AI as an interactional speaking partner, enabling learners to engage in simulated dialogue through chatbots, conversational agents, or large language models [34,36,37,47]. Across these studies, AI-mediated interaction was consistently associated with increased learner engagement, enhanced speaking confidence, and greater willingness to communicate, particularly among learners who reported anxiety in human-mediated speaking contexts.
In both adolescent and university-level settings, AI-supported dialogue was linked to increased speaking frequency, longer turns at talk, and more voluntary participation [32,49]. Several studies attributed these effects to the perceived psychological safety of AI interlocutors, which appeared to reduce fear of negative evaluation and encourage risk-taking in spoken production [29,34]. These participation-related gains were especially pronounced among lower-proficiency learners.
With respect to linguistic development, improvements were most frequently reported in fluency-related dimensions, including reduced hesitation, smoother delivery, and greater continuity of speech. However, evidence for gains in interactional complexity, discourse management, and pragmatic appropriateness was less consistent. Some studies observed that AI exchanges tended to remain structurally predictable or lexically constrained, thereby limiting opportunities for negotiation of meaning or context-sensitive adaptation [16,24].
Indeed, the findings indicate that AI interlocutors are effective in lowering participation barriers and increasing speaking practice. Nevertheless, linguistic development beyond fluency appears more variable and contingent on instructional framing, suggesting that interactional quantity alone does not guarantee deeper communicative competence.

4.2. AI as Feedback Provider

Another prominent strand of research conceptualised AI as a provider of automated speaking feedback, most commonly through ASR-based pronunciation tools, speech evaluation systems, and corrective feedback technologies [45,46,50,56,60].
Across these studies, learners demonstrated measurable improvements in pronunciation accuracy and fluency, particularly in segmental features, stress patterns, and speech rate. Gains in phonological control were among the most consistently reported linguistic outcomes across the dataset.
The immediacy, repeatability, and consistency of AI-generated feedback were frequently identified as key affordances, enabling repeated practice and fostering self-regulated learning [45,50,56]. In several studies, sustained exposure to ASR-mediated feedback was also associated with reductions in speaking anxiety and increased learner confidence [38,45,46].
However, the scope of linguistic development supported by automated systems appeared narrower than that observed in hybrid instructional designs. Feedback systems primarily targeted measurable phonological or fluency features, with limited attention to discourse-level competence, interactional responsiveness, or pragmatic appropriateness [50,56,60].
Taken together, the evidence indicates that AI-based feedback systems are particularly effective in supporting foundational speaking skills, especially pronunciation and fluency, but are less consistently associated with higher-order communicative development.

4.3. Hybrid and Task-Based AI-Integrated Instruction

Studies adopting hybrid instructional designs, in which AI tools were embedded within pedagogically structured tasks, reported comparatively more robust and sustained speaking outcomes than studies relying on AI interaction or feedback alone [40,42,54].
In these contexts, AI functioned as a scaffold rather than a replacement for instruction. Tools were used for preparatory rehearsal, input enhancement, guided practice, or reflective feedback within clearly sequenced learning activities. For example, AI-assisted task-based and production-oriented approaches were associated with improvements in impromptu speaking performance, pragmatic competence, and sustained learner engagement [40,42]. Mobile-assisted and blended designs further extended opportunities for structured out-of-class practice while maintaining alignment with curricular goals [41,44].
Compared to stand-alone chatbot or ASR implementations, hybrid designs more frequently reported gains beyond pronunciation and fluency, including improvements in communicative appropriateness and task performance. However, even within this group, outcome strength varied depending on the clarity of instructional integration. Where AI tools were introduced without explicit task alignment or pedagogical sequencing, gains were uneven and learner uptake inconsistent [39,54].
These findings suggest that structured pedagogical integration is associated with broader and more transferable speaking outcomes.

4.4. Affective Outcomes Associated with AI-Mediated Speaking

Affective outcomes emerged as a consistent cross-cutting theme across the reviewed studies. Reductions in speaking anxiety, increases in willingness to communicate, enhanced learner enjoyment, and improved confidence were reported in studies employing AI interlocutors, feedback systems, and hybrid instructional designs [34,38,45,50,51,52]. These affective gains were observed across both university and secondary contexts and were particularly pronounced among learners who initially reported high levels of speaking apprehension.
Notably, affective improvements were frequently documented even in cases where measurable linguistic gains were modest or limited to specific dimensions such as fluency or pronunciation [45,50]. This pattern suggests that emotional and motivational benefits may emerge independently of, or prior to, broader communicative development. Several studies further emphasised that reductions in anxiety and increased willingness to communicate were associated with learners’ perceptions of AI interlocutors as non-judgmental and repeatable practice environments [34,38,45].
At the same time, a number of investigations cautioned that affective improvements were context-sensitive and dependent on sustained exposure and instructional support [38,39]. Without structured opportunities to transfer AI-mediated confidence and participation to human-mediated speaking tasks, affective gains risk remaining situational rather than enduring.

4.5. Summary of Findings

Overall, the reviewed evidence indicates that AI technologies can play a meaningful supportive role in EFL speaking instruction by expanding practice opportunities, reducing affective barriers, and providing immediate feedback [34,36,38,45,50]. However, the effectiveness of AI-mediated speaking support is neither uniform nor automatic. Studies that positioned AI within pedagogically grounded, task-oriented designs reported more consistent and transferable outcomes than those relying on AI interaction or feedback in isolation [40,41,42,54]. Conversely, research examining stand-alone chatbot or feedback implementations reported more variable gains, particularly in higher-order communicative competence [34,48,52].
Accordingly, the findings support a cautious, context-aware interpretation of AI’s role in speaking development. AI appears most effective when functioning as a mediating instructional resource embedded within structured pedagogical frameworks rather than as a standalone technological solution [40,41,42,54]. Its pedagogical value therefore remains contingent on thoughtful instructional integration and sustained human guidance.

5. Discussion

5.1. Interpreting the Pedagogical Role of AI in EFL Speaking Development

The findings of this review invite a more precise interpretation of how AI-mediated tools function within EFL speaking instruction. While many reviewed studies report positive speaking-related outcomes, these effects cannot be straightforwardly attributed to AI as an autonomous instructional agent. Rather, the evidence suggests that AI most often functions as a pedagogical mediator, with its impact emerging through interaction with instructional design, learner engagement, and affective conditions. Interpreted in this way, the findings complicate technologically deterministic narratives that portray AI as inherently transformative and align with long-standing arguments in ELT and CALL, which assert that learning outcomes are shaped primarily by pedagogical orchestration rather than technological affordances alone [68,69,70].
Importantly, variation in outcome magnitude and durability across the 36 studies indicates that AI does not exert a uniform or self-sustaining influence on speaking development. Studies reporting stronger and more sustained gains typically embedded AI within structured instructional sequences, whereas stand-alone AI interventions were more often associated with limited, short-term, or context-bound effects [34,36,38,45]. From a theoretical perspective, this pattern supports the view that technologies become pedagogically meaningful when normalised within instructional systems rather than introduced as external innovations. Accordingly, AI is better conceptualised not as a disruptive replacement for speaking instruction, but as a contingent instructional resource whose pedagogical value depends on alignment with learning objectives, task design, and learner mediation processes, as synthesised in the proposed mediated input framework.
To synthesise these patterns, Figure 2 presents a mediated input framework that organises how AI-supported speaking instruction functions through the interaction of pedagogical design, affective conditions, and instructional input.

5.2. AI-Mediated Interaction and the Nature of Speaking Practice

A substantial proportion of the reviewed literature conceptualises AI as an interactional speaking partner, most commonly through chatbots and large language models [3,13,14,71]. From an interactionist perspective, reported increases in speaking frequency, turn length, and willingness to communicate provide tentative support for the claim that expanded interactional opportunities may facilitate oral development by encouraging output and engagement [9,10]. At a surface level, these findings appear to align with interaction-based explanations of AI-supported speaking gains.
Prevailing interpretations of AI-mediated speaking instruction tend to conceptualise AI primarily as an interactional partner or a feedback mechanism, implicitly assuming that increased output opportunities or corrective feedback are sufficient drivers of speaking development. However, the synthesis presented in this review indicates that such interpretations do not adequately explain three recurring empirical patterns: (a) why affective gains often precede and exceed linguistic gains, (b) why increased interaction frequency does not consistently result in higher interactional complexity or pragmatic development, and (c) why pedagogical sequencing and task integration exert a stronger influence on outcomes than the technological sophistication of AI systems themselves. The mediated input framework addresses these explanatory gaps by repositioning AI-mediated interaction as a source of accessible, repeatable instructional input whose effectiveness is contingent on pedagogical mediation rather than interaction quantity alone.
However, closer examination reveals important theoretical constraints. While AI-mediated interaction reliably increased participation, evidence for sustained development in interactional complexity, pragmatic appropriateness, and discourse management was uneven. Several studies examining earlier AI language learning systems between 2018 and 2020 reported that AI exchanges were often lexically repetitive, structurally predictable, or limited in contingent responsiveness, thereby constraining opportunities for negotiation of meaning and interactionally driven learning [59,64,65]. These earlier systems were typically based on rule-based chatbots or limited conversational architectures, which restricted the depth and variability of learner interaction.
More recent studies conducted between 2024 and 2025, however, indicate that advances in large language models and AI conversational agents have improved the naturalness and responsiveness of interaction, allowing learners to engage in longer and more varied exchanges. Nevertheless, even in these newer systems, researchers note that AI-mediated conversations may still fall short in supporting pragmatic negotiation, discourse management, and socially situated communication. In interactionist terms, therefore, increased output quantity does not necessarily correspond to qualitatively richer interactional work [11].
This distinction is theoretically consequential. It suggests that AI-mediated dialogue may function primarily as low-stakes or preparatory interaction, supporting fluency, confidence, and willingness to communicate, without fully reproducing the sociocognitive demands of human interaction. Spoken interaction, as described in discourse and sociolinguistic research, involves emergent meaning-making, pragmatic calibration, and sensitivity to social cues that remain difficult for AI systems to simulate consistently [12,72]. Without pedagogical mediation, AI interaction therefore risks privileging surface-level engagement over deeper communicative competence, highlighting the limits of interaction alone as an explanatory mechanism for AI-supported speaking development.

5.3. Automated Feedback, Accuracy, and the Limits of Measurement

Another prominent strand of literature positions AI as a provider of automated speaking feedback, particularly through ASR-based pronunciation and fluency tools [12,68,69]. Across studies, improvements in segmental pronunciation accuracy, speech rate, and learner confidence were frequently reported, often alongside reductions in speaking anxiety [45,50]. These findings suggest that AI-mediated feedback is well suited to supporting form-level aspects of spoken performance, particularly those amenable to repeated practice and self-regulated learning.
At the same time, the reviewed studies highlight a structural limitation inherent in AI-driven feedback systems: feedback is constrained to features that are computationally detectable. Consequently, discourse-level competence, pragmatic appropriateness, and interactional responsiveness remain underrepresented in both feedback provision and outcome measurement [22,68]. This limitation is not merely technical but theoretical, reflecting a broader misalignment between accuracy-oriented metrics and the multidimensional nature of spoken communication [13,14].
From an SLA perspective, these patterns echo long-standing concerns regarding the privileging of measurable accuracy gains at the expense of communicative competence [13,14]. While AI feedback systems appear effective in supporting foundational phonological and fluency development, they cannot substitute for human-mediated evaluation of meaning-making, pragmatic intent, and interactional appropriateness. Accordingly, AI-mediated feedback is best understood as complementary rather than comprehensive, reinforcing the need for pedagogical frameworks that integrate automated feedback with human judgement and discourse-level instruction.

5.4. Pedagogical Integration as a Key Mechanism of Effectiveness

Across the reviewed corpus, pedagogical integration emerged as a consistently differentiating factor between more effective and more limited forms of AI-mediated speaking instruction. Studies embedding AI within task-based, production-oriented, or blended instructional designs tended to report more stable and transferable speaking outcomes than those employing AI in isolation [40,41,42,54]. In these contexts, AI served clearly defined pedagogical functions, such as task rehearsal, input enhancement, or reflective feedback.
This pattern aligns closely with sociocultural perspectives that foreground mediation, scaffolding, and goal-directed activity as central to language development [71]. By contrast, studies lacking clear pedagogical integration frequently reported uneven learner uptake and fragile outcomes [39], underscoring that technological sophistication alone does not guarantee instructional effectiveness.
Taken together, these findings support a design-sensitive interpretation in which learning outcomes are co-constructed through the interaction of tools, tasks, learners, and instructional intent, rather than being driven by technological affordances in isolation.

5.5. Affective Gains as Enabling Conditions for Speaking Development

Affective outcomes emerged as one of the most consistently reported patterns across the reviewed studies. Reductions in speaking anxiety and increases in willingness to communicate were observed across AI interlocutor, feedback-based, and hybrid instructional designs [34,38,45,50], with particularly notable effects among lower-proficiency learners and those with prior negative speaking experiences. These findings suggest that AI-mediated environments may exert their most immediate influence at the level of affective accessibility to speaking opportunities rather than directly on higher-order communicative competence.
From an affective-filter perspective, the reviewed evidence indicates that AI-mediated environments can lower psychological barriers to participation, thereby increasing learners’ readiness to engage with spoken input and output [4]. Across studies, reduced anxiety and enhanced willingness to communicate appeared to function as enabling conditions that facilitated sustained engagement with speaking tasks, particularly in contexts where fear of negative evaluation constrained participation [3,8]. In this sense, affect operates less as an outcome and more as a condition that shapes learners’ access to instructional input, interaction, and feedback.
At the same time, affective improvements were not consistently accompanied by proportional gains in higher-order communicative competence. Several studies cautioned that without opportunities to transfer AI-mediated confidence and fluency to human-mediated interaction, affective gains may remain situational and context-bound [34,38]. This pattern aligns with interactionist and sociocultural perspectives, which emphasise that speaking development is ultimately shaped through socially situated meaning-making rather than isolated participation [10,71]. Reduced anxiety may enable participation, but it does not guarantee the development of pragmatic control, discourse management, or interactional sensitivity.

5.6. Conceptual Framework: Interpreting AI as a Mediated Input Resource

Building on the patterns identified across the reviewed studies, this section presents a conceptual framework that organises how AI-mediated instruction supports EFL speaking development through the interaction of instructional input, affective conditions, and pedagogical design (see Figure 2). Rather than positioning AI as an autonomous instructional agent, the framework reflects how prior research has used AI as a mediated input resource whose pedagogical value depends on alignment with instructional goals and task design.
At the centre of the framework is AI-mediated input characterised by adaptability, repeatability, and accessibility. Across the reviewed studies, AI tools provided learners with sustained exposure to level-appropriate linguistic input through simulated interaction, shadowing, and feedback-driven practice. This input appeared most effective when it was embedded within pedagogically structured activities, including sequenced tasks, goal-oriented speaking activities, and teacher-guided integration.
Surrounding this input component are affective conditions, particularly reduced speaking anxiety and increased willingness to communicate. Within the framework, affective support is interpreted as an enabling condition that facilitates learners’ engagement with input and interaction rather than as an instructional outcome in its own right. Lower affective barriers were associated with increased access to speaking opportunities and greater learner participation.
Together, these elements contribute to speaking development outcomes, including gains in fluency, accuracy, pronunciation, and speaking confidence. The framework also highlights that higher-order communicative competence, such as pragmatic appropriateness and discourse management, was most consistently reported when AI-mediated input was integrated into socially meaningful and pedagogically structured speaking tasks rather than used as a stand-alone practice tool.
Overall, the framework offers an integrative way of organising existing evidence on AI-mediated speaking instruction. It clarifies the pedagogical conditions under which AI-supported practices appear most effective, while also delineating the limits of technology-driven approaches when instructional mediation is weak. The framework is not intended to account for all instructional contexts, and its explanatory power appears strongest in settings where learners possess sufficient receptive proficiency to benefit from AI-mediated input and where pedagogical structures support transfer to human-mediated interaction.

6. Conclusions and Implications

This systematic review synthesised empirical research on AI-mediated instruction for EFL speaking development, with particular attention to how AI-supported practices are pedagogically designed and how they function within instructional contexts. Across the 36 studies reviewed, AI was most commonly employed as an interactional partner, a source of automated feedback, or a hybrid form of instructional support. Rather than functioning as an autonomous instructional agent, AI-mediated tools were most effective when embedded within purposeful pedagogical designs that structured learners’ engagement with spoken language.
One pattern emerging consistently from the review is that the most stable benefits of AI-mediated speaking instruction were observed in affective domains, including reductions in speaking anxiety, increased willingness to communicate, and enhanced learner confidence. While gains in fluency, pronunciation, and accuracy were frequently reported, evidence for sustained development of higher-order interactional competence was more variable. These findings suggest that AI-supported speaking activities may function primarily as preparatory or enabling environments that lower affective barriers to participation, rather than as substitutes for socially situated communicative interaction.
Importantly, the effectiveness of AI-mediated input appeared to depend less on technological features per se than on how AI tools were pedagogically integrated. Studies that embedded AI within task-based, production-oriented, or blended instructional designs tended to report more transferable and durable outcomes than those relying on stand-alone or tool-driven implementations. This design-sensitive pattern highlights the importance of instructional intent, task structure, and teacher mediation in shaping the pedagogical value of AI-supported speaking practice.
From a theoretical perspective, the review offers a clearer account of how AI-mediated speaking activities can be interpreted through input-oriented and task-based perspectives. AI environments may provide adaptive, repeatable, and affectively supportive forms of language exposure that facilitate learners’ engagement with spoken input. However, affective gains should be understood as enabling conditions rather than endpoints of acquisition, and opportunities for goal-oriented, socially meaningful interaction remain essential for the development of communicative competence.
For future research, several directions emerge from the findings of this review. First, more longitudinal investigations are required to examine whether improvements observed in AI-mediated speaking environments lead to sustained development in real communicative settings over time. Many studies included in this review measured short-term improvements in fluency, pronunciation, or learner confidence, but fewer examined whether these gains transfer to authentic human interaction in classroom or professional contexts. Second, future research should explore higher-order communicative competence, including pragmatic appropriateness, discourse management, interactional responsiveness, and negotiation of meaning. While AI tools appear effective in supporting foundational speaking skills, their role in fostering more complex communicative abilities remains less well understood. Third, greater attention should be given to teacher mediation and instructional design in AI-supported speaking environments. Investigating how teachers scaffold AI-supported activities, integrate them with classroom interaction, and guide learners’ reflection on AI-generated feedback may provide deeper insights into effective pedagogical models. Finally, future studies should address the ethical and pedagogical implications of AI-based speaking assessment, including issues related to feedback reliability, algorithmic bias, transparency of evaluation criteria, and the responsible use of automated scoring systems in language learning contexts.
For practice, the review suggests that AI is best adopted as a supportive instructional resource rather than a replacement for communicative pedagogy or teacher expertise. AI tools appear especially well suited for preparatory speaking practice, pronunciation and fluency rehearsal, anxiety-sensitive scaffolding, and extending speaking opportunities beyond classroom time. In addition, institutions and instructors should critically evaluate the ethical implications of AI-supported feedback and assessment systems, ensuring transparency, fairness, and responsible use of automated evaluation tools in language education.
Therefore, this review contributes to a more nuanced understanding of the pedagogical role of AI in EFL speaking development by clarifying how instructional design mediates effectiveness and by situating AI-supported speaking practice within established educational principles. Such an approach supports more principled integration of AI into language teaching while avoiding technologically deterministic assumptions.
As AI continues to enter mainstream educational settings, design-sensitive and pedagogy-first syntheses such as this review are essential for ensuring that technological adoption remains aligned with educational rather than purely technical priorities.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/encyclopedia6040074/s1, PRISMA checklist [28].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Bygate, M. Speaking. In The Cambridge Guide to Teaching English to Speakers of Other Languages; Carter, R., Nunan, D., Eds.; Cambridge University Press: Cambridge, UK, 2001; pp. 14–20. [Google Scholar]
  2. Thornbury, S. How to Teach Speaking; Longman: London, UK, 2005. [Google Scholar]
  3. Horwitz, E.K. Language anxiety and achievement. Annu. Rev. Appl. Linguist. 2001, 21, 112–126. [Google Scholar] [CrossRef]
  4. Krashen, S.D. Principles and Practice in Second Language Acquisition; Pergamon: Oxford, UK, 1982. [Google Scholar]
  5. VanPatten, B. Input processing in second language acquisition. In Theories in Second Language Acquisition, 2nd ed.; VanPatten, B., Williams, J., Eds.; Routledge: London, UK, 2015; pp. 113–134. [Google Scholar]
  6. Ellis, R. The Study of Second Language Acquisition, 2nd ed.; Oxford University Press: Oxford, UK, 2008. [Google Scholar]
  7. Godwin-Jones, R. Using mobile technology to develop language skills and cultural understanding. Lang. Learn. Technol. 2018, 22, 1–17. [Google Scholar]
  8. MacIntyre, P.D.; Clément, R.; Dörnyei, Z.; Noels, K.A. Conceptualizing willingness to communicate in a L2: A situational model of L2 confidence and affiliation. Mod. Lang. J. 1998, 82, 545–562. [Google Scholar] [CrossRef]
  9. Swain, M. Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In Input in Second Language Acquisition; Gass, S.M., Madden, C.G., Eds.; Newbury House: Rowley, MA, USA, 1985; pp. 235–253. [Google Scholar]
  10. Long, M.H. The role of the linguistic environment in second language acquisition. In Handbook of Second Language Acquisition; Academic Press: San Diego, CA, USA, 1996. [Google Scholar]
  11. Gass, S.M.; Mackey, A. Input, Interaction, and Output in Second Language Acquisition; Lawrence Erlbaum: Mahwah, NJ, USA, 2007. [Google Scholar]
  12. McCarthy, M.J.; O’Keeffe, A. Research in the teaching of speaking. Annu. Rev. Appl. Linguist. 2004, 24, 26–43. [Google Scholar]
  13. Ellis, R.; Barkhuizen, G. Analysing Learner Language; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
  14. Norris, J.M.; Ortega, L. Towards an Organic Approach to Investigating CAF in Instructed SLA: The Case of Complexity. Appl. Linguist. 2009, 30, 555–578. [Google Scholar] [CrossRef]
  15. Xing, C. A systematic review on artificial intelligence technologies in ESL/EFL speaking skills. Int. J. TESOL Stud. 2025, 8, 240–270. [Google Scholar]
  16. Safitri, E.I.; Hidayati, S.; Ciptaningrum, D. The impact of AI chatbots on English language learners’ speaking proficiency: A systematic review. J. Res. Engl. Lang. Learn. 2025, 6, 317–329. [Google Scholar] [CrossRef]
  17. Wang, Z.; Li, L. Does AI-Assisted Instruction Facilitate Listening and Speaking in Junior High EFL Classrooms? Int. J. Engl. Lang. Stud. 2025, 7, 1–4. [Google Scholar] [CrossRef]
  18. Muthmainnah, M. AI-CiciBot as Conversational Partners in EFL Education, focusing on Intelligent Technology Adoption (ITA) to Mollify Speaking Anxiety. J. Engl. Lang. Teach. Appl. Linguist. 2024, 6, 76–85. [Google Scholar] [CrossRef]
  19. Ma, M.; Noordin, N.; Razali, A.B. Effects of an AI Chatbot Mobile Application on Foreign Language Anxiety among Chinese EFL Undergraduates. Int. J. Acad. Res. Prog. Educ. Dev. 2024, 13, 3828–3839. [Google Scholar]
  20. Nguyen, L.A.D.; Le, T.T.P. Exploring the Effects of an AI Chatbot on Emotional Engagement in English Speaking Lessons: Insights from Call Annie. Int. J. AI Lang. Educ. 2025, 2, 79–99. [Google Scholar] [CrossRef]
  21. Farooqi, S.-U.-H. Efficacy of AI-Generated Feedback by SmallTalk2Me for Improving Speaking Skill of Saudi EFL Learners. Forum Linguist. Stud. 2025, 7, 714–728. [Google Scholar] [CrossRef]
  22. Fauzi, I.; Hartono, R.; Rukmini, D.; Pratama, H. AI Applications for EFL Learners: Enhancing Speaking Performance and Reducing Anxiety with Gender-Based Analysis. Forum Linguist. Stud. 2025, 7, 282–301. [Google Scholar] [CrossRef]
  23. Zou, B.; Xie, S.; Wang, C. Students’ Willingness to Communicate (WTC) in Using Artificial Intelligence (AI) Technology in English-Speaking Practice. Int. J. Inf. Commun. Technol. Educ. 2025, 21, 1–18. [Google Scholar] [CrossRef]
  24. Nguyễn, Q.N.; Lê, H.V.; Nguyen, T.T.T. The Impact of AI-Supported Speaking Practice on EFL Learners’ Confidence in Vietnam. Int. J. Adv. Multidiscip. Res. 2025, 5, 384–393. [Google Scholar]
  25. John, A. Gamification in English language teaching: A pathway to fostering teacher–student rapport, teacher immediacy and students’ willingness to communicate. XLinguae 2024, 17, 47–58. [Google Scholar] [CrossRef]
  26. John, A. Exploring the impact of artificial intelligence on language acquisition, linguistic development, and language use: A case study from India. Forum Linguist. Stud. 2025, 7, 1104–1117. [Google Scholar] [CrossRef]
  27. Praveen, R.; Irudayasamy, J.; Garlapati, B.S.; Nithyasri, S.; John, A.; Praveena, S. Analysing EFL teachers’ perceptions of AI’s role in academic integrity and pedagogy with Bert-LSTM. In Proceedings of the 2025 Global Conference in Emerging Technology (GINOTECH); IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
  28. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  29. Handley, Z. Has artificial intelligence rendered language teaching obsolete? Mod. Lang. J. 2024, 108, 548–555. [Google Scholar] [CrossRef]
  30. Huang, X.; Han, X.; Dou, A. Generative AI in EFL Speaking Instruction: Teachers’ Reflections on Effectiveness and Implementation Barriers. J. High. Vocat. Educ. 2024, 1, 13–17. [Google Scholar] [CrossRef]
  31. Wei, W.; Zhao, A.; Ma, H. Understanding How AI Chatbots Influence EFL Learners’ Oral English Learning Motivation and Outcomes: Evidence From Chinese Learners. IEEE Access 2025, 13, 56699–56716. [Google Scholar] [CrossRef]
  32. Ballıdağ, M.; Aydın, S. A comparison of the effects of AI-based chatbots and peer interactions on speaking anxiety among EFL learners. Future Educ. Res. 2025, 3, 224–238. [Google Scholar] [CrossRef]
  33. Zou, B.; Du, Y.; Wang, Z.; Chen, J.; Zhang, W. An investigation into artificial intelligence speech evaluation programs with automatic feedback for developing EFL learners’ speaking skills. SAGE Open 2023, 13, 21582440231. [Google Scholar] [CrossRef]
  34. Matiienko-Silnytska, A.; Mikava, N.; Savranchuk, I.; Tkhor, N.; Poliakova, H. Conversational Analysis of Learner–AI Chatbot Interactions in Developing Spoken Fluency. Arab World Engl. J. 2025, 16, 224–238. [Google Scholar] [CrossRef]
  35. Li, L.; Zhang, X.; Zou, B.; Yang, Q. AI partner or peer partner? Exploring AI-mediated interaction in EFL pronunciation from a socio-cultural perspective. Learn. Cult. Soc. Interact. 2025, 55, 100958. [Google Scholar] [CrossRef]
  36. Ma, Y.; Wang, Z.; Pang, H. The more the merrier? Examining the effects of a conversational agent on EFL learners’ speaking in three conditions. Comput. Educ. 2025, 239, 105442. [Google Scholar] [CrossRef]
  37. Behforouz, B.; Al-Maqbali, A.; Al-Ghaithi, A.; Poorghorban, A. Mobile Interaction Meets AI Tutoring: Using ChatGPT-4o to Boost Speaking Skills in EFL Classrooms. Int. J. Interact. Mob. Technol. 2025, 19, 34–49. [Google Scholar] [CrossRef]
  38. Rahman, G.; Mudhsh, B.A.; Almutairi, M.; Kouki, M. Optimizing ESL Learners’ Speech Act Performance: The Role of AI-Powered Chatbots in Pragmatic Competence Development. Theory Pract. Lang. Stud. 2025, 15, 3187–3198. [Google Scholar] [CrossRef]
  39. Shi, W.; Shakibaei, G. Insights Into the Effectiveness of Artificial Intelligence-Integrated Speaking Instruction in Enhancing Speaking Skills and Social–Emotional Competence as Well as Reducing Demotivation and Shyness. Eur. J. Educ. 2025, 60, e70174. [Google Scholar]
  40. Yang, G.; Wang, Y.; Zhang, Y.; Yang, M.; Zeng, Q.; Song, Z. An Empirical Study of AI-Supported Interleaved Training Strategy to Improve EFL Students’ English Impromptu Speaking Performance, Learning Engagement, Technology Acceptance and Epistemic Network Structure: An Empirical Study of AI-Supported Interleaved Training Strategy. Asia-Pac. Educ. Res. 2025, 34, 1519–1540. [Google Scholar]
  41. Zhou, Q.; Hashim, H.; Sulaiman, N.A. Supporting English speaking practice in higher education: The impact of AI chatbot-integrated mobile-assisted blended learning framework. Educ. Inf. Technol. 2025, 30, 14629–14660. [Google Scholar] [CrossRef]
  42. Juan, W.; Ismail, H.H.; Mansor, A.Z. Enhancing Chinese EFL Learners’ Speaking Proficiency through AI-Integrated POA. Educ. Sci. Theory Pract. 2025, 25, 44–57. [Google Scholar]
  43. Pituxcoosuvarn, M.; Tanimura, M.; Murakami, Y.; White, J.S. Enhancing EFL speaking skills with AI-powered word guessing: A comparison of human and AI partners. Information 2025, 16, 427. [Google Scholar] [CrossRef]
  44. Muniandy, J.; Selvanathan, M. ChatGPT, a partnering tool to improve ESL learners’ speaking skills: Case study in a Public University, Malaysia. Teach. Public Adm. 2025, 43, 4–20. [Google Scholar] [CrossRef]
  45. Aljabr, F. ASR using Speechnotes for EFL learners: A Study of the Effects on English Pronunciation and Prosody Skills. J. Ecohumanism 2025, 4, 979–987. [Google Scholar] [CrossRef]
  46. Li, W.; Mohamad, M.; You, H.W. Exploring the effects of using automatic speech recognition on EFL university students with high speaking anxiety. Int. J. Inf. Educ. Technol. 2025, 15, 187–194. [Google Scholar] [CrossRef]
  47. Zheng, Y.; Zhou, Y.; Chen, X.; Ye, X. The influence of large language models as collaborative dialogue partners on EFL English oral proficiency and foreign language anxiety. Comput. Assist. Lang. Learn. 2025, 1–27. [Google Scholar] [CrossRef]
  48. Lee, J. Speaking English with AI or Humans: What Engages EFL Learners More? Engl. Teach. 2025, 80, 41–66. [Google Scholar] [CrossRef]
  49. Dakhil, T.A.; Karimi, F.; Al-Jashami, R.A.U.; Ghapanchi, Z. The Effect of Artificial Intelligence (AI)-Mediated Speaking Assessment on Speaking Performance and Willingness to Communicate of Iraqi EFL Learners. Int. J. Lang. Test. 2025, 15, 1–18. [Google Scholar]
  50. Hsu, H.W. Utilizing shadowing practice and automatic speech recognition technology to enhance EFL learners’ pronunciation accuracy and speaking fluency. J. Comput. Educ. 2025. [Google Scholar] [CrossRef]
  51. Tai, T.Y.; Chen, H.H.J. Impact of generative ai chatbots and interaction modes on the speaking proficiency of adolescent efl learners. Comput. Assist. Lang. Learn. 2025, 1–30. [Google Scholar] [CrossRef]
  52. Kang, E.Y. Enhancing L2 Learners’ Affective Outcomes and Oral Proficiency Through AI-Chatbot Interaction. Korean J. Engl. Lang. Linguist. 2025, 25, 1299–1314. [Google Scholar] [CrossRef]
  53. Li, W.; Mohamad, M.; You, H.W. Impact of the application of mobile integrated speech recognition and automated writing evaluation software on university learners’ EFL speaking competence. Int. J. Inf. Educ. Technol. 2025, 15, 1997–2012. [Google Scholar] [CrossRef]
  54. Zhou, W.; Xu, X.; Luo, X.; Yang, G. AI-assisted instruction: An empirical study of an objective, observation, and organization-based SVVR approach to promote students’ English impromptu speaking performance, speaking anxiety, and cognitive network structure. Interact. Learn. Environ. 2025, 1–18. [Google Scholar] [CrossRef]
  55. Li, W.; Mohamad, M.; You, H.W. Integrating automatic speech recognition and automated writing evaluation to reduce speaking anxiety and enhance speaking competence among Chinese EFL learners. Cogent Educ. 2025, 12. [Google Scholar] [CrossRef]
  56. Shadiev, R.; Feng, Y.; Zhussupova, R.; Altinay, F. Effects of speech-enabled corrective feedback technology on EFL speaking skills, anxiety and confidence. Comput. Assist. Lang. Learn. 2024, 1–37. [Google Scholar] [CrossRef]
  57. Kemelbekova, Z.; Degtyareva, X.; Yessenaman, S.; Ismailova, D.; Seidaliyeva, G. AI in teaching English as a foreign language: Effectiveness and prospects in Kazakh higher education. XLinguae 2024, 17, 69–83. [Google Scholar] [CrossRef]
  58. Tai, T.Y.; Chen, H.H.J. The impact of intelligent personal assistants on adolescent EFL learners’ speaking proficiency. Comput. Assist. Lang. Learn. 2024, 37, 1224–1251. [Google Scholar] [CrossRef]
  59. Bashori, M.; van Hout, R.; Strik, H.; Cucchiarini, C. Effects of ASR-based websites on EFL learners’ vocabulary, speaking anxiety, and language enjoyment. System 2021, 99, 102496. [Google Scholar] [CrossRef]
  60. Yan, H.; Singh, M.K.S.; Rawian, R.M. AI-Based Corrective Feedback in EFL Interactive Speaking: Insights from Interactionist SLA Theory. Educ. Sci. Theory Pract. 2024, 24, 306–322. [Google Scholar]
  61. Tsai, S.C. Learning with Mobile Augmented Reality- and Automatic Speech Recognition-Based Materials for English Listening and Speaking Skills: Effectiveness and Perceptions of Non-English Major English as a Foreign Language Students. J. Educ. Comput. Res. 2023, 61, 444–465. [Google Scholar] [CrossRef]
  62. Aufi, A.; Naqvi, S.; Naidu, V.R.; Homani, Y.A. Integrating HTML5-based Speech Recognition with Learning Management System to Enhance ELF Learner’s Pronunciation Skills. J. Teach. Engl. Spec. Acad. Purp. 2023, 11, 507–520. [Google Scholar]
  63. Hsu, M.H.; Chen, P.S.; Yu, C.S. Proposing a task-oriented chatbot system for EFL learners speaking practice. Interact. Learn. Environ. 2023, 31, 4297–4308. [Google Scholar] [CrossRef]
  64. Yang, H.; Kim, H.; Lee, J.H.; Shin, D. Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL 2022, 34, 327–343. [Google Scholar] [CrossRef]
  65. Ye, Y.; Deng, J.; Liang, Q.; Liu, X. Using a Smartphone-Based Chatbot in EFL Learners’ Oral Tasks. Int. J. Mob. Blended Learn. 2022, 14, 1–17. [Google Scholar] [CrossRef]
  66. Çakmak, F. Chatbot-Human Interaction and Its Effects on EFL Students’ L2 Speaking Performance and Anxiety. Novitas-ROYAL 2022, 16, 113–131. [Google Scholar]
  67. Ahn, T.Y.; Lee, S.M. User experience of a mobile speaking application with automatic speech recognition for EFL learning. Br. J. Educ. Technol. 2016, 47, 778–786. [Google Scholar] [CrossRef]
  68. Chapelle, C.A. Computer Applications in Second Language Acquisition; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
  69. Hubbard, P. A general introduction to CALL. In Computer Assisted Language Learning: Critical Concepts in Linguistics; Routledge: London, UK, 2009. [Google Scholar]
  70. Bax, S. CALL—Past, present and future. System 2003, 31, 13–28. [Google Scholar] [CrossRef]
  71. Lantolf, J.P.; Thorne, S.L. Sociocultural Theory and the Genesis of Second Language Development; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
  72. Hall, J.K. Essentials of SLA for L2 Teachers: A Transdisciplinary Framework; Routledge: New York, NY, USA, 2019. [Google Scholar]
Figure 1. PRISMA 2020 flow diagram of study identification, screening, and inclusion.
Figure 1. PRISMA 2020 flow diagram of study identification, screening, and inclusion.
Encyclopedia 06 00074 g001
Figure 2. AI as a Mediated Input Framework for EFL Speaking Development. Note: This framework conceptualises artificial intelligence (AI) as a mediated instructional input resource rather than as an autonomous teaching agent. AI-mediated input (e.g., chatbot interaction, ASR-supported practice, automated feedback) operates within pedagogically structured tasks and instructional designs shaped by teacher guidance and curricular goals. Affective conditions, particularly reduced speaking anxiety and increased willingness to communicate, function as enabling conditions that support learners’ engagement with input and interaction. Speaking development outcomes, including fluency, accuracy, pronunciation, and speaking confidence, emerge through the interaction of instructional input, affective accessibility, and pedagogical mediation. The framework does not suggest that AI-mediated input alone is sufficient for higher-order communicative competence; rather, such competence appears most likely to develop when AI-supported activities are integrated into socially meaningful, goal-oriented speaking tasks.
Figure 2. AI as a Mediated Input Framework for EFL Speaking Development. Note: This framework conceptualises artificial intelligence (AI) as a mediated instructional input resource rather than as an autonomous teaching agent. AI-mediated input (e.g., chatbot interaction, ASR-supported practice, automated feedback) operates within pedagogically structured tasks and instructional designs shaped by teacher guidance and curricular goals. Affective conditions, particularly reduced speaking anxiety and increased willingness to communicate, function as enabling conditions that support learners’ engagement with input and interaction. Speaking development outcomes, including fluency, accuracy, pronunciation, and speaking confidence, emerge through the interaction of instructional input, affective accessibility, and pedagogical mediation. The framework does not suggest that AI-mediated input alone is sufficient for higher-order communicative competence; rather, such competence appears most likely to develop when AI-supported activities are integrated into socially meaningful, goal-oriented speaking tasks.
Encyclopedia 06 00074 g002
Table 1. Eligibility Criteria for Study Selection.
Table 1. Eligibility Criteria for Study Selection.
CriteriaInclusionExclusion
Study DesignEmpirical investigations (quantitative, qualitative, or mixed-methods) examining AI-mediated speaking instructionPurely theoretical papers, opinion pieces, reviews, or non-empirical commentaries
ParticipantsEFL or ESL learners in secondary, tertiary, or adult education contextsStudies involving non-language learners, non-educational users, or participants not clearly identified
ContextFormal or semi-formal educational settings (e.g., schools, universities, language programmes) with pedagogical implementation of AICommercial, informal, or self-study applications without instructional design
TechnologyAI-mediated tools supporting speaking development (e.g., chatbots, ASR systems, large language models, AI-driven feedback)Non-AI digital tools or AI systems limited to assessment/scoring only
Outcome FocusSpeaking-related outcomes (e.g., fluency, accuracy, pronunciation, oral performance, speaking anxiety)Studies focusing exclusively on non-speaking skills
Analytical DepthSystematic analysis of learning processes, instructional design, or speaking outcomesSuperficial or anecdotal reporting
Table 2. PRISMA Summary of Study Selection.
Table 2. PRISMA Summary of Study Selection.
PRISMA StageNumber of Records
Records identified from Scopus119
Records after duplicates removed119
Records screened (title screening)119
Records excluded (title screening)74
Records assessed for eligibility (abstracts)45
Records excluded after abstract screening9
Studies included in qualitative synthesis36
Table 3. Overview of Included Studies.
Table 3. Overview of Included Studies.
No.Author(s) (Year)AI
Technology
Study DesignSpeaking
Focus
Affective/Engagement OutcomesContextJournal
1Matiienko-Silnytska et al. (2025) [34]Voice-based chatbotExperimentalFluency, discourse features↓ anxiety, ↑ WTCUniversity EFLArab World English Journal
2Li et al. (2025) [35]AI pronunciation partnerMixed-methodsPronunciation↑ enjoyment, ↓ anxietyUniversity EFLLearning, Culture and Social Interaction
3Ma et al. (2025) [36]Generative AI conversational agentExperimental (EEG-supported)Fluency, confidence↑ confidence, interestUniversity EFLComputers & Education
4Behforouz et al. (2025) [37]ChatGPT-4oExperimentalAccuracy, fluency↑ motivationUniversity EFLInternational Journal of Interactive Mobile Technologies
5Rahman et al. (2025) [38]AI chatbotMixed-methodsPragmatic competence↓ anxiety, ↑ confidenceUniversity ESLTheory and Practice in Language Studies
6Shi & Shakibaei (2025) [39]AI-integrated instructionQuasi-experimentalSpeaking skills↓ shyness, ↓ demotivationSecondary EFLEuropean Journal of Education
7Yang et al. (2025) [40]AI-supported interleaved trainingQuasi-experimentalImpromptu speaking↑ engagementUniversity EFLAsia-Pacific Education Researcher
8Zhou et al. (2025) [41]AI chatbot + MABLQuasi-experimentalFluency, appropriacy↑ satisfactionUniversity EFLEducation and Information Technologies
9Juan et al. (2025) [42]ChatGPT-integrated POAMixed-methodsFluency, accuracy↓ anxietyUniversity EFLEducational Sciences: Theory and Practice
10Pituxcoosuvarn et al. (2025) [43]AI word-guessing partnerExperimentalFluency, complexity↓ intimidationUniversity EFLInformation (Switzerland)
11Muniandy & Selvanathan (2025) [44]ChatGPTMixed-methodsSpeaking performance↑ engagementUniversity ESLTeaching Public Administration
12Aljabr (2025) [45]ASR (Speechnotes)Mixed-methodsPronunciation, prosody↑ confidenceUniversity EFLJournal of Ecohumanism
13Li et al. (2025) [46]ASRMixed-methodsSpeaking anxiety reduction↓ anxietyUniversity EFLInternational Journal of Information and Education Technology
14Zheng et al. (2025) [47]Large Language Model (GPT-4)RCTOral proficiency↓ anxiety, ↑ WTCUniversity EFLComputer Assisted Language Learning
15Lee (2025) [48]AI vs. peer interactionWithin-subjectsSpeaking proficiency↓ engagement over timeUniversity EFLEnglish Teaching (South Korea)
16Dakhil et al. (2025) [49]AI-mediated assessmentExperimentalAccuracy, fluency↑ WTCUniversity EFLInternational Journal of Language Testing
17Hsu (2025) [50]ASR + shadowingExperimentalPronunciation, fluency↑ motivationUniversity EFLJournal of Computers in Education
18Tai & Chen (2025) [51]Generative AI chatbotExperimentalSpeaking proficiency↓ anxietySecondary EFLComputer Assisted Language Learning
19Kang (2025) [52]ChatGPT-based chatbotExperimentalGrammar, vocabulary↓ anxiety, ↑ WTCUniversity EFLKorean Journal of English Language and Linguistics
20Li et al. (2025) [53]ASR + AWEQuasi-experimentalSpeaking competence↑ confidenceUniversity EFLInternational Journal of Information and Education Technology
21Zhou et al. (2025) [54]AI-assisted instructionExperimentalImpromptu speaking↓ anxietyUniversity EFLInteractive Learning Environments
22Li et al. (2025) [55]ASR + AWEMixed-methodsSpeaking competence↓ anxietyUniversity EFLCogent Education
23Shadiev et al. (2024) [56]Speech-enabled corrective feedbackExperimentalPronunciation, fluency↓ anxiety, ↑ confidenceUniversity EFLComputer Assisted Language Learning
24Kemelbekova et al. (2024) [57]AI chatbotsExperimentalSpeaking proficiencyMixed attitudesUniversity EFLXLinguae
25Tai & Chen (2024) [58]Intelligent personal assistantsExperimentalSpeaking proficiency↑ enjoymentSecondary EFLComputer Assisted Language Learning
26Bashori et al. (2021) [59]ASR-enabled websitesQuasi-experimentalVocabulary, pronunciation↓ anxiety, ↑ enjoymentSecondary EFLSystem
27Yan et al. (2024) [60]AI corrective feedbackQualitativeSpeaking accuracyMixed affective impactUniversity EFLEducational Sciences: Theory and Practice
28Zou et al. (2023) [33]AI speech evaluationMixed-methodsSpeaking skills↑ self-efficacyUniversity EFLSAGE Open
29Tsai (2023) [61]AR + ASRExperimentalListening & speaking↑ satisfactionUniversity EFLJournal of Educational Computing Research
30Al Aufi et al. (2023) [62]HTML5-based ASRExploratoryPronunciation↑ autonomyUniversity EFLJournal of Teaching English for Specific and Academic Purposes
31Hsu et al. (2023) [63]Task-oriented chatbotExperimentalSpeaking practice↓ anxietyUniversity EFLInteractive Learning Environments
32Yang et al. (2022) [64]AI voice chatbotExperimentalInteractional speaking↑ engagementSchool EFLReCALL
33Ye et al. (2022) [65]Smartphone chatbotExperimentalGrammar, pronunciation↑ confidenceUniversity EFLInternational Journal of Mobile and Blended Learning
34Çakmak (2022) [66]Chatbot-human interactionExperimentalSpeaking performanceMixed anxiety effectsUniversity EFLNovitas-ROYAL
35Ahn & Lee (2016) [67]Mobile ASR applicationQualitativeSpeaking practicePositive UXSecondary EFLBritish Journal of
Educational Technology
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bhar, S.K. Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input. Encyclopedia 2026, 6, 74. https://doi.org/10.3390/encyclopedia6040074

AMA Style

Bhar SK. Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input. Encyclopedia. 2026; 6(4):74. https://doi.org/10.3390/encyclopedia6040074

Chicago/Turabian Style

Bhar, Sareen Kaur. 2026. "Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input" Encyclopedia 6, no. 4: 74. https://doi.org/10.3390/encyclopedia6040074

APA Style

Bhar, S. K. (2026). Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input. Encyclopedia, 6(4), 74. https://doi.org/10.3390/encyclopedia6040074

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop