Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024)

Yan, Weijian; Li, Belle; Lowell, Victoria L.

doi:10.3390/educsci15081066

Open AccessSystematic Review

Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024)

by

Weijian Yan

^*

,

Belle Li

and

Victoria L. Lowell

College of Education, Purdue University, West Lafayette, IN 47907, USA

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(8), 1066; https://doi.org/10.3390/educsci15081066

Submission received: 3 July 2025 / Revised: 24 July 2025 / Accepted: 14 August 2025 / Published: 19 August 2025

(This article belongs to the Section Technology Enhanced Education)

Download

Browse Figures

Versions Notes

Abstract

This systematic literature review examines the integration of Artificial Intelligence (AI) and Extended Reality (XR) technologies in language education, synthesizing findings from 32 empirical studies published between 2017 and 2024. Guided by the PRISMA framework, we searched four databases—ERIC, Web of Science, Scopus, and IEEE Xplore—to identify studies that explicitly integrated both AI and XR to support language learning. The review explores publication trends, educational settings, target languages, language skills, learning outcomes, and theoretical frameworks, and analyzes how AI–XR technologies have been pedagogically integrated, and identifies affordances, challenges, design considerations, and future directions of AI–XR integration. Key integration strategies include coupling AI with XR technologies such as automatic speech recognition, natural language processing, computer vision, and conversational agents to support skills like speaking, vocabulary, writing, and intercultural competence. The reported affordances pertain to technical, pedagogical, and affective dimensions. However, challenges persist in terms of technical limitations, pedagogical constraints, scalability and generalizability, ethical and human-centered concerns, and infrastructure and cost barriers. Design recommendations and future directions emphasize the need for adaptive AI dialogue systems, broader pedagogical applications, longitudinal studies, learner-centered interaction, scalable and accessible design, and evaluation. This review offers a comprehensive synthesis to guide researchers, educators, and developers in designing effective AI–XR language learning experiences.

Keywords:

artificial intelligence; extended reality; language education; systematic literature review; immersive learning; AI–XR integration strategies; affordances; challenges; design considerations; future directions

1. Introduction

The integration of emerging technologies in language education has been a growing area of interest among educators and researchers seeking to enhance pedagogical effectiveness, learner engagement, and address critical needs and issues in language education (Bozkir et al., 2024; Godwin-Jones, 2023; Lowell & Yan, 2024; Yan & Lowell, 2024). Among these innovations, Extended Reality (XR), including Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), and Metaverse, and Artificial Intelligence (AI), have demonstrated significant potential to transform the landscape of language learning (Yan et al., 2024). XR technologies can immerse students in contextualized, authentic environments conducive to experiential and situated learning, while AI enables adaptive, personalized support through intelligent feedback and data-driven instruction (Y.-L. Chen et al., 2022; Divekar et al., 2022; Li et al., 2024a; Yan & Lowell, 2025; Yan et al., 2024; Yang & Wu, 2024).

The affordances—the specific educational benefits and capabilities—of XR technologies in language learning are well-documented, with each component—VR, AR, MR, and Metaverse—offering unique benefits (Hwang & Lee, 2024; Karacan & Akoğlu, 2021; Lowell & Yan, 2024; Tafazoli, 2024; Yan et al., 2024). VR creates simulated, interactive three-dimensional environments that promote deep engagement and authentic language practice (C. Chen et al., 2021; Lowell & Yan, 2023; Yan & Lowell, 2024), while AR enriches real-world settings with digital overlays to support contextualized learning (Nazeer et al., 2024; Parmaxi & Demetriou, 2020). MR merges virtual and physical elements, facilitating collaborative and situated learning interactions (J. Chen et al., 2022). Metaverse expands these affordances into persistent, socially interactive spaces where learners can engage in meaningful, task-based language practices (Godwin-Jones, 2023). These XR technologies support situated learning by immersing learners in contexts that replicate real-world communicative scenarios (Yan & Lowell, 2024; Yan et al., 2024).

In parallel, AI has increasingly been adopted to augment language learning by providing scalable, personalized, and adaptive educational experiences (Y.-C. Chen, 2024; Li et al., 2024b; Makeleni et al., 2023). AI-driven applications—such as natural language processing (NLP), automatic speech recognition (ASR), text-to-speech (TTS), machine learning (ML), computer vision (CV) (e.g., object detection, gesture recognition, image detection), real-time feedback systems (RTFS), conversational agents (CA), multimodal AI agents (MMAA), machine translation (MT), and large language models (LLMs)—are capable of assessing learners’ speaking, listening, writing, and reading comprehension dynamically, supporting immediate corrective feedback and allowing for self-paced, autonomous personalized learning (Aslan et al., 2025; Crum et al., 2024; Escalante et al., 2023; Godwin-Jones, 2023; Li et al., 2024c; Liu, 2023; Yan & Lowell, 2025).

Given their respective strengths, AI and XR are increasingly recognized as synergistic technologies capable of facilitating authentic, learner-centered, and interactive language education (Rangel-de Lazaro & Duart, 2023). Consequently, numerous systematic literature reviews have examined AI or XR independently in the context of language learning. For instance, Makhenyane (2024) focused on mobile-based AR in language education, while Schorr et al. (2024) synthesized design principles from AR-based foreign language learning environments. Similarly, Hamilton et al. (2020) and Huang et al. (2021) investigated immersive VR’s impact on language learning outcomes and user experiences. On the AI front, Crompton and Burke (2023) reviewed applications of AI in vocabulary acquisition and language instruction, and Almelhes (2023) explored AI tools for pronunciation training. Furthermore, Vall and Araya (2023) highlighted both the benefits and limitations of AI in language learning, particularly the importance of preserving human interaction. A recent review by Rangel-de Lazaro and Duart (2023) provided preliminary insights into AI and XR integration in online higher education, primarily in the post-pandemic context.

Despite these research studies, the existing literature largely investigates AI and XR in isolation. This means that most studies focus on either AI or XR independently, without exploring how their integration can jointly enhance language learning through complementary capabilities. There is currently no comprehensive systematic review that synthesizes how AI and XR are integrated in language education, nor are there established frameworks for understanding their integration strategies —such as how AI-driven adaptivity, feedback, or personalization is embedded within immersive XR environments to support language learning—alongside their combined affordances, challenges, design considerations, and future directions. This gap is significant given the increasing convergence of these technologies in educational settings. Therefore, the purpose of this systematic literature review is to examine how AI and XR have been integrated together to support language learning, with a particular focus on their integration strategies, affordances, challenges, design considerations, and future research directions. To structure this investigation and provide a comprehensive understanding of the current landscape, this review is guided by the following research questions:

RQ1: What is the current landscape of AI–XR integration in language education in terms of publication trends, geographic distribution, educational contexts, reported learning outcomes, technological applications, and theoretical framework?

RQ2: How have AI and XR technologies been pedagogically integrated to support language learning, and what affordances have been reported in these implementations?

RQ3. What challenges, design considerations, and future research directions have been identified in the integration of AI and XR in language education?

2. Methodology

Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (Page et al., 2021; PRISMA, 2015), we employed a systematic approach encompassing a structured search, study selection, and data synthesis. The study selection process adhered to the four PRISMA phases of Identification, Screening, Eligibility, and Inclusion, as detailed below.

2.1. Search Strategy (Identification)

On 18 December 2023, we conducted comprehensive literature searches across four electronic databases: ERIC, Web of Science, Scopus, and IEEE Xplore. These databases were chosen to capture both educational and technical research perspectives. For instance, ERIC was selected for its extensive coverage of education literature, Web of Science for its high-quality, multidisciplinary indexing and citation tracking, Scopus for its broad academic coverage, and IEEE Xplore for its focus on engineering and computer science (relevant to AI and XR research). We devised Boolean search strings that combined synonyms for artificial intelligence, extended reality, and language education. For example, search terms included “AI” OR “Artificial Intelligence” (and related terms like “machine learning” or “natural language processing”) AND “XR” OR its variants “Virtual Reality”, “Augmented Reality”, “Mixed Reality”, “Extended Reality” (including abbreviations VR, AR, MR, XR, and “metaverse”) AND language learning keywords (e.g., “language learning”, “language teaching”, “language education”). These terms were adjusted for each database’s search syntax but maintained the same logic of requiring at least one AI-related term, one XR-related term, and a language education term in each result. This search strategy yielded a total of 924 records (before removing duplicates) across the four databases. All references were exported to a review management tool Rayyan (Ouzzani et al., 2016) for the next phase.

2.2. Inclusion and Exclusion Criteria (Eligibility)

We defined clear inclusion and exclusion criteria a priori to guide the selection of studies, focusing on the relevance to AI–XR integration in language education:

2.2.1. Inclusion Criteria

Studies had to be published between 2017 and 2024 (to capture the latest developments in AI and XR technology), be written in English, and report empirical research (quantitative, qualitative, or mixed method) that explicitly combined an AI component with an XR component in the context of language learning or language education. We included both journal articles and peer-reviewed conference papers, provided they presented original data (e.g., experiments, design cases, or user studies) on AI–XR learning experiences for language teaching or learning. Only studies available in full text were considered.

2.2.2. Exclusion Criteria

We excluded publications that were literature reviews, opinion pieces, or purely conceptual papers without new empirical data. We also excluded studies that, despite appearing in the search results, did not actually focus on language learning (for example, works that applied AI or XR in other domains). Studies that involved only AI or only XR (but not both) were excluded, as our interest was specifically in their integration—for instance, a paper on an AI tutoring system without any XR component, or vice versa, would be omitted. Additionally, if a study mentioned AI or XR only in passing (e.g., in an introduction or as a future possibility) without integrating these technologies into the research design, it was excluded. This ensured that all included studies had a genuine dual focus on AI and XR in language education.

2.3. Screening and Selection

After running the searches, we imported all 924 references into Rayyan (Ouzzani et al., 2016), a web-based application designed to facilitate systematic review screening. Rayyan expedited the screening process by automating duplicate identification and allowing independent, blinded screening of titles and abstracts (Ouzzani et al., 2016). First, Rayyan (Ouzzani et al., 2016)identified and removed 143 duplicate records among the 924, yielding a set of unique articles for screening. We then conducted a title and abstract screening on the remaining articles. Two reviewers (the first two authors of this review) independently screened each title/abstract against the inclusion criteria, marking studies as “include” or “exclude” within Rayyan (Ouzzani et al., 2016). This tool’s semi-automation and blinding features helped maintain screening consistency and efficiency.

During the abstract screening, the two reviewers initially disagreed on a small number of studies (only eight conflicts arose). These conflicts were resolved through discussion and consensus, after which the inter-rater agreement was found to be very high (indicating a high level of consistency in applying the criteria). All studies that either reviewer judged as potentially relevant were promoted to the next stage. In line with PRISMA recommendations, we documented the counts at each stage of screening (identification, screening, eligibility, inclusion) for transparency.

Following the title/abstract screening, we obtained the full texts of all remaining candidate studies and assessed each against the inclusion/exclusion criteria in detail (eligibility check). Studies that did not meet all inclusion criteria upon full-text reading (for example, a paper turned out not to involve an XR component in practice) were excluded at this stage, with reasons noted (e.g., “not a language learning study,” “XR not actually used”). Finally, a total of 32 articles listed in the Appendix A satisfied all criteria and were included in the review for data extraction and analysis. (Figure 1 in our PRISMA flow diagram summarizes the selection process, including the numbers of records identified, screened, excluded, and included at each phase.)

For each of the 32 included studies, we performed systematic data extraction using a predefined coding schema. We recorded key information from each study, as shown in Table 1.

Data extraction was carried out carefully to ensure accuracy, with one reviewer extracting and cross-checking critical data points for a subset of studies to validate consistency.

After extracting these details, we analyzed the data using a qualitative coding approach. We developed a coding framework that combined deductive categories (aligned with our research questions and the above data fields) and inductive coding to capture emergent themes. In practice, this meant that while we initially categorized studies by predefined aspects (e.g., types of AI, types of XR), we remained open to new patterns or themes that arose from the data. We employed a grounded coding methodology inspired by Corbin and Strauss’s grounded theory approach (Corbin & Strauss, 2014). This approach allowed patterns and themes to emerge from the data without being constrained by an overly rigid preconceived framework. For example, as we coded the studies, common themes began to surface, such as specific affordances of A–XR integration (like real-time feedback in immersive environments) or recurring challenges (like technical complexity or lack of teacher training). We continuously refined our codes using the constant comparative method until we reached a stable set of themes that covered all findings.

Finally, we synthesized the results narratively, grouping the studies and their insights according to these emergent themes and the review questions. This qualitative synthesis was supported by simple descriptive statistics (e.g., counts of how many studies used VR vs. AR, how many focused on a certain language skill, etc.) to characterize the state of research. By following this rigorous methodology for search, selection, and analysis, we aimed to ensure that our review of AI–XR learning experiences in language education is transparent, reproducible, and comprehensive.

3. Results

3.1. Demographic Information of Selected Studies

3.1.1. Geographic Locations

Table 2 and Figure 2 present the geographic distribution of authors who have contributed to the selected publications on AI–XR learning experiences in language education, published between 2017 and 2024.

Table 2 highlights a noteworthy diversity in the authors’ geographic locations, which indicates a broad interest in AI–XR learning experiences in language education research across different regions. There are 29 collaborative international studies (e.g., Guo et al., 2017; Hwang et al., 2024; H. Lee et al., 2023; Shadiev et al., 2021). The approach for handling entries with multiple countries is to count each country individually. This method is applied to represent and quantify the contribution from each country listed in the multi-country entries accurately.

The author distribution across the 32 articles shows a strong regional concentration in Asia, contributing 55% of the studies. Mainland China leads with 10 articles, followed by South Korea (5), Taiwan (4), Japan (2), and one each from Hong Kong and Singapore. North America follows with 9 articles (8 from the U.S., 1 from Canada), while Europe is moderately represented with 6 articles from the UK (3), Spain (1), Italy (1), and Russia (1). Africa and South America contribute only two articles each, from countries such as Morocco, Egypt, Brazil, and Uruguay. These findings highlight research concentration in East Asia and North America, with limited involvement from other regions, suggesting disparities in research activity and capacity.

3.1.2. Publication Trend

Figure 3 illustrates the publication trend from 2017 to 2024, highlighting a growing scholarly interest in the AI-integrated XR learning experiences in language education. Starting with only one publication each in 2017 and 2018, the number of articles surged to eight in 2019, marking the initial rise in research activity. The decline in 2020 and 2022, with only three publications each and none in 2021, was likely due to the COVID-19 pandemic. After that, the trend rebounded with six articles in 2023. The upward momentum continued, culminating in a peak of ten publications in 2024. This progression suggests increasing academic engagement and recognition of the relevance of this interdisciplinary research area over time.

3.1.3. Scholarly Sources by Type

Figure 4 illustrates the distribution of publication types among the selected articles. Eighteen conference proceedings represent the largest share at 56%, followed by 13 journal articles at 41%, and a single book chapter making up 3%, reflecting a strong emphasis on conference-based dissemination in this research domain.

The predominance of conference proceedings in AI–XR integration research for language education can be attributed primarily to the field’s emerging and interdisciplinary nature, which encourages rapid dissemination of innovative but preliminary findings. In reviewing the papers, we found that these researchers often focus on prototype development, system design, and applied solutions rather than fully validated theoretical models, aligning better with the practical orientation of conferences. In addition, we believe that the fast-paced evolution of AI and XR technologies incentivizes early sharing of research through conferences with shorter publication cycles, allowing timely feedback and relevance. Journals, requiring more extensive empirical validation and theoretical depth, are less suited to the early-stage, design-based studies that currently characterize much of the AI–XR integration work in language learning.

3.1.4. Educational Levels

Figure 5 displays the distribution of studies based on educational levels: Higher Education, K-12, HigherEd and K-12, and Not Specified (with the majority focusing on language in a broad sense). Among the 32 selected studies, a significant majority (22 articles, 69%) were conducted in higher education contexts, highlighting a strong research emphasis on the application of AI-integrated XR technologies at the tertiary level. In contrast, only five articles (16%) focused on the K-12 setting, and two studies (6%) addressed both K-12 and higher education contexts. Additionally, three studies (9%) did not specify the educational level, often concentrating on general language learning without situating the research within a particular stage of education. This distribution indicates that the integration of AI and XR in language education is currently more prevalent in higher education, potentially due to more advanced technological infrastructure, better funding, and stronger research engagement in university settings.

3.1.5. Target Languages

Figure 6 illustrates that a predominant focus was placed on English language learning, with 18 articles (55%) targeting English as the language of instruction or acquisition. Mandarin Chinese followed with five studies (15%), while Japanese appeared in two articles (6%). The remaining languages—Amazigh, Arabic, French, Russian, and Spanish—were each addressed in a single study (3% respectively), and three studies (9%) did not specify the language being taught. This distribution reveals a strong concentration of research efforts on English, reflecting its status as a global lingua franca and the widespread demand for English language proficiency in academic, professional, and digital domains. The limited representation of other languages suggests a need for broader exploration into how AI and XR can support diverse linguistic and cultural contexts, particularly for less commonly taught or regional languages.

3.1.6. Language Foci

Figure 7 illustrates the distribution of language skills emphasized across the 32 selected studies. Speaking skills emerged as the most prominently targeted area, with over half of the studies (n = 16) focusing on oral communication. Listening skills were addressed in three studies, while pronunciation training was explicitly targeted in two. A smaller number of studies focused on other foundational or receptive skills, including reading, grammar, and alphabet learning, each covered by a single study. Three studies explored cross-cultural communication and intercultural competence, emphasizing the integration of language learning with authentic cultural contexts through immersive simulations or AI-enhanced exploratory tasks. Notably, seven studies did not specify a particular language skill focus, often due to broader exploratory or integrative aims, such as examining learner experience, system architecture, or pedagogical design within AI–XR language learning environments.

3.1.7. Participants Number

Figure 8 indicates that among the 32 reviewed studies, sample sizes varied across those that included empirical testing. Specifically, seven studies involved fewer than 30 participants, three studies had between 30 and 60 participants, and another seven reported sample sizes between 61 and 100. Notably, 15 studies did not specify a sample size, as many of them focused on exploratory system development or application design and did not include user studies or empirical testing.

3.1.8. Length of Intervention

The duration of experiments varied across the reviewed studies, as shown in Figure 9. Seven studies were conducted as single-session interventions, while four lasted between one and four weeks. Two studies spanned five to eight weeks, and five extended over a longer period, ranging from 10 to 32 weeks. However, 14 studies did not specify the duration of the intervention, which is often due to their focus on system development or conceptual exploration rather than empirical implementation.

3.1.9. Learning Outcomes

As shown in Table 3, the studies reported a variety of cognitive, affective, linguistic, and cultural learning outcomes. Affective outcomes included increased motivation (n = 11), enhanced engagement (n = 10), and reduced anxiety (n = 3). Linguistic outcomes were reported in areas such as vocabulary acquisition (n = 6), overall speaking proficiency (n = 3), listening comprehension (n = 4), speaking fluency (n = 2), and pronunciation improvement (n = 1). Cognitive outcomes involved knowledge transfer and retention (n = 3), learner autonomy (n = 2), enhanced interaction (n = 3), and personalized learning experiences (n = 1), reflecting the development of higher-order thinking skills and tailored learning processes. Cultural learning outcomes were identified in two studies (n = 2), highlighting gains in intercultural awareness and competence. Notably, 10 studies did not specify the learning outcomes.

3.1.10. XR Applications

Table 4 illustrates a strong emphasis on immersive and interactive language learning through XR technologies, with 19 studies utilizing VR, 7 incorporating AR, 3 employing MR, and 3 exploring Metaverse-based applications. These modalities were employed to enhance different dimensions of language acquisition, such as speaking, listening, writing, reading, and cross-cultural competence, by immersing learners in authentic and context-rich environments.

VR emerged as the most prevalent modality, adopted in 19 studies. These VR applications were realized through a range of design platforms and assets. Four studies used Unity 3D, a versatile game engine favored for creating interactive 3D content (e.g., Bottega et al., 2023; Park et al., 2019; Smuts et al., 2019; Tazouti et al., 2019). Two others combined Unity 3D with 3D image-based environments (e.g., Y.-L. Chen et al., 2022; Yang & Wu, 2024), while 360° video was utilized in two cases to enable panoramic, immersive visualizations for contextual writing tasks (e.g., Shadiev et al., 2021; Y. Wang et al., 2022). Other formats included the use of static 3D images (n = 3) for environment-based vocabulary learning (e.g., Nakamura et al., 2024; Seow, 2023; Song et al., 2023), and commercially available VR software (n = 2) tailored for educational deployment (e.g., Y.-C. Chen, 2024; Gorham et al., 2019). Notably, six studies did not specify the underlying development software.

AR was employed in seven studies and offered more mobile, accessible XR formats by overlaying virtual content onto the physical environment. Four AR studies did not report their technical development frameworks, while the rest used platforms such as Unity 3D, often coupled with tools like 3D Max and Vuforia to support object tracking and real-time interaction (Xin & Shi, 2024). One study utilized 3D images within the AR framework for vocabulary enhancement (Tolba et al., 2024), and one study used Unity 3D to design an AR to allow users to practice their target language with immediate feedback at any time, and from any location (Hollingworth & Willett, 2023). These AR tools emphasized context-sensitive learning and spontaneous speaking, fostering vocabulary retention and pronunciation practice.

MR was used in three studies, which blended virtual and physical spaces more seamlessly than AR. A notable example is the use of a 360° cylindrical panoramic environment for collaborative, multimodal learning (Allen et al., 2019; Chabot et al., 2020; Divekar et al., 2022). These MR environments allowed multiple learners to engage in real time with both peers and digital objects via gesture and voice input, providing culturally immersive language practice grounded in situated learning theory.

Metaverse, represented in three studies, reflects a growing interest in persistent, networked virtual environments for long-term language engagement. These studies deployed 3D image-based platforms (S. Lee et al., 2025) or commercial metaverse solutions (Hwang et al., 2024) to create social XR spaces for language use, identity exploration, and intercultural communication. One study used Unity 3D to build a persistent, avatar-based world for immersive conversational English practice (Yu, 2023). These applications extend beyond task-based learning, supporting social presence, agency, and motivation in language development.

Unity 3D emerged as the most frequently used development platform across VR, AR, MR, and Metaverse studies, reflecting its flexibility for immersive instructional design. Other tools such as Vuforia, 3D Max, and commercial XR platforms (e.g., SPOT Virtual Program) were also employed to support object recognition, animation, and scalability. Regardless of the specific XR modality, these systems consistently emphasized authentic, situated, and multimodal learning experiences to scaffold learners’ engagement and proficiency across a range of language domains.

3.1.11. AI Applications

In Table 5 and Figure 10 a wide array of AI technologies was employed to enhance language learning experiences, each contributing unique affordances to support learners’ language development. Automatic Speech Recognition (ASR) emerged as the most frequently used AI technology, appearing in 22 studies. ASR systems enable real-time transcription of learners’ spoken language, providing immediate feedback on pronunciation and fluency. This technology is particularly useful for supporting speaking practice and fostering self-awareness of oral language errors (e.g., Allen et al., 2019; Bottega et al., 2023; Chabot et al., 2020; Y.-C. Chen, 2024; Divekar et al., 2022).

Natural Language Processing (NLP) was the second most common technology, found in 14 studies. NLP tools are widely utilized for analyzing learners’ written or spoken inputs to provide semantic, syntactic, and grammar-based feedback. For example, NLP-powered applications help learners revise written texts or analyze dialogue coherence in speaking tasks, offering personalized support and enhancing learners’ awareness of language structure and usage (e.g., Guo et al., 2017; Hollingworth & Willett, 2023; H. Lee et al., 2023). Closely following NLP, Computer Vision (CV) was incorporated in 13 studies, especially those involving object detection, image recognition, gesture recognition, or motion tracking. CV technologies detect and interpret learners’ non-verbal communication, body posture, and facial expressions, contributing to more immersive and multimodal learning environments (e.g., Hajahmadi et al., 2024; H. Lee et al., 2023; Mirzaei et al., 2018; Song et al., 2023).

Text-to-speech (TTS) systems were used in 11 studies. These technologies convert written text into spoken language, supporting listening comprehension and pronunciation modeling for learners (e.g., Nakamura et al., 2024; Obari et al., 2020; Park et al., 2019; Seow, 2023). In addition, seven studies implemented a Real-Time Feedback System (RTFS) and a Conversational Agent (CA), respectively. RTFS included automated evaluative feedback or scaffolding mechanisms that adapt to learners’ performance, while CA facilitated human–computer interactions in the form of dialogues with AI-powered avatars or chatbots that simulate conversational practice (e.g., Y.-C. Chen, 2024; Guo et al., 2017; Hajahmadi et al., 2024; Shukla et al., 2019).

Large Language Models (LLMs), such as GPT-based systems, appeared in six studies. These systems were used to generate fluent responses, evaluate learner input, or support creative writing and reflection tasks (e.g., Bottega et al., 2023; Hajahmadi et al., 2024; Hollingworth & Willett, 2023). Meanwhile, Machine-Learning (ML) algorithms were employed in five studies to predict learner behavior, personalize content, or improve the accuracy of speech or text evaluation (e.g., Divekar et al., 2022; Gorham et al., 2019; Shukla et al., 2019). Multimodal AI Agent (MMAA), which integrates speech, vision, and gesture input/output modalities, was reported in three studies, contributing to highly interactive, embodied learning environments (e.g., Y.-L. Chen et al., 2022; Divekar et al., 2022; Tazouti et al., 2019). Finally, only one study employed MT, underscoring a relatively limited but targeted use of AI for cross-linguistic support in language learning contexts (e.g., Shadiev et al., 2021).

This distribution reveals a clear emphasis on speech-related AI technologies—such as ASR and TTS—highlighting the field’s interest in enhancing oral proficiency through automated means. At the same time, the increasing use of LLMs and MMAAs reflects a growing trend toward more integrated, adaptive, and immersive learning experiences.

3.1.12. Theoretical Framework

Figure 11 indicates that learning theories or design frameworks are rarely applied in the 32 reviewed studies. Notably, 27 articles do not specify the theoretical underpinnings guiding their interventions or design decisions. Only a few studies explicitly draw on established theories: Smuts et al. (2019) employ Cognitive Learning Theory to design and evaluate a VR-AI environment for teaching English prepositions; Gorham et al. (2019) apply Embodied Cognition Theory to highlight the affordances of immersive VR combined with machine learning for writing Japanese kanji characters; and Y.-L. Chen et al. (2022) adopt Engagement Theory to demonstrate how a contextualized AI-VR environment enhances learner engagement. Yu (2023) proposes an immersive and adaptive virtual learning system grounded in Experiential Learning, Adaptive Learning, and Maslow’s Hierarchy of Human Needs to deliver personalized instruction. Additionally, Shadiev et al. (2021) integrate Cultural Convergence Theory, Contextual Learning Theory, and the Cross-Cultural Competence Model to guide the design of a 360-degree video-based, AI-supported learning activity aimed at fostering cross-cultural communication.

3.2. Integration Strategies and Integration Affordances

3.2.1. Integration Strategies

Table 6 presents the various integration strategies used in speaking, listening, writing, vocabulary, and cross-cultural learning, which are the primary language focuses of the selected studies.

AI–XR Integration Strategies in Speaking

In 15 studies, four key strategies emerge for enhancing speaking skills through the integration of XR with different AI technologies—ASR, TTS, CA, and CV—as indicated in Table 6.

Integration of XR with Automatic Speech Recognition (ASR)

The most prevalent integration strategy involves coupling XR environments with ASR technologies. This pairing allows learners to engage in immersive speaking tasks where their speech is captured, transcribed, and analyzed in real time. ASR provides immediate, automated feedback on pronunciation, fluency, and accuracy, enabling learners to self-monitor and adjust their speaking performance. For example, in VR environments (Y.-C. Chen, 2024; Guo et al., 2017), learners practice conversation or public speaking in contextually rich scenarios where ASR transcribes their utterances and assesses performance. In MR settings (Divekar et al., 2022), ASR enables learners to interact with holographic agents and spatial interfaces through voice commands, encouraging natural language use and embodied engagement. AR-based systems (Hollingworth & Willett, 2023; H. Lee et al., 2023; Shukla et al., 2019) similarly utilize ASR to recognize learner speech during guided language tasks anchored in real-world settings. This strategy is particularly effective for developing pronunciation, fluency, and real-time interaction skills, supporting situated and experiential learning in both simulated and authentic contexts.

Integration of XR with Text-to-Speech (TTS)

A second strategy involves integrating XR with TTS technologies to simulate natural language output from AI agents or virtual interlocutors. TTS enables XR systems to deliver verbal feedback, instructions, or conversational responses, enriching the immersive experience by giving learners exposure to native-like speech models.

In Y.-L. Chen et al. (2022), VR was used in conjunction with a robot tutor that communicated with learners through synthesized speech generated by a TTS engine. This allowed for more engaging and interactive exchanges. Similarly, in Park et al. (2019), TTS was employed in a VR system to provide learners with auditory models of target phrases, reinforcing correct pronunciation and intonation patterns. The integration of TTS is particularly beneficial for listening comprehension, pronunciation modeling, and turn-taking practice, as learners not only speak but also respond to spoken language in context, mimicking natural conversational dynamics.

Integration of XR with AI-Driven Conversational Agents (CA)

A more advanced strategy combines XR with CAs or chatbots that can interpret learner input (via ASR and NLP) and generate appropriate, context-aware responses. These agents are often embedded in VR or AR environments, functioning as virtual speaking partners or tutors.

Yun et al. (2024) and Mirzaei et al. (2018) incorporated such agents into their systems, enabling learners to engage in task-based or scenario-driven dialogues. The agents responded to learner input with semantically and grammatically appropriate replies, allowing for sustained interaction. Some systems also included adaptive feedback mechanisms, using learner data to tailor responses and scaffold learning progressively. This strategy is especially effective for practicing dialogue, pragmatic competence, and interactive speaking skills, as it mirrors real-world communication through responsive and intelligent conversational flow.

Integration of XR with Computer Vision (CV)

CV technologies—such as gesture tracking, facial recognition, and object detection—have been integrated into XR-based speaking environments to interpret learners’ non-verbal cues and promote more holistic language learning. For example, J.-H. Wang et al. (2020) used Microsoft Kinect v2.0 to capture skeletal and motion data, offering feedback on body language in VR-based speaking tasks. Similarly, Mirzaei et al. (2018) mirrored learners’ gestures onto avatars using motion capture, enhancing embodied interaction. These applications support the development of non-verbal fluency and presentation skills alongside spoken language.

Object detection, a key application of computer vision in XR environments, has been employed to enhance speaking skills by contextualizing conversations and enabling real-time interaction with visual elements. In Y.-L. Chen et al. (2022), a Unity AI plug-in enabled object recognition, allowing a robot to perceive and respond to learners’ visual inputs during language tasks, thereby simulating more natural, visually grounded dialogues. Bottega et al. (2023) integrated YOLOv5 for real-time object detection in language learning games, enabling learners to identify, track, and refer to objects during conversations, thus reinforcing vocabulary and situational fluency. Similarly, H. Lee et al. (2023) employed YOLOv7-tiny with AR glasses to detect surrounding objects and dynamically tailor conversation prompts, fostering more authentic and context-aware speaking practice.

AI–XR Integration Strategies in Listening

In three studies, two key strategies emerge for enhancing listening skills through the integration of XR with different AI technologies—ASR and NLP.

Integration of XR with Automatic Speech Recognition (ASR)

The first strategy combines XR with ASR to support real-time interaction and listening comprehension. For instance, in Guo et al. (2017), learners enter fully immersive VR environments where they engage in simulated conversations with virtual agents. ASR technology captures and transcribes the learners’ spoken input, which not only helps assess their speech but also enables the system to respond appropriately. This interaction fosters active listening as learners must comprehend spoken prompts in authentic contexts and respond meaningfully, promoting a two-way exchange that mirrors real-life communication.

Integration of XR with Natural Language Processing (NLP)

The second strategy integrates XR with NLP to enhance understanding of complex language input and provide adaptive support. In Allen et al. (2019), learners interact in an MR environment enriched with AI agents capable of processing learner responses using NLP. These agents analyze semantic content and adjust the dialogue flow accordingly, offering clarification or repetition when comprehension issues are detected. Unlike ASR, which focuses on transcribing spoken language, NLP interprets meaning and intent, making it possible to scaffold listening tasks based on learner needs. This strategy allows for more personalized and context-aware support, deepening the learner’s engagement with the listening material.

AI–XR Integration Strategies in Writing

In three studies, three strategies emerge for enhancing writing skills through the integration of XR with different AI technologies—RTFS, TTS, and ML.

Integration of XR with Real-Time Feedback System (RTFS)

One key strategy involves the integration of XR with RTFS to support context-rich writing tasks. In Y. Wang et al. (2022), students engaged with spherical video-based VR (SVVR) environments that simulated authentic cultural settings. After completing writing tasks related to the immersive scenarios, an AI-driven feedback system provided individualized suggestions to improve accuracy and coherence. This combination of experiential immersion and adaptive feedback supports the development of situationally appropriate written expression.

Integration of XR with Text-to-Speech (TTS)

Another approach combines XR with TTS technologies to aid vocabulary acquisition and writing fluency. Xu et al. (2019) used an AR-based mobile app that allowed learners to write English words by hand on cards, which were then scanned and read aloud using TTS. This multimodal interaction reinforced spelling and pronunciation while encouraging learners to write more confidently and accurately.

Integration of XR with Machine Learning (ML)

A third strategy applies ML techniques in XR-enhanced environments for post-hoc analysis of writing performance. In Gorham et al. (2019), students used VR tools (Oculus Rift with Kingspray) for writing-related tasks, and unsupervised learning methods (K-means clustering) were later used to categorize writing patterns. This allowed researchers to evaluate writing development trends across learners, offering insights for refining instructional design.

AI–XR Integration Strategies in Vocabulary

In three studies, two strategies emerge for learning vocabulary through the integration of XR with different AI technologies—ASR and CV.

Integration of XR with Automatic Speech Recognition (ASR)

One key strategy involves the integration of XR with ASR to facilitate active vocabulary practice. In Chabot et al. (2020), learners interacted with immersive XR environments using voice input, where ASR converted their spoken words into text. This allowed learners to actively produce target vocabulary in context-rich scenarios and receive immediate feedback, reinforcing both pronunciation and retention. Similarly, Allen et al. (2019) used ASR within MR environments to enable learners to listen to and repeat vocabulary terms, with the system evaluating accuracy in real time. These applications support vocabulary acquisition by linking verbal practice to interactive, immersive experiences.

Integration of XR with Computer Vision (CV)

Another approach leverages XR with CV to enhance object-based vocabulary learning. For example, Song et al. (2023) employed object detection through CV to identify and label real-world items in AR environments. Learners interacted with these objects using mobile AR devices, receiving vocabulary prompts and contextual explanations. This strategy bridges digital content with physical surroundings, helping learners associate new words with visual and spatial cues in authentic settings.

AI–XR Integration Strategies in Cross-Cultural Learning

In three studies, two key strategies emerge for enhancing cross-cultural learning through the integration of XR with different AI technologies—ASR and CA.

Integration of XR with Automatic Speech Recognition (ASR)

A key integration strategy in cross-cultural language learning involves combining VR with ASR to support immersive, interactive speaking practice. For instance, Yang and Wu (2024) used fully immersive 3D VR environments where learners engaged in conversations with virtual characters, while ASR enabled real-time feedback on their spoken input. Similarly, Shadiev et al. (2021) employed 360-degree VR videos enhanced with ASR, allowing learners to interact with culturally rich scenarios through speech, which was transcribed and assessed to guide learning. These approaches merge authentic cultural exposure with immediate language feedback, fostering both linguistic and intercultural competence.

Integration of XR with Conversational Agent (CA)

Another strategy involves integrating AI-powered CAs in VR. In Mirzaei et al. (2018), learners interacted with responsive avatars in role-play scenarios, receiving contextualized, voice-based feedback. This setup promotes pragmatic awareness and fluency by situating learners in realistic cross-cultural dialogues. Together, these strategies highlight how AI–XR environments can create meaningful, culturally grounded language learning experiences.

The AI–XR integration and learning benefits are summarized in Figure 12.

3.2.2. Affordances

AI–XR integration in language education affords learners immersive, interactive, and individualized experiences that support not only skill development but also motivation, autonomy, and well-being. These affordances represent a paradigm shift from conventional classroom instruction toward adaptive, multimodal, and learner-centered environments.

Technical Affordances

AI-integrated XR systems provide advanced multimodal input/output capabilities that enable learners to engage in naturalistic, real-time interaction through speech, gesture, gaze, and spatial movement. Many platforms support hands-free, headset-free experiences through panoramic visuals or AR overlays (e.g., Divekar et al., 2022; Chabot et al., 2020; Hajahmadi et al., 2024), while others use speech recognition such as ASR and TTS, and CAs to create interactive, responsive environments (e.g., H. Lee et al., 2023; Park et al., 2019). Systems often include real-time transcription, pronunciation evaluation, grammar correction, and vocabulary scaffolding, supported by RTFS. Furthermore, scalability is enhanced through web-based or mobile deployment, markerless AR, and integration with LMS dashboards (e.g., Xu et al., 2019; Yun et al., 2024).

Pedagogical Affordances

These systems allow learners to engage in authentic, contextualized, and task-based learning scenarios—such as virtual marketplaces, tourist sites, or role-plays in restaurants—that mirror real-life language use (e.g., Allen et al., 2019; Y.-L. Chen et al., 2022). Many platforms support learner autonomy, enabling self-paced exploration and dialogic interaction with AI avatars or chatbots (e.g., Guo et al., 2017; Nakamura et al., 2024). AI personalization further enhances these environments by adapting content, difficulty, and feedback based on learner behavior, performance, or preferences (e.g., Seow, 2023; Yang & Wu, 2024). Gamified learning elements—like quests, rankings, and interactive simulations—promote engagement and retention while supporting differentiated instruction (e.g., Tazouti et al., 2019; Yu, 2023). Reflection and self-regulation are also encouraged through features such as conversation recording, envisioning, and feedback dashboards (e.g., Kizilkaya et al., 2019; Mirzaei et al., 2018).

Affective Affordances

AI-integrated XR learning experiences often reduce learner anxiety by providing private, judgment-free environments for practicing language skills, especially speaking, without the fear of social embarrassment (e.g., Y.-C. Chen, 2024; Hollingworth & Willett, 2023). These environments support inclusivity by offering adaptive interfaces and visual-semantic support for learners with diverse needs, such as those with dyslexia (e.g., Hajahmadi et al., 2024). Intercultural competence is supported through simulated cross-cultural interactions and native-speaker avatars (e.g., Hwang et al., 2024; Obari et al., 2020).

3.3. Challenges, Design Considerations and Future Directions

3.3.1. Challenges

While AI–XR integration offers significant promise in transforming language education, it is still limited by technical immaturity, lack of scalability, instructional gaps, ethical concerns, and high implementation costs.

Technical Limitations

Many systems face limitations in AI capability, such as a lack of real-time adaptive dialogue, poor speech recognition accuracy, and limited natural interaction (e.g., Y.-L. Chen et al., 2022; H. Lee et al., 2023; Park et al., 2019). Dialogue models often rely on templates or predefined responses, which reduces interactivity and personalization (Chabot et al., 2020; Guo et al., 2017). AR and VR platforms also struggle with hardware constraints like display resolution, camera tracking, and latency (Hollingworth & Willett, 2023; Xin & Shi, 2024).

Pedagogical Constraints

Several studies highlight limitations in instructional design and learning effectiveness. AI-integrated XR learning experiences often lack long-term empirical validation and are mostly evaluated in short-term or low-sample-size studies (Y.-C. Chen, 2024; Divekar et al., 2022). Some tools show limited improvement in language proficiency, especially when they emphasize scripted interactions or neglect grammar correction (Nakamura et al., 2024; Obari et al., 2020). Cognitive overload is also a concern, particularly when learners must navigate complex multimodal environments or simultaneously process visual and verbal input (Seow, 2023; Y. Wang et al., 2022).

Scalability and Generalizability

Many projects rely on small pilot studies and limited sample sizes, which hinders the generalizability of results (Y.-L. Chen et al., 2022; Song et al., 2023). Manual setup, such as content tagging and system calibration, further restricts scalability (Chabot et al., 2020). Additionally, some systems are still in the prototype stage or focused on narrow learning goals (e.g., only one phoneme or specific dialogue types), limiting their applicability to broader language curricula (Tazouti et al., 2019; Tolba et al., 2024).

Ethical and Human-Centered Concerns

AI systems often lack the emotional intelligence of human teachers, which limits their ability to provide empathetic or nuanced feedback (Hajahmadi et al., 2024). Learners may become overreliant on AI-generated feedback, potentially affecting their ability to self-regulate or seek human interaction (Yu, 2023). Issues such as data privacy, screen time management, and the psychological safety of immersive environments remain unresolved (Bottega et al., 2023; Hwang et al., 2024).

Infrastructure and Cost Barriers

AI–XR systems often require sophisticated hardware (e.g., HMDs, Kinect sensors, AR glasses) and technical support, which can be cost-prohibitive for many educational institutions (Allen et al., 2019; J.-H. Wang et al., 2020). Development and maintenance costs are high due to the need for interdisciplinary collaboration among educators, designers, and programmers (Yang & Wu, 2024). This limits accessibility, particularly in low-resource or non-urban educational settings. Some systems also demand high digital literacy from both learners and teachers (Hajahmadi et al., 2024; S. Lee et al., 2025).

3.3.2. Design Considerations

These instructional design considerations fall into five interconnected categories: technological design, pedagogical alignment, learner-centered interaction, accessibility and scalability, and evaluation and iteration. These categories reflect the thoughtful integration of technical architecture with educational principles and user needs.

Technological Design

Many studies emphasize modular and flexible system architectures to support scalability and integration with different devices and platforms. Unity 3D was the preferred development environment due to its cross-platform compatibility and scripting capabilities (Yang & Wu, 2024; Yu, 2023). Systems were often designed with parallel processing, gesture recognition, and real-time feedback engines to enhance interactivity (Allen et al., 2019; Guo et al., 2017). Several systems also employed AI modeling frameworks, such as reinforcement learning and semantic knowledge matrices, to drive adaptation and personalization (H. Lee et al., 2023; Smuts et al., 2019). To streamline development and deployment, studies used pre-built assets, open-source APIs, and browser-based delivery models (Park et al., 2019; Seow, 2023).

Pedagogical Alignment

Effective AI-integrated XR learning experiences were grounded in recognized instructional frameworks such as task-based language teaching (TBLT), experiential learning (Kolb, 2014), and CEFR-aligned tasks (Divekar et al., 2022; H. Lee et al., 2023; Y. Wang et al., 2022). Learning was structured through scenario-based modules, quests, or level progressions with increasing linguistic complexity (Y.-L. Chen et al., 2022; Tazouti et al., 2019). Game-based design elements such as time constraints, rewards, and adaptive feedback loops were frequently employed to motivate learners and promote engagement (Bottega et al., 2023; Song et al., 2023). Cultural and intercultural learning was also emphasized, particularly through avatar customization and global English variation representation (S. Lee et al., 2025).

Learner-Centered Interaction

Systems prioritized natural and multimodal input/output, including voice, gesture, eye-gaze, and touch-based controls (Chabot et al., 2020; Guo et al., 2017). Real-time feedback, role-switching, and interactive dialogues were used to personalize experiences and encourage active learner participation (Allen et al., 2019; Nakamura et al., 2024). Many platforms incorporated features for learner reflection, such as session logs, audio recordings, or envisioning tasks, to support metacognitive development (Kizilkaya et al., 2019; Mirzaei et al., 2018). Learner modeling was used to adapt pacing, feedback style, and content based on individual profiles and performance analytics (Hajahmadi et al., 2024; Yu, 2023).

Accessibility and Scalability

Designers aimed to ensure low-friction access across devices and learner demographics. Systems were built to function offline, on mobile/tablets, or through web apps to reduce infrastructure dependency (Y.-C. Chen, 2024; Tolba et al., 2024; Xu et al., 2019). Simple, intuitive interfaces helped reduce cognitive load and support diverse learning needs (Hajahmadi et al., 2024; Shukla et al., 2019). Cost-effectiveness and ease of deployment were also critical, particularly in school settings, where AR markers, shared devices, and simple gesture interfaces enabled classroom-wide use (Xin & Shi, 2024).

Evaluation and Iteration

Design-based research (DBR) and iterative development were central to many studies. Systems evolved based on user testing, learner feedback, and performance analytics (Y.-L. Chen et al., 2022; Hollingworth & Willett, 2023). Customizable dashboards and authoring tools enabled teachers to monitor learner progress, adjust content, and create new materials without coding expertise (Chabot et al., 2020; Shukla et al., 2019). Ethical considerations such as cultural sensitivity, data privacy, and emotional well-being were incorporated into design guidelines and training models (Hwang et al., 2024; Seow, 2023).

3.3.3. Future Directions

Based on the synthesis of the 32 studies, several key gaps and future directions emerge regarding AI-integrated XR learning experiences in language education. These insights span across technological, pedagogical, methodological, and ethical domains, highlighting both the evolving potential of AI-integrated XR learning experiences and the challenges that must be addressed for broader, more effective implementation.

Technological Advancements and Integration

A significant gap across many studies is the limited adaptability and realism of AI dialogue systems. Future work should prioritize enhancing AI’s capacity for unscripted, context-aware, and emotionally responsive interactions (Y.-L. Chen et al., 2022; Nakamura et al., 2024). Advancements in automatic pronunciation error detection, sentiment analysis, and multimodal feedback (e.g., facial expression, gesture recognition) are seen as crucial to creating more natural and human-like learning environments (Bottega et al., 2023; Guo et al., 2017). Additionally, there is a call for the development of lightweight, mobile-compatible versions of XR systems to support access outside of labs and increase usability (H. Lee et al., 2023; Yang & Wu, 2024).

Pedagogical Expansion and Personalization

Many studies emphasize the need to extend AI–XR learning experiences beyond current limited scenarios (e.g., alphabet drills or scripted dialogues) toward broader language skills like grammar, writing, and intercultural communication (Hajahmadi et al., 2024; Tazouti et al., 2019). There’s growing interest in incorporating affective computing and learner analytics to personalize learning based on emotion, behavior, and individual profiles (Smuts et al., 2019; Yu, 2023). Long-term learning trajectories and cross-modal integrations (e.g., combining speaking with writing or VR with peer collaboration) are also proposed to better reflect real-world language use (Obari et al., 2020; Y. Wang et al., 2022).

Methodological Rigor and Longitudinal Validation

A recurring gap is the lack of robust empirical testing. Many projects are in early prototype stages with small sample sizes or short deployment durations. Future research must include longitudinal studies, larger participant groups, and mixed-method evaluations to establish long-term effectiveness and generalizability (Y.-C. Chen, 2024; Song et al., 2023). Benchmarking tools, standardized assessments within XR, and comparative studies across modalities (e.g., VR vs. AR vs. mobile) are recommended to better evaluate outcomes (Gorham et al., 2019; Seow, 2023).

Curriculum Integration and Scalability

There is a clear need to align AI-integrated XR learning experiences with formal curricula and assessment standards. Current systems often operate as supplementary or experimental, limiting their adoption in mainstream education. Integration with LMSs, classroom dashboards, and teacher-facing analytics would bridge this gap (Shukla et al., 2019; Yun et al., 2024). Additionally, designing teacher-friendly tools and training modules is essential for sustainable implementation in varied educational contexts (Hwang et al., 2024; Yu, 2023).

Ethical Frameworks and Inclusive Design

As AI becomes more embedded in immersive environments, ethical concerns such as data privacy, screen time, cultural bias in AI responses, and emotional safety are increasingly important (Bottega et al., 2023; Hajahmadi et al., 2024). Future directions call for ethical guidelines, diverse linguistic and cultural representation in AI agents, and inclusive interfaces that support learners with special needs (Xin & Shi, 2024). Figure 13 presents a summary of affordances, challenges, design considerations, and future directions.

4. Discussion

In this systematic review, we examined the integration of Artificial Intelligence (AI) and Extended Reality (XR) in language education by synthesizing findings from 32 empirical studies published between 2017 and 2024. We aimed to map the current landscape, explore pedagogical integration and affordances, and identify the key challenges and future directions for this emerging field. We found that the field is rapidly evolving, shaped by technological innovation, pedagogical experimentation, and increasing global interest, particularly in East Asia and North America. Our review underscores both the promise and complexity of AI–XR integration, revealing patterns of innovation as well as notable disparities in design quality and learner contexts. In this section, we interpret these findings by considering our research questions and their broader implications for educational research, design, and practice.

4.1. Synthesis and Interpretation of Findings

4.1.1. The Current Landscape of AI–XR in Language Education (RQ1)

We found that research on AI-integrated XR in language education has expanded significantly in recent years, with the majority of studies published after 2019. This surge appears to be linked to technological advancements and the pedagogical shifts prompted by the COVID-19 pandemic. Despite the growth of the field, research remains geographically concentrated in East Asia and North America, indicating that access to the required infrastructure and funding may still be limited in other regions, particularly in Africa and South America. This geographic concentration may limit the generalizability of the findings, particularly in regions where linguistic diversity, technological infrastructure, and pedagogical practices differ significantly from East Asian and North American contexts.

Beyond regional patterns, the studies also revealed distinct technological trends in the use of AI and XR tools. From a technological standpoint, virtual reality (VR) emerged as the most used XR modality, followed by augmented reality (AR), mixed reality (MR), and Metaverse environments. This finding is consistent with previous literature that emphasizes the immersive potential of VR for language learning (Hamilton et al., 2020; Lowell & Yan, 2023). Among AI technologies, automatic speech recognition (ASR), natural language processing (NLP), and computer vision (CV) were dominant, particularly in supporting speaking and listening skills through real-time feedback. These patterns reflect a broader movement in language education toward enhancing communicative competence through intelligent systems (Crompton & Burke, 2023; Liu, 2023).

Educationally, we observed that most studies were situated in higher education contexts and focused on English language learning. While this mirrors global trends, it also underscores the need for more research on less commonly taught languages and K-12 learners. Reported learning outcomes varied, with motivation, engagement, and vocabulary acquisition being the most frequently cited benefits. However, many studies reported positive affective outcomes, such as motivation and engagement, relatively few employed standardized language assessments or longitudinal measures of proficiency. Most relied on short-term, self-reported data or limited post-tests, raising concerns about the validity and generalizability of these findings and underscoring the need for more robust, standardized, and longitudinal evaluation methods that go beyond self-reported engagement.

4.1.2. Pedagogical Integration and Affordances (RQ2)

Our review revealed that AI and XR technologies were integrated using diverse pedagogical strategies tailored to specific language skills. For speaking, the most common approach involves combining XR environments with ASR, allowing learners to receive immediate feedback on pronunciation and fluency. Text-to-speech (TTS) tools and conversational agents further enhanced these environments by simulating natural dialogue and modeling the use of target languages. These approaches align with constructivist and sociocultural theories, which emphasize the role of interaction, feedback, and contextualized learning in language acquisition (Kolb, 2014; Vygotsky, 1978).

In vocabulary learning and writing, AI-driven feedback systems and CV were used to provide contextualized, multimodal practice. For instance, learners engaged in writing tasks within immersive environments and received automated feedback on coherence and grammatical accuracy. Vocabulary learning was enhanced through object recognition and environmental cues, linking digital content with real-world referents.

We identified several key affordances of AI–XR integration across technical, pedagogical, and affective domains. Technically, these systems enable multimodal interaction through voice, gesture, and gaze, supporting more naturalistic communication. Pedagogically, they foster task-based learning, adaptive feedback, and learner autonomy. Affective affordances included reduced anxiety, increased motivation, and enhanced engagement, especially in private environments that allowed for repeated, judgment-free practice. These findings reinforce prior research on the motivational and emotional benefits of immersive and adaptive learning technologies (Godwin-Jones, 2023; Tafazoli, 2024). These pedagogical applications demonstrate the potential of AI–XR to create rich, interactive language learning experiences when aligned with evidence-based instructional strategies.

4.1.3. Challenges, Design Considerations, and Research Gaps (RQ3)

Despite their promise, we found that AI-integrated XR systems face several persistent challenges. Technically, many systems exhibited limitations in AI capability, including low speech recognition accuracy, poor adaptability to user input, and overreliance on scripted interactions. Hardware-related issues, such as latency, discomfort, and low display resolution, further constrained the user experience. These technical shortcomings highlight the need for ongoing refinement and innovation to develop more responsive, accessible, and robust systems.

Pedagogically, we observed that many studies were exploratory, often involving prototypes with small sample sizes and short durations. Most studies employed pilot designs, small-scale quasi-experiments, or single-group pre–post evaluations, often with fewer than 50 participants. Few studies used randomized designs, mixed-methods triangulation, or long-term tracking of learning outcomes. This restricts the generalizability of findings and highlights the need for longer-term, mixed-method evaluations. Some systems were also weakly aligned with curricular goals or failed to support holistic language development, instead focusing on isolated skills or activities.

Beyond technical and instructional issues, implementation challenges also arose around equity and sustainability, with scalability and access emerging as critical barriers. High development and maintenance costs, along with the need for interdisciplinary expertise, continue to limit widespread implementation, especially in low-resource settings. Furthermore, we noted limited attention to ethical issues, such as learner privacy, excessive screen time, and the emotional impact of immersive environments. As AI and XR technologies become more integrated into education, these ethical concerns will require clear frameworks and guidelines.

Despite these challenges, several studies highlighted promising design principles that may guide future development. Design principles identified in our review included the importance of modular system architecture, pedagogical alignment, learner-centered interaction, and iterative evaluation. We found that systems grounded in established instructional approaches, such as task-based language teaching (TBLT) and experiential learning, were more likely to report positive outcomes. Core features, such as real-time feedback, adaptive content, and multimodal interaction, appeared particularly effective in supporting learner engagement and autonomy.

4.2. Theoretical and Conceptual Insights

Our findings resonate strongly with several foundational learning theories, including constructivism, sociocultural theory, and embodied cognition. We found that immersive AI–XR environments reflect constructivist principles by enabling learners to construct knowledge through active, situated experiences. Sociocultural theory was evident in the use of AI agents and avatars, which mediated social interaction and supported the co-construction of meaning in simulated communicative contexts. Embodied cognition was supported through the use of gestures, spatial navigation, and physical interaction, which grounded language learning in sensory experience and helped reinforce memory and comprehension.

Notably, however, many studies lacked explicit theoretical grounding in their design rationale, often defaulting to assumptions about increased engagement or novelty. Embedding theoretical constructs more deliberately could enhance the instructional coherence and evaluative clarity of future AI–XR implementations.

Although theory was underutilized in many studies, aligning empirical patterns with learning frameworks can strengthen design logic and practical outcomes. By linking empirical findings to these theoretical frameworks, we provide a conceptual foundation for the continued design and study of AI–XR environments. These perspectives highlight the potential of such technologies to transform language learning from a passive, text-based activity into an embodied, socially mediated process.

4.3. Contextual Nuances and Emerging Patterns

Beyond general trends, our analysis revealed several contextual nuances that influence the functioning of AI–XR technologies across various settings. Several contextual patterns emerged from our synthesis. The predominance of English and higher education contexts underscores the need for more inclusive research that addresses diverse learner populations, educational levels, and target languages. We also noted that, despite high levels of learner engagement and motivation, some studies reported only limited gains in language proficiency. This suggests that immersive technologies, while beneficial for affective outcomes, must be paired with strong instructional design and clear learning objectives to support meaningful language development.

We further observed that the integration of AI and XR was often uneven. In several studies, one technology was prioritized over the other, or the integration was superficial rather than deeply embedded in pedagogical strategy. For example, in some studies, XR was primarily used for immersion while AI components were limited to background analytics or feedback modules, rather than fully integrated into interactive learning tasks. This lack of synergy can diminish the potential for adaptive, dialogic learning experiences, highlighting the importance of intentional, theory-driven design that fully leverages the complementary affordances of AI and XR to support dynamic, responsive language learning environments.

4.4. Critical Reflections on the Review Process

In conducting this review, we employed a comprehensive search strategy, a multidisciplinary scope, and a rigorous coding methodology. This approach enabled a holistic view of both system-level features and learner-level outcomes across diverse contexts. Our use of both deductive and inductive coding allowed for a rich synthesis of technological, pedagogical, and affective dimensions. Nevertheless, we recognize several limitations. First, our inclusion criteria restricted us to English-language publications, introducing potential language bias. Second, the heterogeneity of study designs, reporting standards, and evaluation measures posed challenges for synthesis. Finally, given the rapid pace of technological change, it is likely that new tools and applications have emerged since our final search date.

4.5. Positioning the Review Within the Broader Discourse

We believe this review offers a timely and significant contribution to the discourse on educational technology by systematically examining the pedagogical integration of AI and XR in language learning, two domains that have often been studied in isolation. Our synthesis not only clarifies the current state of the field but also lays the groundwork for future innovation and research.

4.5.1. Bridging Fragmented Research Silos

Previous reviews have typically focused on either XR (e.g., VR or AR) or AI applications in language education. For example, Hamilton et al. (2020) and Makhenyane (2024) examined immersive XR environments, while Crompton and Burke (2023) and Almelhes (2023) reviewed AI-based tools for vocabulary and pronunciation. We extend this work by analyzing how these technologies function synergistically when co-designed and co-deployed in integrated learning environments. This perspective reflects the realities of increasingly hybrid educational ecosystems. This integrated lens addresses a gap in the literature and reflects the convergence of intelligent and immersive technologies in modern education.

4.5.2. Advancing Pedagogical and Design Frameworks

We also contribute to the field by emphasizing instructional design and pedagogical alignment, areas that have often been secondary in technology-driven research. By aligning integration strategies with learning theories such as constructivism, sociocultural theory, and embodied cognition, we provide a conceptual foundation that can inform future system development. We also highlight key design principles, including learner-centered interaction, iterative refinement, and accessibility, as critical components of scalable AI–XR learning environments. These principles can serve as a roadmap for researchers and designers aiming to create scalable, inclusive, and pedagogically meaningful AI–XR learning environments.

4.5.3. Setting a Research Agenda

Based on our synthesis, we identify several pressing priorities for future research:

Underrepresentation of K-12 learners and less commonly taught languages,

A lack of longitudinal, large-scale, and mixed-method studies,

Limited integration with formal curricula and teacher professional development,

Insufficient exploration of ethical concerns, including data privacy, screen time, and emotional well-being.

We encourage future research that is methodologically rigorous, ethically grounded, and inclusive of diverse educational settings and learner populations.

4.5.4. Reframing the Role of AI and XR in Language Education

Finally, our review prompts a broader rethinking of the role of AI and XR in language education. Rather than displacing teachers, we argue that these technologies should augment human instruction by enabling personalized, immersive, and responsive learning experiences. This human-centered vision aligns with recent calls for responsible, pedagogically informed innovation in educational technology (Godwin-Jones, 2023; Vall & Araya, 2023). As the field evolves, interdisciplinary collaboration and ethical foresight will be critical to realizing the transformative potential of these technologies in equitable and sustainable ways.

For language educators and technology developers, the findings highlight several actionable insights. Teachers remain pivotal in AI- and XR-enhanced learning environments; effective integration requires careful preparation, clear pedagogical objectives, and digital training. Instructors need to develop the skills to select appropriate XR applications and plan supportive tasks, providing scaffolding and guidance to ensure that immersive experiences truly augment language learning rather than overwhelm learners. Educational institutions can facilitate this by offering professional development workshops on using XR and AI tools in the classroom.

On the development side, software designers should collaborate with educators to create high-quality, interactive XR learning environments tailored to linguistic objectives. Such partnerships help ensure that AI-driven features (e.g., speech recognition, virtual tutors, adaptive feedback) are grounded in sound pedagogy and user-centered design, thereby enhancing student engagement and motivation. Practical design considerations emerging from the review include incorporating immersive role-play and authentic cultural scenarios, while also ensuring interfaces are inclusive (e.g., accessible to learners with disabilities, adaptable to different learning styles) and mitigating issues like motion discomfort or cognitive overload in XR settings. Educators should approach AI–XR tools as supplements, not replacements, for sound teaching practices (e.g., leveraging these technologies to provide authentic language contexts and individualized support), and developers should focus on usability, inclusivity, and alignment with educational needs.

Research-wise, the integration of AI and XR in language education offers rich theoretical implications. The review indicates that these technologies draw upon and reinforce multiple learning theories, from sociocultural and constructivist perspectives (through collaborative, context-rich interaction) to cognitive multimedia learning and embodied learning theories (through immersive, multisensory engagement). The convergence of AI and XR creates novel learning conditions. For example, AI-driven conversational agents in a virtual environment. This allows researchers to examine second language acquisition processes in new ways (such as the role of immersion in lowering anxiety or the impact of immediate, AI-generated feedback on language uptake). Notably, however, many studies reviewed did not explicitly ground their interventions in established learning theory, pointing to an area for improvement. This gap suggests that future research should more deliberately employ and test theoretical frameworks when studying AI-integrated XR learning experiences, thereby contributing to theory-building in computer-assisted language learning. The review’s findings encourage researchers to refine existing theories or develop new models to explain how immersive, AI-guided learning experiences affect motivation, engagement, and language development. In particular, aspects like learner presence in virtual environments, the agency of AI tutors, and data-driven personalization call for theoretical exploration to understand their pedagogical effectiveness. Furthermore, given the inherently interdisciplinary nature of AI–XR integration, the work underscores the value of cross-domain collaboration. Researchers in language education are urged to partner with experts in computer science, human–computer interaction, linguistics, and ethics to address the multifaceted questions raised by these technologies. Such collaborations can expand methodological approaches and ensure that emerging theories account for technical, linguistic, and social dimensions of AI-integrated XR learning environments in education.

4.5.5. Limitations

Despite its comprehensive scope, this systematic review has several limitations that warrant caution in interpreting the results. First, there are limitations related to the literature selection that may introduce bias. The review was confined to studies published between 2017 and 2024 and primarily to English-language, peer-reviewed sources. By relying on specific academic databases and keywords, it is possible that some relevant studies were not captured, especially work reported in other languages or in grey literature.

Second, the body of evidence synthesized in this review is still nascent, with many studies having short durations and small sample sizes. The majority of the included empirical studies were exploratory or pilot implementations of AI or XR tools, often involving only one class or a limited group of learners over a relatively brief period. Such small-scale interventions limit the generalizability of the findings. Positive learning outcomes observed in a short experimental setting may not translate into long-term gains or broader populations.

Additionally, many of the AI–XR systems described were novel prototypes or first-of-a-kind implementations, lacking extensive empirical validation. Few studies tracked students’ progress beyond immediate intervention. Therefore, there is little evidence regarding sustained language development or retention over time. The innovative nature of these tools means that their full impact (and potential pitfalls) is not yet well understood, and claims of effectiveness should be considered preliminary. It is also noteworthy that technology in this domain is evolving at a rapid pace. Developments in AI (for example, the emergence of advanced generative AI language models) and improvements in XR hardware are continuous, which can quickly outdate earlier studies’ findings or technical setups. Thus, some limitations of current systems may be resolved by newer technology, while new challenges will undoubtedly emerge.

In light of these factors, there is a clear need for more large-scale and longitudinal research on AI–XR integration. Scholars have pointed out that future research must involve broader, longer-term studies to rigorously evaluate educational impacts. Robust experimental or quasi-experimental research with diverse cohorts, as well as longitudinal studies that follow learners over months or years, will help confirm whether the promising short-term results reported to date hold up in the long run. Until such evidence is available, the conclusions of this review should be interpreted as indicative but not conclusive regarding long-term efficacy.

Third, there are notable gaps in the linguistic and regional diversity of both the underlying literature and, consequently, our review’s coverage. A large portion of the studies on AI and XR in language education have focused on English as the target or medium of instruction. Far fewer studies have examined the integration of these technologies for other languages, especially less commonly taught languages or indigenous languages.

5. Conclusions

This systematic review synthesized 32 empirical studies to investigate how AI and XR technologies have been integrated to support language education between 2017 and 2024. Our analysis reveals a growing interest in this intersection, particularly in East Asia and North America, and a strong research emphasis on higher education contexts and English language learning. The integration of AI and XR in language learning environments has demonstrated significant potential in enhancing learner engagement, motivation, and proficiency, particularly in speaking skills, through immersive, interactive, and personalized learning experiences.

The review identified diverse integration strategies that combine AI technologies—such as automatic speech recognition (ASR), natural language processing (NLP), text-to-speech (TTS), and computer vision (CV)with various XR modalities including virtual reality (VR), augmented reality (AR), mixed reality (MR), and Metaverse platforms. These integrations foster real-time feedback, multimodal communication, and authentic situational practice. Furthermore, the affordances of these systems extend beyond linguistic outcomes to include reduced anxiety, enhanced autonomy, and increased learner confidence. Pedagogically, these technologies support task-based, experiential, and adaptive learning approaches that align with constructivist and sociocultural learning theories.

Despite these affordances, the review highlights several critical challenges. Many studies are limited by small sample sizes, short intervention durations, and a lack of rigorous evaluation. Technical constraints, such as limited ASR accuracy and non-adaptive AI responses, often undermine the naturalness and effectiveness of interaction. Additionally, barriers related to cost, accessibility, teacher training, and ethical considerations—such as data privacy and learner well-being—pose significant obstacles to widespread adoption. These issues suggest that while the field is advancing, it remains in an exploratory phase, requiring further development to achieve scalable and sustainable impact.

Moving forward, future research should focus on enhancing the adaptability and emotional responsiveness of AI agents, designing inclusive and low-cost XR systems, and conducting longitudinal, mixed-method studies to assess the long-term effects of AI–XR integration on language acquisition. There is also a clear need for better alignment with formal curricula and greater attention to diverse linguistic and cultural contexts, especially in underrepresented regions and less commonly taught languages. Embedding theoretical frameworks more explicitly into system design and evaluation will be essential for establishing stronger pedagogical coherence and guiding evidence-based practice.

In conclusion, the integration of AI and XR in language education holds immense potential for addressing longstanding critical needs in education. By addressing current limitations and building on emerging best practices, researchers and educators can harness the full potential of these technologies to create more equitable, engaging, and effective language learning environments.

Author Contributions

Conceptualization, W.Y. and B.L.; Methodology, B.L.; Software, W.Y. and B.L.; Validation, W.Y.; Formal analysis, W.Y.; Investigation, W.Y. and B.L.; Resources, W.Y.; Data curation, W.Y.; Writing—original draft, W.Y., B.L. and V.L.L.; Writing—review & editing, W.Y., B.L. and V.L.L.; Visualization, W.Y.; Supervision, W.Y. and V.L.L.; Project administration, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AR	Augmented Reality
ASR	Automatic Speech Recognition
CA	Conversational Agents
CV	Computer Vision
DBR	Design-Based Research
ERIC	Education Resources Information Center
LLMs	Large Language Models
ML	Machine Learning
MMAA	Multimodal AI Agents
MR	Mixed Reality
MT	Machine Translation
NLP	Natural Language Processing
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RTFS	Real-Time Feedback Systems
TTS	Text-to-Speech
TBLT	Task-Based Language Teaching
VR	Virtual Reality
XR	Extended Reality

Appendix A

Table A1. References Included in the Systematic Review.

	Author	Title	Journal or Conference Proceedings
1	(Allen et al., 2019)	The Rensselaer Mandarin Project—A Cognitive and Immersive Language Learning Environment	Proceedings of the AAAI Conference on Artificial Intelligence
2	(Bottega et al., 2023)	Jubileo: An Immersive Simulation Framework for Social Robot Design	Journal of Intelligent & Robotic Systems
3	(Chabot et al., 2020)	A Collaborative, Immersive Language Learning Environment Using Augmented Panoramic Imagery.	2020 6th International Conference of the Immersive Learning Research Network (iLRN)
4	(Y.-C. Chen, 2024)	Effects of technology-enhanced language learning on reducing EFL learners’ public speaking anxiety.	Computer Assisted Language Learning
5	(Y.-L. Chen et al., 2022)	Robot-Assisted Language Learning: Integrating Artificial Intelligence and Virtual Reality into English Tour Guide Practice	Education Sciences
6	(Divekar et al., 2022)	Foreign language acquisition via artificial intelligence and extended reality: Design and evaluation.	Computer Assisted Language Learning
7	(Gorham et al., 2019)	Assessing the efficacy of VR for foreign language learning using multimodal learning analytics	Professional development in CALL: a selection of papers
8	(Guo et al., 2017)	SeLL: Second language learning paired with VR and AI	SIGGRAPH Asia 2017 Symposium on Education
9	(Hajahmadi et al., 2024)	ARELE-bot: Inclusive Learning of Spanish as a Foreign Language Through a Mobile App Integrating Augmented Reality and ChatGPT	2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)
10	(Hollingworth & Willett, 2023)	FluencyAR: Augmented Reality Language Immersion	Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
11	(Hwang et al., 2024)	Integrating AI chatbots into the metaverse: Pre-service English teachers’ design works and perceptions	Education and Information Technologies
12	(Kizilkaya et al., 2019)	Design Prompts for Virtual Reality in Education	Artificial Intelligence in Education
13	(H. Lee et al., 2023)	VisionARy: Exploratory research on Contextual Language Learning using AR glasses with ChatGPT	Proceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter
14	(S. Lee et al., 2025)	Enhancing Pre-Service Teachers’ Global Englishes Awareness with Technology: A Focus on AI Chatbots in 3D Metaverse Environments.	TESOL Quarterly
15	(Mirzaei et al., 2018)	Language learning through conversation envisioning in virtual reality: A sociocultural approach	Future-proof CALL: language learning as exploration and encounters—short papers from EUROCALL 2018
16	(Nakamura et al., 2024)	LingoAI: Language Learning System Integrating Generative AI with 3D Virtual Character	Proceedings of the 2024 International Conference on Advanced Visual Interfaces
17	(Obari et al., 2020)	The Impact of Using AI and VR with Blended Learning on English as a Foreign Language Teaching	CALL for widening participation: short papers from EUROCALL 2020
18	(Park et al., 2019)	). Interactive AI for Linguistic Education Built on VR Environment Using User Generated Contents	2019 21st International Conference on Advanced Communication Technology (ICACT)
19	(Seow, 2023)	LingoLand: An AI-Assisted Immersive Game for Language Learning.	Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
20	(Shadiev et al., 2021)	Cross-cultural learning in virtual reality environment: Facilitating cross-cultural understanding, trait emotional intelligence, and sense of presence	Educational Technology Research and Development
21	(Shukla et al., 2019)	iLeap: A Human-Ai Teaming Based Mobile Language Learning Solution for Dual Language Learners in Early and Special Educations	A Human-Ai Teaming Based Mobile Language Learning Solution for Dual Language Learners in Early and Special Educations
22	(Smuts et al., 2019)	Towards Dynamically Adaptable Immersive Spaces for Learning	2019 11th Computer Science and Electronic Engineering (CEEC),
23	(Song et al., 2023)	Developing a ‘Virtual Go mode’ on a mobile app to enhance primary students’ vocabulary learning engagement: An exploratory study.	Innovation in Language Learning and Teaching
24	(Tazouti et al., 2019)	ImALeG: A Serious Game for Amazigh Language Learning	International Journal of Emerging Technologies in Learning (iJET)
25	(Tolba et al., 2024)	Interactive Augmented Reality System for Learning Phonetics Using Artificial Intelligence	IEEE Access
26	(J.-H. Wang et al., 2020)	Digital Learning Theater with Automatic Instant Assessment of Body Language and Oral Language Learning	2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT)
27	(Y. Wang et al., 2022)	An Integrated Automatic Writing Evaluation and SVVR Approach to Improve Students’ EFL Writing Performance	Sustainability
28	(Xin & Shi, 2024)	Application of Hybrid Image Processing Based on Artificial Intelligence in Interactive English Teaching	ACM Transactions on Asian and Low-Resource Language Information Processing
29	(Xu et al., 2019)	Design and Implementation of an English Lesson Based on Handwriting Recognition and Augmented Reality in Primary School.	International Association for Development of the Information Society
30	(Yang & Wu, 2024)	Design and Implementation of Chinese Language Teaching System Based on Virtual Reality Technology	Scalable Computing: Practice and Experience
31	(Yu, 2023)	AI-Empowered Metaverse Learning Simulation Technology Application	AI-Empowered Metaverse Learning Simulation Technology Application
32	(Yun et al., 2024)	Interactive Learning Tutor Service Platform Based on Artificial Intelligence in a Virtual Reality Environment	Intelligent Human Computer Interaction

References

Allen, D., Divekar, R. R., Drozdal, J., Balagyozyan, L., Zheng, S., Song, Z., Zou, H., Tyler, J., Mou, X., Zhao, R., Zhou, H., Yue, J., Kephart, J. O., & Su, H. (2019). The rensselaer mandarin project—A cognitive and immersive language learning environment. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 9845–9846. [Google Scholar] [CrossRef]
Almelhes, S. A. (2023). A review of artificial intelligence adoption in second-language learning. Theory and Practice in Language Studies, 13(5), 1259–1269. [Google Scholar] [CrossRef]
Aslan, S., Alyuz, N., Li, B., Durham, L. M., Shi, M., Sharma, S., & Nachman, L. (2025). An early investigation of collaborative problem solving in conversational AI-mediated learning environments. Computers and Education: Artificial Intelligence, 8, 100393. [Google Scholar] [CrossRef]
Bottega, J. A., Kich, V. A., Jesus, J. C. D., Steinmetz, R., Kolling, A. H., Grando, R. B., Guerra, R. D. S., & Gamarra, D. F. T. (2023). Jubileo: An immersive simulation framework for social robot design. Journal of Intelligent & Robotic Systems, 109(4), 91. [Google Scholar] [CrossRef]
Bozkir, E., Özdel, S., Lau, K. H. C., Wang, M., Gao, H., & Kasneci, E. (2024, July 8–10). Embedding large language models into extended reality: Opportunities and challenges for inclusion, engagement, and privacy. 6th ACM Conference on Conversational User Interfaces (Vol. 38, pp. 1–7), Luxembourg. [Google Scholar] [CrossRef]
Chabot, S., Drozdal, J., Peveler, M., Zhou, Y., Su, H., & Braasch, J. (2020, June 21–25). A collaborative, immersive language learning environment using augmented panoramic imagery. 2020 6th International Conference of the Immersive Learning Research Network (iLRN) (pp. 225–229), San Luis Obispo, CA, USA. [Google Scholar] [CrossRef]
Chen, C., Hung, H., & Yeh, H. (2021). Virtual reality in problem-based learning contexts: Effects on the problem-solving performance, vocabulary acquisition and motivation of English language learners. Journal of Computer Assisted Learning, 37(3), 851–860. [Google Scholar] [CrossRef]
Chen, J., Dai, J., Zhu, K., & Xu, L. (2022). Effects of extended reality on language learning: A meta-analysis. Frontiers in Psychology, 13, 1016519. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.-C. (2024). Effects of technology-enhanced language learning on reducing EFL learners’ public speaking anxiety. Computer Assisted Language Learning, 37(4), 789–813. [Google Scholar] [CrossRef]
Chen, Y.-L., Hsu, C.-C., Lin, C.-Y., & Hsu, H.-H. (2022). Robot-assisted language learning: Integrating artificial intelligence and virtual reality into English tour guide practice. Education Sciences, 12(7), 437. [Google Scholar] [CrossRef]
Corbin, J., & Strauss, A. (2014). Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage publications. [Google Scholar]
Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education, 20, 22. [Google Scholar] [CrossRef]
Crum, S., Li, B., & Kou, X. (2024). Generative artificial intelligence and interactive learning platforms: Second language vocabulary acquisition. In C. Stephanidis, M. Antona, S. Ntoa, & G. Salvendy (Eds.), HCI international 2024 posters. HCII 2024. Communications in computer and information science (Vol. 2117). Springer. [Google Scholar] [CrossRef]
Divekar, R. R., Drozdal, J., Chabot, S., Zhou, Y., Su, H., Chen, Y., Zhu, H., Hendler, J. A., & Braasch, J. (2022). Foreign language acquisition via artificial intelligence and extended reality: Design and evaluation. Computer Assisted Language Learning, 35(9), 2332–2360. [Google Scholar] [CrossRef]
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20, 57. [Google Scholar] [CrossRef]
Godwin-Jones, R. (2023). Presence and agency in real and virtual spaces: The promise of extended reality for language learning. Language Learning & Technology, 27(3), 6–26. [Google Scholar]
Gorham, T., Jubaed, S., Sanyal, T., & Starr, E. L. (2019). Assessing the efficacy of VR for foreign language learning using multimodal learning analytics. In C. N. Giannikas, E. Kakoulli Constantinou, & S. Papadima-Sophocleous (Eds.), Professional development in CALL: A selection of papers (pp. 101–116). Research-Publishing.net. [Google Scholar] [CrossRef]
Guo, J., Chen, Y., Pei, Q., Ren, H., Huang, N., Tian, H., Zhang, M., Liu, Y., Fu, G., Hu, H., & Zhang, X. (2017, November 27–30). SeLL: Second language learning paired with VR and AI. SA ‘17: SIGGRAPH Asia 2017 Symposium on Education (pp. 1–2), Bangkok, Thailand. [Google Scholar] [CrossRef]
Hajahmadi, S., Clementi, L., Jiménez López, M. D., & Marfia, G. (2024, March 16–21). ARELE-bot: Inclusive learning of Spanish as a foreign language through a mobile app integrating augmented reality and ChatGPT. 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 335–340), Orlando, FL, USA. [Google Scholar] [CrossRef]
Hamilton, D. E., McKechnie, J., Edgerton, E., & Wilson, C. (2020). Immersive virtual reality as a pedagogical tool in education: A systematic literature review of quantitative learning outcomes and experimental design. Journal of Computers in Education, 8(1), 1–32. [Google Scholar] [CrossRef]
Hollingworth, S. L. C., & Willett, W. (2023, October 29–November 1). FluencyAR: Augmented reality language immersion. Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (pp. 1–3), San Francisco, CA, USA. [Google Scholar] [CrossRef]
Huang, X., Zou, D., Cheng, G., & Xie, H. (2021). A systematic review of AR and VR enhanced language learning. Sustainability, 13(9), 4639. [Google Scholar] [CrossRef]
Hwang, Y., & Lee, J. (2024). Exploring pre-service English teachers’ perceptions and technological acceptance of metaverse language classroom design. Sage Open, 14(4), 21582440241300543. [Google Scholar] [CrossRef]
Hwang, Y., Lee, S., & Jeon, J. (2024). Integrating AI chatbots into the metaverse: Pre-service English teachers’ design works and perceptions. Education and Information Technologies, 30(4), 4099–4130. [Google Scholar] [CrossRef]
Karacan, C. G., & Akoğlu, K. (2021). Educational augmented reality technology for language learning and teaching: A comprehensive review. Shanlax International Journal of Education, 9(2), 68–79. [Google Scholar] [CrossRef]
Kizilkaya, L., Vince, D., & Holmes, W. (2019). Design prompts for virtual reality in education. In S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, & R. Luckin (Eds.), Artificial intelligence in education (Vol. 11626, pp. 133–137). Springer International Publishing. [Google Scholar] [CrossRef]
Kolb, D. A. (2014). Experiential learning: Experience as the source of learning and development. FT Press. [Google Scholar]
Lee, H., Hsia, C.-C., Tsoy, A., Choi, S., Hou, H., & Ni, S. (2023, September 20–22). VisionARy: Exploratory research on contextual language learning using AR glasses with ChatGPT. 15th Biannual Conference of the Italian SIGCHI Chapter (p. 22), Torino, Italy. [Google Scholar] [CrossRef]
Lee, S., Jeon, J., & Choe, H. (2025). Enhancing pre-service teachers’ global English awareness with technology: A focus on AI chatbots in 3D metaverse environments. TESOL Quarterly, 59(1), 49–74. [Google Scholar] [CrossRef]
Li, B., Bonk, C. J., Wang, C., & Kou, X. (2024a). Reconceptualizing self-directed learning in the era of generative AI: An exploratory analysis of language learning. IEEE Transactions on Learning Technologies, 17(3), 1515–1529. [Google Scholar] [CrossRef]
Li, B., Lowell, V., Watson, & Wang, C. (2024b). A systematic review of the first year of publications on ChatGPT and language education: Examining research on ChatGPT’s use in language learning and teaching. Computers and Education: Artificial Intelligence, 100, 100266. [Google Scholar] [CrossRef]
Li, B., Wang, C., Bonk, C. J., & Kou, X. (2024c). Exploring inventions in self-directed language learning with generative AI: Implementations and perspectives of YouTube content creators. TechTrends, 68, 803–819. [Google Scholar] [CrossRef]
Liu, M. (2023). Exploring the application of artificial intelligence in foreign language teaching: Challenges and future development. SHS Web of Conferences, 168, 03025. [Google Scholar] [CrossRef]
Lowell, V. L., & Yan, W. (2023). Facilitating foreign language conversation simulations in virtual reality for authentic learning. In T. Cherner, & A. Fegely (Eds.), Bridging the XR technology-to-practice gap: Methods and strategies for blending extended realities into classroom instruction (Vol. I, pp. 119–133). Association for the Advancement of Computing in Education and Society for Information Technology and Teacher Education. Available online: https://www.learntechlib.org/p/222242/ (accessed on 18 December 2023).
Lowell, V. L., & Yan, W. (2024). Applying systems thinking for designing immersive virtual reality learning experiences in education. TechTrends, 68(1), 149–160. [Google Scholar] [CrossRef]
Makeleni, S., Mutongoza, B. H., & Linake, M. A. (2023). Language education and artificial intelligence: An exploration of challenges confronting academics in global south universities. Journal of Culture and Values in Education, 6(2), 158–171. [Google Scholar] [CrossRef]
Makhenyane, L. E. (2024). The use of augmented reality in the teaching and learning of isiXhosa poetry. Journal of the Digital Humanities Association of Southern Africa (DHASA), 5(1). [Google Scholar] [CrossRef]
Mirzaei, M. S., Zhang, Q., Van der Struijk, S., & Nishida, T. (2018). Language learning through conversation envisioning in virtual reality: A sociocultural approach. In P. Taalas, J. Jalkanen, L. Bradley, & S. Thouësny (Eds.), Future-proof CALL: Language learning as exploration and encounters—Short papers from EUROCALL 2018 (pp. 207–213). Research-Publishing.net. [Google Scholar] [CrossRef]
Nakamura, H., Nakazato, H., & Tobita, H. (2024, June 3–7). LingoAI: Language learning system integrating generative AI with 3D virtual character. 2024 International Conference on Advanced Visual Interfaces (pp. 1–2), Arenzano, Italy. [Google Scholar] [CrossRef]
Nazeer, I., Jamshaid, S., & Khan, N. M. (2024). Linguistic impact of augmented reality (AR) on English language use. Journal of Asian Development Studies, 13(1), 350–362. [Google Scholar] [CrossRef]
Obari, H., Lambacher, S., & Kikuchi, H. (2020). The impact of using AI and VR with blended learning on English as a foreign language teaching. In K.-M. Frederiksen, S. Larsen, L. Bradley, & S. Thouësny (Eds.), CALL for widening participation: Short papers from EUROCALL 2020 (pp. 253–258). Research-Publishing.net. [Google Scholar] [CrossRef]
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—A web and mobile app for systematic reviews. Systematic Reviews, 5, 210. [Google Scholar] [CrossRef] [PubMed]
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. [Google Scholar] [CrossRef]
Park, W., Park, D., Ahn, B., Kang, S., Kim, H., Kim, R., & Na, J. (2019, February 17–20). Interactive AI for linguistic education built on VR environment using user generated contents. 2019 21st International Conference on Advanced Communication Technology (ICACT) (pp. 385–389), PyeongChang, Republic of Korea. [Google Scholar] [CrossRef]
Parmaxi, A., & Demetriou, A. A. (2020). Augmented reality in language learning: A state-of-the-art review of 2014–2019. Journal of Computer Assisted Learning, 36(6), 861–875. [Google Scholar] [CrossRef]
PRISMA. (2015). Transparent reporting of systematic reviews and meta-analyses. Available online: http://www.prisma-statement.org/ (accessed on 18 December 2023).
Rangel-de Lazaro, G., & Duart, J. M. (2023). You can handle, you can teach it: Systematic review on the use of extended reality and artificial intelligence technologies for online higher education. Sustainability, 15(4), 3507. [Google Scholar] [CrossRef]
Schorr, I., Plecher, D. A., Eichhorn, C., & Klinker, G. (2024). Foreign language learning using augmented reality environments: A systematic review. Frontiers in Virtual Reality, 5, 1288824. [Google Scholar] [CrossRef]
Seow, O. (2023, October 29–November 1). LingoLand: An AI-assisted immersive game for language learning. UIST’23 Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (p. 120), San Francisco, CA, USA. [Google Scholar] [CrossRef]
Shadiev, R., Wang, X., & Huang, Y.-M. (2021). Cross-cultural learning in virtual reality environment: Facilitating cross-cultural understanding, trait emotional intelligence, and sense of presence. Educational Technology Research and Development, 69(5), 2917–2936. [Google Scholar] [CrossRef]
Shukla, S., Shivakumar, A., Vasoya, M., Pei, Y., & Lyon, A. F. (2019, April 11–19). iLeap: A human-AI teaming based mobile language learning solution for dual language learners in early and special educations. International Association for Development of the Information Society (IADIS) International Conference on Mobile Learning (pp. 57–64), Utrecht, The Netherlands. [Google Scholar]
Smuts, M. G., Callaghan, V., & Gutierrez, A. G. (2019, September 18–20). Towards dynamically adaptable immersive spaces for learning. 2019 11th Computer Science and Electronic Engineering (CEEC) (pp. 113–117), Colchester, UK. [Google Scholar] [CrossRef]
Song, Y., Wen, Y., Yang, Y., & Cao, J. (2023). Developing a ‘Virtual Go mode’ on a mobile app to enhance primary students’ vocabulary learning engagement: An exploratory study. Innovation in Language Learning and Teaching, 17(2), 354–363. [Google Scholar] [CrossRef]
Tafazoli, D. (2024). From virtual reality to cultural reality: Integration of virtual reality into teaching culture in foreign language education. Journal for Multicultural Education, 18(1/2), 6–24. [Google Scholar] [CrossRef]
Tazouti, Y., Boulaknadel, S., & Fakhri, Y. (2019). ImALeG: A serious game for amazigh language learning. International Journal of Emerging Technologies in Learning (IJET), 14(18), 28–38. [Google Scholar] [CrossRef]
Tolba, R. M., Elarif, T., Taha, Z., & Hammady, R. (2024). Interactive augmented reality system for learning phonetics using artificial intelligence. IEEE Access, 12, 78219–78231. [Google Scholar] [CrossRef]
Vall, R. R. F. d. l., & Araya, F. G. (2023). Exploring the benefits and challenges of ai-language learning tools. International Journal of Social Sciences and Humanities Invention, 10(01), 7569–7576. [Google Scholar] [CrossRef]
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (Vol. 86). Harvard University Press. [Google Scholar]
Wang, J.-H., Chen, Y.-H., Yu, S.-Y., Huang, Y.-L., & Chen, G.-D. (2020, July 6–9). Digital learning theater with automatic instant assessment of body language and oral language learning. 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT) (pp. 218–222), Tartu, Estonia. [Google Scholar] [CrossRef]
Wang, Y., Luo, X., Liu, C.-C., Tu, Y.-F., & Wang, N. (2022). An integrated automatic writing evaluation and SVVR approach to improve students’ EFL writing performance. Sustainability, 14(18), 11586. [Google Scholar] [CrossRef]
Xin, D., & Shi, C. (2024). Application of hybrid image processing based on artificial intelligence in interactive English teaching. ACM Transactions on Asian and Low-Resource Language Information Processing, 3626822. [Google Scholar] [CrossRef]
Xu, J., He, S., Jiang, H., Yang, Y., & Cai, S. (2019). Design and implementation of an English lesson based on handwriting recognition and augmented reality in primary school (pp. 171–178). International Association for Development of the Information Society. [Google Scholar]
Yan, W., & Lowell, V. L. (2024). Design and evaluation of task-based role-play speaking activities in a VR environment for authentic learning: A design-based research approach. The Journal of Applied Instructional Design, 13(4), 14. [Google Scholar] [CrossRef]
Yan, W., & Lowell, V. L. (2025). The Evolution of virtual reality in foreign language education: From text-based MUDs to AI-enhanced immersive environments. TechTrends, 69, 853–858. [Google Scholar] [CrossRef]
Yan, W., Lowell, V. L., & Yang, L. (2024). Developing English language learners’ speaking skills through applying a situated learning approach in VR-enhanced learning experiences. Virtual Reality, 28, 167. [Google Scholar] [CrossRef]
Yang, T., & Wu, J. (2024). Design and implementation of Chinese language teaching system based on virtual reality technology. Scalable Computing: Practice and Experience, 25(3), 1564–1577. [Google Scholar] [CrossRef]
Yu, D. (2023, September 18–20). AI-empowered metaverse learning simulation technology application. 2023 International Conference on Intelligent Metaverse Technologies & Applications (iMETA) (pp. 1–6), Tartu, Estonia. [Google Scholar] [CrossRef]
Yun, C.-O., Jung, S.-J., & Yun, T.-S. (2024). Interactive learning tutor service platform based on artificial intelligence in a virtual reality environment. In B. J. Choi, D. Singh, U. S. Tiwary, & W. Y. Chung (Eds.), Intelligent human computer interaction. IHCI 2023. Lecture notes in computer science (Vol. 14531, pp. 367–373). Springer. [Google Scholar] [CrossRef]

Figure 1. PRISMA Flow Diagram.

Figure 2. Author Distribution by Region.

Figure 3. Number of Publications by Year (2017–2024).

Figure 4. Publication Types of Selected Studies.

Figure 5. Distribution of Studies on Educational Levels.

Figure 6. Distribution of Target Languages.

Figure 7. Language Foci.

Figure 8. Number of Participants.

Figure 9. Length of Intervention.

Figure 10. AI Technology Applications in Selected Studies.

Figure 11. Theoretical Frameworks in Selected Studies.

Figure 12. AI–XR Integration Strategies and Learning Benefits in Language Education.

Figure 13. Summary of Affordances, Challenges, Design Considerations, and Future Direction.

Table 1. Coding Schema.

Category	Description
Bibliographic Details	Author(s), publication year, location, included scholarly sources by type (e.g., peer-reviewed journal articles, peer-reviewed conference proceedings, peer-reviewed book chapters)
Study Aims and Context	Purpose of the study and the educational setting (e.g., K-12 classroom, university language course, informal learning environment), target language, and language focus
Participants and Sample Size	Number of participants
Length of Intervention	Duration of instructional implementation
Learning Outcomes	Cognitive, linguistic, affective, and cultural learning outcomes assessed (such as knowledge retention, speaking proficiency, learner engagement, cultural competence)
AI Technology Used	Type of AI used (e.g., automatic speech recognition, text-to-speech, natural language processing, conversational agent)
XR Technology Used	Type of XR used (VR, AR, MR, Metaverse) and platform/hardware details
Theoretical Frameworks	Structured lens through which researchers interpret, analyze, and connect key concepts, guiding the design, implementation, and evaluation of a study (e.g., embodied cognition, experiential learning, engagement theory)
Integration Strategies	Description of how AI and XR components interacted or were integrated (e.g., VR-Automatic Speech Recognition integration, AR-Text-to-Speech integration)
Affordances	Any pedagogical, affective, or technological advantages (e.g., enhanced learner engagement, increased language immersion, personalized feedback through AI, real-time pronunciation correction, or improved motivation) that show how the integration of AI and XR supports or extends language learning processes
Challenges	Any problems, limitations, or drawbacks noted (e.g., technical issues, small sample sizes, usability challenges, or pedagogical limitations) that could inform future research
Design Considerations	Important factors or guidelines that need to be considered when designing an AI–XR learning experience in language learning (e.g., pedagogical alignment, learner-centered interaction, accessibility, and scalability)
Future Directions	Suggestions for what researchers could explore or investigate next based on the study’s findings and limitations

Table 2. Author Geographic Distribution of Selected Studies.

Continent	Country	Number of Articles	Studies
Asia	Mainland China	10	(Guo et al., 2017; Hwang et al., 2024; H. Lee et al., 2023; S. Lee et al., 2025; Shadiev et al., 2021; Y. Wang et al., 2022; Xin & Shi, 2024; Xu et al., 2019; Yang & Wu, 2024; Yu, 2023)
	Taiwan	4	(Y.-L. Chen et al., 2022; Y.-C. Chen, 2024; J.-H. Wang et al., 2020; Y. Wang et al., 2022)
	Hongkong	1	(Song et al., 2023)
	Japan	2	(Bottega et al., 2023; Nakamura et al., 2024)
	South Korea	5	(Bottega et al., 2023; Hwang et al., 2024; S. Lee et al., 2025; Park et al., 2019; Yun et al., 2024)
	Singapore	1	(Song et al., 2023)
North America	U.S.	8	(Allen et al., 2019; Chabot et al., 2020; Divekar et al., 2022; Gorham et al., 2019; Hwang et al., 2024; S. Lee et al., 2025; Seow, 2023; Shukla et al., 2019)
	Canada	1	(Hollingworth & Willett, 2023)
Europe	UK	3	(Kizilkaya et al., 2019; Smuts et al., 2019; Tolba et al., 2024)
	Spain	1	(Hajahmadi et al., 2024)
	Italy	1	(Hajahmadi et al., 2024)
	Russia	1	(Shadiev et al., 2021)
Africa	Morocco	1	(Tazouti et al., 2019)
	Egypt	1	(Tolba et al., 2024)
South America	Brazil	1	(Bottega et al., 2023)
	Uruguay	1	(Bottega et al., 2023)

Table 3. Synthesis of Learning Outcomes in AI–XR Language Education.

Learning Outcome	Number of Studies	Articles
Increased Engagement	10	Bottega et al. (2023); Chabot et al. (2020); Y.-L. Chen et al. (2022); Gorham et al. (2019); Kizilkaya et al. (2019); Song et al. (2023); Tazouti et al. (2019); Xin and Shi (2024); Xu et al. (2019); Yu (2023)
Increased Motivation	11	Bottega et al. (2023); Y.-L. Chen et al. (2022); Gorham et al. (2019); Hollingworth and Willett (2023); Shadiev et al. (2021); Tazouti et al. (2019); Tolba et al. (2024); J.-H. Wang et al. (2020); Y. Wang et al. (2022); Xin and Shi (2024); Xu et al. (2019)
Vocabulary Acquisition	6	(Bottega et al., 2023; Chabot et al., 2020; Y.-L. Chen et al., 2022; Divekar et al., 2022; Hollingworth & Willett, 2023; Xu et al., 2019)
Reduction in Anxiety	4	(Y.-C. Chen, 2024; H. Lee et al., 2023; Y. Wang et al., 2022; Yu, 2023)
Cultural Learning	2	(Mirzaei et al., 2018; Shadiev et al., 2021)
Overall Speaking Proficiency Improvement	3	(Divekar et al., 2022; Kizilkaya et al., 2019; Obari et al., 2020)
Knowledge Transfer and Retention	3	(Y.-L. Chen et al., 2022; Divekar et al., 2022; Tolba et al., 2024)
Listening Comprehension Improvement	4	(Y.-L. Chen et al., 2022; Divekar et al., 2022; Obari et al., 2020; Tolba et al., 2024)
Autonomy Improvement	2	(Y.-L. Chen et al., 2022; Song et al., 2023)
Speaking Fluency Improvement	2	(Y.-L. Chen et al., 2022; Kizilkaya et al., 2019)
Pronunciation Improvement	1	(Tolba et al., 2024)
Interaction Improvement	3	(Chabot et al., 2020; Song et al., 2023; Tolba et al., 2024)
Personalized Learning	1	(Yu, 2023)
Not Specified	10	(Allen et al., 2019; Guo et al., 2017; Hajahmadi et al., 2024; Nakamura et al., 2024; Park et al., 2019; Seow, 2023; Shukla et al., 2019; Smuts et al., 2019; Yang & Wu, 2024; Yun et al., 2024)

Table 4. Summary of XR Applications in Language Learning.

XR Technology	Design Software	Number of Studies	Articles
VR	3D Images	3	(Nakamura et al., 2024; Seow, 2023; Song et al., 2023)
	Commercial Software	2	(Y.-C. Chen, 2024; Gorham et al., 2019)
	360 Video	2	(Shadiev et al., 2021; Y. Wang et al., 2022)
	Unity 3D	4	(Bottega et al., 2023; Park et al., 2019; Smuts et al., 2019; Tazouti et al., 2019)
	Unity 3D and 3D Images	2	(Y.-L. Chen et al., 2022; Yang & Wu, 2024)
	Not Specified	6	(Guo et al., 2017; Kizilkaya et al., 2019; Mirzaei et al., 2018; Obari et al., 2020; J.-H. Wang et al., 2020; Yun et al., 2024)
AR	3D Images	1	(Tolba et al., 2024)
	Unity 3D	1	(Hollingworth & Willett, 2023)
	Unity 3D, 3D Max, Vuforia	1	(Xin & Shi, 2024)
	Not Specified	4	(Hajahmadi et al., 2024; H. Lee et al., 2023; Shukla et al., 2019; Xu et al., 2019)
MR	360° Panoramic Displays	3	(Allen et al., 2019; Chabot et al., 2020; Divekar et al., 2022)
Metaverse	3D Images	1	(S. Lee et al., 2025)
	Commercial Platform	1	(Hwang et al., 2024)
	Unity 3D	1	(Yu, 2023)

Table 5. AI Technologies Used in XR Language Learning Studies.

AI Technology Applications	Number of Studies	Articles
Automatic Speech Recognition (ASR)	22	(Allen et al., 2019; Bottega et al., 2023; Chabot et al., 2020; Y.-L. Chen et al., 2022; Y.-C. Chen, 2024; Divekar et al., 2022; Guo et al., 2017; Hajahmadi et al., 2024; Hollingworth & Willett, 2023; Kizilkaya et al., 2019; H. Lee et al., 2023; Mirzaei et al., 2018; Nakamura et al., 2024; Obari et al., 2020; Park et al., 2019; Seow, 2023; Shadiev et al., 2021; Shukla et al., 2019; Tolba et al., 2024; J.-H. Wang et al., 2020; Yang & Wu, 2024; Yun et al., 2024)
Natural Language Processing (NLP)	14	(Allen et al., 2019; Chabot et al., 2020; Y.-C. Chen, 2024; Divekar et al., 2022; Guo et al., 2017; Hollingworth & Willett, 2023; H. Lee et al., 2023; Mirzaei et al., 2018; Obari et al., 2020; Park et al., 2019; Seow, 2023; Xu et al., 2019; Yang & Wu, 2024; Yu, 2023)
Large Language Models (LLMs)	6	(Bottega et al., 2023; Hajahmadi et al., 2024; Hollingworth & Willett, 2023; H. Lee et al., 2023; Nakamura et al., 2024; Seow, 2023)
Machine Learning (ML)	5	(Divekar et al., 2022; Gorham et al., 2019; Shukla et al., 2019; Smuts et al., 2019; Yang & Wu, 2024)
Text-to-Speech (TTS)	11	(Allen et al., 2019; Bottega et al., 2023; Y.-L. Chen et al., 2022; H. Lee et al., 2023; Nakamura et al., 2024; Obari et al., 2020; Park et al., 2019; Seow, 2023; Shadiev et al., 2021; Shukla et al., 2019; Xu et al., 2019)
Computer Vision (CV): Object and Image Detection; Gesture and Motion Tracking	13	(Allen et al., 2019; Bottega et al., 2023; Chabot et al., 2020; Y.-L. Chen et al., 2022; Divekar et al., 2022; Gorham et al., 2019; Hajahmadi et al., 2024; H. Lee et al., 2023; Mirzaei et al., 2018; Song et al., 2023; J.-H. Wang et al., 2020; Xu et al., 2019; Yu, 2023)
AI-driven Real-Time Feedback System (FS)	7	(Y.-C. Chen, 2024; Guo et al., 2017; Hajahmadi et al., 2024; Shukla et al., 2019; Tolba et al., 2024; J.-H. Wang et al., 2020; Y. Wang et al., 2022)
Multimodal AI Agent (MMAA)	3	(Y.-L. Chen et al., 2022; Divekar et al., 2022; Tazouti et al., 2019)
AI-powered Conversational Agent and Chatbot (CA)	7	(Chabot et al., 2020; Hwang et al., 2024; S. Lee et al., 2025; Mirzaei et al., 2018; Nakamura et al., 2024; Park et al., 2019; Yun et al., 2024)
Machine Translation (MT)	1	(Shadiev et al., 2021)

Table 6. AI–XR Integration Strategies in Language Learning.

Language Focus	Number of Articles	Integration Strategies	Example Studies
Speaking	15	Extended Reality with Automatic Speech Recognition Extended Reality with Text-to-Speech Extended Reality with Conversational Agents Extended Reality with Computer Vision	(Bottega et al., 2023; Y.-L. Chen et al., 2022; Y.-C. Chen, 2024; Divekar et al., 2022; Guo et al., 2017; Hollingworth & Willett, 2023; Kizilkaya et al., 2019; H. Lee et al., 2023; Mirzaei et al., 2018; Nakamura et al., 2024; Park et al., 2019; Seow, 2023; Shukla et al., 2019; J.-H. Wang et al., 2020; Yun et al., 2024)
Listening	3	Extended Reality with Automatic Speech Recognition Extended Reality with Natural Language Processing	(Allen et al., 2019; Chabot et al., 2020; Guo et al., 2017)
Writing	3	Extended Reality with Real-Time Feedback System Extended Reality with Text-to-Speech Extended Reality with Machine Learning	(Gorham et al., 2019; Y. Wang et al., 2022; Xu et al., 2019)
Vocabulary	3	Extended Reality with Automatic Speech Recognition Extended Reality with Computer Vision	(Allen et al., 2019; Chabot et al., 2020; Song et al., 2023)
Cross-Cultural Learning	3	Extended Reality with Automatic Speech Recognition Extended Reality with Conversational Agents	(Mirzaei et al., 2018; Shadiev et al., 2021; Yang & Wu, 2024)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, W.; Li, B.; Lowell, V.L. Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024). Educ. Sci. 2025, 15, 1066. https://doi.org/10.3390/educsci15081066

AMA Style

Yan W, Li B, Lowell VL. Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024). Education Sciences. 2025; 15(8):1066. https://doi.org/10.3390/educsci15081066

Chicago/Turabian Style

Yan, Weijian, Belle Li, and Victoria L. Lowell. 2025. "Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024)" Education Sciences 15, no. 8: 1066. https://doi.org/10.3390/educsci15081066

APA Style

Yan, W., Li, B., & Lowell, V. L. (2025). Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024). Education Sciences, 15(8), 1066. https://doi.org/10.3390/educsci15081066

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Artificial Intelligence and Extended Reality in Language Education: A Systematic Literature Review (2017–2024)

Abstract

1. Introduction

2. Methodology

2.1. Search Strategy (Identification)

2.2. Inclusion and Exclusion Criteria (Eligibility)

2.2.1. Inclusion Criteria

2.2.2. Exclusion Criteria

2.3. Screening and Selection

3. Results

3.1. Demographic Information of Selected Studies

3.1.1. Geographic Locations

3.1.2. Publication Trend

3.1.3. Scholarly Sources by Type

3.1.4. Educational Levels

3.1.5. Target Languages

3.1.6. Language Foci

3.1.7. Participants Number

3.1.8. Length of Intervention

3.1.9. Learning Outcomes

3.1.10. XR Applications

3.1.11. AI Applications

3.1.12. Theoretical Framework

3.2. Integration Strategies and Integration Affordances

3.2.1. Integration Strategies

AI–XR Integration Strategies in Speaking

AI–XR Integration Strategies in Listening

AI–XR Integration Strategies in Writing

AI–XR Integration Strategies in Vocabulary

AI–XR Integration Strategies in Cross-Cultural Learning

3.2.2. Affordances

Technical Affordances

Pedagogical Affordances

Affective Affordances

3.3. Challenges, Design Considerations and Future Directions

3.3.1. Challenges

Technical Limitations

Pedagogical Constraints

Scalability and Generalizability

Ethical and Human-Centered Concerns

Infrastructure and Cost Barriers

3.3.2. Design Considerations

Technological Design

Pedagogical Alignment

Learner-Centered Interaction

Accessibility and Scalability

Evaluation and Iteration

3.3.3. Future Directions

Technological Advancements and Integration

Pedagogical Expansion and Personalization

Methodological Rigor and Longitudinal Validation

Curriculum Integration and Scalability

Ethical Frameworks and Inclusive Design

4. Discussion

4.1. Synthesis and Interpretation of Findings

4.1.1. The Current Landscape of AI–XR in Language Education (RQ1)

4.1.2. Pedagogical Integration and Affordances (RQ2)

4.1.3. Challenges, Design Considerations, and Research Gaps (RQ3)

4.2. Theoretical and Conceptual Insights

4.3. Contextual Nuances and Emerging Patterns

4.4. Critical Reflections on the Review Process

4.5. Positioning the Review Within the Broader Discourse

4.5.1. Bridging Fragmented Research Silos

4.5.2. Advancing Pedagogical and Design Frameworks

4.5.3. Setting a Research Agenda

4.5.4. Reframing the Role of AI and XR in Language Education

4.5.5. Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics