The Effects of Virtual Reality-Assisted Language Learning: A Meta-Analysis

: Existing literature reﬂects that VR technology is widely used in language learning settings. Although many studies have identiﬁed the multiple beneﬁts and affordance of Virtual Reality (VR) technologies in language learning, most studies are qualitative studies that do not provide substantial evidence to investigate the impact of this technology on language learning. To this end, this study conducted a meta-analysis of 21 quantitative studies with 1144 participants published between 2010 and 2021. The study’s main purpose was to examine the effects of VR on students’ language learning academic performance, including linguistic gains and affective gains. The results indicated that VR-assisted language learning had a medium effect on the linguistic gains (Hedges’ g = 0.662, 95% CI [0.398–0.925], p < 0.001) and affective gains (Hedges’ g = 0.570, 95% CI [0.309–0.831], p < 0.001) of students compared to non-VR conditions, respectively. Furthermore, the study further analyzed the impact of several moderator variables such as education levels, hardware types, language skills, target language, and L1/L2 on language learning gains. The research indicates that VR technology has a great potential to improve language learning as an educational resource and provides suggestions for further research and practice on the use of VR-assisted language learning.


Introduction
Language serves as a bridge between humans and society to communicate.With the growing trend of globalization, language education has also developed rapidly.For example, the recognized international language, English, has become a compulsory subject in many countries and regions [1].However, language learning is a time-consuming and challenging process for many students as far as the current situation is concerned.The main reason for language learning difficulties is the lack of an authentic language environment, and learners cannot personally contact relevant contexts to use the target language to achieve learning goals [2].That is, it is essential that language learners are provided an authentic learning environment and meaningful tasks [3].The advancement of computer technology as a learning tool has provided new methods and created real-world environments to improve language learning [4].As a novel technology, virtual reality (VR) has provided numerous alternative learning opportunities to language learners in the past decade [5].
VR technology and device appeared in the early 1960s, and many studies on VR and its applications have already been carried out in recent years [6].Virtual Reality (VR) refers to a three-dimensional (3D) environment generated by computer technology, which can provide a context similar to visual simulation and other senses [7].It allows users to communicate with people, machines, and other entities in the virtual environment by using computers and various devices [8,9].There are two main types of virtual reality devices, namely immersive VR and non-immersive VR.Immersive VR involves high immersion and costs, such as cave automatic virtual environment (CAVE) and headset VR devices.

of 18
Non-immersive VR commonly refers to the desktop VR that interacts with computers through the mouse, control handle, and other devices [7,10].The dramatic reduction in the cost of devices and technology has driven a rapid growth of VR applications in educational fields such as medicine, science, and mathematics in recent years that has been proven to be positive [11][12][13].Learners feel the actual situation through sensory organs, which can help them improve their motivation, participation, and learning ability [5].Moreover, VR has also been applied to language learning and has shown the importance and potential of applications to support language learning.

Applications of VR in Language Learning
In recent years, a large number of studies have documented the application of technology-enhanced language learning.VR has also received a great deal of attention in this field.VR technology is characteristics of immersion, interaction, and involvement [14].It breaks through the limitations of traditional media, provides language learners with a realistic simulated language learning environment, and effectively supports their language learning.For example, Wehner, Gump, and Downey [15] investigated the effect of Second Life on the motivation of English-speaking undergraduates to learn Spanish.They found that Second Life increased participants' motivation.Huang, Hwang, and Chang [7] developed a Chinese writing SVVR learning system based on spherical video-based virtual reality (SVVR) and compared the effects of this method with traditional technology-supported learning methods in high school writing classes.It was found that, compared to a control group that participated in traditional technology-supported learning classes, the SVVR writing improved students' writing performance and their self-efficacy while also reducing their cognitive load.For example, Tai and Chen [16] randomly assigned 72 middle school students to the experimental group (mobile headset virtual device) and control group (traditional multimedia) to investigate the effectiveness of mobile headset virtual devices for learners' English listening comprehension compared with video learning.The results showed that mobile headset virtual devices were significantly higher than video.In summary, these studies indicate that VR can promote language skills and foster motivation and self-efficacy.

Previous Review Studies on VR-Assisted Language Learning
As far as we know, most of the existing literature [10,[17][18][19][20] has described the systematic reviews of VR application in language learning through qualitative variables.These studies have shown that VR application in language learning is steadily growing and effectively used in language learning settings.Lin and Lan [17] conducted a content analysis of VR applications in language learning from 2004 to 2013, and they found that the research topics on VR-enhanced language learning were mainly interactive communication, behavior, affections, and task-based instruction.Solak and Erdem [18] also conducted a content analysis on 40 papers published from 1995 to 2015 on foreign language learning and teaching through virtual reality and identified that the data collection tool preferred was document analysis, and that the effectiveness of VR and game-based learning were the two prominent topics in this field.Huang, Zou, Cheng, and Xie [20] analyzed the VR on language learning from five perspectives: ways of integrating VR tools in language learning; primary users; major research findings; why it can effectively promote language learning's impact.Qiu et al. [10] conducted a systematic literature review that included 150 articles on VR/AR in language learning from 2008 to 2019.The study found that VR was most applied in higher education and that the most reported advantages of VR in language learning were learners' learning behaviors, learning attitudes, and learning performance.It also showed that task-based learning and game-based learning were the most common learning strategies.
Furthermore, one meta-analysis measuring the impact of VR on language learning has been published in journals.Wang et al. [21] conducted a meta-analysis to examine the effect size of 3D virtual worlds applied to language learning in the past two decades.The research included 13 primary studies and found that overall effect sizes of linguistic and affective gains were 0.832 and 0.531, respectively.However, the aforementioned review studies present certain limitations.In existing literature reviews, we find that most of the studies, through a qualitative approach, describe and summarize the findings of VR technology in language learning and do not consider quantitative methods with which to measure the impact of this technology on language learning.It is not easy to use qualitative approaches to evaluate the actual overall effectiveness of VR application on language learning and how moderator variables affect the effects.Therefore, the present research has important research implications.We conducted a meta-analysis to examine and quantify studies on the effectiveness of VR-assisted language learning and to determine which moderator variables influence the effectiveness of virtual reality to better guide the future application of virtual reality technology in language learning.

Purpose of This Meta-Analysis
This meta-analysis used the PICO framework [22] as the research framework.The PICO framework includes four major components: (1) Population.It refers to students of different education levels who may be involved in language learning through VR applications.(2) Intervention.It is considered as the treatments (e.g., the hardware types, target language, and L1/L2) of studies on VR-assisted language learning.(3) Comparison.It refers to the teaching strategy implemented in the control groups, including traditional teaching methods and other educational technology resources.( 4) Outcomes.It focuses on the student's learning performance of VR applications on language learning, such as acquiring language skills and affective gains.
Therefore, the study proposed three purposes to provide a summary on the status quo of the application of virtual reality technology in language learning, to determine the overall effect size of virtual reality applications on students' language learning outcomings, and to identify whether moderator variables influence the impact of virtual reality on students' language learning gains.As a result, the research questions were formulated as follows: 1.
What is the research status of VR technology-assisted language learning?Specifically, what education levels, hardware types, language skills, target language, and L1/L2 were involved in the existing studies about this field?2.
What is the overall effectiveness of virtual reality applications on students' language learning achievement?3.
What kinds of moderator variables influence the effectiveness of virtual reality on students' language learning outcomes?

Method
We followed the procedure of meta-analysis proposed by Glass, MacGaw, and Smith [23].The procedures included (1) collecting studies, (2) coding the features of studies, (3) calculating the effect size of each study, and (4) investigating the moderating effects of the study's characteristics.

Data Sources and Search Strategy
The literature search procedure of this study rigorously followed the PRISMA guidelines [24] to collect literature related to VR-supported language learning.In order to obtain relevant sufficient and scientific papers, this study searched journal articles published from 2010 to 2021.The major databases used for the searches were the Web of Science Core Collection, Springer Link, Wiley Online Library, and Scopus.Two sets of search keywords were used to search: (1) VR technology-related keywords, including VR, Virtual Reality, Virtual Environment, and Virtual Worlds; and (2) language-related keywords, including Language, Second Language, Foreign Language, Second Language Acquisition, EFL, Listening, Speaking, Reading, Writing, and Vocabulary.These two sets of keywords were used in incorporation with Boolean operators [25].Moreover, the major journals of educational technology (e.g., Computers & Education, Computer Assisted Language Learning, Journal of Computer Assisted Learning, British Journal of Educational Technology, Language Learning & Technology, Interactive Learning Environments, and ReCALL) and the list of collected papers' references were perused as a supplement.

Inclusion and Exclusion Criteria
We followed the inclusion and exclusion criteria in Table 1 to select the literature to meet the meta-analysis requirements.All the papers had to be written in English and published between 2010 and 2021.The studies had to be empirical and had to have adopted either an experimental or quasi-experimental research design in which the experimental group used VR-assisted students' language learning compared to control groups that used other learning methods.Furthermore, the studies had to provide sufficient information to calculate effect sizes, such as standard deviations, means, and sample sizes.Finally, experimental results included learning achievement (e.g., different language skills) or affective gains, and the experimental and control groups' learning contents had to be the same.

Search Results
The process of literature identification, screening, eligibility, and eventual inclusion in this meta-analysis is shown in Figure 1.We found 561 journal articles related to VR-assisted language learning published from 2010 to 2021 and 8 articles from other sources during the first phase of literature selection.After removing 117 duplicates, two researchers read the title and abstract of each article and then identified whether or not the article was relevant to VR-assisted language learning.Ultimately, a total of 82 full-text articles were selected during the identification and screening process.
In the second phase, the included articles of the first stage were further screened.Only experimental and quasi-experimental studies involving VR-assisted language learning were carefully examined to identify whether the VR technology and other teaching/learning methods were included.At this stage, theoretical studies, qualitative research, and survey research were all excluded.After the second screening stage, 29 articles remained for consideration in the meta-analysis.
In the final stage, we examined the experimental data reported in the study.Studies that did not provide sufficient data to calculate effect sizes (e.g., mean, standard deviation, sample size) were excluded.After this stage, only 21 articles were included in the further meta-analysis.Of the 21 articles, only two were affective gains, and six included both linguistic and affective gains.
ing/learning methods were included.At this stage, theoretical studies, qualitative research, and survey research were all excluded.After the second screening stage, 29 articles remained for consideration in the meta-analysis.In the final stage, we examined the experimental data reported in the study.Studies that did not provide sufficient data to calculate effect sizes (e.g., mean, standard deviation, sample size) were excluded.After this stage, only 21 articles were included in the further meta-analysis.Of the 21 articles, only two were affective gains, and six included both linguistic and affective gains.

Coding Scheme
The coding scheme of the present study was composed of four main categories: basic research information, research participants' level, control treatment, research treatments, and learning outcomes.In addition to the basic research information, the other four items and their moderator variables correspond to the components of the PICO framework.The coding scheme was described in detail as follows:

Basic Research Information
This referred to the author's last name, year of publication, region, and article's title.

Research Participants' Level
The research participants' level corresponded to the "Population" of the PICO framework and was coded by their educational levels, including elementary school, middle school (junior or senior), and college.

Coding Scheme
The coding scheme of the present study was composed of four main categories: basic research information, research participants' level, control treatment, research treatments, and learning outcomes.In addition to the basic research information, the other four items and their moderator variables correspond to the components of the PICO framework.The coding scheme was described in detail as follows:

Basic Research Information
This referred to the author's last name, year of publication, region, and article's title.

Research Participants' Level
The research participants' level corresponded to the "Population" of the PICO framework and was coded by their educational levels, including elementary school, middle school (junior or senior), and college.

Control Treatment
The control treatment of the included studies corresponded to the "Comparison", referring to the different control group treatments.Previous meta-analyses had considered "control treatments" as moderator variables to compare experimental treatments with different control treatments [26,27].Garzón and Acevedo [27] divided the control treatments into three types: multimedia, traditional lectures, and traditional pedagogical tools.In this meta-analysis, the control group treatments were classified into two categories: (1) traditional, which refers to traditional lectures and traditional pedagogical tools, such as curriculum, conventional teacher introduction, or textbooks, or (2) multimedia, which refers to educational resources accessed using videos, images, animation, or computerassisted instruction.

Treatments
In the meta-analysis, for all the included studies, the treatments corresponded to the PICO framework's "Intervention", which was considered as the hardware types, language skills, target language, and L1/L2.The specific descriptions for these treatments were as follows: 1.
Hardware types.According to Qiu et al.'s [10] classification of the immersion level of the VR devices, VR can be divided into non-immersive and immersive devices.The non-immersive devices included desktop VR, smartphones, and tablet computers.
Cave VR and head mount display were classified as immersive devices, such as Samsung Gear VR, phone cardboard, and cave-like VR. 2.
Target languages.This study classified the coding variables of different target languages into English, Chinese, Spanish, German, and Korean.4.
L1/L2.According to Hwang and Fu's [4] review study on mobile language learning applications, the L1 referred to the native and first language that the learners learned from birth.The L2 usually referred to the learner's subsequent language after acquiring their first language.

Learning Outcomes
The learning outcomes corresponded to the PICO framework's "Outcome".It referred to two categories of learning performance, including linguistic gains (e.g., language skills and knowledge or content learning) and affective gains (e.g., learning attitudes, motivation, and self-efficacy) [4,21].
The two researchers coded 21 articles according to the abovementioned coding rules.The coding process was performed in a consistent process, and whenever there was a question about coding and reaching consensus, the researchers held discussions to ensure that the coding of the article was consistent.If two researchers had different coding opinions, the article would be equally allocated to a third researcher for coding until they agree on all the code.In the end, coding was reviewed again to ensure it was accurate (Table A1).

Data Analysis 2.5.1. Computing Effect Sizes
Comprehensive meta-analysis software was used to conduct the meta-analysis and compute the effect sizes.To compare the effect sizes, we chose to use Hedges' g as a standardized measure of effect sizes for computation in the present study.Hedges' g was the adjusted Cohen's d that was a measurement based on the mean difference between the two groups by the pooled standard deviation, and it was helpful for small sample size bias [29].The calculation formula for Cohen's d was as follows: where M E and M C were the estimated means of the experimental and control groups, respectively, with N E and N C being the sample sizes of both groups, and S 2 E and S 2 C the respective standard deviation.Hedges' g and Cohen's d were similar for large sample sizes, but Hedges' g performed best for small samples when Cohen's d was multiplied by a correction factor J (which can be adjusted for small sample bias): where N was the total sample size.
Hedges'g = J × Cohen's d Only one effect size was calculated for each article in the present study because a study contributing more than one effect size might lead to the biased overall effect size [30].If a study had multiple effect sizes, they were combined into one value [29].Simultaneously, the I 2 test was used as heterogeneity examined, where I 2 < 25% indicated low; 25-75% indicated moderate heterogeneity; ≥75% was considered substantial heterogeneity [31].When I 2 statistic > 50%, heterogeneity was considered significant.If there was heterogeneity, the random-effects model was used; otherwise, the fixed-effect model was chosen.

Analyses of Publication Bias
Publication bias occurred when researchers published only favorable results [29].To examine publication bias in this study, it was evaluated with the funnel plot and the fail-safe N test.We first assessed by visually observing the shapes of the funnel plot.If there was no publication bias, the plot resembles a symmetrical inverted funnel.Otherwise, the plot was asymmetric [32].In addition to the funnel plot, the fail-safe N had been calculated.The fail-safe N refers to the number of unpublished studies required to reduce the effect size to an insignificant level [33].There was no publication bias if the fail-safe N was larger than 5n + 10 (where n was the number of effect sizes).

Descriptive Information
Twenty-one articles were included by the literature search, screening, coding, and extracting process in this meta-analysis.Table 2 describes the basic information of the 21 articles included in this study.These articles were published between 2010 and 2021, and the studies were quasi-experimental designs.Different regions had researched VR-assisted language learning, and Taiwan had the most relevant studies.The sample sizes ranged from 10 to 106 participants in both the treatment and control group, and the total sample size was 1144 (treatment group = 563 and control group = 581).The learning outcome included linguistic gain, affective gain, and both.
Table 3 shows the different moderator variables and their corresponding percentages.Regarding control treatment, multimedia teaching was the largest proportion (57.1%).The largest proportion of studies in the educational levels included college students (47.6%), and the second largest group was middle school students (38.1%).As for hardware types, immersive (57.1%) devices had a wider usage than non-immersive (42.9%).In terms of language skills, the most frequently studied were vocabulary (26.3%), writing (26.3%), and speaking (21.1%), followed by mixed (15.8%).Finally, the main target learning was mostly English (66.6%), followed by Chinese (19.0%), and the most studied (85.7%) was L2 learning, with the rest being L1 learning (14.3%).

Overall Effect Size
Table 4 shows the overall effect sizes of linguistic gains and affective gains.Because the Q statistics and I 2 statistics revealed that the effect sizes in linguistic gains (Q = 73.476,I 2 = 75.502,p < 0.001) and affective gains (Q = 14.955,I 2 = 53.317,p < 0.05) were of moderate heterogeneity, the random-effects model was used to pool the data [30].Furthermore, this indicated that one or more moderators were attributable to this heterogeneity other than sampling error [52], and further moderator analysis was required, as described in Section 3.3.Using a random-effects model to pool the effect sizes, the overall effect sizes showed a statistically significant difference in linguistic gains (g = 0.662, CI [0.398-0.925],p < 0.001) and affective gains (g = 0.570, CI [0.309-0.831],p < 0.001) on VR-assisted language learning compared to other learning methods.According to Cohen [53], when the effect size ≤ 0.2, it indicated a small effect, while when it was between 0.2 and 0.8, it was a moderate effect, and ≥0.8 was classified at a large effect.Moreover, this suggested that VR-assisted language learning had a moderate impact on students' linguistic gains and affect in language learning classrooms.In other words, language learning conditions using VR technologies had significantly better learning outcomes than non-VR conditions in language learning.These findings provided solid evidence for the advantages of VR devices in language learning and complemented previous qualitative review research on VR-assisted language learning [20].

The Effect Sizes of Moderator Variables on Linguistic Gains
To learn more about the moderating effects of linguistic gains, this study analyzed the impact of moderator variables on linguistic gains.Table 5 depicts the effect sizes for moderator variables.

Control Treatments
Table 5 indicates that VR had a moderate impact on language learning.However, it was necessary to verify that VR contributed to this effect.In other words, it was important to establish that this effect was not the result of other interventions but the result of intervention with VR.To make this clear, the effect of VR treatment compared to the control treatment was examined.The results shown in Table 5 suggested that using VR technologies was more effective than using other teaching resources, including traditional (g = 0.840, p < 0.001) and multimedia (g = 0.514, p < 0.01) teaching resources.These findings showed that the improvement in language learning appeared to be related to the use of VR and not just the intervention, even if there was no significant difference (Q B = 1.469, p = 0.226) between the various categories of control treatment.

Educational Levels
With respect to educational levels, middle school had largest effect on linguistic gains (g = 0.845, p < 0.001), followed by the college (g = 0.546, p < 0.05).The educational levels of elementary school (g = 0.468, p = 0.182) did not reach significant effect sizes.Q B also did not reach the 0.05 significance level (Q B = 1.322, p = 0.516), suggesting that there was no significant difference between the different categories of educational levels.

Hardware Types
There was a significant difference between the categories of hardware types (Q B = 8.178, p < 0.01).The non-immersive devices had the largest effect size (g = 1.091, p < 0.001) but the immersive devices obtained a moderate effect size (g = 0.409, p < 0.01).

Language Skills
There was no significant difference among the various categories of language skills (Q B = 3.620, p = 0.460).Except for writing skills (g = 0.319, p = 0.220) and mixed (g = 0.522, p = 0.124), which had no significant effect size, other language skills showed a significant effect size.The effect size was largest for speaking skills (g = 1.032, p < 0.01), followed by listening (g = 0.850, p < 0.05) and vocabulary (g = 0.761, p < 0.01).VR can enhance the learning performance of language representations and comprehension in a short period of time, such as speaking learning.For example, Chen and Hwang [36] found that the use of VR can enhance participants' move structures and levels of learning motivation in their speaking learning compared with conventional multimedia learning.

Target Languages
English obtained a moderate-to-high effect size (g = 0.731, p < 0.001), while there was no significant effect size for Chinese (g = 0.497, p = 0.115) and German (g = 0.316, p = 0.618).However, Q B also did not reach the 0.05 significance level (Q B = 0.754, p = 0.686), which indicated that there was no significant difference.In summary, the results showed that applications for English learning were highly effective.Chinese and German did not achieve a significant effect, probably due to the small sample size.

L1/L2
The Q B statistics also did not reach the 0.05 significance level, indicating that there was no significant difference (Q B = 0.605, p = 0.437).The effect size for L2 achieved a large level (g = 0.701, p < 0.001), while there was no significant effect size for L1 (g = 0.361, p = 0.380).

The Effect Sizes of Moderator Variables on Affective Gains
Regarding the moderating effects of affective gains, this study analyzed the impact of moderator variables on affective gains.Table 6 shows the results of descriptive statistics for the moderator analysis on the affective gains.Note: k, the number of effect sizes; g, Hedges' g; SE, standard error; Q-value, Q value of the heterogeneity test between the subgroups; CI, confidence interval; * p < 0.05; ** p < 0.01; *** p < 0.001.

Control Treatments
There was a no significant difference between the different control treatments (Q B = 0.448, p = 0.503).The multimedia had a moderate effect size (g = 0.636, p < 0.001), while there was no significant effect size for traditional (g = 0.421, p = 0.125).

Educational Levels
Table 6 showed that QB did not reach the 0.05 significance level (Q B = 1.603, p = 0.449), indicating no significant difference among the different educational levels.College usage had a moderate-to-high effect size on affective gains (g = 0.719, p < 0.001), while the educational levels of elementary school (g = 0.339, p = 0.376) and middle school (g = 0.380, p = 0.137) did not obtain significant effect sizes.Djigunovic (2014) reported that, due to a lack of contact with the language and world knowledge, children relied on teachers' guidance, which may reduce children's motivation to learn L2 in a situational linguistic learning environment.The reduction in affective gain was influenced by self-assessment rather than language experience, which was similar to the findings of Wang et al. [21].Therefore, college students' affective gain was significantly larger than elementary and middle school.

Hardware Types
There was a no significant difference between the categories of hardware types (Q B = 2.513, p = 0.113).The no-immersive devices obtained a large effect size (g = 0.780, p < 0.001), followed by the immersive devices (g = 0.394, p < 0.05).

Target Languages
As shown in Table 6, English obtained a moderate effect size (g = 0.521, p < 0.01), while there was no significant effect size for Chinese (g = 0.834, p = 0.057), Spanish (g = 0.700, p = 0.143), and Korean (g = 0.570, p = 0.191).However, Q B also did not reach the significance level (Q B = 0.486, p = 0.922), which indicated that there was no significant difference.

L1/L2
The Q B statistics also did not reach the significance level, suggesting that there was no significant difference (Q B = 0.300, p = 0.584).The effect size for L2 obtained a moderate effect size (g = 0.701, p < 0.01), and L1 also obtained a moderate effect size (g = 0.529, p < 0.05).

Evaluation of the Publication Bias
The funnel plot and classic fail-safe N were adopted in this study to examine whether the included studies were affected by publication bias.If there was no publication bias, the funnel plot was similar to a symmetrical inverted funnel [32].The visual analysis in Figures 2 and 3 showed that the funnel chart was centralized and symmetrical, indicating no publication bias.To further confirm whether there was a publication bias, we computed the classic fail-safe N. The results of the classic fail-safe N showed that a total of 444 studies with invalid results were required to invalidate the effect size for linguistic gains, and a total of 77 studies were needed to invalidate the effect size for affective gains.To summarize, we concluded that publication bias was absent, which did not inflate the effect sizes.

Discussion
Many studies have suggested the potential of VR-assisted language learning.However, there is little consensus on whether it can help improve language learning.Through integrating the finding of published empirical studies of VR-assisted language learning, our research provided concrete evidence on the overall effect sizes of VR-assisted language learning on students' linguistic and affective gain and how moderator variables influenced its effectiveness.

The Effectiveness of VR-Assisted Language Learning
Based on the results of 21 articles (N = 1144) in this meta-analysis, we find evidence for the overall effectiveness of VR applications on students' language learning achievement, and we also find an overall positive effect size of 0.662 and 0.570 on students' linguistic and affective gain, respectively, suggesting that VR can enhance students' language learning achievement compared to non-VR conditions.The overall effect size can be considered a medium effect [53], which is consistent with the findings of Wang et al. [21] that VR-assisted language learning can facilitate language knowledge acquisition and enhance affection.The positive findings related to VR-assisted language learning may be attributed to several features of virtual reality: (1) immersive learning can effectively promote language learning [49]; (2) improve language skills through interaction between VR

Discussion
Many studies have suggested the potential of VR-assisted language learning.However, there is little consensus on whether it can help improve language learning.Through integrating the finding of published empirical studies of VR-assisted language learning, our research provided concrete evidence on the overall effect sizes of VR-assisted language learning on students' linguistic and affective gain and how moderator variables influenced its effectiveness.

The Effectiveness of VR-Assisted Language Learning
Based on the results of 21 articles (N = 1144) in this meta-analysis, we find evidence for the overall effectiveness of VR applications on students' language learning achievement, and we also find an overall positive effect size of 0.662 and 0.570 on students' linguistic and affective gain, respectively, suggesting that VR can enhance students' language learning achievement compared to non-VR conditions.The overall effect size can be considered a medium effect [53], which is consistent with the findings of Wang et al. [21] that VR-assisted language learning can facilitate language knowledge acquisition and enhance affection.The positive findings related to VR-assisted language learning may be attributed to several features of virtual reality: (1) immersive learning can effectively promote language learning [49]; (2) improve language skills through interaction between VR and learners [5,54]; (3) effectively filter by authentic language learning settings (e.g., learning anxiety) [7,49].Our meta-analysis reveals the potential of using VR applications in language learning and provides teachers with options to support their teaching.

The Moderator Analyses of VR-Assisted Language Learning
In this study, we identify six moderator variables and investigate how the design of the intervention influenced the effectiveness of VR-supported language learning.Although the overall results are positive, we should consider that individual studies' results may vary by factors, such as control treatment, education level, hardware type, target language, language skill, L1/L2.However, the findings of our moderator analysis indicate that only one moderator (hardware type) is significant at p < 0.01.Compared to immersive devices, non-immersive devices have the most significant effect on linguistic gains.The reason for this is that long-term use of immersive devices affects learners' senses, such as dizziness [10].Furthermore, students might only focus on the interesting content rather than the learning content, failing to achieve learning goals [21].It also shows that the effect of immersive and non-immersive devices on learners' language learning can be compared in future research.
Regarding control treatment, we compared VR applications as a teaching resource with other types of teaching resources, including multimedia and traditional resources.We do not find significant differences among the control treatment in this study and suggest that VR applications promote language learners' linguistic and affective gain compared to multimedia and traditional resources.Moreover, higher-education students are the main research subjects, consistent with some research on VR-assisted language learning [10,21].The reason for this may be that most of the researchers who support language learning with VR come from colleges and may find college students easier than collecting relevant data in K-12 schools.While VR has a large effect on the linguistic gains of middle education and a medium-to-large effect on the affective gains of higher education students, these differences were not statistically significant in the different educational levels.Overall, language of the different educational levels was positively influenced by VR applications, which is consistent with previous research results [21].
The findings of our moderator analysis indicated a large effect size for speaking than other language skills in terms of language domains.However, these differences were not statistically significant for different language skills.It should be noted that research on reading is lacking, which should be addressed in future research.At the same time, the result related to the target language and L1/L2 revealed that most studies on VR-assisted language learning have focused on English as an L2.This may be because learning an L2 is far more challenging than learning L1 [17].Our results regarding this moderator suggest that VR technology could be used in both L1 and L2 learning, indicating that these have a good potential as an educational resource.

Limitations of the Study
As a limitation of our meta-analysis, only journal articles published from 2010 to 2021 in the two major databases and major journals were included in this study.Although it was found that publication bias was unlikely to be an issue for this meta-analysis, the findings of linguistic and affective gains might be influenced by studies with insignificant results that were rejected due to the small sample of the study.Moreover, articles excluded by the eligibility criteria may also alter the meta-analysis results.More studies were needed to assess the actual value of large effect sizes with small samples.Therefore, future research should include journal articles, conference papers, doctoral theses, reviews, and articles in languages other than English should also be considered.Moreover, other databases (e.g., ProQuest or JSTOR) and longer time periods should be considered when searching for relevant articles.

Conclusions
Based on the above discussion, several suggestions for VR-assisted language learning are put forward as follows: First, future research should enrich the diversity of applications of VR-assisted language learning.The essence of VR-assisted language learning is VR technology designed to improve the effectiveness of language teaching/learning methods.Existing research should not be limited to higher education but should focus more on kindergarten and K-12 education.For example, Cerezo, Calderón, and Romero [55] adopted VR-assist language learning to help preschool children practice the pronunciation of basic English vocabulary.The results showed that VR-assisted language learning significantly impacted the children's motivation and improved their performance compared to traditional methods.Furthermore, most of the studies focused on vocabulary-related knowledge (e.g., vocabulary and writing) and a lack of grammar knowledge and reading skills.Language learning programs mainly involve English as a foreign language, with few studies having focused on learning other languages as a second or foreign language, and there has been little research related to native language teaching.Thus, future research should consider different languages with different skills to expand the diversity of VR-assisted language learning.Second, future research should examine the effects different VR devices on language learning.The results of our moderator analysis show that only the different device types had significant differences in this meta-analysis.Non-immersive devices performed better for students' linguistic gain than immersive devices.The existing studies have mainly compared VR with traditional teaching on language learning, and there is a lack of research on the effect of different VR devices in language learning.Therefore, it is worthwhile to examine the effects of different VR devices on language learning in future research.
Third, future research should not only focus on language knowledge acquisition and affective enhancement, but also focus on the development of higher-order thinking in the process of language acquisition.From these studies included in the meta-analysis, it can be found that the results of language gains are mainly obtained through post-intervention tests, and the analysis of emotional gain is mainly obtained through questionnaires.However, there is a lack of a deeper mechanism exploration of VR-assisted language learning, such as higher-order cognition or behavior evaluation.With the development of emerging technologies such as artificial intelligence and big data, more innovative approaches, such as learning behavior analysis, should be used to understand the nature of VR-supported language learning.For example, there has been evidence from neuroscience research to support immersion learning for L2 acquisition [56].Future research should not only consider learning benefits, but also examine the development or changes in higher-order thinking or abilities.

Figure 2 .
Figure 2. Funnel plot assessing publication bias on linguistic gains.

Figure 2 .
Figure 2. Funnel plot assessing publication bias on linguistic gains.

Figure 2 .
Figure 2. Funnel plot publication bias on linguistic gains.

Figure 3 .
Figure 3. Funnel plot assessing publication bias on affective gains.

Figure 3 .
Figure 3. Funnel plot assessing publication bias on affective gains.

Table 1 .
Inclusion and exclusion criteria.

Table 2 .
Basic Information of the included studies.
Note: E, experimental group; C, control group.

Table 3 .
Categories of the 21 included articles.

Table 4 .
The effect sizes of linguistic-and affective-gain in random-effects model.
Note: k, the number of effect sizes; g, Hedges' g; SE, standard error; Q-value, Q value of the heterogeneity test between the subgroups; CI, confidence interval; * p < 0.05; *** p < 0.001.

Table 5 .
Descriptive for the moderator analysis on the linguistic gains.

Table 6 .
Descriptive for the moderator analysis on the affective gains.