English-Medium Instruction as a Pedagogical Strategy for the Sustainable Development of EFL Learners in the Chinese Context: A Meta-Analysis of Its Effectiveness

: With English-medium instruction (EMI) as a pedagogical strategy being practiced worldwide in higher education (HE), extensive research has explored stakeholders’ attitudes toward, and perceived beneﬁts and challenges of EMI based on self-report data. However, the actual effectiveness of EMI on students’ subject content and English language learning achievements tested with objective measures has accrued little evidence. This meta-analysis synthesized 44 independent samples (32 in medical disciplines) from 36 studies. The results show that EMI students performed signiﬁcantly better in both subject content and English learning than students in Chinese-medium courses, but it should be noted that the difference in content learning was found only with students from medical disciplines. Discipline was the only signiﬁcant factor moderating content learning, while disciplines, research design, and instruction time in English signiﬁcantly moderated English learning. The ﬁndings provide implications for implementing EMI in similar contexts and highlight the importance of rigorous future research to examine the beneﬁts of EMI. 7.832, p < 0.000. The homogeneous test showed non-signiﬁcant results for both achievement in academic subjects, Q (11) = 13.322, p > 0.05, T = 0.091, I 2 = 24.934%, and English learning, Q (5) = 3.596, p > 0.05, T = 0.000,


Introduction
Internationalization has been widely embraced in higher education institutions (HEIs), one collateral impact of which is "Englishization" in HEIs manifested in adopting Englishmedium instruction (EMI) as a pedagogical strategy [1]. Generally, EMI refers to the use of English to teach non-language academic subjects in "countries or jurisdictions where the first language of the majority of the population is not English" [2] (p. 37). It has often been likened to "content and language integrated learning" (CLIL) or "integrating content and language in higher education", notions used in Europe. However, while both CLIL and EMI share the pedagogical objective of improving "students' L2 proficiency by teaching subject matter through L2" [3] (p. 663), the "language component" is usually not the focus of teaching and assessment in EMI programs, and EMI instructors are usually not L2 specialists [4]. Despite the ostensibly straightforward definition of EMI, its actual practices vary greatly at geographical, institutional, and even classroom levels [2]. For instance, in the context of mainland China (henceforth referred to as China), the official term used to describe the adoption of English in non-language courses, partially or fully, is shuangyu jiaoxue (bilingual teaching) or shuangyu jiaoyu (bilingual education) [5]. However, bilingual teaching or education practices in China differ hugely from bilingual education models in international contexts [6]. While bilingual education in Western contexts offers lengthy exposure to two languages for various reasons, such as maintaining heritage language or promoting additive bilingualism, bilingual education in China mainly aims to promote content and language learning simultaneously through offering semester-based English-taught courses (for an in-depth discussion of the differences, see [6]). To be more

Perceptions and Measured Effectiveness of EMI
Previous research on EMI has generated rich descriptions of learners' and teachers' perceptions of EMI, including their attitudes and perceived challenges and obstacles [9][10][11][12][13][14]. Generally, students' positive attitudes toward EMI have been reported. For instance, in their questionnaire survey and interviews conducted among 579 students at 12 HEIs and 28 staff members at 8 universities in Japan and China, Galloway et al. [10] reported that students mostly had a positive image about their EMI courses or programs, with nearly 90% of the students agreeing that EMI is appropriate at the university level. Notably, EMI was perceived to do a better job at improving their overall English language proficiency than their learning of subject knowledge. A similar positive result was reported in Kong and Wei's [13] survey of 282 Chinese students. In this study, the item "EMI helps improve my English language proficiency" received an endorsement from 70.3% of the respondents, with a mean score of 3.81 on a 5-point scale. In a recent systematic review on attitudes toward EMI in East Asia and the Gulf, Graham and Eslami [15] found that of the 20 studies included, 4 studies reported positive attitudes and 7 slightly positive attitudes, which accounted for 55% of the studies being reviewed. Teachers' positive attitudes toward EMI have also been documented. In Galloway et al.'s [10] research, faculty members believed that EMI was beneficial for students' overall language knowledge and subject knowledge. Twenty-five university teachers from Austria, Italy, and Poland in Dearden and Macaro's [9] interview study expressed positive attitudes toward EMI and believed that EMI could improve students' English.
However, despite the optimistic picture about EMI that has surfaced in the literature, many challenges and concerns have been raised, particularly in English-as-a-foreignlanguage (EFL) contexts. Ample criticism has been directed at the gaps between nationallevel (macro) and/or university-level (meso) EMI policies and classroom-level EMI prac-tices (micro) [7,16,17]. Rose and colleagues' [16,17] analyses of government and institutional documents, as well as interviews with stakeholders (i.e., teachers and students), have shown that due to a lack of detailed stipulations or guidelines, policy enactment at the classroom level varied, which caused differing degrees of English provision in actual practice. Consequently, this could confuse faculty members and confound any intended efforts to examine the effectiveness of EMI as a pedagogical strategy.
Another repeatedly reported factor that affects the effectiveness of EMI is the English proficiency of both teachers and students [18][19][20][21][22]. Although instructors of content courses may have met the qualifications for EMI given their overseas academic training or other experiences, many of them are not sufficiently competent to use the English language at will. In Doiz et al.'s [19] focus group interviews with 13 EMI lecturers in Spain, the lecturers all mentioned that teaching in a foreign language was one of the biggest challenges, which resulted in a reduction of vocabulary at their disposal and reduced details in their explanations regarding the content subject. Due to inadequate English proficiency, EMI instructors may have to stay close to the textbooks and be unable to teach in "a spontaneous, interactive, freewheeling manner" [23] (p. 35). This may then reduce the transmission of lesson content and trigger cognitively lower-order student responses [24] and limit the variety of pedagogical activities otherwise available [25,26]. To worsen this situation, many EMI contexts lack sufficient and effective in-service training for EMI instructors to improve their pedagogical and language skills in managing their teaching [11,27].
Similarly, students' low English proficiency is a widely recognized obstacle that hinders them from comprehending lesson content [11,28]. For instance, Kim [29] found that 60% of the students understood less than 70% of class content in beginner classes in a Korean EMI course. This can result in consequences such as EMI teachers falling back on using students' first language (L1) as a learning resource [28,29], repeatedly explaining specific or technical vocabulary, or simplifying disciplinary content [11,19]. Consequently, EMI's initially assumed benefits for improving both English and content learning may not be materialized.
Based on the mixed perceptions of EMI presented above, a critical issue worth investigating is to what extent EMI effectively brings about both subject content and language learning achievements. Hitherto, many studies have focused on the effectiveness of EMI perceived by students. There is a dearth of studies concerning the objective measure of such effectiveness (see [21,22,[30][31][32][33][34]), and inconclusive findings have been reported. For instance, while significant gains in English vocabulary, morphological awareness, and reading comprehension were identified in Li's [35] study of 53 students majoring in Early Childhood Education in one-semester EMI, when this group was compared to a "historical" group, the EMI students scored slightly lower than the "historical" group. In Lei and Hu's [21] study, no significant difference was found between 64 students in an EM program and 72 students in a Chinese-Medium (CM) program in their scores of the national standardized College English Test Band 6 (CET 6). Targeting students majoring in medical sciences, Yang et al. [30] reported no statistical difference between EMI students and CMI students on their scores on the Chinese Medical Practitioner Examination (CMPE) taken in both 2015 and 2016. Joe and Lee's [32] study among 64 Korean medical students, based on the comparison of the students' comprehension of a 50 min lecture in English and in Korean, respectively, showed that the medium of instruction had no effect on students' lecture comprehension, although more than half of the students perceived EMI as less satisfying than the Korean-medium instruction. These findings suggest that no consensus can be reached on the impact of EMI on students' subject content and English learning achievements.
Stronger evidence has been obtained from quasi-experimental research. For instance, Guo, Tong, Wang, Min, and Tang [36] conducted a quasi-experimental study among 18 EMI students and 25 CMI students in China and showed that while EMI students scored numerically higher on post-test of academic achievement and English language, the differences were not significant after adjusting for their performance in the pre-test. Similarly, in Han's [37] quasi-experimental study, 137 EMI students performed equally well on an end-of-year mathematics test compared to their 137 non-EMI counterparts. These results seemed to have prompted researchers to dismiss the common concern documented in the literature as to whether EMI carries a detrimental effect on students' content area learning [32,36].
While mixed findings on the impact or effectiveness of EMI on language and subject content learning have arisen from individual studies, systematic reviews or meta-analyses have been conducted to provide a fuller understanding of the issues in question [2,15,38,39]. Macaro et al. [2] systematically reviewed 83 studies on EMI in higher education in different geographical regions and concluded that "the research evidence to date is insufficient to assert that EMI benefits language learning nor that it is clearly detrimental to content learning" (p. 36). In addition to Graham and Eslami's [15] systematic review on attitudes toward EMI among students from East Asia and the Gulf, as mentioned before, Tong et al. [39] systematically reviewed research on bilingual education in Chinese higher education published in English and Chinese. Tong et al.'s [39] study has attempted to explore what aspects of bilingual education in the Chinese context were explored and in what ways, as well as the publication trend. Although these reviews have contributed essential insights into EMI in different regions, they hardly employed quantitative techniques to precisely measure the effectiveness of EMI on promoting content and/or language learning. To this purpose, meta-analysis as a more powerful approach is highly needed. Meta-analysis is "a quantitative method of synthesizing empirical research results in the form of effect sizes" [40] (p. 7). In other words, the effect size (ES) computed in a meta-analysis can provide substantial evidence for assessing the existence of effects. Lo and Lo [38] conducted a meta-analysis in which they synthesized the results of 24 studies on EMI education in secondary schools in Hong Kong between 1970 and 2010. To the best of our knowledge, Lo and Lo's [38] study represented the first attempt to conduct a meta-analysis on EMI. However, the effectiveness of EMI programs at the tertiary level remains mostly unknown, which requires more endeavors to perform such a meta-analysis to shed light on this field.

EMI Practices in Tertiary Education in Mainland China
EMI practices in China have gained momentum since the turn of this century when the Ministry of Education of China [41] issued a directive stipulating that universities should strive to offer 5-10% of undergraduate courses in English or other foreign languages within three years. This top-down policy has been driven by multiple benefits assumed to be brought about by EMI. It is believed that EMI enables students to keep abreast with cuttingedge scientific and technologic achievements [7]. Additionally, the adoption of EMI is expected to raise the quality of higher education and thus increase the competitiveness of HEIs. At a fundamental level, EMI is generally believed to fulfil two goals with one shot, that is, enhancing both subject content and language learning [5].
Despite what the term EMI may suggest, actual practices of EMI in China are complicated. Hu [6] summarized four bilingual models: foreign language teaching in mainstream education, maintenance bilingual instruction, transitional bilingual instruction, and immersion bilingual instruction. The four models can be viewed as situated on a continuum between exclusive use of Chinese (i.e., foreign language teaching in mainstream education) and the predominant use of English (i.e., immersion bilingual instruction) in teaching curriculum content. In the case of maintenance bilingual instruction, Chinese is still used as the main medium of instruction, while English is used more frequently to provide explanations, descriptions, and illustrations; in contrast, the transitional bilingual instruction model is a reversal where English is more regularly used than Chinese [6]. A relatively simplified classification was proposed by Xu [42], who categorized three EMI models based on the amount of English and language used by instructors: Chinese as the main medium of instruction (CMMI), English and Chinese balanced instruction (ECBI), and English as the main medium (EMMI). In these three models, the proportion of English used is less than 30%, about 40-60%, and higher than 70%, respectively. Based on his review of the literature on EMI in China and his observations of EMI classes on the tertiary level, Xu [42] Sustainability 2021, 13, 5637 5 of 20 noted that the textbooks, handouts, or PowerPoint slides used in EMI courses are often in English, and different models of EMI mainly differ in the oral language used.
Conforming to the international research landscape in EMI presented above, similar positive attitudes, concerns, and challenges perceived by students and instructors have been reported in many empirical studies of EMI in this local context (e.g., [7,[11][12][13][14]42]). However, only a handful of studies have concentrated on examining the actual effectiveness of EMI (e.g., [36,37]), leaving this issue unresolved. Hence, this meta-analysis aims to address this issue by synthesizing studies that have statistically assessed students' subject content and/or English learning achievements in EMI courses compared to their counterparts in corresponding CMI courses. Additionally, we considered that the difference in learning achievements between EMI and CMI students may be moderated by academic disciplines, language provision in class, research designs, and even the degree of details disseminated in different types of work, such as unpublished theses and published articles in English (mostly in international journals) or Chinese (in local journals). Hence, we also examined the possible moderating effect of five variables: (a) discipline (medical disciplines, other hard disciplines, and soft disciplines); (b) research design (non-experimental, randomized controlled trial, and quasi-experimental); (c) instruction time in English (50% or above, below 50%, and unreported); (d) publication type (published articles and theses); and e) journal type (in English, in Chinese). Further information about the categories of these variables is presented in the next section.

Research Questions and Context
This study was guided by the following research questions:

1.
Are there any significant differences in subject content and English learning achievements between students taking EMI and CMI courses?

2.
To what extent do the variables of discipline, research design, instruction time in English, publication type, and journal type moderate the differences in subject content and English learning achievements between students taking EMI and CMI courses?
We originally classified discipline into a broad contrast between "hard" disciplines (e.g., medical science, physics) and "soft" disciplines (e.g., Mass Media, Education) in Becher and Trowler's [43] parlance. However, this variable was later categorized into medical disciplines, other hard disciplines, and soft disciplines, because 32 out of 44 independent samples from the collected studies were in the field of medical science, 8 in other hard disciplines (i.e., Physics, Mathematics, Computer Science, Chemistry-related), and 4 in soft disciplines (i.e., Advertising, Mass Media, Early Childhood Education, Business). Separating medical disciplines from other hard disciplines allowed us to perform fine-grained analyses on other hard disciplines (e.g., physics, mathematics). Referencing Tong et al. [39], we categorized research design into the non-experimental design, randomized controlled trial (i.e., experimental design), and quasi-experimental design. Although Xu's [42] abovementioned framework for classifying English provision in teaching is useful, we found that many of the collected studies did not report the actual or approximate percentage of English used in teaching. Hence, instruction time in English was divided into three types: 50% or above of the instruction time, below 50% of the instruction time, and unreported. Publication type had two categories: published articles and unpublished theses. Journal type was classified into English and Chinese in terms of the language used.

Literature Search
The literature search was performed using the keywords "medium of instruction", "English medium instruction" or "bilingual program" and "China" or "Chinese", targeting studies published or completed between January 2001 and July 2020. The databases searched included Web of Science, ERIC, SCOPUS, and ProQuest Dissertations and Theses. We also searched the China National Knowledge Infrastructure Net (CNKI), which is the Sustainability 2021, 13, 5637 6 of 20 largest academic database integrating journal articles and unpublished masters' and doctoral theses in China. The thematic words used were quanying jiaoxue (full English teaching), shuangyu jiaoyu (bilingual education), or shuangyu jiaoxue (bilingual instruction/teaching). Since an extremely large number of studies was returned, the majority of which were not empirical but simply involved theoretical discussion, we decided to add the controlled term xiaoguo (effect/effectiveness) to the Abstract. Initial searches yielded a total number of 4610 studies, of which 79 are journal articles in English, 3917 journal articles in Chinese, and 614 unpublished theses in Chinese. Moreover, we performed forward and backward searches by consulting relevant published review articles, including Macaro et al. [2] and Tong et al. [39]. An addition of 35 studies was then procured, of which 32 were journal articles in Chinese, 2 book chapters in English, and 1 book in Chinese.
In further screening the literature [44], the following inclusion criteria were set up: (a) The study was conducted in tertiary education institutions in mainland China; (b) the study was empirical and involved two groups of students studying the same course in EMI and CMI, respectively; (c) the academic achievement was measured by final exams, tests, or other forms of standardized measures rather than gauged by students' self-reports. Meanwhile, the exclusion criteria employed in this analysis are as follows: (a) The study targeted students majoring in English; (b) the study was conducted in Hong Kong, Macao, Taiwan, and other regions outside mainland China; (c) the study examined bilingual programs involving Mandarin Chinese and an ethnic minority language; (d) the study compared the effectiveness of some pedagogical methods such as flipped classrooms or problem-based learning on students' academic achievement within bilingual programs. Authors of studies with insufficient data information were contacted, which enabled us to include Li's [35] study. Later analysis revealed a detected outlier [45], which was removed (presented later). Altogether, 36 studies involving 44 independent samples fit our criteria, of which 32 samples were notably from medical disciplines. The total sample was 7582 students, of whom 3117 took EMI and 4465 CMI courses.

Coding Procedure
All studies were coded by the two authors independently. We first established the protocol that specified several features to be coded under the consideration of the current research purposes and previous systematic reviews [2,39]. The features include the basic information (e.g., authors, year of publication, title of the work, title of journal or edited book) and the moderators to be examined (e.g., discipline, research design, instruction time in English, publication type, and journal type in terms of English or Chinese). To ensure the reliability of the coding, the first author conducted three rounds of coding, after which the second author independently coded all the studies. While high agreement was achieved (98.33%), the disagreement was in the coding of the variable of instruction time in English. This variable was initially coded into four categories in terms of the percentage of instruction time in which English was used (50% or above, between 20% and 50%, below 20%, and unreported). Ostensibly, this seemed a low-inference variable, but it was found that many of the studies did not report actual or estimated percentage of time when English was used as the MI. After negotiation, we decided to reduce the coding into three categories (50% or above, below 50%, and unreported), and intercoder reliability was 100%.

Computing Effect Size
To examine the effectiveness of EMI on students' academic achievements, the standardized mean difference (Cohen's d) in academic achievements between students receiving EMI and CMI was used as the ES. The magnitude of effect sizes was gauged as small, medium, and large, with the values of Cohen's d being 0.20, 0.50, and 0.80, respectively [46].
Random-effects models were chosen for this meta-analysis. Random-effects models are based on the assumption that the true effect may vary across the studies included in the analysis, which contrasts with their counterparts' fixed-effects models that assume homogeneity of the true effect in all studies [47]. Random-effects models yield more conservative estimates and are generally more preferred over fixed-effects models [47]. Additionally, random-effects models are more appropriate for making generalized conclusions or unconditional inferences about the magnitude of the association between variables based on the meta-analytic findings [40].
Since the sample size was small, outliers were examined in that their presence would have a significant impact on the results. Following Li's [48] practice, outliers were detected by transforming the effect sizes from independent samples into z scores. Any resultant absolute values of the z scores larger than 2.0 indicated the presence of outliers. This procedure was performed in the analyses of both subject content and English learning achievements. Consequently, Zhang's [45] study was identified as an outlier in both analyses and removed from further analysis. Table 1 shows the coded characteristics of the studies included in this meta-analysis.

Assessing Publication Bias
Publication bias refers to the possibility that studies reporting unexpected or nonsignificant results are less likely to be published, resulting in the phenomenon that the published literature may not represent the whole picture on a research topic [40]. To address this issue, we included unpublished works such as dissertations or theses in the literature search [47]. In addition, we also assessed the potential impact of publication bias on the current meta-analysis in two ways. The first was to produce two funnel plots (one for subject content achievement and the other for English achievement), shown in Figures 1 and 2. The funnel plots show the distribution of effect sizes in relation to their standard error. As shown in Figures 1 and 2, the studies included were generally distributed symmetrically, which does not suggest publication bias [40]. Second, nonsignificant results were obtained from the Begg and Mazumdar rank correlation test

Differences in Subject Content and English Learning Achievements
The first research question concerned the differences in subject content and English achievements between students taking EMI and CMI courses. The results of the metaanalysis are shown in Table 2. It can be seen that in 43 independent samples (since Peng's [53] study did not examine the participants' subject content learning achievement, the comparison of the difference in content learning was conducted with 43 independent samples), EMI students performed better in subject content learning, with a medium and significant ES, d = 0.673, SE = 0.134; Z = 5.032, p < 0.001 (see Appendix A for the forest plot showing effect sizes for content subject achievements). The homogeneity test showed that these effect sizes were heterogeneous, Q(42) = 47.817, p < 0.001, T = 0.852, I 2 = 12.166%. This result justified the use of random-effects models and the need to explore the possible presence of moderators.

Differences in Subject Content and English Learning Achievements
The first research question concerned the differences in subject content and English achievements between students taking EMI and CMI courses. The results of the metaanalysis are shown in Table 2. It can be seen that in 43 independent samples (since Peng's [53] study did not examine the participants' subject content learning achievement, the comparison of the difference in content learning was conducted with 43 independent samples), EMI students performed better in subject content learning, with a medium and significant ES, d = 0.673, SE = 0.134; Z = 5.032, p < 0.001 (see Appendix A for the forest plot showing effect sizes for content subject achievements). The homogeneity test showed that these effect sizes were heterogeneous, Q(42) = 47.817, p < 0.001, T = 0.852, I 2 = 12.166%. This result justified the use of random-effects models and the need to explore the possible presence of moderators. It is also seen in Table 2 that in 12 independent samples whose English learning achievement was tested, EMI students had better achievement, with a large and significant ES, d = 1.583, SE = 0.344; Z = 4.598, p < 0.001 (see Appendix B for the forest plot showing effect sizes for English achievements). A heterogeneity of effect sizes was also detected, Q(11) = 41.911, p < 0.001, T = 1.345, I 2 = 73.754% At this juncture, it should be noted that the significant and medium summary effect size for subject content learning was unexpected and somewhat inconsistent with the general view on the effectiveness of EMI found in the literature. This unexpected result may be attributed to the fact that out of the 43 independent samples, 32 (74.42%) were from medical disciplines. The overrepresentation of medical disciplines may bias the findings from other disciplines. Hence, we performed another round of analyses on studies in disciplines other than medical disciplines. The results showed that these EMI students had lower performance than CMI students in subject content learning, but the difference was not significant, d = −0.064, 95% CI (−0.175, 0.046), Z = −1.143, p > 0.05. The EMI students had higher English achievement than their CMI counterparts, with a significant and medium ES, d = 0.572, 95% CI (0.429, 0.715), Z = 7.832, p < 0.000. The homogeneous test showed non-significant results for both achievement in academic subjects, Q(11) = 13.322, p > 0.05, T = 0.091, I 2 = 24.934%, and English learning, Q(5) = 3.596, p > 0.05, T = 0.000, I 2 = 0.000%. These results implied that studies from disciplines other than medical science did not seem to vary in their findings of the effectiveness of EMI. In other words, the studies in non-medical disciplines collected in this analysis may form a homogeneous group in terms of academic and English learning achievements.

Moderator Variables
To address the second research question, we assessed the effect sizes, 95% CI and between-group heterogeneity for the five moderators (discipline, research design, instruction time in English, publication type, and journal type) using the random-effects model [47]. The results for subject content learning achievement are shown in Table 3. As seen in Table 3, only one of the five variables (i.e., discipline) was a significant moderator, Q(2) = 14.190, p < 0.001. Specifically, the mean ES of studies in medical disciplines (d = 0.933) was significantly larger than those from other disciplines. Follow-up pairwise comparisons indicated that the mean ES in medical disciplines was considerably larger than that in other hard disciplines, Q(1) = 10.286, p < 0.001, and soft disciplines, Q(1) = 4.078, p < 0.05. No significant difference was found between studies in other hard disciplines and soft disciplines, Q(1) = 0.126, p > 0.05. The other four categorical variables did not significantly moderate the difference in subject content learning achievement between EMI and CMI students. Table 4 shows the results of moderator analysis for English achievement. Three moderators were found to be significant: discipline, Q(2) = 9.087, p < 0.05; design, Q(2) = 17.388, p < 0.001; and instruction time in English, Q(1) = 6.163, p < 0.05. Additionally, follow-up pairwise comparisons on discipline showed that the mean ES in medical disciplines was significantly larger than that in other hard disciplines, Q(1) = 6.307, p < 0.05, but not significantly larger than that in soft disciplines, Q(1) = 2.687, p > 0.05. No significant difference in mean ES was found between studies in other hard disciplines and soft disciplines, Q(1) = 0.847, p > 0.05. Post hoc pairwise comparisons on research design indicated that studies employing a randomized controlled trial yielded a significantly larger ES than non-experimental studies, Q(1) = 6.270, p < 0.05, and quasiexperimental studies, Q(1) = 9.783, p < 0.01. No significant difference was found between non-experimental studies and quasi-experimental studies, Q(1) = 3.763, p > 0.05. While instruction time in English was a significant moderator, half of the 12 studies did not report actual or estimated percentage of instruction time during which English was used. The percentage of instruction time in English was 50% or above in all the remaining six studies. Therefore, no further comparisons on this factor could be made between the six studies.

Academic Achievements
This study aimed to systematically analyze the effectiveness of EMI in the mainland Chinese context through synthesizing empirical evidence from 36 studies with 44 independent samples. It was found that EMI students performed significantly better in subject content learning than their CMI counterparts, which suggested that EMI was beneficial for subject content learning. This result was somewhat unexpected, given the many concerns raised about students' inability to comprehend EMI lectures on curriculum content [29]. However, this result echoed Joe and Lee's [32] finding that students' content knowledge increased over the EMI course despite their dissatisfaction with the course, suggesting a mismatch between students' perceived and actual performance of lecture comprehension. Further analysis showed that the significant effect of EMI on subject content learning was only identified among students in medical disciplines. Likewise, Tong et al.'s [39] systematic review also found that in studies showing a significant difference in learning outcomes favoring bilingual programs, the majority were from medical science. This may be partly explained by medical education in the current context. Many medical colleges in China offer seven-or eight-year programs that combine undergraduate and postgraduate degrees [80], which requires students to invest much more effort in their disciplinary studies. Additionally, it has been found that medical students' English matriculation scores signifi-cantly predicted their academic performance in college, which indicated the important role of English proficiency for medical students in this context [81]. The special prominence given to EMI in this field may be because this field has become a global enterprise entailing growing collaboration between medical schools in different countries, and thus, EMI is an essential means to the globalization of medical education [30,32]. That said, cautions are warranted when interpreting the current findings since the high number of medical-related studies included in this meta-analysis may not reflect the genuine effects of EMI on content learning in general domains.
The significant difference in subject content learning between EMI and CMI identified only with medical disciplines, a somewhat bewildering result, may lie in variations in the amount of English used across different disciplines. As can be found in Table 1, in the research articles from medical disciplines, the instruction time in English for 17 out of the 32 independent samples (17/32 = 53.13%) was not reported. This means that the EMI medical students' more significant gains in content knowledge learning may capitalize on 53.13% of the samples for whom English provision in class was unknown. It was likely that English was not the major MI in these studies, and thus, much fewer linguistic barriers were imposed on these students. In contrast, of the 11 independent samples from other disciplines, 8 samples (8/11 = 72.72%) received 50% English provision or above in class (see Table 1). The higher percentage of English provision in these studies in non-medical disciplines may demand higher levels of English proficiency from students and thus cause non-significant difference in content learning between EMI and CMI. Therefore, while the current finding of the beneficial effect of EMI on subject content learning may be consistent with students' and faculty members' favorable perceptions of EMI learning [10,13,15], it can only be interpreted within the medical disciplines in the current context and cannot be generalized to other disciplines. A case in comparison is Lo and Lo's [38] meta-analysis, which showed that EMI secondary students in Hong Kong performed significantly worse than the CMI group in science, history, and geography, while the difference in mathematics learning between the two groups was not significant. Hence, the non-significant difference in subject content learning favoring CMI students found in non-medical disciplines may still call into question the taken-for-granted positive effect of EMI on content knowledge learning. Nevertheless, the combination of the current findings and the literature [32,36] appears to suggest that EMI does not necessarily hamper students' content knowledge learning.
Compared to the mixed findings on the effectiveness of EMI on content learning, this meta-analysis revealed that EMI students obtained significantly higher achievement in English than their CMI peers. These findings were expected and consistent with the widely upheld beliefs that EMI can enhance English learning [9,10,13] and the results of Lo and Lo's [38] meta-analysis targeting secondary students in Hong Kong. It is reasonable that frequent exposure to English language input can facilitate students' English learning. Based on these findings, it may be tentatively concluded that EMI programs are beneficial for enhancing students' English learning in the current EFL context.

Moderators
This study also examined five factors that possibly moderated the differences in academic achievements between EMI and CMI students. It was found that discipline was the only significant moderator for subject content learning achievement. The studies in medical disciplines showed that EMI students performed significantly better than CMI students. In contrast, no significant difference was found between the two respective groups of students from other hard and soft disciplines. While the saliency of EMI in medical education has been addressed above, this result may also be accounted for from the perspective of future career development. Students in medical disciplines tend to engage in the medical profession and thus may expend much effort in acquiring their curriculum knowledge. In contrast, the four studies in other hard disciplines included in this meta-analysis (see Table 4) tested content learning achievement in EMI courses such as Physics [51], Data Structure and Algorithm or Photonics and Lasers [54], Mathematics [37], and Chemistry [61], most of which are foundation instead of professional courses in other hard disciplines. These students may, therefore, not expend as much effort as medical students. Regarding soft disciplines, it has been strongly argued that knowledge in this field is structured "through new interpretations, or negotiations of phenomena, between contemporaries", and academic literacy in these disciplines benefits more from using the local language rather than a foreign language as the medium of instruction [27] (p. 114). From this perspective, the significantly smaller ES found in soft disciplines than in medical disciplines seems plausible.
Unlike the case of content learning achievement, the difference in English learning achievement between EMI and CMI students was significantly moderated by discipline, research design, and instruction time in English. As indicated by the current results, EMI students in medical disciplines and soft disciplines appeared to benefit more in terms of English learning than students in other hard disciplines. As previously addressed, due to the pivotal consequences of their professional courses, medical EMI students may spend more time and effort on tackling English language problems to ensure comprehension, whereas EMI students in other disciplines may not expend as much effort on solving language difficulties in learning foundation courses that may not play decisive roles in their future professions. Therefore, discipline was a significant moderator for English learning achievement.
Research design and instruction time in English were also significant moderators. The most robust type, the randomized controlled trial, yielded the largest ES in the difference between EMI and CMI students. This result suggested the importance of adopting sound methodology when examining the effectiveness of EMI, which was also stressed by Tong et al. [39]. However, although instruction time in English was a significant factor, further inference about this factor's moderation was not possible because half of the included studies did not report the instruction time using English (see Table 4).
What should be noted is that many of the studies included in this meta-analysis displayed several methodological issues, such as a lack of transparency regarding the procedures of implementing EMI and the percentage of instruction time in English; a lack of information about the subjects (e.g., age, gender) and test instruments; and insufficient evidence regarding the reliability and validity of research instruments. Many of these issues identified in this meta-analysis mirrored the problems identified in Tong et al.'s [39] systematic review of EMI studies, such as "nonrandom assignment, group incomparability" or "a lack of information on its implementation" (p. 16). These issues jointly compromised the validity of the findings of those studies. Thus, these methodological issues largely prevented an unequivocal conclusion on the effectiveness of EMI. It would be fair to say that given the evidence synthesized in this analysis, more empirical evidence harnessed with methodologically sound approaches is needed before unequivocal conclusions could be made.
With the recognition of the abovementioned methodological pitfalls observed in this inquiry, implications may be proposed. First, overall, it seemed that EMI did not undermine subject content learning based on the current findings, which should send a reassuring message to EMI teachers and program designers. However, cautious procedures such as allocation of English instruction and English materials are still in order, particularly in non-medical disciplines where small mean differences between EMI and CMI students were found. In the cases where both teachers' and students' English proficiency seemed insufficient, the practice of translanguaging may be an alternative to ensure that the transmission of subject content is not compromised [28]. The use of students' L1 in class, as expected by EMI students [32], can be a source to assist students' English language learning. EMI teachers may also expose students to authentic English more frequently, for instance, through using videotaped lectures by proficient English speakers that are widely available today. Further, to secure the effectiveness of EMI programs, English language enhancement courses or workshops for in-service teachers are in need to prepare for potential EMI implementation [11,27]. Rubio-Alcalá and Mallorquín [82] presented a detailed framework for designing training activities for teachers in plurilingual programs. They accentuated the training of teachers' three dimensions of competencies-language, pedagogical, and emotional-and recommended activities taking such forms as workshops, team teaching, and reflective teaching. In addition, supplementing EMI courses with English-language support classes open for EMI students may also be an alternative worth exploring since teachers' expertise from the two fields could be complementary to each other in a way that brings out optimal learning opportunities for the students [10,16]. Meanwhile, although students' English score in the college entrance examination is often taken as the benchmark for registering EMI courses, scholars have challenged this practice and argued that continued language support should still be offered for students to maximize their gains in EMI courses [18,[20][21][22]. This can find compelling evidence in Sánchez-Pérez's recent study [83] that found the frequencies of students' use of certain disciplinary-literacy variables such as moves, cohesive devices, technical words, and passive voice were significant indicators of their content proficiency in laboratory reports written in English. Therefore, as stressed by Sánchez-Pérez [83], providing subject-specific language and literacy support should be highlighted in EMI programs. Overall, as Aizawa and Rose [16] pointed out, a better understanding of how teachers comprehend and act on macro-and meso-level policies and what contextual obstacles face stakeholders is required to assist the implementation of EMI in various disciplines.
To measure the effectiveness of EMI and pinpoint when and how EMI is facilitative to students' subject content learning and English learning, more robust research designs and longitudinal studies are needed. EMI research so far appears to prevail with studies relying on students' self-reports, which could not guarantee true effects in real life. As Macaro et al. [2] rightly stated, the absence of research using objective tests indicates that "any cost-benefit evaluation of EMI is inconclusive at best and impossible at worst" (p. 64). Further, since it takes time for changes induced by EMI to occur, longitudinal studies are invaluable to discern the impact of EMI on students' academic achievements. In particular, to reveal the associations between teachers' EMI practice and students' academic gains, contextualized data about EMI practice enacted in classrooms are highly needed and can be obtained through methods such as classroom observations. Future research is also expected to examine individual and contextual factors that may influence students' academic success in EMI programs, such as students' learning motivation, affects, and engagement in inside and out-of-class learning activities. In other words, the effectiveness of EMI may also rest on factors beyond the classroom setting.

Conclusions
This meta-analysis examined 36 studies involving 44 independent samples that employed statistical methods comparing differences in subject content and English learning achievements between EMI and CMI students in the mainland Chinese context. It was found that EMI students generally showed significant higher academic performance than their CMI peers. However, when the samples from medical disciplines were set aside, there was no significant difference in subject content learning between the two groups, although the EMI students still had significantly higher achievement in English learning, which is probably because of the higher initial English proficiency needed for EMI students to meet registration requirements. Given the methodological insufficiency identified in the collected studies, caution is advised with regard to interpreting the validity of these findings, and more empirical evidence is needed.
EMI has become a trend and been implemented as a pedagogical strategy in many regions and countries worldwide in the quest for the internationalization and globalization of higher education. In EFL contexts, top-down policies have been issued to promote EMI, especially in HEIs. Although a large body of research has explored stakeholders' such as students' and teachers' perceptions of EMI and its perceived effectiveness, studies employing objective measures are still scarce. There are even fewer meta-analyses that assess the effectiveness of EMI drawing on effect sizes synthesized from empirical evidence.
This meta-analysis, albeit collecting studies conducted in the Chinese context, is directly pertinent to other contexts where EMI has continued to be valued for promoting both content and English language learning. The obstacles and challenges faced by EMI students reported in other contexts [16,19,27,32] are likewise found to be present in the Chinese context [7,11,28]. Therefore, the collective empirical evidence stemming from this meta-analysis can offer a sound frame of reference for policymakers and educators to weigh various factors at the macro, meso, and micro levels [16,17] when planning to adopt EMI as a pedagogical strategy. Findings of this study have also accentuated the importance of assessing the effectiveness of EMI using objective and rigorous measurements.

Acknowledgments:
The authors would like to sincerely thank Miao Li for her generous sharing of data values which were necessary and significant to this study.

Conflicts of Interest:
The authors declare no conflict of interest.    Figure A2. Forest plot showing effect sizes for English achievement.