A Meta-Analysis of the Cognitive, Affective, and Interpersonal Outcomes of Flipped Classrooms in Higher Education

This paper aims to quantify the effects of flipped classrooms in higher education by reviewing 43 empirical studies of students’ cognitive, affective, and interpersonal outcomes. The innovative pedagogy of a flipped classroom in higher education fosters a sustainable, interactive, and student-centered learning environment (as opposed to the traditional lecture style, in which there is little room for interaction). This study’s results show the positive effects of flipped classrooms and highlight the improvement in students’ educational outcomes between 2012 and 2017. Overall, effect sizes were medium—effect size (ES) = 0.35, 95% confidence interval (CI) = 0.24 to 0.47—across three outcome domains using a random effects model. In the outcomes, affective (ES = 0.59), interpersonal (ES = 0.53), and cognitive (ES = 0.24) domains were of a higher order than the effect sizes. However, the results indicated that flipped classrooms benefitted students studying chemistry, engineering, mathematics, and physics less than they did students studying other subjects.


Introduction
The flipped classroom is an innovative instructional model that is gaining popularity in higher education because it provides active and student-centered learning and enhances students' educational outcomes [1]. Rahman, Mohamed, Aris, and Zaid [2] state that flipped classrooms were initially introduced in college-level technology classes. In the flipped classroom, students study instructional materials before class, typically online lectures, and apply what they learned in in-class activities [3]. Unlike teacher-centered teaching (e.g., the traditional college lecture style), flipped classrooms provide students with engaging, interactive learning experiences in which they can develop complex reasoning, written communication, and critical thinking skills [4].
The needs of students and society often evolve faster than traditional teaching methods. Thus, there is an urgent need to reconstruct college education [5]. An increasing number of stakeholders, including students and instructors, see the traditional, teacher-centered lecture style as obsolete. Consequently, universities are responding by developing, systematizing, and delivering courses and programs in new and innovative ways, which they hope will engage students as well as meet their educational needs and demands. However, transitioning from traditional lecture-based learning to a new classroom model requires a paradigm shift from teacher-centered to student-centered learning [6]. Although some scholars debate about whether the dichotomy of lectures versus active learning is meaningful in today's higher education classrooms [7,8], this paper assumes that flipped classrooms represent a different instructional model that can complement, rather than replace, traditional approaches to education.
Flipped classrooms have been shown to improve student motivation [24], student satisfaction [21,25], and confidence [21]. However, some studies have shown that flipped classrooms had a negative impact on students' satisfaction and attitudes [16,26].
Interpersonal outcomes refer to learning that aims to improve student action and performance, including interaction and engagement (e.g., active learning). Flipped classrooms have been found to improve student-teacher interaction, student engagement, student-to-student interaction, individual education, active learning, and debate competence [6,21,27].

Negative Outcomes of Flipped Classrooms
Not all studies on flipped classrooms report positive results. Some report mixed or negative results. Ryan and Reid [28] demonstrated that low-achieving students in flipped classrooms performed better on exams. However, Jensen, Kummer, and Godoy [16] indicated that flipped classrooms did not improve student performance outcomes regardless of whether students were high achievers or low achievers. Missildine, Fountain, Summers, et al. [26] showed that introducing flipped classrooms improved learning gains but did not improve students' satisfaction. Lucke [29] indicated that students enjoyed their flipped classes but showed no improvement in cognition and understanding. Vliet, Winnips, and Brouwer [18] pointed out that positive learning gains from flipped classroom environments were only temporary.
Few meta-analyses exist on the effects of flipped classrooms. Further, there is little empirical evidence regarding flipped classrooms' utility in improving student performance in higher education [30]. This study is the first to examine the effects of flipped classrooms in higher education using a meta-analysis.

Research Problem
This study conducts a meta-analysis to explore the effects of flipped classrooms on cognitive, affective, and interpersonal educational outcomes. The meta-analysis synthesizes the effects of flipped classrooms in higher education and attempts to answer the following research questions: (a) what is the overall effect of the flipped classroom approach in the context of higher education? (b) What outcome variables have the most influence on measurable flipped classroom effect size? And (c) are any effects of the flipped classroom approach moderated by studies' characteristics or variables (e.g., department, subject area, and publication year)?

Method
Meta-analysis involves formulating a problem, collecting data, coding data, analysis, and interpretation [31]. This study's meta-analysis followed the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analysis) guidelines [32].

Literature Search
This paper examines journal articles and dissertations about flipped classrooms in the context of higher education that were published between 2012 and 2017. The authors searched five electronic databases for empirical articles: The Education Resources Information Center (ERIC), PROQUEST, Web of Science, PsychInfo, and Google Scholar. To capture a range of potential eligible studies, we employed the following search keywords in titles and abstracts: "flipped classroom," "flipped class," "flipped learning," "inverted class," "inverted classroom," "smart learning," and "blended learning." The authors found forty-three meaningful studies that met the study's inclusion and exclusion criteria ( Figure 1).

Inclusion and Exclusion Criteria
Studies with the following features met this study's inclusion and exclusion criteria: they must be quantitative studies on student learning or reasoning processes in flipped classrooms; they must provide sufficient information to calculate effect sizes; they must define the flipped classroom approach as including the use of video or audio materials before class and featuring in-class activities; they must compare flipped classrooms' effects with those of traditional classrooms; they must feature students in higher education settings; they must have been published between January 2012 and June 2017; and they must be an empirical, peer-reviewed journal article or dissertation.

Coding Studies
The data were extracted from studies that met the inclusion criteria ( Table 1). The studies' characteristics were coded as possible moderating variables to investigate the variance of flipped classrooms' effects. Two researchers independently coded each study. We developed a coding manual to maintain reliability of the coding procedures, which included study characteristics, effect size calculation, and report characteristics. Discrepancies between the two coders were resolved prior to data analysis without exception and were resolved by an independent third expert if no agreement could be reached between the two coders.

First Author
Year Publication Effect Size

Inclusion and Exclusion Criteria
Studies with the following features met this study's inclusion and exclusion criteria: they must be quantitative studies on student learning or reasoning processes in flipped classrooms; they must provide sufficient information to calculate effect sizes; they must define the flipped classroom approach as including the use of video or audio materials before class and featuring in-class activities; they must compare flipped classrooms' effects with those of traditional classrooms; they must feature students in higher education settings; they must have been published between January 2012 and June 2017; and they must be an empirical, peer-reviewed journal article or dissertation.

Coding Studies
The data were extracted from studies that met the inclusion criteria ( Table 1). The studies' characteristics were coded as possible moderating variables to investigate the variance of flipped classrooms' effects. Two researchers independently coded each study. We developed a coding manual to maintain reliability of the coding procedures, which included study characteristics, effect size calculation, and report characteristics. Discrepancies between the two coders were resolved prior to data analysis without exception and were resolved by an independent third expert if no agreement could be reached between the two coders.

Computation of Effect Sizes
The effect size of this meta-analysis includes three different data formats: treatment vs. control group design, pre-post design, and standardized mean change difference (pre-post measure with both treatment and control group), where the pooled estimate of standard deviation was used to consider different sample sizes between flipped and non-flipped classroom groups. All effect sizes were calculated using the Comprehensive Meta-Analysis (CMA) program to estimate a mean effect size [67]. Effect sizes were reported as positive when flipped classroom students performed better than students in the control groups. The effect size was evaluated as follows: 0.20 = small effect, 0.50 = medium effect, and 0.80 = large effect [68].

Combining Effect Sizes
We employed a two-step process to synthesize the effects of flipped classroom outcomes. First, it calculated the effect size and variance of each outcome in the primary study. Second, it calculated the weighted mean effect size (ES) using inverse variance weight. To select its analysis model, the study conducted a homogeneity test using two measures of variability: Q and I 2 . The Q test examined whether the variability in an average weighted ES exceeds sampling error alone [69]. I 2 is an alternative measure of homogeneity, which is less sensitive to sample size than Q. I 2 shows whether the proportion of the observed variance reflects differences in true effect sizes [67]. To evaluate I 2 statistics, this study followed Higgins and Green's [70] guidelines: 0% to 40% might not be important; 30% to 60% may represent moderate heterogeneity; 50% to 90% may represent substantial heterogeneity; and 75% to 100% may represent considerable heterogeneity. The null hypothesis of the homogeneity test was that all outcomes came from the same population. If homogeneous, this study used a fixed effects model that had a common effect size and only considered sampling variance. If heterogeneous, this study used a random effects model that had no common effect size and considered sampling variance and true difference between studies [71]. Based on the homogeneity test and investigation of flipped classroom primary studies, this study used random effect models to synthesize the main effects and sub-group analyses.

Publication Bias
Publication bias happens when the results of published studies are different from the results of unpublished studies because studies with positive results, large effects, and large sample sizes are overrepresented in the literature [67,72]. To examine publication bias, this study adopts a funnel plot, exploring symmetrical distributions around the weighted mean effect sizes [73]. Funnel plots are scatter plots of effect sizes from studies in the meta-analysis, where the horizontal axis represents effect sizes and the vertical axis represents standard errors [72]. An asymmetrical pattern in the results of the funnel plot indicates a possible publication bias.

Analyzing Variances in Effect Sizes Across Studies
Finally, this study examined the variances in the effect sizes using sub-group analysis and meta-regression [74]. Meta-analyzers should prove whether the effect sizes are homogenous in order to calculate the overall effect size in a meta-analysis. This study used homogeneity test results to select an analysis model and decide whether reviewers would perform a sub-group analysis. Q-statistics were used to assess the heterogeneous structure of the average effect sizes. When the Q statistic is significant (p < 0.05), it suggests that the studies in the meta-analysis are heterogeneous effects. A random effects model was adopted to calculate the overall effect size in this study. The homogeneity calculation formula is as follows: where w i = 1/v(g i ) and w i is an inverse variance weight. The Q statistic is used to determine whether the primary results are homogeneous for subgroup analysis. The magnitude of effect sizes interpreted 0.2 as small, 0.5 as medium, and 0.8 as large according to Cohen's rule of thumb [68].

Dependence
This meta-analysis included a total of 43 studies and 218 effect sizes. When a primary study has more than one effect size, reviewers should explain the assumption of independence because multiple effect sizes have dependence within the study. To maintain the assumption of independence, the reviewers should select only one effect size per study, which will cause information loss. To keep multiple effect sizes within the study, this choice will cause a violation of independence assumption. To avoid this violation, this study adopted the "shifting unit of analysis" method [75]. This method proposes a compromise between the issues of information loss and violation of independence assumptions. To calculate the overall effect size, "study" will be used as an analysis unit to determine the independence assumption. To perform sub-group analysis, the effect size of each sub-group will be used as a unit of analysis.

Results
As mentioned earlier, the 43 studies included in the meta-analysis synthesized a total of 218 effect sizes: an average of 5.1 effect sizes per study. As multiple effect sizes existed within studies, the reviewers considered the dependence of effect sizes in each study. Figure 2 shows the study characteristics for all 43 studies, including effect size (i.e., standard difference in means), standard error, variance, confidence interval, Z-value, and p-value in a forest plot. Black squares in the forest plot's horizontal lines show the effect size of an individual study, and the horizontal lines indicate the confidence interval for each estimate. The small diamond shape at the bottom represents the overall effect size of all studies. According to the forest plot, the smallest effect size value is −0.933, and the highest effect size value is 1.666. Thirty-nine studies had positive effect sizes, while four had negative effect sizes. Consequently, the implementation of flipped classrooms had a significant effect in 39 of the 43 studies.
information loss. To keep multiple effect sizes within the study, this choice will cause a violation of independence assumption. To avoid this violation, this study adopted the "shifting unit of analysis" method [75]. This method proposes a compromise between the issues of information loss and violation of independence assumptions. To calculate the overall effect size, "study" will be used as an analysis unit to determine the independence assumption. To perform sub-group analysis, the effect size of each sub-group will be used as a unit of analysis.

Results
As mentioned earlier, the 43 studies included in the meta-analysis synthesized a total of 218 effect sizes: an average of 5.1 effect sizes per study. As multiple effect sizes existed within studies, the reviewers considered the dependence of effect sizes in each study. Figure 2 shows the study characteristics for all 43 studies, including effect size (i.e., standard difference in means), standard error, variance, confidence interval, Z-value, and p-value in a forest plot. Black squares in the forest plot's horizontal lines show the effect size of an individual study, and the horizontal lines indicate the confidence interval for each estimate. The small diamond shape at the bottom represents the overall effect size of all studies. According to the forest plot, the smallest effect size value is −0.933, and the highest effect size value is 1.666. Thirty-nine studies had positive effect sizes, while four had negative effect sizes. Consequently, the implementation of flipped classrooms had a significant effect in 39 of the 43 studies.   Thus, all studies in the analysis did not share a common effect size, which means the null hypothesis of the homogeneity test can be rejected. We used the random effects model to estimate the overall effect size and compare sub-group differences using the study characteristics (e.g., outcome variables, report characteristics variables, and study characteristics variables). The results of the homogeneity test show that the effect sizes are heterogeneous ( Table 2). The results of the random effects model analysis are displayed in Table 3. The overall effect size of flipped classrooms was 0.35, indicating that flipped classrooms had a medium effect in terms of the Cohen's rule of thumb [68]. The effect size showed an overall significant difference in outcomes from flipped classrooms and traditional lecture-based classrooms in higher education (ES = 0.35, 95% CI = 0.24 to 0.47).

Outcomes of Flipped Classroom (Research Question 2)
This meta-analysis used a random effects model to investigate the differences between sub-groups, as the results from each sub-group were heterogeneous. The categorical variables are as follows: outcome domains (cognitive, affective, and interpersonal), department, subject, data format, and publication status. We conducted a meta-regression analysis using publication year as a covariate. In the random effects categorical analysis by outcome, shown in Table 4, the results of implementing flipped classrooms varied. In the outcomes, the respective effect sizes of affective (ES = 0.59), interpersonal (ES = 0.53), and cognitive (ES = 0.24) domains were in descending order.
In the context of higher education, flipped classrooms appear to have more significant effects on students' affective and interpersonal outcomes than on their cognitive outcomes. Regarding affective outcomes, students' immersion (ES = 1.

Effects of Characteristics (Research Question 3)
Tables 5 and 6 list the effect sizes measured by this study, separated by department and subject area.  This study investigated a variety of subject areas to determine whether the flipped classroom approach is more beneficial in some contexts or subjects than it is in others  In the primary studies reviewed in this research, the data are generally represented in three different formats: pre-post design, treatment vs. control group design, and pre-post with treatment vs. control group (standardized mean change difference). The effect sizes for each type are as follows: treatment vs. control, ES = 0.25 (95% CI = 0.21 to 0.28), pre-post design, ES = 0.38 (95% CI = 0.35 to 0.42), and standardized mean change difference, ES = 0.47 (95% CI = 0.41 to 0.53). The difference was not small, and study design may factor into this difference in effect sizes. Regarding publication type, the effect size of dissertations (ES = 0.61, 95% CI = 0.54 to 0.68) was larger than the effect size of journal articles (ES = 0.29, 95% CI = 0.26 to 0.31), but the difference was not significant (Table 7). Regarding year of publication, this study conducted a meta-regression analysis in which the regressing effect sizes of flipped classrooms on year of publication served as a moderator. The slope of the meta-regression by publication year is negative overall, but it is statistically significant (Table 8) and has a significant moderating effect on the relationship between flipped classrooms and a study's year of publication.

Publication Bias
The funnel plot (Figure 3) shows the symmetry of effect size distribution in the mean effect size whether publication bias in the overall effect size exists, providing no evidence for publication bias. This meta-analysis shows no missing studies and finds no imputations of effect size for publication bias.
Educ. Sci. 2020, 10, x FOR PEER REVIEW 11 of 17 The funnel plot (Figure 3) shows the symmetry of effect size distribution in the mean effect size whether publication bias in the overall effect size exists, providing no evidence for publication bias. This meta-analysis shows no missing studies and finds no imputations of effect size for publication bias.

Discussion
This study conducted a meta-analysis of the effects of flipped classrooms on students' cognitive, affective, and interpersonal outcomes in higher education. It extends the discussions and findings from recent meta-analyses that found that flipped classrooms had a significant effect on students' cognitive outcomes in higher education: for example, by improving their test scores, grade, knowledge, skills, and self-directed learning (e.g., [9,76,77]). This study expands the evidence for flipped classroom effectiveness in improving college students' academic outcomes as compared to traditional, lecture-based classrooms.
The first research question was regarding the overall effect of flipped classrooms on students' cognitive, affective, and interpersonal outcomes. The study found that flipped classrooms had a medium effect on academic outcomes; the average scores of students in flipped classrooms were 0.35 standard deviations above the average scores of students in traditional, lecture-based classrooms. It also confirmed the results of previous, related studies (e.g., ES = 0.36 [3]; ES = 0.35 [9]; ES = 0.53 [77]; ES = 0.21 [78]). In short, its findings demonstrate that flipped classrooms can improve college students' academic outcomes in various ways, could provide an effective way to inculcate essential 21st-century skills in students [79], and may assist students with special educational needs in performing better than they would in traditional, lecture-based classrooms.
The second research question was regarding the outcomes influenced by the introduction of the flipped classroom method. The overall effect sizes of the affective outcomes (ES = 0.59, SE = 0.03, 95% CI = 0.53 to 0.65]), interpersonal outcomes (ES = 0.53, SE = 0.31, CI = 0.47 to 0.59), and cognitive outcomes (ES = 0.24, SE = 0.24, 95% CI = 0.19 to 0.36) were the descending order of the overall effect sizes. This study's results suggest that flipped classrooms improve college students' cognitive, affective, and interpersonal outcomes and that flipped classrooms have more significant effects on affective and interpersonal outcomes than on cognitive outcomes. This result can be explained by the features of the flipped classroom that encourage active engagement and learner-centered interactions. Furthermore, this study's findings indicate that flipped classrooms indirectly affect cognitive outcomes because affective outcomes have a strong influence on cognitive outcomes [23], in part by improving students' motivation and willingness to learn [80]. However, affective outcomes

Discussion
This study conducted a meta-analysis of the effects of flipped classrooms on students' cognitive, affective, and interpersonal outcomes in higher education. It extends the discussions and findings from recent meta-analyses that found that flipped classrooms had a significant effect on students' cognitive outcomes in higher education: for example, by improving their test scores, grade, knowledge, skills, and self-directed learning (e.g., [9,76,77]). This study expands the evidence for flipped classroom effectiveness in improving college students' academic outcomes as compared to traditional, lecture-based classrooms.
The first research question was regarding the overall effect of flipped classrooms on students' cognitive, affective, and interpersonal outcomes. The study found that flipped classrooms had a medium effect on academic outcomes; the average scores of students in flipped classrooms were 0.35 standard deviations above the average scores of students in traditional, lecture-based classrooms. It also confirmed the results of previous, related studies (e.g., ES = 0.36 [3]; ES = 0.35 [9]; ES = 0.53 [77]; ES = 0.21 [78]). In short, its findings demonstrate that flipped classrooms can improve college students' academic outcomes in various ways, could provide an effective way to inculcate essential 21st-century skills in students [79], and may assist students with special educational needs in performing better than they would in traditional, lecture-based classrooms.
The second research question was regarding the outcomes influenced by the introduction of the flipped classroom method. The overall effect sizes of the affective outcomes (ES = 0.59, SE = 0.03, 95% CI = 0.53 to 0.65]), interpersonal outcomes (ES = 0.53, SE = 0.31, CI = 0.47 to 0.59), and cognitive outcomes (ES = 0.24, SE = 0.24, 95% CI = 0.19 to 0.36) were the descending order of the overall effect sizes. This study's results suggest that flipped classrooms improve college students' cognitive, affective, and interpersonal outcomes and that flipped classrooms have more significant effects on affective and interpersonal outcomes than on cognitive outcomes. This result can be explained by the features of the flipped classroom that encourage active engagement and learner-centered interactions. Furthermore, this study's findings indicate that flipped classrooms indirectly affect cognitive outcomes because affective outcomes have a strong influence on cognitive outcomes [23], in part by improving students' motivation and willingness to learn [80]. However, affective outcomes (e.g., attitudes and satisfaction) in the flipped classroom are not necessarily positive in higher education. This study's results regarding the high effect sizes of interpersonal outcomes in flipped classrooms are consistent with the results of Shi, Ma, Macleod, et al. [77]. Further, the results can be explained by the instructors' tendency to design active in-class activities in flipped classrooms to increase student participation and interaction [61] through discussion, small group activities, feedback, group discussion, collaborative group work, and group projects [81]. These active, in-class activities enhance students' interpersonal skills and encourage them to become active and self-directed learners who are deeply involved in the learning process [82,83].
This study's third research question addressed the effects of study characteristics on how the effect sizes of flipped classrooms were measured. To answer this question, the study performed subgroup analyses using subject area, department, publication year, and study design as moderators. These moderators accounted for a small amount of the relatively large levels of heterogeneity between studies. The results indicated that flipped classrooms can be applied in a variety of subject areas and still effectively improve educational outcomes, as discussed in Rahman, Mohamed, Aris, and Zaid [2]. Although instructors' individual approaches can influence the success of flipped classrooms, this study found that English, Engineering, Math, Physics, and Chemistry classrooms showed small effect sizes. These results are in line with other meta-analyses of flipped classrooms (e.g., [3,78]).
Regarding publication bias and publication type, this study found that the primary literature on flipped classrooms did not indicate publication bias, even though dissertations (ES = 0.61) had a greater effect size than journal articles (ES = 0.29). This study also performed a funnel plot to examine the possibility of publication bias but did not find evidence for publication bias. Thus, publication type can be treated as a moderator in future flipped classroom interventions.

Limitations and Future Research Directions
This meta-analysis has several limitations. First, the meta-analysis gains ecological validity by including only quantitative field studies (experimental or quasi-experimental research), which examine whether the study results can be generalized to real-life settings. However, some internal validity relative to more controlled laboratory studies is sacrificed: for example, randomized controlled trials [84]. Second, this meta-analysis includes only quantitative findings despite the fact that there are many flipped classroom studies that employ qualitative research methods [31,85]. Because this study excluded qualitative studies from its analysis, its results should be interpreted with caution. Qualitative findings help researchers arrive at deeper understandings [86] and generate new knowledge [87]. Some studies show that flipped classrooms have been particularly effective among the learner demographic [28] because low achievers require more interaction and motivation to attain good learning outcomes. We recommend and encourage researchers to implement flipped classrooms with various student bodies in a variety of academic settings to better define the degree to which these results are transferrable [16].
The flipped classroom is not a panacea, and its effectiveness depends in large part on whether students actually use the available pre-class time effectively [30]. We therefore propose repeated use of flipped classrooms and related, modified strategies on a trial-and-error basis. Ratta [6] insisted that flipped classroom instruction is congruent with today's digital-savvy college student; moreover, it is also important to understand the various influences of today's student culture, study style, study habits, and use of devices. Further study may be warranted to allow more detailed conclusions about student performance to be drawn [88].

Conclusions
This study synthesized the results of 43 studies regarding the effects of flipped classrooms on students' cognitive, affective, and interpersonal outcomes in higher education. It examined the overall effect sizes of flipped classrooms compared to traditional, lecture-based classrooms and found that flipped classrooms had a medium effect on various student learning outcomes. Particularly, the study identified that the flipped classroom shows a more significant effect on affective and interpersonal outcomes than on cognitive outcomes. This result can be explained by the features of the flipped classroom that encourage active engagement and learner-centered interactions. Instructors and other educational leaders in higher education institutions can pursue instruction redesigns and educational supports to implement flipped classrooms as an effective pedagogical practice. Additionally, the mixed results of adopting the flipped classroom instruction in departments and subjects show that various instructional forms and strategies are factors that determine the effectiveness on educational outcomes. Thus, future research must explore the relationship between various forms of flipped classrooms and educational outcomes to arrive at pedagogical decisions for instructional development.