Good Statistical Practices for Contemporary Meta-Analysis: Examples Based on a Systematic Review on COVID-19 in Pregnancy

: Systematic reviews and meta-analyses have been increasingly used to pool research ﬁndings from multiple studies in medical sciences. The reliability of the synthesized evidence depends highly on the methodological quality of a systematic review and meta-analysis. In recent years, several tools have been developed to guide the reporting and evidence appraisal of systematic reviews and meta-analyses, and much statistical effort has been paid to improve their methodological quality. Nevertheless, many contemporary meta-analyses continue to employ conventional statistical methods, which may be suboptimal compared with several alternative methods available in the evidence synthesis literature. Based on a recent systematic review on COVID-19 in pregnancy, this article provides an overview of select good practices for performing meta-analyses from statistical perspectives. Speciﬁcally, we suggest meta-analysts (1) providing sufﬁcient information of included studies, (2) providing information for reproducibility of meta-analyses, (3) using appropriate terminologies, (4) double-checking presented results, (5) considering alternative estimators of between-study variance, (6) considering alternative conﬁdence intervals, (7) reporting prediction intervals, (8) assessing small-study effects whenever possible, and (9) considering one-stage methods. We use worked examples to illustrate these good practices. Relevant statistical code is also provided. The conventional and alternative methods could produce noticeably different point and interval estimates in some meta-analyses and thus affect their conclusions. In such cases, researchers should interpret the results from conventional methods with great caution and consider using alternative methods.


Introduction
Systematic reviews and meta-analyses have been widely used to synthesize results from multiple studies on the same research topic in medical sciences [1,2]. The reliability of the synthesized evidence depends critically on appropriate methods used to perform meta-analyses [3,4]. However, despite the mass production of meta-analyses, it has been found that many meta-analyses need improvements in their methodological quality [5][6][7][8][9][10]. This is a particularly crucial issue in the COVID-19 pandemic because of the concerns about the expedited peer-review process [11][12][13][14].
This article uses a systematic review on COVID-19, recently published in The BMJ, to illustrate some good practices for performing a meta-analysis from statistical perspectives. Many non-statistical recommendations and quality assessments for a systematic review and meta-analysis can be found in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklists [3,15,16], the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approaches [17,18], the AMSTAR (A MeaSurement Tool to Assess systematic Reviews) tools, etc. [19,20]. In recent years, metaanalyses have begun to adopt these non-statistical recommendations, but there is still much room for improvement in terms of statistical analyses.
For example, several papers pointed out that the well-known statistical method for the random-effects meta-analysis proposed by DerSimonian and Laird [21] is suboptimal [22][23][24]. Various methods with potentially better performance are available and can be readily implemented with various statistical software programs [25][26][27][28][29]. Nevertheless, the DerSimonian-Laird (DL) method continues to dominate contemporary meta-analyses [9]. Some popular software programs for meta-analysis (e.g., Review Manager) use the DL method as the default and perhaps the only option.
In this article, based on the aforementioned systematic review on COVID-19, we aim at exploring potential issues when using statistical methods for its meta-analyses and illustrating potential better alternatives. Reproducible code for all analyses is provided. We hope these materials will help practitioners accurately use appropriate statistical methods to perform high-quality meta-analyses in the future.

Case Study
We use the data of meta-analyses reported by Allotey et al. [30] as our examples. This study conducted a living systematic review, which will be updated periodically to incorporate evidence from new studies. We use the version of update 1 of the original article published on 1 September 2020. The systematic review identified a total of 192 studies and performed multiple meta-analyses to investigate the prevalence, clinical manifestations, risk factors, and maternal and perinatal outcomes in pregnant and recently pregnant women (henceforth, pregnant women) with COVID-19. We select this systematic review for illustrations due to several considerations. It deals with the important research topic of COVID-19, where the appropriate use of statistical analyses is particularly crucial for timely and accurate decision-making. Also, this review covers a wide range of meta-analysis settings; the included meta-analyses had diverse outcomes, types of studies (non-comparative and comparative), numbers of studies, sample sizes, extents of heterogeneity, etc.
This article uses three meta-analyses from this systematic review to illustrate several statistical advances. The first two meta-analyses synthesize comparative studies; their outcomes are fever and cough in pregnant women compared with non-pregnant women of reproductive age with COVID-19. Each meta-analysis contains 11 studies. The original meta-analysis on fever yielded a pooled odds ratio (OR) of 0.49 with 95% confidence interval (CI) (0.38, 0.63) and I 2 = 40.8% suggesting moderate heterogeneity. The original meta-analysis on cough yielded a pooled OR of 0.72 with 95% CI (0.50, 1.03) and I 2 = 63.6% suggesting moderately high heterogeneity. Overall, pregnant women with COVID-19 were less likely to have fever and cough than non-pregnant women with COVID-19. The association with fever was statistically significant, while that with cough was not. For illustrative purposes, Figure 1 shows the forest plot of the meta-analysis on cough.  The third meta-analysis combines non-comparative data from 60 studies to obtain a pooled prevalence of COVID-19 in pregnant women; Figure 2 presents its forest plot. The original analysis gave a pooled prevalence of 7% with 95% CI (5%, 8%) and I 2 = 98.0% suggesting extremely high heterogeneity.

Providing Sufficient Information of Included Studies
Meta-analysts should provide sufficient information of included studies so that peer reviewers and other researchers could reproduce the meta-analyses and validate the

Providing Sufficient Information of Included Studies
Meta-analysts should provide sufficient information of included studies so that peer reviewers and other researchers could reproduce the meta-analyses and validate the results. The PRISMA statement and its extensions give comprehensive overviews of the reporting of meta-analyses [15,16,31,32]; meta-analysts are advised to carefully follow these guidelines for general purposes. Here, we focus on the reporting from statistical perspectives; the non-statistical parts (e.g., study selection) are not discussed, while they are equally critical for validating meta-analyses. The statistical data from individual studies can be feasibly provided in meta-analyses of aggregate data. However, this practice may be challenging for meta-analyses of individual participant data (IPD), which could involve concerns about data privacy. In such situations, meta-analysts may provide detailed procedures with other researchers to apply for access to the de-identified participant-level data. In the following, we restrict the discussions to meta-analyses of aggregate data.
In all three examples, the meta-analyses use aggregate data, i.e., the number of subjects with fever or cough and the sample sizes of pregnant and non-pregnant women in the comparative studies, and the number of cases of COVID-19 and the sample size of pregnant women in the prevalence data. These aggregate data are transparently provided by Allotey et al. [30], displayed in the corresponding forest plots; see, e.g., Figures 1 and 2. With these data available, we can reproduce the results, such as the prevalence and OR, of each individual study. They also permit us to employ alternative meta-analysis methods (detailed later).

Providing Information for Reproducibility of Meta-Analyses
In addition to the information of individual studies, reproducibility of meta-analyses also requires transparency in the statistical analyses, including the choice of measures for quantifying the study results, models for pooling the individual-study data, methods for assessing heterogeneity between studies and small-study effects, software program and its version used for performing the analyses, as well as subgroup analyses and sensitivity analyses (if applicable).
For example, Allotey et al. [30] specified that the OR was used for pooling the comparative dichotomous data with random-effect models. If comparative continuous data are needed to be pooled with dichotomous data, the standardized mean difference was used as the effect measure of the continuous data and was transformed to the log OR using the method by Chinn [33]. For the prevalence data, the Freeman-Tukey double-arcsine transformation was applied to the proportion estimate from each study to stabilize its sample variance [34]. The authors used the I 2 statistic to assess heterogeneity [35,36], but they did not assess small-study effects or publication bias. When the random-effects model is used, it is also critical to specify the estimator of the between-study variance, which is a key parameter in this model and could greatly affect the pooled results, particularly 95% CIs. The authors only specified that the DL method was used for pooling prevalence data but did not specify that for pooling comparative data. We have reproduced their meta-analyses of comparative data and found that the DL method was also used for comparative data. The original meta-analyses were all performed with Stata 16, which is widely used in the current literature of meta-analyses.

Using Appropriate Terminologies
Based on our knowledge, it was not uncommon that some inappropriate terminologies were used for meta-analysis methods. For example, in the systematic review by Allotey et al. [30], the prevalence was incorrectly referred to as "rate ratio" in the meta-analyses of prevalence (Figures 2 and 3 in the original article). As its name suggests, the rate ratio is a ratio of incidence rates for comparative studies, while the prevalence (or proportion) is a type of non-comparative data. The incidence rate also includes certain time elements (e.g., person-year), while the prevalence does not include such elements.
Besides these minor issues in this case study, Ioannidis [37] explored the problem of massive citations in detail. Additional examples include referring to the forest plot [38] as "Forrest plots", "honoring the nonexistent Dr. Forrest," and the funnel plot [39,40]. for assessing small-study effects as "Beggar's funnel plot," "apparently copy-pasting from some original source(s) that mistyped Colin Begg's funnel plot." Moreover, the commonly used Q test for heterogeneity is often referred to as the "Cochrane's Q" or "Cochran's Q." The former wrongly relates the Q test to the Cochrane Collaboration. The latter is used in many meta-analyses due to the paper by William G. Cochran [41], although it was not designed for testing for heterogeneity in Cochran's original work [42].
In order to use appropriate statistical methods for a meta-analysis, the first step is to specify their names correctly. When referring to certain meta-analysis methods, we suggest researchers always reading and citing the original methodological articles or tutorials that proposed, introduced, or reviewed the methods.

Double-Checking Presented Results
A meta-analysis has the power to yield more precise results than individual studies, but it could also inherit potential research errors from individual studies. It may be difficult to discover and correct the errors hidden in individual studies. The potential erroneous results from suspectable studies could be removed from the meta-analysis, or sensitivity analyses could be conducted to evaluate such studies' impact on the pooled results.
Additional errors could occur when pooling the individual studies; researchers should try their best to avoid such errors when inputting the data and outputting the results. For example, a systematic review team may assign two or more researchers to independently extract individual studies' data, perform meta-analyses, check the results, and proofread the final manuscript. With sufficient information provided, it is more likely to check for potential internal reporting discrepancies. Several examples of internal reporting discrepancies are discussed by Puljak et al. [43].
Taking the systematic review by Allotey et al. [30] as an example, several discrepancies appeared. In the meta-analysis on fever comparing pregnant women with non-pregnant women with COVID-19 ( Figure 5 in the original article), the OR of the study "Wei L 2020" was reported as 0.40 with 95% CI (0.11, 0.77), and the OR of another study "Wang Z 2020" was reported as 0.29 with 95% CI (0.11, 0.77). The reported CIs were identical, while the point estimates of the ORs were different, and the CIs were displayed differently in the forest plot. Based on the forest plot, the CI of "Wei L 2020" encompassed the null value 1, so the reported CI of this study was likely erroneous when copying and pasting the numeric results in the forest plot. Fortunately, because the event counts and sample sizes (8 and 17 for pregnant women and 18 and 26 for non-pregnant women) were reported for this study, we can derive the correct 95% CI as (0.11, 1.40). A similar issue occurred in the study "Zambrano LD 2020," whose OR was reported as 0.52 with 95% CI (0.50, 0.50). The CI did not even encompass the point estimate; again, this was likely due to a typesetting error. In the meta-analysis on cough (also Figure 5 in the original article), the total sample sizes of pregnant and non-pregnant women across the 11 studies were reported as 5468 and 75,053, respectively. These total sample sizes were also apparently erroneous because they are smaller than the sample sizes in the single study "Zambrano LD 2020." The correct total sample sizes should be 17,806 and 222,493 ( Figure 1).

Considering Alternative Estimators of Between-Study Variance
As mentioned earlier, the example meta-analyses were performed with the randomeffects model, and the well-known DL method was used to estimate the between-study variance. The DL estimator is based on the method of moments. This method is popular possibly because it is a simple, non-iterative method with a closed-form [21]. Many alternative estimators have been proposed for the between-study variance [44][45][46]. Although the DL estimator retains its usefulness in some situations (e.g., large sample sizes) [22], it could bias the estimated between-study variance, and the restricted maximum-likelihood (REML) estimator generally performs better among various frequentist methods [25,26,47]. Bayesian methods can also be good alternatives as they have the ability to incorporate prior information (e.g., from external evidence or experts' opinions) in the final estimates [48,49]. The method used to estimate heterogeneity plays a crucial role in a meta-analysis because it could greatly affect the estimated overall effect, particularly the width of its CI and thus the statistical significance. Therefore, we suggest researchers exploring alternative options for estimating the between-study variance offered by the software programs used for their meta-analyses. In many cases, the alternative estimators may produce similar results to the DL estimator, and the DL estimator may be considered reliable. However, if these estimators yield fairly different results, researchers may consider alternative estimators.
In the example meta-analysis on fever, the DL method estimated the between-study variance as 0.053, leading to an overall OR estimate of 0.488 with 95% CI (0.377, 0.632). Using the REML method, the estimate became 0.127, leading to an overall OR estimate of 0.453 with 95% CI (0.326, 0.629). In the example meta-analysis on cough, the DL method estimated the between-study variance as 0.168, leading to an overall OR estimate of 0.719 with 95% CI (0.502, 1.031). Using the REML method, the estimate became 0.239, leading to an overall OR estimate of 0.711 with 95% CI (0.476, 1.061).

Considering Alternative Confidence Intervals
Conventionally, the CI of the overall estimate in a meta-analysis is produced assuming normality (e.g., for the log OR). However, this normality assumption might be questionable in some situations [50]; as such, the normality-based CI may not have the desired coverage probability (e.g., 95%). Hartung and Knapp [51,52] and Sidik and Jonkman [53,54] independently introduced a refined CI based on the t-distribution for the random-effects meta-analysis. This t-based CI has been shown to have better coverage probabilities than the standard normality-based CI by various simulation studies, particularly when a metaanalysis only contains a few studies [29,[55][56][57]. Of note, this CI was designed for the random-effects meta-analysis, and it is inappropriate to apply it to the fixed-effect (also known as common-effect) meta-analysis that assumes no heterogeneity.
The t-based 95% CI of the overall OR in the example meta-analysis on fever was (0.352, 0.678) and (0.316, 0.650) using the DL and REML methods, respectively, both wider than their counterparts of normality-based 95% CIs (0.377, 0.632) and (0.326, 0.629). Similarly, in the example meta-analysis on cough, the t-based 95% CI of the overall OR was (0.463, 1.118) and (0.453, 1.114) using the DL and REML methods, respectively. Also, both were wider than their counterparts of normality-based 95% CIs (0.502, 1.031) and (0.476, 1.061).

Reporting Prediction Intervals
Heterogeneity between studies frequently appears and is generally expected in a meta-analysis [58]. Standard meta-analysis approaches use the random-effects model to account for the heterogeneity and use the estimated between-study variance τ 2 and/or the I 2 statistic to quantify it. However, it is difficult to apply these metrics to clinical practice for future research. Over the last decade, much effort has been made to promote the reporting of the prediction interval (PI) in a meta-analysis, but only a small proportion of metaanalyses adopt this recommendation in the current literature [9,[59][60][61][62][63]. The PI represents the expected range of the true effects in future studies, making it easier to apply metaanalysis results to clinical practice. The PI is wider than the CI due to the heterogeneity between existing studies in a meta-analysis and future studies. A meta-analysis may have a CI not encompassing the null value (thus implying a statistically significant effect), but its PI could encompass the null, indicating that a future study could have opposite results [64].
Despite the attractive features of the PI, researchers should note that the PI could be subject to large uncertainties when the number of studies in a meta-analysis is relatively small (e.g., <10). In the presence of small-study effects (detailed in the following subsection), the PI could have poor coverage due to biased estimates. Therefore, the PI should be interpreted with caution in these situations. Also, the PI is designed for a random-effects meta-analysis; it is not sensible for a fixed-effect meta-analysis.
In the example meta-analysis on fever, based on the REML estimator of the betweenstudy variance, the 95% PI of the overall OR is (0.186, 1.104), encompassing the null value 1. Recall that the 95% CI of this meta-analysis is (0.326, 0.629), not encompassing 1. Therefore, although the meta-analysis concludes a statistically significant association between fever and pregnancy, this conclusion could be changed in a new study.
In the example meta-analysis on cough, based on the REML estimator of the betweenstudy variance, the 95% PI of the overall OR is (0.214, 2.356), much wider than its 95% CI (0.476, 1.061). The PI can be incorporated into the forest plot [60,65], as shown in Figure 1. Of note, the results in Figure 1 were produced using the DL method to reproduce the original results in Allotey et al. [30], so the interval estimates were different from the foregoing results based on the REML method.

Assessing Small-Study Effects Whenever Possible
Small-study effects refer to the phenomenon that smaller studies containing fewer subjects have substantially different results from larger studies with more subjects. They could be caused by publication bias, when small studies with statistically significant findings or effect estimates in the desired direction are more likely published in the literature than those with non-significant findings or effect estimates in the opposite direction [66,67]. Assessing small-study effects is a crucial step for validating the synthesized evidence from a meta-analysis; if substantial small-study effects appear, the certainty of the synthesized evidence should be rated down [3,68,69]. Common approaches to assessing small-study effects include graphical tools, such as the funnel plot [39,40], and quantitative methods, such as Egger's test, Begg's test, and skewness [70][71][72][73][74]. The asymmetry in a funnel plot is an indicator of potential small-study effects. Additional contours that depict areas of various statistical significance levels can be further added to the usual funnel plot, referred to as the contour-enhanced funnel plot [40,75,76]. They help distinguish publication bias from other potential factors (e.g., subgroup effects) that might cause small-study effects.
We assessed small-study effects in the meta-analyses on fever and cough. In Figure 3A, the funnel plot shows that the (log) ORs from the 11 studies on fever were distributed asymmetrically. Smaller studies with larger standard errors tended to have smaller ORs away from the null value 1, indicating small-study effects. The potential missing studies at the lower right part of the funnel plot are likely located within the white area with p-values > 0.1. Therefore, this contour-enhanced funnel plot supports the existence of publication bias. Nevertheless, the p-value of Egger's test was 0.275, suggesting that publication bias was not statistically significant. Of note, if the potential missing studies were located in areas with very small p-values, then the small-study effects may not be explained by publication bias. In such cases, meta-analysts are encouraged to explore the factors that might cause the funnel plot's asymmetry, e.g., by performing subgroup analyses to examine whether the asymmetry was attributable to heterogeneity [40,77].
Although small-study effects were not statistically significant based on Egger's test in this case study, it did not mean that the assessment of small-study effects was unnecessary. Statistical methods for detecting small-study effects usually have low In Figure 3A, the funnel plot shows that the (log) ORs from the 11 studies on fever were distributed asymmetrically. Smaller studies with larger standard errors tended to have smaller ORs away from the null value 1, indicating small-study effects. The potential missing studies at the lower right part of the funnel plot are likely located within the white area with p-values > 0.1. Therefore, this contour-enhanced funnel plot supports the existence of publication bias. Nevertheless, the p-value of Egger's test was 0.275, suggesting that publication bias was not statistically significant. Of note, if the potential missing studies were located in areas with very small p-values, then the small-study effects may not be explained by publication bias. In such cases, meta-analysts are encouraged to explore the factors that might cause the funnel plot's asymmetry, e.g., by performing subgroup analyses to examine whether the asymmetry was attributable to heterogeneity [40,77].
Although small-study effects were not statistically significant based on Egger's test in this case study, it did not mean that the assessment of small-study effects was unnecessary. Statistical methods for detecting small-study effects usually have low powers, particularly in meta-analyses with only a few studies. As such, the significance level for detecting small-study effects is typically set to 0.1, higher than the most popular cutoff of 0.05 [78].
Here, meta-analysts should distinguish the p-value of tests for small-study effects from the p-values of individual studies' effect estimates. The significance levels depicted in the contour-enhanced funnel plot are intended for the latter. In addition, if a meta-analysis contains less than 10 studies, it might be inappropriate to use the funnel plot to detect small-study effects because it is hard to distinguish chance from real asymmetry [40].
In Figure 3B, the funnel plot for the meta-analysis on cough does not show apparent missing studies in the white area of non-significance. Therefore, it does not support the existence of publication bias.

Considering One-Stage Methods
Conventionally, meta-analyses are performed with two-stage methods; that is, withinstudy estimates are first obtained, and then the study-specific estimates are pooled together as an overall estimate. The two-stage methods are usually simple and intuitive; the studyspecific estimates provided by them are also necessary for producing the forest plot for visualizing a meta-analysis and the funnel plot for assessing small-study effects. Nevertheless, they suffer from several limitations. First, the study-specific estimates in the two-stage methods are typically assumed to approximately follow normal distributions. For this purpose, certain transformations are applied to the original effect measures. For example, the OR is typically analyzed on the logarithmic scale, and the Freeman-Tuckey double-arcsine transformation is widely used to transform proportion estimates, as in the original analyses by Allotey et al. [30]. The transformed estimates may approximately follow normal distributions when the sample sizes are sufficiently large, while the approximation may be inaccurate for studies with small sample sizes [50]. In recent years, there are also growing concerns about the appropriateness of the Freeman-Tuckey doublearcsine transformation for meta-analyses of proportions [79][80][81]. Second, the variances of the effects from individual studies need to be estimated in the two-stage methods, and the estimated within-study variances are typically treated as fixed variables. Again, this practice may be valid for large-sample settings, but it is questionable for studies with small sample sizes [57]. For example, the (log) OR's variance depends on the event counts, which are actually random variables instead of fixed variables. The (log) OR and variance are thus intrinsically associated, and such association could lead to non-negligible biases for small sample sizes and/or low even rates [82][83][84].
With the recent development of statistical methods for meta-analysis, many software programs have commands to pool data via one-stage methods, such as generalized linear mixed models (GLMM) and Bayesian hierarchical models. The one-stage methods assume exact likelihood functions for the observed data (e.g., the binomial likelihood for the event count from a group of patients). They do not need the estimation for each individual study and thus avoid some unrealistic assumptions made by the two-stage methods. Moreover, these methods are widely applicable to many types of meta-analyses, including comparative studies, proportions, and diagnostic tests [27,[85][86][87][88][89][90]. In the following, we illustrate the use of GLMMs and Bayesian hierarchical models with two example meta-analyses.
In the meta-analysis on cough, recall that the overall OR was 0.719 with 95% CI (0.502, 1.031) based on the original analysis (the DL estimation) by Allotey et al. [30]; it was 0.711 with 95% CI (0.476, 1.061) using the REML estimation. We re-analyzed this dataset using the GLMM and Bayesian hierarchical models with a logit link function. For the Bayesian models, we used the vague normal prior N(0, 100 2 ) for the overall log OR and the uniform prior U(0, 5) for the between-study standard deviation τ. We also considered the informative log-normal prior LN(−2.89, 1.91 2 ) for τ 2 , which was derived by Turner et al. [48] based on a large Cochrane database. The GLMM estimated the overall OR as 0.710 with 95% CI (0.493, 1.022). The Bayesian model with U(0, 5) prior for τ produced the estimated OR of 0.701 with 95% credible interval (CrI) (0.415, 1.143), and that with LN(−2.89, 1.91 2 ) prior for τ 2 gave 0.709 with 95% CrI (0.462, 1.047).
In the meta-analysis of the prevalence of COVID-19 in pregnant women, we reanalyzed it using the GLMM and Bayesian model with a logit link function, in addition to the original two-stage method used by Allotey et al. [30] (i.e., the DL estimation with the Freeman-Tuckey double-arcsine transformation). Based on the original two-stage method, the overall prevalence was estimated as 6.77% with 95% CI (5.28%, 8.44%). Based on the GLMM, the estimated overall prevalence became 5.44% with 95% CI (4.09%, 7.19%). The Bayesian model with U(0, 5) for τ produced the estimated OR of 5.44% with 95% CrI (4.04%, 7.34%). The prevalence estimates by both one-stage methods were smaller than those by the two-stage method by over 1%.

Conclusions
This article provided a summary of good practices for performing a meta-analysis from statistical perspectives. We illustrated these practices using meta-analyses published in a recent systematic review on COVID-19 in pregnancy. We hope they may help improve the methodological quality of future meta-analyses. For facilitating researchers to implement the methods reviewed in this article, the Supplemental File gives all code for our analyses.
Due to the urgent need for COVID-19 research, it has been dramatically expedited to conduct and peer-review meta-analyses. Nevertheless, it is critical to safeguard the integrity of scientific evidence during this challenging period of accelerated publishing [14]. This article shows that some statistical methods used in the example meta-analyses may be suboptimal. In our re-analyses with better alternatives, some meta-estimates had noticeable changes. Also, potential small-study effects might exist. Extra attention is needed to examine whether such effects might continue to exist in the future updates of this living systematic review after including new studies. This article has several limitations because we were only able to focus on select statistical advances for meta-analysis based on a single case study on COVID-19. For example, for assessing small-study effects or publication bias, some selection models may be applied as sensitivity analyses to examine the robustness of synthesized results to potential bias [91,92]. Alternative meta-analysis methods are available to offer some benefits over the traditional fixed-effect and random-effects models under specific cases [93][94][95]. In addition, the current literature has debates on the choice of effect measures, e.g., relative risk, in meta-analyses [96,97]. This article has also not covered topics on meta-analyses of diagnostic tests [98]. All examples are meta-analyses of aggregate data, while metaanalyses of IPD may involve additional issues and require specific methods [99]. For a more comprehensive review of meta-analysis methods, one may refer to the Cochrane Handbook [100].
Systematic reviews and meta-analyses are a type of transdisciplinary research. Therefore, in addition to many statistical considerations reviewed in this article, non-statistical guidance is also crucial for conducting high-quality meta-research. For example, heterogeneity between studies may be assessed beyond the statistical perspectives [101]. To aid the statistical assessment of small-study effects, researchers are suggested to search for relevant unpublished studies (e.g., on preprint servers and trial registries), include them in meta-analyses, and explore their potential differences from the published studies [100]. Of course, because the unpublished studies are not peer-reviewed, they could be subject to a high risk of bias. The risk of bias must be carefully appraised if incorporating such studies in the systematic review [102].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/biomedinformatics1020005/s1: code for producing the results presented in the main content.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical review and approval were waived for this study because we focused on statistical methods for meta-analysis and used published data.