You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

1 November 2025

Adjusting for Publication Bias in Meta-Analysis with Continuous Outcomes: A Comparative Study

1
Department of Applied Biology, HAS Green Academy, Onderwijsboulevard 221, 5223 DE ’s-Hertogenbosch, The Netherlands
2
Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
This article belongs to the Special Issue Statistical Methods in Epidemiology:Latest Advances and Prospects

Abstract

Publication bias has been a problem facing meta-analysts. Methods adjusting for publication bias have been proposed in the literature. Comparative studies for methods adjusting for publication bias are found in the literature, but these studies are limited. We investigated and compared the performance of five methods adjusting for publication bias for the case of continuous outcomes. Three measures of continuous treatment effect, the mean difference, Cohen’s d and Hedges’ g, were considered. The methods studied were Copas, PET-PEESE, p-uniform, Trim and Fill and the limit meta-analysis. In addition, the performance of the random-effects meta-analysis using the DerSimonian estimator was also investigated. The analysis was conducted using a case study and an extensive simulation study including different scenarios. In general, the Copas and the PET-PEESE were found to be the least biased methods adjusting for publication bias. However, the Copas method, like other likelihood-based methods, can have convergence issues. In addition, the PET-PEESE method is robust in case of heteroscedasticity, making the PET-PEESE method a preferable technique to adjust for publication bias.

1. Introduction

Publication bias, also known as the file drawer problem or small-study problem, occurs when the published meta-analysis is not representative of the population of all studies, and can lead to incorrect conclusions. Publication bias is mainly attributed to the direction and the statistical significance of results, but other causes have been documented in the literature, e.g., language bias, availability bias and cost bias [].
One popular method for adjustment for publication bias is the Trim and Fill method [,]. Assuming funnel plot symmetry, the method starts by estimating the number of unreported studies. Then, the treatment effect is re-estimated after removing the most extreme studies on the left side of the funnel plot. Using the new treatment effect estimate, a new number of unreported studies is estimated and removed. This process is repeated till the number of unreported studies is stable. Then, values are imputed for this number of studies on the right side of the funnel plot, and a treatment effect is estimated using the new set of data. The method has been evaluated in the literature for binary outcomes and found to be more biased in case of treatment effect heterogeneity [,].
Methods correcting for publication bias applying a selection model assume a mechanism for the publication process while modeling how the data are generated without publication bias. The first selection method developed by Hedges [,,] assumed that only statistically significant results were published. Iyengar and Greenhouse [] developed the Hedges method by using a weight function based on the relative likelihood that a non-significant-result study is published to that of a significant-result study. Later on the Hedges and Iyengar and Greenhouse methods have been extended to accommodate more complicated forms of publication bias []. Vevea and Hedges [] proposed creating weights using a step function involving the one-tailed p-values. The Vevea and Hedges method is computationally complex since all weights need to be estimated along with the effect size and the between-study variance component. This can be especially problematic when the number of studies in a meta-analysis is small. For this reason, Vevea and Woods [] proposed applying the Vevea and Hedges method with a priori specified weight functions. However, one disadvantage of the Vevea and Hedges method is assuming the study-specific covariates are available. Dear and Begg [] introduced a similar method using a weighted likelihood function with the weights calculated using limits from the normal distribution based on each study’s p-value.
Another adjustment method based on a selection model which uses the effect size and its standard error instead of the p-values was developed as a sensitivity analysis by Copas and Shi [,]. The probability of selection was estimated using a linear function of a standard normally distributed variable, with the function parameters determined subjectively. Next, a conditional distribution density function was constructed based on the distribution of the treatment effects and of the selection probability, after which likelihood methods were applied to obtain estimates for the treatment effect. The sensitivity of the estimated treatment effect values was tested for different values of the parameters of the selection function. Reasonable estimates for the treatment effect were the ones associated with a p-value of a publication bias test of just above 0.1.
Another group of selection methods uses the distribution of the p-values [,,]. The p-curve and p-uniform methods both estimate the effect size by minimizing the distance between the observed p-values distribution and the uniform distribution, each method using a different distance metrics. One disadvantage of the p-curve method is that it does not estimate a confidence interval for the bias-adjusted effect size []. More selection methods can be found in the literature; see [,,,] for an overview.
Regression methods have also been applied to adjust for publication bias. Moreno et al. [] regresses the treatment effect on its variance, using the variance as a weight, and uses the predicted value for an infinitely large study as an adjusted estimate for the treatment effect. Moreno’s method is related to the PET-PEESE approach in which Stanley and Doucoullagos [,] suggest using the slope estimate from the Egger model as bias-adjusted overall treatment effect estimate, if the null hypothesis of zero treatment effect is not rejected. If this null hypothesis is rejected, the method uses the intercept estimate from Moreno’s regression equation as a bias-adjusted overall estimate. Although the PET-PEESE method has been shown to perform well with continuous outcomes, its estimate has been found to be less efficient when there is heterogeneity in treatment effects []. Rucker et al. [] developed the limit meta-analysis method to adjust for publication bias. The authors adjusted the random-effects meta-analysis model by introducing a publication bias parameter and suggested the maximum likelihood method to estimate the model parameters. The authors noted that the estimates could be obtained using a linear regression model on the so-called generalized radial plot, with the standardized effect size dependent on the inverse of the standard error.
Comparison studies of methods adjusting for publication bias can be found in the literature. Terrin et al. [] compared the performance of the Trim and Fill method with that of Hedges selection model in the case of odds ratio. The authors concluded that Hedges’ method performed better than the Trim and Fill method but noted the convergence problems of the iterative Hedges method. Schwarzer et al. [] compared the performance of the Trim and Fill method with that of the Copas method in case of binary outcomes using the arcsine difference as an effect size. The authors concluded that the Trim and Fill method produced larger standard error, wider confidence intervals, i.e., it was more conservative and less efficient than the Copas method. Reed [] compared six meta-analysis estimators in case of publication bias using a simulation study. Publication bias was created based on two scenarios: bias towards statistically significant estimates and bias against studies with wrong-signed estimated effects. The simulation set-up used was suitable for both binary and continuous outcomes. The six estimators considered were the fixed-effects estimate, the weighted least squares estimator, the random-effects estimator, the PET estimator [,], and the PEESE estimator [,], while using the arithmetic mean of estimated effects as the benchmark estimator. Reed concluded that the fixed-effects and the weighted least squares estimators were as efficient and sometimes more efficient than the PET and the PEESE estimators, while the random-effects estimator was often more biased than the other estimators [].
Mcshane et al. [] compared the performance of a number of publication bias adjustment methods using the mean difference as an effect size and applying one-sided selection. The authors compared the p-curve and the p-uniform approaches to the maximum likelihood estimation approaches of Hedges and Iyengar and Greenhouse []. The p-curve and p-uniform approaches were found to perform well under the setting they were designed for: namely, when only studies with results that were statistically significant and directionally consistent were published and effect sizes were homogeneous across studies. However, when one or both of these assumptions was violated, the p-curve and the p-uniform methods became less accurate than the Iyengar and Greenhouse method. Although McShane et al. [] compared the performance of a few methods in case of continuous outcomes, the authors did not consider other methods not based on selection models and studied only one type of effect size.
Carter et al. [] investigated the performance of eight methods correcting for publication bias. The authors carried out an extensive simulation study but only investigated the performance of Cohen’s d. Carter et al. concluded that no one single method clearly outperformed the other methods. Hong and Reed [] compared 10 methods under different simulation setting, including publication selection mechanism. The authors concluded that the size of the meta-analysis sample and effect heterogeneity influenced the estimator performance and showed how these characteristics could help meta-analysts choose the most appropriate estimator for their research circumstances. However, the authors did not concentrate explicitly on continuous outcomes with its different effect size measures. Neither Carter et. al nor Hong and Reed included the popular Copas method.
This brief review shows the lack of a comprehensive comparative study for methods adjusting for publication bias in case of continuous outcomes. Therefore, our research question was as follows: Which publication bias adjusting method works best in the case of continuous outcomes? The objective was to investigate the performance of methods adjusting for publication bias in case of continuous outcomes, for all corresponding measures of the effect size. Investigating all adjustment methods in the literature was obviously beyond the scope of this article. We selected five publication bias adjustment methods to try to represent different techniques and applied these methods in the case of continuous outcomes using the mean difference, Cohen’s d and Hedges’ g as treatment effect estimates. One popular method based on selection models, the Copas method, was considered. Another method based on the p-values, the p-uniform method, was considered. The performance of the PET-PEESE method, a regression analysis-based method, was investigated. Finally, two likelihood-based methods, the Trim and Fill method and the limit meta-analysis method, were considered. In addition, the traditional random effects meta-analysis method based on the DerSimonian–Laird estimate was also included. The layout of this article is as follows: In Section 2.2, we describe the selected methods used to adjust for publication bias. Section 3 describes the case study, the simulation model and the selection model used to simulate publication bias. The results and the discussion are found in Section 4 and Section 5, respectively.

2. Adjustment Methods

A meta-analysis of continuous outcomes usually consists of m studies, i ( i = 1 , 2 , , m ). Each study typically has n i participants, with n i j subjects assigned to treatment arm j , j ϵ 0 , 1 . Here j = 1 and j = 0 indicate the treatment and the control arms, respectively. Define y ^ i 1 and S i 1 as the average and the standard deviation of the treatment arm, respectively, and y ^ i 0 and S i 0 the average and the standard deviation of the control arm, respectively. Traditionally, the mean difference y ^ i 1 y ^ i 0 is used as a treatment effect for study i. Another treatment effect measure is Cohen’s d [], which can be given by y ^ i 1 y ^ i 0 / s e i where s e i = n i 1 1 S i 1 2 + n i 0 1 S i 0 2 / n i 1 + n i 0 2 . However, Cohen’s d has been found to be biased [], and a corrected unbiased estimate is referred to as Hedges’ g. It multiplies Cohen’s d with the correction factor J = 1 3 / 4 n 1 + n 0 2 1 . From here on, we refer to the treatment effect for study i by y ^ i , which can refer to the mean difference, Cohen’s d, and Hedges’ g.

2.1. Copas Selection Model

Copas and Shi [,] assume that all studies follow the traditional random-effects meta-analysis model given by
y i = y + θ i + ϵ i
with y general treatment effect, θ i N 0 , τ 2 and ϵ i N 0 , V i 2 . However, the authors assume that only a subset of the studies has been published. The authors then introduce z i = A 1 + A 2 / s e i ^ + δ i , with δ i N ( 0 , 1 ) correlated with ϵ i through ρ = c o r r ( ϵ i , δ i ) , and y ^ i only reported if z i > 0 . The conditional probability density function is given by P y ^ i | z i > 0 , s e i ^ and the log likelihood function is constructed as []
L y , τ 2 = i = 1 m log P y ^ i | z i > 0 , s e i ^ .
For fixed values of A 1 and A 2 , L y , τ 2 is maximized, and y and τ 2 along with their confidence interval are estimated. We used the R-package “copas”, which is part of the R-package meta (R version 4.1.2) to carry out the Copas method [,].

2.2. The p-Uniform Method

The p-uniform is a selection model approach, with the selection model assuming that the probability of publishing a statistically significant treatment effect as well as a non-significant treatment effect are constant but may differ from each other []. Defining the F X x i , the distribution functions of the normal distributions N y f , s e i ^ , with y f the fixed-effects meta-analysis effect size estimate and y i ˙ the critical value for study i, the authors define the conditional p-value distribution []
p i y ˙ = y ^ i F X x i y i ˙ F X x i
where y ˙ is the critical value for rejecting the null hypothesis of no treatment effect at significance level α . Using the gamma-distributed test statistic L y = i = 1 m log p i y , the authors introduce the bias-adjusted estimate y ^ for y, with y ^ the solution to L y ^ = m . The p-uniform lower and upper limits of the point estimate y ^ L and y ^ U are then given by the solutions of L y ^ L = Γ 1 α / 2 m , 1 and L y ^ U = Γ α / 2 m , 1 , respectively, with Γ the gamma function []. We used the package ‘puniform’ (R version 4.1.2) to perform the p-uniform method [].

2.3. The PET-PEESE Method

Stanley and Doucouliagos [,] start with the linear regression model used in the Egger test given by
y ^ i s e i ^ = α 0 + α 1 s e i ^ + ε i ,
with α 0 and α 1 an intercept and a regression coefficient, respectively, and ε i N 0 , σ 2 . The authors suggest using α ^ 1 in model (4) as a bias-adjusted estimate for the overall treatment effect, if the null hypothesis of α 1 = 0 is not rejected by the Egger equation using a t-test (PET). If this null hypothesis is rejected, the model y i = γ 0 + γ 1 s e i 2 + ϵ i , ϵ i N 0 , σ 2 is fitted using 1 / s e i 2 as weight, and γ ^ 0 is used as a bias-adjusted estimate of the overall treatment effect (PEESE). Note that the last regression model was the one suggested by Moreno et al. (2011) []. We used the GLM procedure in SAS version 9.4 to carry out the PET-PEESE method.

2.4. Trim and Fill Method

The Trim and Fill method can be briefly described as follows [,]. The studies are first ranked based on their distance from the pooled treatment effect estimated by the random-effects model y ^ R M (1), i.e., ranking distances | y ^ i y ^ R M | . Next, the number of unobserved studies is estimated using the estimator L 0 = [ 4 T m m ( m + 1 ) ] / [ 2 m 1 ] , where T m is the Wilcoxon rank-sum test statistic estimated from the ranks of studies with y ^ i > y ^ R M (here, it is assumed that y ^ R M is positive, and it is more likely that studies with effect sizes below y ^ R M are potentially missing). Then, the L 0 most extreme studies (i.e., studies with positive effect sizes furthest away from zero) are trimmed off, and the pooled treatment effect y ^ R M without these studies is re-estimated. Then, all studies are ranked again, based on their distance to the new pooled estimate, and L 0 is re-computed. This procedure is repeated until it stabilizes ( L 0 does not change anymore), and we obtain a final estimate y ^ R M and a final estimate L 0 of the number of studies missing. Then, L 0 studies are imputed by mirroring the L 0 studies with the highest effect sizes around the final estimate for y ^ R M , and the standard error s e i from the mirrored study is provided. After imputation, a final pooled estimate y ^ T F with standard error is provided using the random-effects model on all m + L 0 studies. We used the function “trimfill” in the R package “metafor” (R version 4.1.2), with the average treatment effect estimated using the random-effects model [].

2.5. The Limit Meta-Analysis Method

The limit meta-analysis can be described as follows []. The authors apply a publication-bias-adjusted version of the random-effects meta-analysis model as follows
y ^ i = y + s e i + τ ϵ i + α ,
with α a parameter representing publication bias. The authors propose using y ^ + τ ^ α ^ as a publication-bias-adjusted estimator for the effect size y and the maximum likelihood method to estimate the parameters y and τ . However, the authors note that maximum likelihood estimates for y and τ correspond with the intercept and slope of the linear regression on a generalized radial plot. In such model, the response variable and the independent variable are given by y ^ i / s e i ^ + τ ^ and 1 / s e i ^ + τ ^ , respectively []. We used the R package “metasens” (R version 4.1.2) to carry out the limit meta-analysis method [].

3. Case Study, Simulation, and Selection Models

3.1. Case Study

As a case study, we used the results of 25 studies investigating the effect of the experience of weight on the concept of importance. Experience of weight, exemplified by heaviness and lightness, is metaphorically associated with concepts of seriousness and importance []. Robelo et al. [] reported 25 studies that investigated the effect of weight on the concept of importance. This case study was selected for the following reasons. All published studies had continuous outcomes for the expression of importance and all applied Cohen’s d as an effect size. Moreover, there was evidence that these published studies were too good to be true, indicating the presence of publication bias []. In addition, the authors conducted their own research on this topic after applying the p-uniform method’s test for publication bias on these published studies [] and finding evidence of significant publication bias []. The dataset containing the results of the 25 studies can found here [].

3.2. Simulation Model

The simulation model used is described in Section 3.2.1. The simulation model explores different scenarios including different study sizes, number of units per treatment arm per study, treatment effect homogeneity, and different levels of treatment effect heterogeneity. In addition, two scenarios of equal and unequal study variance are presented for the treatment and control groups, respectively. A separate model for selecting a number of studies based on specific criteria to simulate publication bias is described in Section 3.2.2.

3.2.1. Aggregate Simulation Model

The simulation model applied can be described as follows [,]. In total, m studies were generated, and for the ith study, n i j , j = 0 , 1 , were independently drawn from a Poisson distribution with parameter δ . The estimated treatment effect for the jth treatment group in study i was given by y ¯ i j = η + θ i · t i j + ε i j , with η a fixed treatment effect, θ i a random effect for study i, θ i N 0 , τ 2 , t i j a treatment indicator variable for study i with a value of 1 when j = 1 and 0 otherwise, and ε i j an error term with ε i j N 0 , S i j 2 / n i j . For the equal study variance scenario, S i 0 2 = S i 1 2 = S i 2 , while for the unequal study variance scenario, the variance for the control and the treatment groups were created by S i 0 2 = 0.8 S i 2 and S i 1 2 = S i 2 , respectively. The parameter values used to generate the data were m { 10 , 30 } , δ = 15 , 30 , η = 5 , τ 2 = 0 , 1 , 5 , and S i 2 was generated using S i 2 N 100 , 100 . The mean difference, Cohen’s d, and Hedges’ g were then calculated as described in Section 2 earlier.
Note that some choices for τ 2 made here are unrealistic for the corresponding treatment effect. Examples are the choice of τ 2 = 1 for the mean difference and the choice of τ 2 = 5 for Cohen’s d and Hedges’ g. However, as Morris et al. [] point out, including unrealistically extreme data generation mechanisms can help show under which circumstances the considered methods fail.

3.2.2. Selection Model Based on Significant Effect Size

Selection models based on the p-value of the study effect have been proposed in the literature [,,,]. Let α be the significance level and t q , ν i be the qth quantile of a student t distribution with ν i degrees of freedom for study i. When the effect size is significant (assuming more positive effect sizes), i.e., d i > t 1 α / 2 , ν i , the study is included. To add randomness to the selected studies, a uniform distributed random variable U ( 0 , 1 ) and a parameter π pub are used. The drawings from the uniform distribution represent probabilities. The value of π pub is chosen based on the desired percentage of published studies, see below. The drawn values of the uniform distribution and the π pub determine which study will be selected as follows. If the uniform random variable is smaller than or equal to 1 π pub , the non-significant study is included too; otherwise, it is excluded. Here, α = 0.05 and π pub were chosen to obtain a desired publishing rate of around 80%. For the equal study variance scenario, we applied ν = n i 0 + n i 1 2 . For the unequal variance scenario, we applied the Satterthwaite approximation for the degrees of freedom ν = S i c 2 / n i c + S i T 2 / n i T 2 / S i c 2 / n i c 2 / n i c 1 + S i T 2 / n i T 2 / n i T 1 [].

4. Results

4.1. Results for the Case Study

Table 1 shows the treatment effect estimates for the case study described in Section 3.1 using the five adjustment methods explained in Section 2. Also presented is the treatment effect estimate using the traditional random-effects meta-analysis method based on the DerSimonian–Laird estimator (DL).
Table 1. Results of treatment effect estimates for the case study.
For the mean difference, the Copas and limit meta-analysis methods have the lowest point estimates and their mean difference estimates do not significantly deviate from zero. The PET-PEESE method is the only negatively biased method, while the Trim and Fill method is the highest positively biased method. Both the PET-PEESE and the Trim and Fill methods have mean difference estimates that do not significantly deviate from zero. The p-uniform and the DL methods are both positively biased and have mean difference estimates that significantly deviate from zero.
In the case of Cohen’s d and Hedges’ g, all methods produce a treatment effect estimate of between 0.5 and 0.6, with the exception of the PET-PEESE and the limit meta-analysis methods,. The limit meta-analysis method produces a lower treatment effect estimate. The p-uniform, PET-PEESE, and the limit meta-analysis are the only methods indicating that there is no treatment effect, since their estimates do not significantly deviate from zero.

4.2. Results for the Simulation Study

4.2.1. Average Number of Studies After Selection

Table 2 shows the average number of remaining studies per simulated meta-analysis after applying the selection model explained in Section 3.2.2 for all simulation settings. We tried to keep the average number of remaining studies around 80% of original studies for comparison purposes. This percentage is based on earlier research concluding that the reported proportion of published studies with positive outcomes was 82% in emergency or general medicine [].
Table 2. Average number of remaining studies after applying the selection model for the different simulation settings.
Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 show the average mean squared error (AMSE), the bias and the coverage probabilities (COV) of all the methods considered. Note that the results of equal and unequal variance scenarios were comparable for all methods except the PET-PEESE method. For this reason, the results of unequal variance scenario are only shown for the PET-PEESE method. The results of this scenario for all other methods can be found in the Supplementary Materials.
Table 3. An overview of the AMSE, bias, and coverage probabilities for the Copas method under equal variance assumption for the different simulation settings.
Table 4. An overview of the AMSE, bias, and coverage probabilities for the p-uniform method under equal variance assumption for the different simulation settings.
Table 5. An overview of the AMSE, bias, and coverage probabilities for the PET-PEESE method under equal variance assumption for the different simulation settings.
Table 6. Anoverview of the AMSE, bias, and coverage probabilities for the PET-PEESE method under unequal variance assumption for the different simulation settings.
Table 7. An overview of the AMSE, bias, and coverage probabilities for the Trim and Fill method under equal variance assumption for the different simulation settings.
Table 8. An overview of the AMSE, bias, and coverage probabilities for the limit meta-analysis method under equal variance assumption for the different simulation settings.
Table 9. An overview of the AMSE, bias, and coverage probabilities for the random-effects model under equal variance assumption for the different simulation settings.

4.2.2. Copas Selection Model

For the mean difference, the AMSE and bias increased as the treatment effect heterogeneity increased when m = 10 for the equal study variance scenario. For the unequal study variance scenario, the mean difference, the AMSE, and bias increased as the treatment effect heterogeneity increased for both m = 10 and m = 30 . Both the AMSE and bias were slightly higher for the unequal study variance scenario. Coverage probabilities were much lower than those of Cohen’s d and Hedges’ g. For Cohen’s d, when m = 10 and m = 30 , the AMSE and bias of the Copas method were higher for the unequal study variance scenario when n i = 15 but not always when n i = 30 . The coverage probability for Cohen’s d was comparable for the equal and unequal study variance scenarios. However, the coverage probability decreased when m = 30 for both homogeneity and heterogeneity of the treatment effect. Hedges’ g showed comparable values of AMSE, bias, and coverage probability when m = 1 0 and m = 30 and when n i = 10 for different values of τ 2 and for both the equal and unequal study variance scenarios. However, the AMSE, bias, and coverage probability clearly decreased when n i = 30 . Overall, Hedges’ g had a smaller AMSE and bias and higher coverage probabilities than Cohen’s d. The AMSE and bias of the method under the equal variance scenario were lower than under the unequal variance scenario. The statistical power of the Copas method was lower under the equal variance scenario.

4.2.3. The p-Uniform Method

For the mean difference, the AMSE and bias increased as the treatment effect heterogeneity increased for both the equal and unequal study variance scenarios. Both the AMSE and bias were slightly higher for the unequal study variance scenario. For Cohen’s d, the AMSE and bias of the p-uniform method were comparable for the unequal study variance scenario for all simulation settings. The coverage probability mostly increased for Cohen’s d as the treatment effect heterogeneity increased. The AMSE and bias of Cohen’s d decreased while the coverage probability increased as the treatment effect heterogeneity increased. Hedges’ g showed comparable values of AMSE, bias, and coverage probability when n i = 15 for different values of τ 2 and for both the equal and unequal study variance scenarios. However, the AMSE, bias, and coverage probability mostly increased when n i = 30 , except when τ 2 = 5. The coverage probability mostly increased for Hedges’ g as the treatment effect heterogeneity increased. Cohen’s d and Hedges’ g had comparable bias, AMSE, and coverage probabilities. The AMSE and bias of the p-uniform method under the equal variance scenario were lower than under the unequal variance scenario. The statistical power of the method under the equal and the unequal variance scenario were comparable.

4.2.4. PET-PEESE

For the mean difference, the AMSE increased as the treatment effect heterogeneity increased for the equal and unequal study variance scenarios. The bias decreased as the treatment effect heterogeneity increased. The AMSE and bias were clearly higher for the equal study variance scenario than under the unequal study variance scenario. Coverage probabilities were conservative compared to those of Cohen’s d and Hedges’ g.
For Cohen’s d, the AMSE of the PET-PEESE method was higher for the equal study variance scenario than the unequal variance scenario for all simulation combinations. The bias was higher for the equal study variance scenario when m = 10 and n i = 15 . The AMSE increased and the bias decreased when the between-study variance increased. The coverage probability was, in general, nominal or almost nominal for all simulation settings. These results held for both homogeneity and heterogeneity of the treatment effect. The coverage probability, however, was comparable for different values of τ 2 .
Hedges’ g showed higher values for the AMSE for the equal study variance scenario than the unequal study variance scenario. The bias of Hedges’ g decreased in general as the treatment effect heterogeneity increased. The statistical power was nominal or almost nominal for all simulation settings.
The AMSE for Cohen’s d and Hedges’ g were comparable. When n i = 15 , Hedges’ g showed a lower bias than Cohen’s d in the equal study variance scenario, but this difference reduced or disappeared when n i = 30 . For the unequal study variance scenario, no clear pattern could be observed. Coverage probabilities were comparable for Cohen’s d and the Hedges g for all simulation scenarios.
The AMSE and bias of the PET-PEESE method under the equal variance scenario were higher than under the unequal variance scenario. The statistical power of the method was higher under the equal variance scenario.

4.2.5. Trim and Fill Method

For the mean difference, the AMSE and bias increased as the treatment effect heterogeneity increased when m = 10 for the equal study variance scenario. For the unequal study variance scenario, the mean difference the AMSE, and bias increased as the treatment effect heterogeneity increased for both m = 10 and m = 30 . Both the AMSE and bias were slightly higher for the unequal study variance scenario. Coverage probabilities were lower than those of Cohen’s d and Hedges’ g.
For Cohen’s d, the AMSE and bias of the Trim and Fill method were higher for the equal study variance scenario when n i = 15 but not when n i = 30 . The AMSE and bias increased when τ 2 = 5 for the equal and unequal study variance scenarios. The coverage probability for Cohen’s d was comparable for the equal and unequal study variance scenarios. Hedges’ g showed higher values of AMSE and bias and when m = 10 and n i = 15 for the equal study variance scenario for different values of τ 2 than when m = 30 . The AMSE and bias increased when τ 2 = 5 for the equal and unequal study variance scenarios. The AMSE and bias were generally higher for the equal study variance scenario than the unequal study variance scenario. The coverage probabilities for Hedges’ g were comparable for the equal and unequal study variance scenarios. However, the coverage probability decreased when m = 30 and n i = 30 . The AMSE and bias of Cohen’s d were higher than those of Hedges’ g for the equal study variance scenario except when m = 30 and n i = 30 and also when τ 2 = 5 . The coverage probabilities of Cohen’s d and Hedges’ g had comparable values except when τ 2 = 5 for the equal study variance scenario, where Hedges’ g had a higher coverage probability.
The AMSE and bias of the Trim and Fill method under the equal study variance scenario were lower than under the unequal variance scenario. The coverage probability of the method was lower under the equal variance scenario.

4.2.6. The Limit Meta-Analysis Method

For the mean difference, the AMSE and bias increased as the treatment effect heterogeneity increased for the equal and unequal study variance scenarios. Coverage probabilities were much lower than those of Cohen’s d and Hedges’ g.
For Cohen’s d, the AMSE and bias of the limit meta-analysis method were higher for the equal study variance scenario except when m = 30 and n i = 30 . The bias increased as the treatment effect heterogeneity increased, especially when τ 2 = 5 . The coverage probability for Cohen’s d was comparable for the equal and unequal study variance scenarios. For the equal variance scenario, the coverage probability in general was above the nominal level, with the exception of the case m = 10 and n i = 30 where it became very low. In the unequal variance scenario, when n = 10 , the coverage probability was above the nominal level except for the cases n i = 15 and τ 2 = 5 , and n i = 30 and τ 2 = 1 . When m = 30 , the coverage probability was either equal or slightly less than the nominal value. For the equal variance scenario, the AMSE and bias of Cohen’s d were higher than those of Hedges’ g when n i = 15 . The values of the AMSE and bias of both methods were comparable for the unequal variance scenario. Hedges’ g showed higher values of the AMSE and bias when n i = 15 for both the equal and unequal study variance scenarios. For both methods, the AMSE and bias increased as the treatment effect heterogeneity increased for both the equal and unequal study variances scenarios. The coverage probability for Hedges’ g was comparable for the equal and unequal study variance scenarios, but it was slightly liberal when m = 10 . (A liberal coverage probability is lower than the nominal value, while a conservative one is higher than the nominal value.) Coverage probabilities were comparable for Cohen’s d and Hedges’ g methods for all simulation scenarios.
The AMSE and bias of the limit meta-analysis method under the equal variance scenario were lower than those under the unequal variance scenario. The statistical power of the method under the equal and unequal variance scenarios were comparable.

4.2.7. Random-Effects Model Using the DerSimonian–Laird Estimate

For the mean difference, the AMSE and bias increased when the τ 2 = 5 for both the equal and unequal study variance scenarios. Coverage probabilities were slightly lower than those of Cohen’s d and Hedges’ g.
For Cohen’s d, the AMSE and bias of the random-effects model were higher for the equal study variance scenario when n i = 15 but not when n i = 30 . The AMSE and bias increased when τ 2 = 5 for the equal and unequal study variances scenarios. The coverage probability for Cohen’s d was comparable for the equal and unequal study variance scenarios. Hedges’ g showed lower values of the AMSE and bias and when n i = 15 for the equal study variance scenario for different values of τ 2 than when n i = 30 for both the equal and unequal variance scenarios. The AMSE and bias of Hedges’ g increased when τ 2 = 5 for both the equal and unequal study variances scenarios. The AMSE and bias were generally higher for the unequal study variance scenario than for the equal study variance scenario. The coverage probabilities for both Cohen’s d and Hedges’ g were generally low and decreased when n i = 30 . The coverage probabilities of Cohen’s d and Hedges’ g had comparable values except when τ 2 = 5 for the equal study variance scenario where Hedges’ g had a higher coverage probability.
The AMSE and bias of the random-effects model using the DerSimonian–Laird estimate method under the equal variance scenario were lower than those under the unequal variance scenario. The statistical power of the method was lower under the equal variance scenario.

4.3. Comparison of Results of the Methods Correcting for Publication Bias

4.3.1. AMSE

For the mean difference, the lowest AMSE was seen for the limit meta-analysis method when m = 30 . The highest AMSE was observed for the PET-PEESE method. As for Cohen’s d and Hedges’ g methods, the p-uniform had the lowest AMSE followed by the Copas and Trim and Fill methods. The highest AMSE was found for the PET-PEESE and the limit meta-analysis methods, respectively.

4.3.2. Bias

For the mean difference, the PET PEESE was almost always negatively biased but had the lowest bias when disregarding the sign. The p-uniform method had the highest bias among all methods. The Copas, Trim and Fill and random-effects model using the DerSimonian–Laird variance component had comparable biases. As for Cohen’s d and Hedges’ g methods, the limit meta-analysis method was generally positively biased, and its bias became smaller when m = 30 . In addition, the limit meta-analysis method had a lower absolute bias than the PET PEESE. The Copas method was positively biased but had a relatively small bias. The Trim and Fill and the random-effects meta-analysis methods were positively biased and had comparable biases. The p-uniform method was also positively biased but had the biggest bias among all the methods.

4.3.3. Coverage Probability

The PET PEESE usually had a coverage probability below the nominal value, except in the case of m = 10 and n i = 15 . As mentioned above, the limit meta-analysis had a liberal coverage probability when m = 10 but a slightly conservative coverage probability when m = 30 . The Copas and the Trim and Fill methods had comparable coverage probability. Both methods clearly had a lower than nominal coverage probability, and their coverage probabilities decreased when n i = 30. The p-uniform method and the random-effects meta-analysis methods had the lowest coverage probability.

5. Discussion

The aim of this article was to compare the performance of five methods adjusting for publication bias. The five methods were Copas, p-uniform, PET-PEESE, Trim and Fill, and limit meta-analysis methods. In addition, the random-effects meta-analysis method based on the DerSimonian–Laird estimate was also included. This study focused on continuous outcomes and investigated the adjustment methods using the mean difference, Cohen’s d, and Hedges’ g. The performance of the adjustment methods were compared using a case study and a simulation study. The simulation settings included different scenarios with different number of studies, different number of individuals per study, treatment effect homogeneity, and different levels of treatment effect heterogeneity. In addition, the two scenarios of equal and unequal variances for the treatment and control groups were also investigated. The performance of the adjustment methods was investigated by calculating the average mean squared error, the bias, and the coverage probability.
The weak performance of the random-effects meta-analysis model had been reported earlier in the literature []. The PET-PEESE method performed slightly worse as the treatment effect heterogeneity increased, a result also stated earlier in the literature []. The p-uniform method overestimated the treatment effect, and its performance deteriorated as the treatment effect heterogeneity increased, a result also presented earlier in the literature []. The limit meta-analysis was too conservative compared to the other methods investigated. This is probably the result of the equivalence of the method to a regression analysis and the fact that the limits are basically prediction limits. These prediction limits not only take the precision of the mean value into account but also the precision of the between-study variance component and that of the parameter representing publication bias.
Copas’ method was compared to the Trim and Fill method using the odds ratio. The Copas method was preferred since it produced smaller standard errors []. Our analysis for Cohen’s d and Hedges’ g also showed that the Copas method had smaller bias and higher coverage probabilities than the Trim and Fill method. However, the Copas method performed worse as treatment effect heterogeneity increased.
Regarding the mean difference, all methods performed worse in the unequal variance scenario, with the exception of the PET-PEESE and the p-uniform methods. This can be explained by the fact that PET-PEESE applies weighted regression, taking heteroscedasticity into account. No direct explanation could be found for the robust performance of the p-uniform method under heteroscedasticity.
In general, no clear difference in performance was observed between Cohen’s d and Hedges’ g for all methods under the different scenarios. This can be explained by the similarity of both methods, as both assume equal variances and use a pooled variance. Both Cohen’s d and Hedges’ g performed better under the equal variance scenario for all methods, with the exception of the PET-PEESE method. The robustness of the PET-PEESE method in the case of heteroscedasticity could be explained by the fact that both estimates are based on weighted regression models, which take heteroscedasticity into account.

6. Conclusions

Disregarding the direction of the estimation bias, the Copas and PET-PEESE methods were the least biased methods. However, the Copas method, like all likelihood-based methods, can have convergence problems, rendering the method less applicable to meta-analysts. This disadvantage of the Copas method is an advantage to the PET-PEESE. In addition, the PET-PEESE method is robust in the case of unequal variances for the treatment and control arms. For these two reasons, meta-analysts are advised to apply the PET-PEESE method. Given the comparable performance of Cohen’s d and Hedges’ g, practitioners could be encouraged to use either measure for the meta analysis.

7. Study Limitations

There are many selection models in the literature introduced to correct for publication bias [,,,]. To include all these methods was beyond the scope of this study. In addition, no Bayesian methods have been included in this study, while these have been applied to correct for publication bias [].

Highlights

Comparative studies for publication bias adjustment methods are available but limited. This study provides a broader comparative study focusing on continuous outcomes. Impact: recommendations for meta-analysts on which method to apply based on the least bias, among other factors.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13213487/s1, Table S1: An overview of the AMSE, bias and coverage probabilities for the Copas method under unequal variance assumption for the different simulation settings; Table S2: An overview of the AMSE, bias and coverage probabilities for the puniformmethod under unequal variance assumption for the different simulation settings; Table S3: An overview of the AMSE, bias and coverage probabilities for the Trim & Fill method under unequal variance assumption for the different simulation settings; Table S4: An overview of the AMSE, bias and coverage probabilities for the limit meta-analysis method under unequal variance assumption for the different simulation settings; Table S5: An overview of the AMSE, bias and coverage probabilities for the random effects model under unequal variance assumption for the different simulation settings.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Hedges, L.V. Modeling publication selection effects in meta-analysis. Stat. Sci. 1992, 7, 246–255. [Google Scholar] [CrossRef]
  2. Duval, S.; Tweedie, R. A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. J. Am. Stat. Assoc. 2000, 95, 89–98. [Google Scholar] [PubMed]
  3. Duval, S.; Tweedie, R. Trim and Fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000, 56, 455–463. [Google Scholar] [CrossRef]
  4. Terrin, N.; Schmid, C.H.; Lau, J.; Olkin, I. Adjusting for publication bias in the presence of heterogeneity. Stat. Med. 2003, 22, 2113–2126. [Google Scholar] [CrossRef]
  5. Peters, J.L.; Sutton, A.J.; Jones, D.R.; Abrams, K.R.; Rushton, L. Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity. Stat. Med. 2007, 10, 4544–4562. [Google Scholar] [CrossRef] [PubMed]
  6. Hedges, L.V. Distribution theory for Glass’s estimator of effect size and related estimators. J. Educ. Stat. 1981, 6, 107–128. [Google Scholar] [CrossRef]
  7. Borenstein, M. Effect sizes for continuous data. Effect sizes for dichtomous data. In The Handbook of Research Synthesis and Meta-Analysis, 2nd ed.; Cooper, H., Hedges, L.V., Valentine, J.C., Eds.; Russell Sage Foundation: New York, NY, USA, 2009; pp. 221–235. [Google Scholar]
  8. Iyengar, S.; Greenhouse, J.B. Selection models and the file drawer problem. Stat. Sci. 1988, 3, 109–117. [Google Scholar] [CrossRef]
  9. Vevea, J.L.; Hedges, L.V. A general linear model for estimating effect size in the presence of publication bias. Psychometrika 1995, 60, 419–435. [Google Scholar] [CrossRef]
  10. Vevea, J.L.; Woods, C.M. Publication Bias in Research Synthesis: Sensitivity Analysis Using A Priori Weight Functions. Psychol. Methods 2005, 10, 428–443. [Google Scholar] [CrossRef]
  11. Dear, K.B.; Begg, B. An approach for assessing publication bias prior to performing a meta-analysis. Stat. Sci. 1992, 7, 237–245. [Google Scholar] [CrossRef]
  12. Copas, J.; Shi, J.Q. Meta-analysis, funnel plots and sensitivity analysis. Biostatistics 2000, 1, 247–262. [Google Scholar] [CrossRef] [PubMed]
  13. Copas, J.; Shi, J.Q. A sensitivity analysis for publication bias in systematic reviews. Stat. Methods Med. Res. 2000, 10, 251–265. [Google Scholar] [CrossRef] [PubMed]
  14. Simonsohn, U.; Nelson, L.D.; Simmons, J.P. p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results. Perspect. Psychol. Sci. 2014, 9, 666–681. [Google Scholar] [CrossRef] [PubMed]
  15. Simonsohn, U.; Nelson, L.D.; Simmons, J.P. P-curve: A Key to The File Drawer. J. Exp. Psychol. Gen. 2014, 143, 534–547. [Google Scholar] [CrossRef]
  16. van Assen, M.A.; van Aert, R.; Wicherts, J.M. Meta-analysis using effect size distributions of only statistically significant studies. Psychol. Methods 2015, 20, 293. [Google Scholar] [CrossRef]
  17. Hedges, L.V.; Vevea, J.L. Selection method approaches. In Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments; Rothstein, H.R., Sutton, A.J., Borenstein, M., Eds.; John Wiley & Sons: Chichester, UK, 2005; pp. 145–174. [Google Scholar]
  18. Jin, Z.C.; Zhou, X.-H.; He, J. Statistical methods for dealing with publication bias in meta-analysis. Stat. Med. 2015, 34, 343–360. [Google Scholar] [CrossRef]
  19. McShane, B.B.; Böckenholt, U.; Hansen, K.T. Adjusting for Publication Bias in Meta-Analysis: An Evaluation of Selection Methods and Some Cautionary Notes. Perspect. Psychol. Sci. 2016, 11, 730–749. [Google Scholar] [CrossRef]
  20. Sutton, A.J.; Song, F.; Gilbody, S.M.; Abrams, K.R. Modelling publication bias in meta-analysis: A review. Stat. Methods Med. Res. 2000, 9, 421–445. [Google Scholar] [CrossRef]
  21. Moreno, S.G.; Sutton, A.J.; Ades, A.E.; Cooper, N.J.; Abrams, K.R. Adjusting for publication biases across similar interventions performed well when compared with gold standard data. J. Clin. Epidemiol. 2011, 64, 1230–1241. [Google Scholar] [CrossRef]
  22. Stanley, T.D. Meta-Regression Methods for Detecting and Estimating Empirical Effects in the Presence of Publication Selection. Oxf. Bull. Econ. Stat. 2008, 70, 103–127. [Google Scholar] [CrossRef]
  23. Stanley, T.D.; Doucouliagos, H. Meta-regression approximations to reduce publication selection bias. Res. Synth. Methods 2013, 5, 60–78. [Google Scholar] [CrossRef]
  24. Alinaghi, N.; Reed, W.R. Meta-analysis and publication bias: How well does the FAT-PET-PEESE procedure work? Res. Synth. Methods 2018, 9, 285–311. [Google Scholar] [CrossRef] [PubMed]
  25. Rücker, G.; Schwarzer, G.; Carpenter, J.R.; Binder, H.; Schumacher, M. Treatment-effect estimates adjusted for small-study effects via a limit meta-analysis. Biostatistics 2011, 12, 122–142. [Google Scholar] [CrossRef] [PubMed]
  26. Schwarzer, G.; Carpenter, J.; Rücker, G. Empirical evaluation suggests Copas selection model preferable to trim-and-fill method for selection bias in meta-analysis. J. Clin. Epidemiol. 2010, 63, 282–288. [Google Scholar] [CrossRef]
  27. Reed, W.R. A Monte Carlo Analysis of Alternative Meta- Analysis Estimators in the Presence of Publication Bias. Economics 2015, 9, 20150030. [Google Scholar] [CrossRef]
  28. Carter, E.C.; Schönbrodt, F.D.; Gervais, W.M.; Hilgard, J. Correcting for bias in psychology: A comparison of meta-analytic methods. Adv. Methods Pract. Psychol. Sci. 2019, 2, 115–144. [Google Scholar] [CrossRef]
  29. Hong, S.; Reed, W.R. Using Monte Carlo experiments to select meta-analytic estimators. Res. Synth. Methods 2021, 12, 192–215. [Google Scholar] [CrossRef]
  30. Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef]
  31. van Aert, R.C.M. Meta-Analysis Methods Correcting for Publication Bias. 2021. R Version 0.2.8. Available online: https://github.com/RobbievanAert/puniform/issues (accessed on 1 September 2023).
  32. Viechtbauer, W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw. 2010, 36, 1–48. [Google Scholar] [CrossRef]
  33. Schwarzer, G.; Carpenter, J.R.; Rucker, G. Metasens: Statistical Methods for Sensitivity Analysis in Meta-Analysis. 2025. R Package Version 1.5-3. Available online: https://github.com/guido-s/metasens (accessed on 10 September 2023).
  34. Ackerman, J.M.; Nocera, C.C.; Bargh, J.A. Incidental haptic sensations influence social judgments and decisions. Science 2010, 328, 1712–1715. [Google Scholar] [CrossRef]
  35. Rabelo, A.L.A.; Keller, V.N.; Pilati, R.; Wicherts, J.M. No Effect of Weight on Judgments of Importance in the Moral Domain and Evidence of Publication Bias from a Meta-Analysis. PLoS ONE 2015, 10, e0134808. [Google Scholar] [CrossRef]
  36. Francis, G.; Tanzman, J.; Matthews, W.J. Excess success for psychology articles in the journal Science. PLoS ONE 2014, 9, e114255. [Google Scholar] [CrossRef] [PubMed]
  37. Ning, J.; Chen, Y.; Piao, J. Maximum likelihood estimation and EM algorithm of copas-like selection model for publication bias correction. Biostatistics 2017, 18, 495–504. [Google Scholar] [CrossRef] [PubMed]
  38. Friedrich, J.O.; Adhikari, N.K.J.; Beyene, J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: A simulation study. BMC Med. Res. Methodol. 2008, 8, 32. [Google Scholar] [CrossRef] [PubMed]
  39. Morris, T.P.; White, R.I.; Crowther, M.J. Using simulation studies to evaluate statistical methods. Stat. Med. 2019, 38, 2074–2102. [Google Scholar] [CrossRef]
  40. Satterthwaite, F.E. An approximate distribution of estimates of variance components. Biom. Bull. 1946, 2, 110–114. [Google Scholar] [CrossRef]
  41. Moscati, R.; Jehle, D.; Ellis, D.; Fiorello, A.; Landi, M. Positive-outcome bias: Comparison of emergency medicine and general medicine literatures. Acad. Emerg. Med. 1994, 1, 267–271. [Google Scholar] [CrossRef]
  42. Bartoš, F.; Maier, M.; Wagenmakers, E.J.; Doucouliagos, H.; Stanley, T.D. Robust Bayesian meta-analysis: Model-averaging across complementary publication bias adjustment methods. Res. Synth. Methods 2023, 14, 99–116. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.