A Meta-Analysis for Simultaneously Estimating Individual Means with Shrinkage, Isotonic Regression and Pretests

Meta-analyses combine the estimators of individual means to estimate the common mean of a population. However, the common mean could be undefined or uninformative in some scenarios where individual means are “ordered” or “sparse”. Hence, assessments of individual means become relevant, rather than the common mean. In this article, we propose simultaneous estimation of individual means using the James–Stein shrinkage estimators, which improve upon individual studies’ estimators. We also propose isotonic regression estimators for ordered means, and pretest estimators for sparse means. We provide theoretical explanations and simulation results demonstrating the superiority of the proposed estimators over the individual studies’ estimators. The proposed methods are illustrated by two datasets: one comes from gastric cancer patients and the other from COVID-19 patients.


Introduction
Meta-analysis is a statistical method used to summarize results from published studies [1]. While it has been applied to all areas of science, meta-analysis has been a particularly important tool in educational studies [2,3] and medical studies [4,5]. Especially, metaanalysis has played an important role in studies on the impact of COVID-19 [6][7][8].
The most basic method in meta-analyses uses a fixed-effect model, where all of the studies have a common mean [5,9]. This model has long been recognized in mathematical statistics and stratified sampling designs, where the common mean estimator has rigorously been investigated (pp. 55-103 of [10]; [11][12][13]).
Another basic model is the random-effects model [14], where the studies have different means. This model imposes normally distributed random-effects to account for betweenstudy deviations from the common mean [15]. In either model, the goal of meta-analyses is to estimate the common mean by combining the estimators of individual means.
However, in some scenarios, the common mean is undefined or uninformative. In these scenarios, the focus of meta-analyses can be on individual studies' means. While individual means are estimated by published studies, they could be improved by statistical techniques. For instance, Raudenbush and Bryk [16] pointed out the possibility for improving individual studies' estimators in meta-analyses by using empirical Bayes estimators under the normal random-effects. Schmid [17] also considered similar Bayes estimators that improve upon individual studies' estimators. Röver and Friede [18,19] tried to improve an individual estimator by another individual estimator. All these Bayes estimators are defined by posterior means under the normal priors (random-effects). Evidently, the posterior means are related to but different from the individual studies' estimates. However, there are some cases where the normal priors (random-effects) do not apply (Section 2.2). This motivates us to develop non-Bayesian estimation.
In this article, we consider some improved estimation methods of individual means without normal random-effects. First, we will explain how to employ shrinkage estimators to obtain improved estimators of individual means. Unlike the existing Bayes estimators, our frequentist shrinkage estimators do not involve any prior distribution. Next, we will provide theoretical and numerical results to show the superiority of the proposed estimators over the individual studies' estimators. Finally, we will analyze two datasets to illustrate the proposed methods.
This article is organized as follows. Section 2 provides a background including a review of meta-analyses. Section 3 proposes the estimators of individual means and studies their theoretical properties. Section 4 conducts simulation studies, and Section 5 analyzes real datasets. Section 6 concludes the article.

Meta-Analysis
This subsection reviews the basic models for meta-analysis under the normal distribution, including the fixed-effect model and random-effects model [1,5].
First, we introduce some mathematical notations and basic assumptions for metaanalyses. Let 0 be the number of studies ( stands for Groups) in a meta-analysis. For 1,2, … , , the i-th study provides an estimate for an unknown mean with a known variance 0. This implies that each study provides as the standard error (SE) of the estimate . Hence, in meta-analyses, the variance is known. We note that there could be a different setting, where the variance is unknown [20,21].
Next, we further assume the normal model ~ , . This model's special case, the fixed-effect model, imposes the common mean assumption. That is, ≡ ⋯ . The random-effects model imposes normally distributed means given by ~ , for 1,2, … , . In either case, the main goal of meta-analyses is to estimate by optimally combining s [1,5,9].
In a meta-analysis, observed data consist of ; 1,2, … , . An example is the twosample t-test for a continuous outcome, where is the (standardized) mean difference estimator from the i-th study, and is the pooled variance estimator [5]. Another example is the 2 2 table analysis for association between an event and a risk factor, where is the logarithm of the risk ratio (or odds ratio), and is estimated variance [5]. Another example is a survival analysis for censored data, where can be the logarithm of an estimate of relative risks from the Cox model. Without loss of generality, we assume that 0 corresponds to the null value.
Finally, we describe the main idea of meta-analyses. The idea is to optimally combine the results from the studies, based on either the fixed-effect model or random-effects model. The first model leads to the fixed-effect meta-analysis under the assumption ≡ ⋯ . The second model leads to the random-effects meta-analysis under the assumption ~ , . In either case, the target is to estimate the common (overall) mean by where the value of ̂ is an estimator from the data [5,14].

Does the Estimation of the Common Mean Make Sense?
Given the data of : 1,2, … , , one usually conducts the fixed-effect meta-analysis and/or random-effects meta-analysis. The Q test for homogeneity can help select between the fixed-effect and random-effects model [1,5]. If the fixed-effect model is rejected (i.e., the random-effects model is selected), one might explore a systematic source of heterogeneity by meta-regression. One might also test the goodness-of-fit of the normal random-effects model [22]. In any case, statistical inference for the common mean depends on some statistical models.
However, the above statistical models do not always fit the data at hand. If the studies have a monotonic-effect, e.g., ⋯ , the model is neither fixed nor random. Thus, there is no general way to estimate the common parameter unless some specific models are imposed. If a covariate is available to explain some structure among s, a meta-regression may be used (Chapter 20 of [1]). However, we do not wish to assume the presence of covariates to conduct meta-regression in this article.
While the common mean estimate can be an informative summary of individual estimates, this single value rarely describes the whole results. Thus, meta-analysis typically displays individual estimates , … , , providing an opportunity for learning the data. For instance, forest plots show the individual estimates , … , along with their 95% confidence intervals (CIs): , … , 1.96 , … , . Funnel plots show , … , against , … , ; see [1,5,23,24] for these plots. In some cases, one is interested in the that has the largest size [16]. Therefore, reporting individual estimates is an important part of meta-analysis.

Individual Estimates Could Be Improved
Despite the potential advantages of improving the individual estimators by the shrinkage methods [11,12,25,26], the development of such estimators has rarely been discussed in the literature of meta-analyses. In other words, the goal of meta-analyses has been the estimation of the common mean by disregarding the individual means. This causes a gap between meta-analyses and shrinkage estimation methods. Therefore, there is room for improving the standard individual estimators , … , for , … , in some scenarios. We raise scenarios where the common mean is undefined or uninformative. The first example is the case of the monotonic-effect, ⋯ , where the common mean seems to be not well-defined. In this case, an alternative is to perform metaregression with the aid of observable covariates. We however do not assume the availability of covariates in this article.
The second example is the case of sparse normal means [27,28], where most of s are equal to zero, e.g., , … , 5,0,0,0,0,0,0,0,0,5 . This is relevant to meta-analyses that fail to reject many null hypotheses. In this case, the interest may be in the estimation of the nonzero means rather than the common mean.

Proposed Estimators
We propose a variety of shrinkage estimators for simultaneously estimating individual means , … , , and discuss their properties. In particular, we theoretically prove that the proposed estimators have better precision than the individual studies' estimates , … , in terms of a mean squared error criterion under some conditions. The problem setting is as follows: Let ~ , be an estimator for an unknown mean under a known variance 0 for 1,2, … , . As mentioned in Section 2, the variance is known from the -th study. Observed data consist of ; 1,2, … , . We regard ≡ , … , as a simultaneous estimator of ≡ , … , . That is, we regard as an observable random vector to estimate the unknown vector .
We then call an improved estimator of . We call the standard unbiased estimator because of the unbiasedness . It is a benchmark estimator for comparing different estimators. Recall that is simply the collection of the individual studies' estimates. If "∀ , … , " in the criterion (2) holds only for a restricted parameter space, is only locally improved. The criteria (1) and (2) employ the weighted mean squared errors (WMSEs). The inverse variance weights guarantee the equal contributions of all studies to the estimation error. The weighs also make it convenient to apply the classical decision theory. The special case of ⋯ 1 gives the total MSE (TMSE), defined as ∑ [11,25,29]. We use the WMSEs for theoretical convenience while the TMSE criterion is also a relevant criterion. We will use both the WMSE and TMSE in our numerical assessments.
Indeed, there are infinitely many estimators that improve upon , including very complex and unrealistic ones [25]. In addition, it is quite easy to find an estimator that locally improves upon , such as . While the problem of deriving/assessing estimators has been intensively discussed in the statistical decision theory, it has rarely been appreciated in meta-analytical settings. The goal of this article is to facilitate the applications of shrinkage estimators in the context of meta-analyses.
In the subsequent sections, we will introduce three estimators that help reduce the WMSE and TMSE by shrinking toward a restricted space of , … , . Section 3.1 will discuss the shrinkage towards the zero vector , … , : ⋯ 0 , the most traditional shrinkage scheme. Section 3.2 will consider the shrinkage toward under constraints , … , : ⋯ . Section 3.3 will explore the shrinkage towards the sparse space , … , : ∑ 0 .

Shrinkage Estimation for Means
To estimate , … , , we propose the James-Stein (JS) estimator of the form This estimator is a modification of the original JS estimator [29] that was derived under the unit variances ( 1 for ∀ ). See Appendix A.1 for the details. The JS estimator reduces variance by shrinking the vector toward while it produces bias. The degree of shrinkage is determined by the factor 2 / ∑ / that usually ranges from 0 (0% shrinkage) to 1 (100% shrinkage), and occasionally becomes greater than 1 (overshrinkage).
It can be shown in Appendix A.1 that has the following WMSE where is a random variable having a noncentral -distribution with the noncentral parameter ∑ / and the degrees of freedom . Thus, has a smaller WMSE than when 3. Indeed, the WMSE is minimized at at which the WMSE is 1/ 2 by the inverse moment of the central -distribution with 0. Thus, the JS estimator gains the greatest advantage if all the individual means are zero. This gain is appealing for meta-analyses for small individual effects (true means close to zero). Even if , the JS estimator has a smaller WMSE than . The reduction of the WMSE diminishes as departs from zero.
One might ask where the special formula of the JS estimator comes from. The JS estimator can be derived as an empirical Bayes estimator under the prior ~ 0, : where the shrinkage factor 1/ 1 , the JS estimator minimizes the Bayes risk, and hence, is the optimal estimator under the prior.
A minor modification to the JS estimator can reduce the WMSE further. The modification is made in order to avoid the effect of an overshrinkage, 2 / ∑ / 1, by which all the signs of are reverted. The overshrinkage phenomenon occurs with a small probability, and therefore, the modification is minor in the majority of cases. A modified estimator is the positive-part JS estimator where . ≡ max 0, . . Consequently, has a smaller WMSE than (p. 356, Theorem 5.4 of [25]. In summary, this subsection proposes two estimators ( and ) that improve upon the standard unbiased estimator .

Estimation under Ordered Means
Our next proposal is a shrinkage estimator under ordered means. We consider the case where the ordering constraints ⋯ are known by the study design. Thus, the parameter is known to be restricted on the space , … , : ⋯ . For instance, suppose that 1,2, … , represents the time index ( 1 for the oldest study, and for the newest study) at which a treatment effect is estimated. Then, one may assume a trend ⋯ due to the improvements of treatments over time. For instance, the true means may be , … , 2, 1, 0, 1, 2 . This trend could be modeled by a meta-regression with , where values and are unknown. In practice, one does not know any structure of the means (e.g., linear regression) except for ⋯ . If some knowledge, such as a linear model on covariates, is true, one could use meta-regression. However, we do not adopt any model, permitting various nonlinear settings such as 2, 2, 0, 0,0 and 2, 1, 4, 4,5 . The use of the standard unbiased estimator , … , is not desirable under the ordering constraints. Due to random variations, the estimator , … , can be outside the parameter space, namely, ∉ , … , : ⋯ . Under this setting, an estimator accounting for the parameter restriction improves upon the unrestricted estimator , even though the former is a biased estimator [30]. The restricted maximum likelihood (RML) estimator satisfying ⋯ is calculated by the pool-adjacent-violators algorithm (PAVA) max min ∑

.
This gives the RML estimator ≡ , … , , which has a smaller WMSE than . For an example of 3, one has the data of , , , and the PAVA results in Hence, is equal to itself or an average including . Of course, ∀ if ⋯ . For theories and applications of the PAVA, we refer to [31][32][33]. The max min formula written above can be found in Chapter 8 of [30] or [34].
It is clear that is different from order statistics ⋯ that are a permutation of , … , and also improve the WMSE in some cases [34]. However, the permuted estimator loses the information of individual studies' identifications, and therefore, is not considered in this article.
Below, we further improve with the aid of the JS estimator. Let . be the indicator function; 1 or 0 if is true or false, respectively. We adjust the estimator of Chang [35] who proposed the JS-type estimator under the order restriction as follows: where "RJS" stands for "Restricted JS". Note that has a smaller WMSE than , and hence, the former improves upon the latter [35].
We further improve the RJS estimator by the positive-part RJS estimator given by Consequently, has a smaller WMSE than (Theorem 5.4 of Lehmann and Casella [25]). Note that, if ⋯ is not satisfied, then . In summary, this subsection proposes three estimators ( , , and ) that improve upon the standard unbiased estimator under ⋯ .

Estimation under Sparse Means
Our third proposal is a shrinkage estimator under sparse normal means where most of the s are zero [27,28]. The vector , … , is called sparse if the number ∑ 0 is much smaller than , e.g., , … , 5,0,0,0,0,0,0,0,0,5 . In practice, one does not know which components are zeros, and how many components are zero. Nonetheless, one could assume that many of , … , are zero. However, the elements of , … , are almost always nonzero, which disagree with the true values , … , . Under the sparse means, it is quite reasonable to estimate as exactly zero if is close to zero. Accordingly, one can use a thresholding estimator | | for a critical value 0. The idea was proposed by Bancroft [36] who formulated pretest estimators that incorporate a preliminary hypothesis test into estimation. Judge and Bock [37] extensively studied pretest estimators with applications to econometrics; see also more recent works [38][39][40][41][42][43]. Among all, we particularly note that Shih et al. [41] proposed the general pretest (GPT) estimator that includes empirical Bayes and Types I-II shrinkage pretest estimators for the univariate normal mean.
We modify the GPT estimator to be adopted to meta-analyses as follows: , 1,2, … , .
for 0 1, : ∞, ∞ ↦ 0,1 , and is the upper p-th quantile of 0,1 for 0 1. To implement the GPT estimator, the values of and , and the probability function . must be chosen. They cannot be chosen to minimize the WMSE and TMSE criteria since pretest estimators do not permit tractable forms of MSEs [38,41] For the GPT estimator, we set 1/2 (50% shrinkage) as suggested by Shih et al. [41]. To facilitate the interpretability of pretests, we chose 0.025 (5% level) and 0.05 (10% level). Thus, we propose the estimator Thus, if : 0 is rejected at the 5% level, we set . If : 0 is accepted at the 5% level, but rejected at the 10% level, we set /2. If : 0 is accepted at the 10% level, we set 0. Thus, gives a weaker shrinkage than does. Obviously, we obtain a relationship | |.
The GPT estimator introduced above is not an empirical Bayesian estimator. If 0 , 1 , and 1 /max , were chosen, the resultant GPT estimator would be an empirical Bayes estimator [41]. However, we do not consider this estimator in our analysis.
In summary, this subsection proposes two estimators ( and ) that improve upon the standard unbiased estimator provided .

Simulation
We conducted Monte Carlo simulations to examine the performance of the proposed estimators. The purpose of the simulations is to clarify how the proposed estimators improve upon the standard unbiased estimator in terms of the WMSE and TMSE, defined as We also examined the bias defined as, Bias ≡ , 1,2, … , .
For the known variances, we generated ~d f / 4 , restricted in the interval 0.009, 0.6 , as previously considered [15,44], leading to 0.173.   Tables 1-4 summarize the simulation results for comparing eight estimators: ,  ,  , , , , , and . The JS estimators ( and ) improved upon in all the scenarios (see the WMSE and TMSE in Tables 1-4). Scenario (b) produced the greatest reduction in the values of the WMSE and TMSE, where all the true means are equal to zero ( Table 2). This is exactly what the theoretical results imply (Section 3.1). The two estimators and showed similar performance except for Scenario (b), where exhibited smaller values for the WMSE and TMSE than ( Table 2). In summary, we observed a clear advantage of the two JS estimators ( and ) over the standard estimator ( ) in all the scenarios examined.

Simulation Results
The order-restricted estimators ( , and ) were even more advantageous than and according to the smallest values of the WMSE and TMSE in Tables 1 and 3. The substantially reduced values of the WMSE and TMSE were brought by the strict constraints by isotonic regression. We found that had the smallest value of the WMSE and TMSE under Scenarios (a) and (c) (Tables 1 and 3). However, we observed large and systematic biases of the order-restricted estimators, which are caused by isotonic regression.
The pretest estimator showed the best performance under Scenario (d) ( Table 4), followed by the general pretest estimator . However, the comparison between and depends on the scenarios: was better in Scenarios (b) and (d) while was better in Scenarios (a) and (c). Both and did not perform well in Scenarios (a) and (c), where the sparsity hypothesis : 0 fails. Naturally, the sparsity of individual means is essential for the pretest estimators to perform well.
In summary, our simulations have shown that the proposed estimators have their own advantageous scenarios: for Scenario (b); for Scenarios (a) and (c); or for Scenario (d). These proposed estimators substantially reduced the values of the WMSE and TMSE over the standard estimator . Furthermore, the proposed estimators , , , , and improved upon across all the scenarios. We therefore conclude that there are good reasons to apply the proposed estimators to estimate in order to improve the accuracy of estimation.

Data Analyses
We analyze two datasets to illustrate the proposed method. One dataset comes from gastric cancer patients and the other from COVID-19 pneumonia patients.

Gastric Cancer Data
We illustrate the proposed methods using a dataset from the Global Advanced/Adjuvant Stomach Tumor Research International Collaboration (GASTRIC) [45]. The data provide survival times for gastric cancer patients from 14 studies. One of the goals of this collaboration is to estimate the effects of chemotherapy on disease-free survival (DFS) for gastric cancer patients [46]. We used the data available in the R package surrosurv [47].
Let be the estimator for a Cox regression coefficient for a treatment indicator (1=treatment vs. 0=control) and let be the SE for 1, … ,14. Table 5 summarizes , , and sample sizes for the 14 studies. It is reasonable to assume the asymptotic normal approximation ~ , since the sample size of each study is large enough. Table 5 also shows the p-values for testing : 0 through Z-values / .  Table 5 shows that only two studies reached 5% significance. Thus, it is reasonable to assume 0 ∀ , and therefore, the JS estimators are suitable. It is also reasonable to assume 0 for most of s, and therefore, the PT and GPT estimators are suitable. We therefore calculated , , , and to estimate as shown in Table 6.   , and more accurately estimate , … , than do. The reason is that the data structure resembles Scenario (b), where and performed the best ( Table 2) or Scenario (d), where and performed the best (Table 4).

COVID-19 Data
We illustrate the proposed methods using data from Pranata et al., [7] who examined the effect of hypertension on mortality for patients with COVID-19 pneumonia. One of their conclusions was that hypertension increases mortality. We obtained mortality data from Pranata et al. [7], consisting of 11 published studies (Table 7).  11, be the log of the risk ratio (RR) calculated from two-by-two contingency tables examining the association (mortality vs. hypertension). Let be the SE for 1, … ,11. Table 7 summarizes the 11 studies. Table 7 also shows the p-values for testing the no association ( : 0) by Z-values / assuming ~ , . Before applying the proposed methods, we changed the order of the 11 studies by the percentages of male patients ( Table 8). The largest percentage (82%) gave the smallest RR ( 0.46) while the smallest percentage (45%) gave the largest RR ( 2.89). However, the values of s are not monotonically increasing ( Table 8). While Pranata et al. [7] regressed s on the percentages by meta-regression, we prefer not to assume a linear model. Alternatively, we considered the order-restricted estimators ( , , and ) by merely assuming that the true means are ordered by the percentages of male patients.  Table 8 shows the proposed estimators  ,  ,  , , and . The JS shrinkage estimators were 1 0.05 . They reduced only 5% of , and hence gave very weak shrinkage toward zero. This is because s are unlikely to be zero. The order-restricted estimator successfully enforced order restrictions, ⋯ , which was not achieved by . Since the order restrictions were not met for , we had by definition. Thus, the last three estimators can clearly incorporate the knowledge that hypertension-related mortality is monotonically influenced by gender without imposing a linear model.
In summary, it would be advantageous and informative to add the order restriction to individual studies' estimators on the association (mortality vs. hypertension). The resultant order-restricted estimators are expected to provide more accurate estimates than individual studies' estimators. The reason is that the data structure resembles Scenario (c), where , , and performed the best (Table 3).

s6. Conclusions and Future Extensions
We have shown that the proposed shrinkage estimators can be more precise than the standard unbiased estimators for estimating individual means in meta-analyses. Theoretical evidence (Section 3) and simulation studies (Section 4) identified the remarkable advantages of the proposed estimators under certain scenarios of the true individual means. However, no single estimator is the best for all the scenarios. Hence, it is important to choose a suitable estimator for each scenario. We have provided two data examples to demonstrate the scenarios under which the proposed estimators are beneficial (Section 5). However, the choice of estimators depends on the goal of research, and hence, it is difficult to set a purely statistical strategy to choose the estimator. What we have concluded in this article is that there is good potential for the proposed estimators by their careful applications to real meta-analyses.
Recall that the shrinkage effects on the general pretest estimators, , … , , are component-wise, and hence, do not involves the JS-type shrinkage as in Sections 3.1 and 3.2. In the presence of pretests, it is unclear how the JS-type shrinkage can be introduced. This issue is an interesting topic, yet it is beyond the scope of the article.
This article focused on the shrinkage toward zero because a meta-analysis is typically a collection of individual studies of weak effects. However, other shrinkage schemes can also be considered, such as the shrinkage toward an arbitrary mean vector, a truncated space, a polyhedral convex, or a positive halfplane [12,25,[48][49][50][51][52][53][54]. Different shrinkage schemes could be considered depending on the goals of meta-analyses. Therefore, there is good potential for shrinkage estimation methods to work with meta-analyses.
This article focuses on univariate response. One may consider extensions of this article to multiple (e.g., bivariate) responses. For instance, educational research may involve meta-analyses of bivariate test scores on verbal and mathematics tests [55], and mathematics and statistics tests [44]. In our data example of gastric cancer patients (Section 5.1), bivariate correlated endpoints are of great interest [55]. Correlations between responses should be employed to improve the efficiency and reduce the bias of univariate analyses [56][57][58][59][60][61][62][63][64][65] and to predict a primary outcome by the secondary outcome [66][67][68][69]. Multivariate shrinkage estimators of multivariate normal means, such as [70,71], can be considered for this extension.
One may also consider extensions of this article to non-normal (e.g., gamma distributed) responses. For non-normal distributions, shrinkage estimators exist, such as [72] for gamma distribution. Thus, introduction of shrinkage estimators to multivariate and/or non-normal models, and even their regression models, is a promising future direction.
Finally, we note that all meta-analyses are valid under the questionable assumption of no publication bias. Frequentist and Bayesian models for checking the publication biases are often considered [73,74]. It would be interesting to see if the proposed shrinkage methods can mitigate the effect of publication biases.