We propose a variety of shrinkage estimators for simultaneously estimating individual means$\left({\mu}_{1},\dots ,{\mu}_{G}\right)$, and discuss their properties. In particular, we theoretically prove that the proposed estimators have better precision than the individual studies’ estimates $\left({y}_{1},\dots ,{y}_{G}\right)$ in terms of a mean squared error criterion under some conditions.
Indeed, there are infinitely many estimators
$\mathit{\delta}\left(\mathit{Y}\right)$ that improve upon
$\mathit{Y}$, including very complex and unrealistic ones [
25]. In addition, it is quite easy to find an estimator that locally improves upon
$\mathit{Y}$, such as
$\mathit{\delta}\left(\mathit{Y}\right)=\mathbf{0}$. While the problem of deriving/assessing estimators has been intensively discussed in the statistical decision theory, it has rarely been appreciated in meta-analytical settings. The goal of this article is to facilitate the applications of shrinkage estimators in the context of meta-analyses.
In the subsequent sections, we will introduce three estimators that help reduce the WMSE and TMSE by shrinking
$\mathit{Y}$ toward a restricted space of
$\left({\mu}_{1},\dots ,{\mu}_{G}\right)$.
Section 3.1 will discuss the shrinkage towards the zero vector
$\mathbf{0}=\left\{\left({\mu}_{1},\dots ,{\mu}_{G}\right):{\mu}_{1}=\cdots ={\mu}_{G}=0\right\}$, the most traditional shrinkage scheme.
Section 3.2 will consider the shrinkage toward
$\mathbf{0}$ under constraints
$\left\{\left({\mu}_{1},\dots ,{\mu}_{G}\right):{\mu}_{1}\le \dots \le {\mu}_{G}\right\}$.
Section 3.3 will explore the shrinkage towards the
sparse space
$\{\left({\mu}_{1},\dots ,{\mu}_{G}\right):{{\displaystyle \sum}}_{i=1}^{G}I\left({\mu}_{i}\ne 0\right)<G\}$.
3.1. Shrinkage Estimation for Means
To estimate
$\mathit{\mu}=\left({\mu}_{1},\dots ,{\mu}_{G}\right)$, we propose the James–Stein (JS) estimator of the form
This estimator is a modification of the original JS estimator [
29] that was derived under the unit variances (
${\sigma}_{i}=1$ for
$\forall i$). See
Appendix A.1 for the details. The JS estimator reduces variance by shrinking the vector
$\mathit{Y}$ toward
$\mathbf{0}$ while it produces bias. The degree of shrinkage is determined by the factor
$\left(G-2\right)/({{\displaystyle \sum}}_{i=1}^{G}{Y}_{i}^{2}/{\sigma}_{i}^{2})$ that usually ranges from 0 (0% shrinkage) to 1 (100% shrinkage), and occasionally becomes greater than 1 (overshrinkage).
It can be shown in
Appendix A.1 that
${\mathit{\delta}}^{\mathrm{JS}}$ has the following WMSE
where
${\chi}_{G}^{2}\left(\lambda \right)$ is a random variable having a noncentral
${\chi}_{}^{2}$-distribution with the noncentral parameter
$\lambda ={{\displaystyle \sum}}_{i=1}^{G}{\mu}_{i}^{2}/{\sigma}_{i}^{2}$ and the degrees of freedom
$G$. Thus,
${\mathit{\delta}}^{\mathrm{JS}}$ has a smaller WMSE than
$\mathit{Y}$ when
$G\ge 3$. Indeed, the WMSE is minimized at
$\mathit{\mu}=\mathbf{0}$ at which the WMSE is
$G-1/\left(G-2\right)$ by the inverse moment of the central
${\chi}_{}^{2}$-distribution with
$\lambda =0$. Thus, the JS estimator gains the greatest advantage if all the individual means are zero. This gain is appealing for meta-analyses for small individual effects (true means close to zero). Even if
$\mathit{\mu}\ne \mathbf{0}$, the JS estimator has a smaller WMSE than
$\mathit{Y}$. The reduction of the WMSE diminishes as
$\lambda $ departs from zero.
One might ask where the special formula of the JS estimator comes from. The JS estimator
${\mathit{\delta}}^{\mathrm{JS}}$ can be derived as an empirical Bayes estimator under the prior
${\mu}_{i}~N\left(0,{\sigma}_{i}^{2}{\tau}^{2}\right)$:
where the shrinkage factor
$\widehat{\left(1/\left(1+{\tau}^{2}\right)\right)}\equiv \left(G-2\right)/{{\displaystyle \sum}}_{i=1}^{G}{Y}_{i}^{2}/{\sigma}_{i}^{2}$ is the estimator of
$1/\left(1+{\tau}^{2}\right)$. See
Appendix A.2 for the detailed derivations. Thus, if
${\mu}_{i}~N\left(0,{\sigma}_{i}^{2}{\tau}^{2}\right)$, the JS estimator minimizes the Bayes risk, and hence, is the optimal estimator under the prior.
A minor modification to the JS estimator can reduce the WMSE further. The modification is made in order to avoid the effect of an overshrinkage,
$\left(G-2\right)/({{\displaystyle \sum}}_{i=1}^{G}{Y}_{i}^{2}/{\sigma}_{i}^{2})>1$, by which all the signs of
$\mathit{Y}$ are reverted. The overshrinkage phenomenon occurs with a small probability, and therefore, the modification is minor in the majority of cases. A modified estimator is the
positive-part JS estimator
where
${(.)}^{+}\equiv \mathrm{max}\left(0,.\right)$. Consequently,
${\mathit{\delta}}^{\mathrm{JS}+}$ has a smaller WMSE than
${\mathit{\delta}}^{\mathrm{JS}}$ (p. 356, Theorem 5.4 of [
25].
In summary, this subsection proposes two estimators (${\mathit{\delta}}^{\mathrm{JS}}$ and ${\mathit{\delta}}^{\mathrm{JS}+}$) that improve upon the standard unbiased estimator $\mathit{Y}$.
3.2. Estimation under Ordered Means
Our next proposal is a shrinkage estimator under ordered means. We consider the case where the ordering constraints ${\mu}_{1}\le \dots \le {\mu}_{G}$ are known by the study design. Thus, the parameter $\mathit{\mu}$ is known to be restricted on the space $\left\{\left({\mu}_{1},\dots ,{\mu}_{G}\right):{\mu}_{1}\le \dots \le {\mu}_{G}\right\}$. For instance, suppose that $i\left(=1,2,\dots ,G\right)$ represents the time index ($i=1$ for the oldest study, and $i=G$ for the newest study) at which a treatment effect ${\mu}_{i}$ is estimated. Then, one may assume a trend ${\mu}_{1}\le \dots \le {\mu}_{G}$ due to the improvements of treatments over time.
For instance, the true means may be $\left({\mu}_{1},\dots ,{\mu}_{5}\right)=\left(-2,-1,0,1,2\right)$. This trend could be modeled by a meta-regression with ${\mu}_{i}=a+bi$, where values $a$ and $b$ are unknown. In practice, one does not know any structure of the means (e.g., linear regression) except for ${\mu}_{1}\le \dots \le {\mu}_{5}$. If some knowledge, such as a linear model on covariates, is true, one could use meta-regression. However, we do not adopt any model, permitting various non-linear settings such as $\left(-2,-2,0,0,0\right)$ and $\left(-2,-1,4,4,5\right)$.
The use of the standard unbiased estimator
$\mathit{Y}=\left({Y}_{1},\dots ,{Y}_{G}\right)$ is not desirable under the ordering constraints. Due to random variations, the estimator
$\mathit{Y}=\left({Y}_{1},\dots ,{Y}_{G}\right)$ can be outside the parameter space, namely,
$\mathit{Y}\notin \left\{\left({\mu}_{1},\dots ,{\mu}_{G}\right):{\mu}_{1}\le \dots \le {\mu}_{G}\right\}$. Under this setting, an estimator accounting for the parameter restriction improves upon the unrestricted estimator
$\mathit{Y}$, even though the former is a biased estimator [
30].
The restricted maximum likelihood (RML) estimator satisfying
${\delta}_{1}\le \dots \le {\delta}_{G}$ is calculated by the pool-adjacent-violators algorithm (PAVA)
This gives the RML estimator
${\mathit{\delta}}^{\mathrm{RML}}\equiv \left({\delta}_{1}^{\mathrm{RML}},\dots ,{\delta}_{G}^{\mathrm{RML}}\right)$, which has a smaller WMSE than
$\mathit{Y}$. For an example of
$G=3$, one has the data of
$\left({Y}_{1},{Y}_{2},{Y}_{3}\right)$, and the PAVA results in
Hence,
${\delta}_{i}^{\mathrm{RML}}$ is equal to
${Y}_{i}$ itself or an average including
${Y}_{i}$. Of course,
${\delta}_{i}^{\mathrm{RML}}={Y}_{i}\forall i$ if
${Y}_{1}\le \dots \le {Y}_{G}$. For theories and applications of the PAVA, we refer to [
31,
32,
33]. The max min formula written above can be found in Chapter 8 of [
30] or [
34].
It is clear that
${\mathit{\delta}}^{\mathrm{RML}}$ is different from order statistics
${Y}_{\left(1\right)}\le \dots \le {Y}_{\left(G\right)}$ that are a permutation of
$\left({Y}_{1},\dots ,{Y}_{G}\right)$ and also improve the WMSE in some cases [
34]. However, the permuted estimator loses the information of individual studies’ identifications, and therefore, is not considered in this article.
Below, we further improve
${\mathit{\delta}}^{\mathrm{RML}}$ with the aid of the JS estimator. Let
$I(.)$ be the indicator function;
$I\left(A\right)=1$ or
$I\left(A\right)=0$ if
$A$ is true or false, respectively. We adjust the estimator of Chang [
35] who proposed the JS-type estimator under the order restriction as follows:
where “RJS” stands for “Restricted JS”. Note that
${\mathit{\delta}}^{\mathrm{RJS}}$ has a smaller WMSE than
${\mathit{\delta}}^{\mathrm{RML}}$, and hence, the former improves upon the latter [
35].
We further improve the RJS estimator by the positive-part RJS estimator given by
Consequently,
${\mathit{\delta}}^{\mathrm{RJS}+}$ has a smaller WMSE than
${\mathit{\delta}}^{\mathrm{RJS}}$ (Theorem 5.4 of Lehmann and Casella [
25]). Note that, if
${Y}_{1}\le \dots \le {Y}_{G}$ is not satisfied, then
${\mathit{\delta}}^{\mathrm{RML}}={\mathit{\delta}}^{\mathrm{RJS}}={\mathit{\delta}}^{\mathrm{RJS}+}$.
In summary, this subsection proposes three estimators (${\mathit{\delta}}^{\mathrm{RML}}$, ${\mathit{\delta}}^{\mathrm{RJS}}$, and ${\mathit{\delta}}^{\mathrm{RJS}+}$) that improve upon the standard unbiased estimator $\mathit{Y}$ under ${\mu}_{1}\le \dots \le {\mu}_{G}$.
3.3. Estimation under Sparse Means
Our third proposal is a shrinkage estimator under
sparse normal means where most of the
${\mu}_{i}$s are zero [
27,
28]. The vector
$\left({\mu}_{1},\dots ,{\mu}_{G}\right)$ is called
sparse if the number
$\sum}_{i=1}^{G}I\left({\mu}_{i}\ne 0\right)$ is much smaller than
$G$, e.g.,
$\left({\mu}_{1},\dots ,{\mu}_{10}\right)=\left(-5,0,0,0,0,0,0,0,0,5\right)$. In practice, one does not know which components are zeros, and how many components are zero. Nonetheless, one could assume that many of
$\left({\mu}_{1},\dots ,{\mu}_{G}\right)$ are zero. However, the elements of
$\left({Y}_{1},\dots ,{Y}_{G}\right)$ are almost always nonzero, which disagree with the true values
$\left({\mu}_{1},\dots ,{\mu}_{G}\right)$.
Under the sparse means, it is quite reasonable to estimate
${\mu}_{i}$ as exactly zero if
${Y}_{i}$ is close to zero. Accordingly, one can use a thresholding estimator
${Y}_{i}I(\left|{Y}_{i}\right|>{c}_{i})$ for a critical value
${c}_{i}>0$. The idea was proposed by Bancroft [
36] who formulated
pretest estimators that incorporate a preliminary hypothesis test into estimation. Judge and Bock [
37] extensively studied pretest estimators with applications to econometrics; see also more recent works [
38,
39,
40,
41,
42,
43]. Among all, we particularly note that Shih et al. [
41] proposed the
general pretest (GPT) estimator that includes empirical Bayes and Types I-II shrinkage pretest estimators for the univariate normal mean.
We modify the GPT estimator to be adopted to meta-analyses as follows:
for
$0\le {\alpha}_{1}\le {\alpha}_{2}\le 1$,
$q:\left(-\infty ,\infty \right)\mapsto \left(0,1\right)$, and
${z}_{p}$ is the upper
p-th quantile of
$N\left(0,1\right)$ for
$0<p<1$. To implement the GPT estimator, the values of
${\alpha}_{1}$ and
${\alpha}_{2}$, and the probability function
$q(.)$ must be chosen. They cannot be chosen to minimize the WMSE and TMSE criteria since pretest estimators do not permit tractable forms of MSEs [
38,
41]. Fortunately, for any value of
${\alpha}_{1}$ and
${\alpha}_{2}$, and a function
$q$, one can show that
${\mathit{\delta}}^{\mathrm{GPT}}\equiv \left({\delta}_{1}^{\mathrm{GPT}},\dots ,{\delta}_{G}^{\mathrm{GPT}}\right)$ has smaller WMSE and TMSE values than
$\mathit{Y}$ provided
$\mathit{\mu}\approx \mathbf{0}$; see
Appendix A.3 for the proof.
For the above reasons, we apply statistically interpretable choices of
${\alpha}_{1}$,
${\alpha}_{2}$, and
$q(.)$. The special case of
${\alpha}_{1}={\alpha}_{2}=\alpha =0.025$ leads to the usual pretest estimator
for which
$q(.)$ is arbitrary. Thus, we retain
${Y}_{i}$ if
${H}_{0}:{\mu}_{i}=0$ is rejected in favor of
${H}_{1}:{\mu}_{i}\ne 0$ at the 5% level. Otherwise, we discard
${Y}_{i}$, and conclude
${\mu}_{i}=0$.
For the GPT estimator, we set
$q\left(z\right)=1/2$ (50% shrinkage) as suggested by Shih et al. [
41]. To facilitate the interpretability of pretests, we chose
${\alpha}_{1}=0.025$ (5% level) and
${\alpha}_{2}=0.05$ (10% level). Thus, we propose the estimator
Thus, if ${H}_{0}:{\mu}_{i}=0$ is rejected at the 5% level, we set ${\delta}_{i}^{\mathrm{GPT}}={Y}_{i}$. If ${H}_{0}:{\mu}_{i}=0$ is accepted at the 5% level, but rejected at the 10% level, we set ${\delta}_{i}^{\mathrm{GPT}}={Y}_{i}/2$. If ${H}_{0}:{\mu}_{i}=0$ is accepted at the 10% level, we set ${\delta}_{i}^{\mathrm{GPT}}=0$. Thus, ${\delta}_{i}^{\mathrm{GPT}}$ gives a weaker shrinkage than ${\delta}_{i}^{\mathrm{PT}}$ does. Obviously, we obtain a relationship $\left|{\delta}_{i}^{\mathrm{PT}}\right|\le \left|{\delta}_{i}^{\mathrm{GPT}}\right|\le \left|{Y}_{i}\right|$.
The GPT estimator introduced above is not an empirical Bayesian estimator. If
${\alpha}_{1}=0$,
${\alpha}_{2}=1$, and
$q\left(y\right)=1-{\sigma}_{i}^{2}/\mathrm{max}\left\{{\sigma}_{i}^{2},{y}^{2}\right\}$ were chosen, the resultant GPT estimator would be an empirical Bayes estimator [
41]. However, we do not consider this estimator in our analysis.
In summary, this subsection proposes two estimators (${\mathit{\delta}}^{\mathrm{PT}}$ and ${\mathit{\delta}}^{\mathrm{GPT}}$) that improve upon the standard unbiased estimator $\mathit{Y}$ provided $\mathit{\mu}\approx \mathbf{0}$.