On Application of the Empirical Bayes Shrinkage in Epidemiological Settings

This paper aims to provide direct and indirect evidence on setting up rules for applications of the empirical Bayes shrinkage (EBS), and offers cautionary remarks concerning its applicability. In epidemiology, there is still a lack of relevant criteria in the application of EBS. The bias of the shrinkage estimator is investigated in terms of the sums of errors, squared errors and absolute errors, for both total and individual groups. The study reveals that assessing the underlying exchangeability assumption is important for appropriate use of EBS. The performance of EBS is indicated by a ratio statistic f of the between-group and within-group mean variances. If there are significant differences between the sample means, EBS is likely to produce erratic and even misleading information.


Introduction
There have been widespread interest in and applications of "shrinkage" estimators in epidemiology and demographic analysis for the purposes of smoothing spatial fluctuations, stabilizing estimates, and reducing sampling and non-sampling errors [1][2][3][4]. Prior researches have also demonstrated that the coefficient shrinkage is potentially useful for selection of epidemiological models and control of multiple confounders using modern hierarchical modeling techniques [5,6]. The term shrinkage refers to a statistical phenomenon that the posterior estimate of the prior mean is shifted from the sample mean towards the prior mean [7]. The Bayesian approach to the shrinkage estimation is to use the prior distribution and the likelihood (based on the data) to determine the posterior distribution. It has been regarded as empirical Bayes shrinkage (EBS), when there is no information for the prior, and the observed data are employed to postulate the prior distribution, assuming the sample means were drawn from the same population [8].
The shrinkage estimator was first proposed by Stein [9] in the 1950s as an alternative to the ordinary least squares (OLS) estimator i.e. the sample mean to produce smaller mean squared errors. In epidemiology, the EBS has been increasingly used for stabilizing disease incidence, prevalence and mortality estimates, as well as improving reliability of the estimates [10][11][12][13][14]. Although the underlying principles of the EBS estimator are still controversial [15][16][17], it is generally believed to provide an improvement over the OLS for reducing error risk in decision making [18]. Nevertheless, the EBS is subject to bias, error and arbitrary judgment [6]. Evidence also exists that this dedicated statistical technique has been misused without due considerations [15,19,20]. Recently, the Australian Bureau of Statistics applied the EBS estimator to adjust the Indigenous population estimates for Australian states and territories in an attempt to reduce standard errors, resulting in 9% and 4% reductions in the magnitude of population estimates for the states of Western Australia and Northern Territory respectively and increase of 9% for Victoria and Tasmania [21]. This methodology has substantial repercussions for Indigenous services funding allocation, and needs to be justified.
Dating back to Efron and Morris in the early 1970s, the high risk of EBS estimation has been recognized for individual parameters far from the mean of the prior distribution [22,23]. Since then, a series of improved Stein estimators have been developed to overcome the deficiency, including limited translation, positive-part and generalized Bayes estimators [e.g., [24][25][26], see [27] for a review of historical details. Another strategy to reduce the risk is estimation preceded by testing, known as preliminary-test estimator, to determine whether it is efficacious to shrink or not [28][29][30][31][32]. In epidemiological and demographic practice, these caveats appear to be largely overlooked.
In light of ongoing debate among mathematicians and statisticians on how to improve EBS and its applications, there is a lack of relevant criteria for assisting decision-making in the possible application of EBS in epidemiological settings. This paper provides empirical evidence on setting up rules for the EBS, and offers cautionary remarks concerning its applications. In the next section, the EBS is briefly reviewed and the problems concerning the EBS are specified. A statistic is proposed to determine its applicability and simulation studies are conducted to investigate and illustrate its properties. In particular, the nature of bias in the estimator is explored. Two illustrative examples are then presented, followed by discussions.

Empirical Bayes Estimator
Consider an ensemble of k group parameters to be estimated with n independent observations ) , , , . In analogy with [9], the EBS for j  is: is the overall sample (grand) mean,    n i ij j n y y 1 is the sample mean for group j and B is a shrinkage factor valued between 0 and 1 inclusive. Here, B = 0 represents that the sample means should not be 'shrunk' to the grand mean, whereas B = 1 indicates that the sample means should be fully 'shrunk' to, and replaced by the grand mean. Estimation of B is straightforward [33,34]: where n ( If the within-group mean variance is small relative to the between-group mean variance, the shrinkage factor will be small, and vice versa. An iterative estimating procedure has been developed for the unequal variance situation [34]. The EBS is believed to be an optimal combination of the sample mean and the grand mean, and increases reliability of the estimates because of its smaller sum of squared errors (SSE): The definition of risk by the quadratic lost function provides a useful means for risk minimization in decision making [35]. In the simulation study below, the bias (or accuracy) of the estimators will be evaluated in terms of the sum of errors (SE), defined as ) (  [36]. Because the task is to estimate j  , the performance of the estimator is assessed for each j  by: where with Q being the total number of simulations. If the SE j is close to zero, the bias is small for j  . Unlike SSE j and SAE j , the SE j can be either positive or negative.

Problem with the EBS Estimator
Two examples from the literature [33,34] suggested that the EBS method can produce smaller SSE than the sample mean, i.e., with the total number of observations N>n being finite. Referring to the basketball example [34], N is the total number of 82 games, n = 10 and j ỹ is the average score for the remainder 72 games. This opens up two questions. Firstly, what happens to the SSE if, instead of the remainder average, the total average j Y (the final score in the examples) is used, which is really the matter of concern. The use of j ỹ for the assessment standard j  in the SSE equation (4) is problematic, especially when N is Secondly, a small SSE does not necessarily reflect either good accuracy or high precision for all groups. This begs more questions: how are the errors distributed across groups and how will the EBS behave if SE and SAE criteria are adopted rather than SSE?

Simulation Study and Analysis of Variance
Simulation study uses computer intensive procedures to provide insights about the appropriateness and accuracy of a statistical method under particular assumptions [37]. The objectives of the simulations are (i) to see if the EBS generally outperforms the OLS; (ii) to investigate under what condition the EBS will perform better; and (iii) to explicitly demonstrate the discriminative feature of the EBS estimator in terms of bias for individual groups. These settings are devised to cover a wide range of possible combinations of differences between within-group and between-group variances. The OLS was chosen for comparison partly because of the ease of computation and partly because the OLS corresponds to the maximum likelihood estimator under a normal distribution, which is common in epidemiological settings. In the simulations, 2 y  is always estimated by 2 y  , even though 2  is known.
The performance of j x is then analysed using the ratio f of the between-group mean variance and the within-group mean variance: Suppose the posterior mean ) , ( . Dividing both the numerator and the denominator of f by 0 2   yields: as given by Everson [34], and: which further leads to the ratio statistic: This statistic is similar in spirit to Sclove, Morris and Radhakrishnan [29]. Note that the f statistic is inversely proportional to B.

Simulations
The number of replications Q is set to 1,000, which is considered sufficient (>500) for detecting a 0.02 permissible difference (one-fifth of the difference between the minimum group means), given the variance 0.25, n = 20, type I error 0.05 and the power 0.95 [37]. The first part of the study is to compare SSE and SAE of the EBS estimator with those of the OLS estimator. Note that SE is excluded The proportions of the 1,000 simulations for which SSE of the EBS estimator is smaller than its OLS counterpart are recorded in Table 1. The simulation results show that the EBS estimator can outperform the OLS estimator (proportion > 50%) when the parameter j  and the remainder average j ỹ are used for assessment when  2 is large and the differences between sample means are small ( j = j/10 or  j = j). The EBS estimator, however, performs slightly worse than the OLS estimator when the total mean j Y is used for assessment and n is large, and particularly when 2  is large.   The errors are next assessed for individual j  in the second part of the simulation study. The individual SE j , SSE j and SAE j analyses unveil some undesirable features of the EBS estimator. Table 2 shows that the EBS estimator has a positive bias for groups with sample means far below the grand mean, for example, j = 1. Meanwhile, the EBS estimator tends to have a negative bias for groups with sample means far above the grand mean, for example, j = 9. The EBS estimator introduces a statistical bias towards the grand mean, which is skewed against marginal values. This is clearly illustrated in the results of the simulations shown in Figure 1. Panel (a) of Figure 1 shows that the SE 1 for EBS estimate x 1 is skewed positively, the SE 5 for x 5 has a symmetric distribution, whereas x 9 is skewed negatively.
By comparison, panel (b) clearly indicates that regardless of the magnitude of the means, the distributions of SE j for all three OLS estimators 9 5 1 and , y y y are overlapping and symmetrical. These plots confirm the presence of bias in the EBS estimator and the lack of bias in the OLS estimator. Furthermore, this bias from EBS is negatively correlated with the marginal position of the parameter in relation to other parameters.
where the EBS SSEj is smaller than the OLS SSEj. As is shown in most other cases of Table 3, for groups with value far away from the grand mean (e.g., j = 1, 9), the EBS SSE j is larger than the OLS SSE j . For groups with value close to the grand mean (e.g., j = 5), the EBS SSE j is smaller than or equal to the OLS SSE j . The results indicates that the EBS estimator reallocates sum of squared errors unevenly across the groups, less for the central values and more for the minimum and maximum values. Again, simulation results for individual SAE j are generally in agreement with those for SSE j and thus are omitted for brevity.
In view of the above results, the EBS estimator may not increase the reliability of the estimates. When f is small, the EBS estimator can increase the reliability more for those means close to the grand mean, but less for those means far away from the grand mean. When f is large, the EBS estimator actually decreases the reliability especially for the means very different from the grand mean. The overall smaller SSE for which the EBS is designed does not necessarily lead to an increase in precision for all groups. It is very likely for the marginal groups that the EBS will produce both greater bias and less precision if the f value is large. When f exceeds , the EBS estimator ceases to be preferable to the OLS estimator given the statistical bias introduced. In this case, potential confounder(s) need to be identified, and further divisions of ensembles or stratifications are necessary to ensure the f value is not exceedingly large when EBS is applicable.

Examples
Two examples using real data are provided below to demonstrate instances where the OLS estimators generate a lower SSE than the EBS estimators. In both these examples the inadvisability of using the EBS estimator is suggested by the f statistic criterion.

Example 1: Mumps
The first application concerns mumps notifications per 100,000 by State/Territory from the Australian National Notifiable Diseases Surveillance System [38]. The data from 2001 to 2007 are taken to predict the 2008 notification rate, and the year-to-date 2008 notification rate is used to evaluate the EBS estimate; see Table 4. Suppose the notification rates follow a normal distribution and the EBS is applicable. Because of the difference in population size between State/Territories, unequal variances are considered appropriate and the shrinkage factors are estimated iteratively [34]. The estimated shrinkage factors and corresponding EBS estimates for the 2008 notification rates by State/Territory are listed at the bottom rows of Table 4. The SSE for the EBS estimator is 267.6, much greater than the SSE of 202.5 for the OLS estimator. The EBS estimators do not provide better estimates than the OLS estimators (in terms of SSE) in this situation. Here f = 13.09 is much greater than   = 2.57, and the EBS performed badly with SSE = 648, much greater than the SSE = 601 of the OLS estimator. Then we stratify the birth weights by identifying and separating out non-Aboriginal infants. The f value decreases to 3.71, indicating the performance of the EBS estimator has improved substantially. In accordance with the f statistic criterion, the EBS is still not applicable after stratification, indicating further potential confounders (such as rurality) may operate. Due to the small number of districts, further division of the ensemble based on rurality is not performed. In the above two examples, the relative merits of the EBS and OLS estimators are reversed compared with the sport examples advocating the EBS estimator [33,34].

Discussion
The EBS estimator is sometimes considered as a possible solution to the problem of unstable estimates and a way to reduce standard errors. This study demonstrates that when the variance ratio statistic f is large, the EBS estimator offers little reduction in standard errors for all groups, but instead it can potentially increase standard errors and bias for marginalized groups.
The EBS rests on some important implicit assumptions such as unimodal probability distribution and exchangeability [17]. To make the assumptions explicit, for the EBS to be valid, the groups within each ensemble have to be "similar", exchangeable random quantities from the same prior bell-shaped distribution. If the f value indicates that they are unlikely similar groups from the same distribution, then the underlying assumptions are violated. A remedy to this problem is to stratify or partition the data into credible ensembles according to confounders in order to satisfy these assumptions. In doing so, each ensemble will have its own model prior distribution with little between-group heterogeneity relative to within-group sampling error. Alternatively, if additional covariate or potential confounder information is available, hierarchical regression, multilevel model or mixed model appear more appropriate to allow the prior parameters to vary at more than one level and enable structural prior information to be incorporated into parameter estimates [39][40][41]. The multivariate coefficient shrinkage, rather than EBS, seems to be the answer to address the confounding and collinearity issues [5]. Forcing EBS without consideration of exchangeability may lead to loss of most of the statistical gains [42].
The rationale behind shrinkage was to minimize the risk by considering a prescribed loss function, rather than unbiased estimation for the parameter. The improvement in the risk is significant if the individual components are close to the point towards which these estimators shrink and the ensemble point estimator is of primary interest [23]. Many authors have contributed to improving both ensemble and individual properties for the shrinkage estimators, including the preliminary test approach [29][30][31]43]. The main advantage of the EBS estimator is a sacrifice of unbiasedness for improved precision. The f value plays a role in suggesting those situations under which this trade-off is beneficial and those under which it is not. When f becomes large, the benefits of improved precision appear to be diminishing and offset by unacceptably large bias and a greater degree of volatility for marginal groups. This process can be interpreted as a preliminary test for exchangeability. At first, the null hypothesis is tested with the f statistic. If , the hypothesis is rejected at the significance level ,  j 's are not really exchangeable and EBS is not indicated to be suitable.
Epidemiologists and practitioners may not be fully aware of the possible problematic and differential nature of both bias and volatility resulting from EBS estimation; with benefits being directed towards the ones having a large population while disadvantaging those having a small population and sample size. Such differential shrinkage is often counter-intuitive. The arbitrary and unjustified shrinkage may be regarded as unfair or merely data manipulation by those being evaluated, especially when the precisions for individual group estimates are of equal interest, as distinct from the general research situation when the overall precision is of primary interest.
In summary, the purpose of the EBS estimator is to reduce "risk" in terms of SSE. To apply the EBS estimator appropriately, epidemiologists need to assess the underlying exchangeability assumption. If there are significant differences between the sample means, EBS is likely to produce erratic and even misleading information.