Empirical Likelihood-Based ANOVA for Trimmed Means

In this paper, we introduce an alternative to Yuen’s test for the comparison of several population trimmed means. This nonparametric ANOVA type test is based on the empirical likelihood (EL) approach and extends the results for one population trimmed mean from Qin and Tsao (2002). The results of our simulation study indicate that for skewed distributions, with and without variance heterogeneity, Yuen’s test performs better than the new EL ANOVA test for trimmed means with respect to control over the probability of a type I error. This finding is in contrast with our simulation results for the comparison of means, where the EL ANOVA test for means performs better than Welch’s heteroscedastic F test. The analysis of a real data example illustrates the use of Yuen’s test and the new EL ANOVA test for trimmed means for different trimming levels. Based on the results of our study, we recommend the use of Yuen’s test for situations involving the comparison of population trimmed means between groups of interest.


Introduction
The comparison of the means of several populations is frequently encountered in the statistical analysis of data from environmental research and public health studies. Typically, ANOVA is used to compare these means of interest, for example, for the comparison of means of blood lead levels between groups of children receiving different interventions. Practical situations may involve complications such as unbalanced designs (i.e., unequal sample sizes for the groups), variance heterogeneity, and departures from normality. It may be the case, for instance, that the distributions underlying the data from each group are truly heavy tailed or skewed, but it is also possible that such departures from normality are due to few observations located away from the bulk of the data in the tails of the distribution. It is well-known that the classical ANOVA F test cannot handle such violations of its assumptions, and, as a consequence, it has problems controlling the probability of the type I error at the specified nominal level. Heteroscedasticity and/or outliers can completely break down the results of the ANOVA F test when not properly taken into account (see, for example, [1]). Given this limitation of the ANOVA F test, there is a need for ANOVA type tests that are robust to both heteroscedasticity and outliers.
A statistical test that satisfies these requirements is the test developed by Yuen [2], who proposed a modified version of Welch's heteroscedastic F test [3]. The latter test is designed to deal with heteroscedasticity for normally distributed data, and it is using the sample means and sample variances to estimate their population counterparts. Since the sample mean and the sample variance are not robust to outliers, Yuen [2] proposed to replace them with a pair of robust estimators consisting of the trimmed mean and the Winsorized variance. Such an approach provides a better control of the probability of the type I error for one-way ANOVA situations involving unbalanced designs and skewed distributions (see [4]). There are two important comments to be made. The first comment is that the construction of Yuen's test has a somewhat ad hoc nature, by replacing the least squares estimators with robust versions. The second comment is that Yuen's test is no longer a test for the comparison of populations means, but, rather, it is a test to compare population trimmed means. It may be preferable to make inferences regarding the population trimmed means rather than the population means when the underlying distributions for the groups are skewed, since the trimmed means are more representative for the bulk of the data in those situations [5].
In this paper, we present an alternative to Yuen's test, a new nonparametric test that can be used to compare several trimmed means based on the empirical likelihood (EL) approach to statistical inference [6][7][8]. The EL method (see [9] for a detailed overview) is a popular nonparametric approach that does not require normality (or other distributional assumptions) and can be regarded as a data adaptive method. We develop an EL-based ANOVA test for the comparison of trimmed means that takes advantage of the nonparametric nature of the EL approach, by extending the results of Qin and Tsao [8] who introduced the EL method for a trimmed mean (see also the results from [10]). All technical details regarding the tests considered in this paper (including the asymptotic results for the new EL-based ANOVA for trimmed means) are provided in the Appendices A-D.
The paper is organized as follows. In Section 2, we present and interpret the results of a simulation study that compares the performance of the EL-based ANOVA for trimmed means and means with alternative methods under several scenarios involving skewed distributions. In Section 3, we analyze a real data set using different types of tests for the comparison of population trimmed means and population means. We end the paper by presenting conclusions in Section 4.

Simulation Study
For simplicity, we present only situations where we are interested in the comparison of three population trimmed means or three population means (k = 3), while having samples of equal sizes. We consider scenarios involving skewed distributions, with and without variance heterogeneity. For the EL ANOVA for trimmed means, we consider only symmetric trimming, where all samples are trimmed symmetrically. We note that, although we are primarily interested in the performance of the tests for the comparison of trimmed means, EL ANOVA for trimmed means (panel ELT) and Yuen's test (panel Yuen); for completeness purposes, we are also including the results for the tests for the comparison of means, specifically the classical ANOVA F test (panel F test), Welch's heteroscedastic F test for means (panel Welch), and the EL ANOVA for means (panel EL). For Welch's test and Yuen's test, we have used the R function t1way (see Wilcox [11]). The R functions that provide the implementation of the EL ANOVA methods for trimmed means and means are available from the corresponding author upon request.
For the simulation study, we investigate the potential effect of the shape of the distributions on the estimated probability of type I errors. We consider several skewed distributions with and without variance heterogeneity. We use a simulation design similar to that from [5], where (trimmed) means of only two independent skewed populations are compared. For the scenario with homogeneous variances (scenario 1), we simulate data from three independent skewed distributions. We consider the χ 2 3 distribution, the lognormal distribution with normal mean µ = 0 and normal scale σ = 1, the gamma distribution with shape parameter α = 2 and scale parameter σ = 1, and the skew-normal distribution with location parameter ξ = 0, scale parameter ω = 1, and slant parameter α = 1 (see [12]). For the scenario with heterogeneous variances (scenario 2), we further transform the data simulated from the three independent skewed distributions as to have the ratios between variances to be either 1:4:9 or 1:1:36. To ensure that the relevant H T 0 of equal trimmed means or H 0 of equal means are true, before altering the variances, we center the data using the theoretically determined trimmed means (when using tests for the comparison of trimmed means) or means (when using tests for the comparison of means). We use 10,000 Monte Carlo simulations to calculate the empirical probability of type I errors for the tests performed at the nominal 0.05 significance level. Table 1 presents the empirical probability of type I errors for the different tests for the situation involving skewed distributions with homogeneous variances (scenario 1). Regarding the comparison of trimmed means, the results for Yuen's test are closer to the nominal significance level than those for the EL ANOVA test for trimmed means. By contrast, among the tests that compare means, the results of the EL ANOVA test for means are closest to the nominal significance level. Tables 2 and 3 present the corresponding results for the same tests for situations involving skewed distributions with heterogeneous variances (scenario 2). We note that it is more difficult to control the probability of a type I error when the ratios between variances are 1:1:36 than when they are 1:4:9. Similar to the homogeneous variances scenario, the results for the heterogeneous variances scenario suggest that Yuen's test performs best among the tests for the comparison of trimmed means, while the EL ANOVA test performs best among the tests for the comparison of means. Table 1. Empirical probability of type I error for various tests for the equality of means and trimmed means of three independent skewed distributions with homogeneous variances. For methods involving trimmed means, symmetric trimming at level α i = β i = c, i = 1, 2, 3 is used.
Trimming level Trimming level Trimming level Trimming level  Table 2. Empirical probability of type I error for various tests for the equality of means and trimmed means of three independent skewed distributions with the ratios between variances being 1:4:9.
Trimming level Trimming level  Table 2. Cont.
Trimming level  Table 3. Empirical probability of type I error for various tests for the equality of means and trimmed means of three independent skewed distributions with the ratios between variances being 1:1:36. For methods involving trimmed means, symmetric trimming at level α i = β i = c, i = 1, 2, 3 is used.
Trimming level

Real Data Example
To illustrate the use of the EL ANOVA for trimmed means and means, we use the Oslo Transect data set [13]. This real data set includes 360 observations corresponding to different plants collected along a 120 km transect running through the city of Oslo, Norway. The concentrations of 25 chemical elements found in these plants were recorded together with factors that may influence the mineral concentration. Except for not including two chemical elements, Au and Na, this data is available within R package rrcov [14] as OsloTransect dataset. We analyze this dataset, and, thus, only 23 chemical elements are included in Table 4. To preserve the skewness of the data, we have also used the raw data, as opposed to the log transformed data (as done in [13]). After removing the observations with missing values, we are left with 332 observations. We consider the 23 concentrations of chemical elements as the response variables, and the lithology as a group variable with four levels.
As for the simulation study, even though our main interest is in tests that compare population trimmed means, for completeness purposes, we also provide the results from the tests that compare population means. We consider three symmetric trimming strategies similar to those used in the simulation study. The entries from Table 4 provide the p-values from the tests for the comparison of population means and population trimmed means. We note that, for each trimming strategy, the p-values from the EL ANOVA for trimmed means (panel ELT) and Yuen's test (panel Yuen) are very similar. In addition, the p-values from the EL ANOVA for means (panel EL) and Welch's heteroscedastic F test (panel Welch) are also very similar.

Conclusions
In this paper, we introduce a new nonparametric ANOVA type test for the comparison of population trimmed means. Although the new method is derived from the general principles of the empirical likelihood approach, versus the somewhat ad hoc nature of the derivation of Yuen's test from Welch's heteroscedastic F test, the results of our simulation study in situations involving skewed distributions indicate that, unless the sample sizes per group are very large, the new EL ANOVA method for trimmed means performs worse than Yuen's test with respect to control over the probability of a type I error. This is in contrast with our simulation results for the comparison of means, where the EL ANOVA for means performs better than Welch's heteroscedastic F test. The analysis of the real data example provides similar p-values for the new EL ANOVA method for trimmed means and the Yuen's test for different trimming levels, and also similar p-values for the EL ANOVA and Welch's heteroscedastic F test.
Based on these results, we recommend the use of Yuen's test for situations, where the research question involves the comparison of population trimmed means between groups of interest. The choice of the specific trimming strategy is an important and complex issue, since different trimming strategies imply different null hypotheses being tested. As such, the selection of the trimming strategy should be based on subject matter reasons that take into account what is known by the experts about the data under investigation. Alternatively, in the absence of expert knowledge information, different trimming strategies could be used to evaluate the sensitivity of the results to the choice of the trimming strategy.
Acknowledgments: The authors would like to thank the academic editor and the reviewers for their thoughtful and constructive suggestions.
Author Contributions: Janis Valeinis and George Luta proposed the new EL ANOVA for trimmed means; Mara Velina proved the asymptotic results; Mara Velina and Luca Greco performed the simulation study; and Mara Velina performed the data analysis. All authors wrote and edited the paper and all authors read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Statistical Tests Not Based on EL
. . , k, be independent random samples from k different distributions with population means µ i . We are interested in testing the null hypothesis of equal population means H 0 : Under the assumption of equal variances (homoscedasticity) and normally distributed data in each group, i.e., Y ij ∼ N(µ i , σ), one can use the classical ANOVA F test are the sample mean and sample variance of the i-th group, respectively, and is the pooled sample mean. The null hypothesis in (A1) is rejected at level c, if F > F c,k−1,N−k , where F c,k−1,N−k is the critical value based on the F distribution with k − 1 and N − k degrees of freedom.
Let us suppose now that Y ij ∼ N(µ i , σ i ) for i = 1, . . . , k. Welch's heteroscedastic F test [3] is designed to be robust to the violation of the assumption of equal group variances. The main difference with the classical ANOVA F test is that the following weights are used:

The Welch's heteroscedastic F test statistics is defined by
Yuen's test, i.e., the robust modification of Welch's heteroscedastic F test, is designed to be robust to departures from normality. The test is obtained by using the sample trimmed means and Winsorized variances instead of the sample means and variances. Let Y i(1) , Y i(2) , . . . , Y i(n i ) denote the order statistics for the ith sample. Let q i = [n i α i ] + 1 and r i = n i − [n i β i ], where 0 < α i < 1/2 and 0 < β i < 1/2 represent the proportion of observations trimmed from the left and from the right tail of the distribution, respectively, and [x] denotes the largest integer less than or equal to x.
represents the effective sample size after trimming and the sample trimmed mean of the ith group is defined as Let W ij represent the new observations after replacing the trimmed observations in the lower and upper tails with the lowest and highest untrimmed values of the sample, i.e., The sample Winsorized variance for the i-th group is computed as Yuen's test statistics is given by The null hypothesis of equal (trimmed) means is rejected at level c if F YT ≥ F c,k−1,ν YT , where Note that this test reduces to Welch's heteroscedastic F test when there is no trimming.

Appendix B. EL-Based ANOVA for Means
Let F i denote a candidate for the true unknown distribution F i0 and v ij = F i {Y ij } denote the jump of F i at {Y ij }. The EL for the i-th sample is L(F i ) = ∏ n i j=1 v ij and corresponds to a multinomial distribution defined on the i-th sample by attaching a weight v ij to each Y ij . The weights v ij = v ij (µ i ) satisfy the conditions The function L(F i ) attains its maximum value when v ij = n −1 i . Similar to the classical approach based on the parametric likelihood, the profile EL ratio function is defined as For an ANOVA model, Owen ([15]) defined the k-sample EL as the product of k group specific empirical likelihoods. Therefore, given the k samples, the profile EL ratio function can be defined as follows: where v ij = v ij (µ). Under the null hypothesis (A1), the k − 1 contrasts between means are constrained to be zero, and if µ = µ 0 + O(n −1/2 0 ), where µ 0 is the true unknown common mean and n 0 = min 1≤i≤k n i , then as n 0 → ∞, whereȲ i· is the sample mean for the i-th sample,Ȳ is the common mean estimator and the weights w i are inverse proportional with the sample variances S 2 i , i.e., Similar to (B1), define the profile EL ratio function over the trimmed samples, that is, as if the m i observations in each sample are independent, i.e., It is important to note that the weights are no longer a function of the common population mean but of the common population trimmed mean, that is v ij = v ij (µ αβ ). As a consequence, we will obtain an ELRT for a different null hypothesis claiming the equality of population trimmed means (see [8,10]), that is H T 0 : µ αβ1 = µ αβ2 = . . . = µ αβk = µ αβ .
When the underlying distribution of the data in each group is symmetric, the two hypotheses (A1) and (D2) are equivalent if symmetric trimming is performed. This equivalence does not hold for skewed distributions, for which it may be preferable to compare trimmed means rather than means [5]. The following result holds.
Proof. By a Lagrange multiplier argument, it can be shown (see, for example, [9]), that the v ij , i = 1, 2, . . . , k that maximize R(µ αβ ) are given by where the Lagrange multiplier λ i is the solution to Then, by substituting v ij from (D5) in the expression (D1), we obtain The maximum empirical likelihood estimatorȲ αβ is the solution to ∂R(µ αβ )/∂µ αβ = 0: It follows thatȲ αβ satisfies and expression (D3) follows. Since, according to Theorem C1, for each of the trimmed samples i = 1, . . . , k and true value µ 0 then, summing over the groups, we prove the result stated in (D3), by using the same arguments leading to the result stated in (B2).