Testing Equality of Multiple Population Means under Contaminated Normal Model Using the Density Power Divergence

This paper considers the problem of comparing several means under the one-way Analysis of Variance (ANOVA) setup. In ANOVA, outliers and heavy-tailed error distribution can seriously hinder the treatment effect, leading to false positive or false negative test results. We propose a robust test of ANOVA using an M-estimator based on the density power divergence. Compared with the existing robust and non-robust approaches, the proposed testing procedure is less affected by data contamination and improves the analysis. The asymptotic properties of the proposed test are derived under some regularity conditions. The finite-sample performance of the proposed test is examined via a series of Monte-Carlo experiments and two empirical data examples—bone marrow transplant dataset and glucose level dataset. The results produced by the proposed testing procedure are favorably compared with the classical ANOVA and robust tests based on Huber’s M-estimator and Tukey’s MM-estimator.


Introduction
The analysis of variance (ANOVA) has become one of the most useful and powerful statistical approaches in diverse applications, such as biology, physics, chemistry, genetics, engineering, economics, psychology, and medicine. This omnibus procedure, developed by [1], has often been applied to the continuous data from more than two independent samples for exploratory and confirmatory data analysis (cf. [2]). ANOVA is statistically appealing since this approach specifies and quantifies the effect of different treatments on the observed outcomes by comparing two sources of variabilities, i.e., variations within and between groups, to assess the equality of group means (or to test the null hypothesis of no treatment effects) (cf. [3,4]).
The classical ANOVA test requires some restrictive assumptions, such as normality of the errors, homogeneity of group variances, and absence of outliers, which may not be satisfied in practice (cf. [3,[5][6][7]). In particular, one crucial issue that requires special attention is the presence of outliers that differ from the bulk of the data (cf. [8]). Outliers caused by the measurement error, recording error, and naturally atypical observations may be masked and have adverse effects on the efficiency of traditional estimators [8]. As a result, even a small deviation from the ideal conditions can make the test meaningless and lead to unreliable results. To downplay this problem, practitioners often use some ad-hoc methods to remove outliers. However, such an approach has shortcomings as it can lead to a significant loss of efficiency. Hence, a more appropriate strategy is to use a Some concluding remarks are given in Section 8, and the proofs of the technical results are shown in the Appendix A and Supplementary Materials.

Generalized Analysis of Variance Model
Let us consider the generalized ANOVA model as follows: y ij = µ i + ε ij , i = 1, 2, · · · , k; j = 1, 2, · · · , n i , (1) where y ij is the j-th observation under the i-th categorical group and ∑ k i=1 n i = N. Here, µ i is the unobserved fixed effect of the i-th group. We assume that the random errors ε ij are independent random variables with mean zero and finite variance. As we are dealing with a robust estimator, we do not assume that the error term necessarily follows a normal distribution but rather a contaminated normal distribution with p proportion outliers, where 0 ≤ p < 0.5. However, the target distribution for ε ij is N(0, σ 2 ) for all i = 1, 2, · · · , k and j = 1, 2, · · · , n i . Thus, the model parameter θ = (µ 1 , µ 2 , · · · , µ k , σ 2 ) T , with θ ∈ Θ, is robustly estimated to match the target distribution. We denote the target distribution of y ij , i.e., N(µ i , σ 2 ), as f θ (y ij |i), or simply f θ (y ij ). It is also referred to as the model distribution.
The following assumption is needed to define the true data generating distribution. Assumption 1. Suppose the true data generating distribution g(y ij ) contains p proportion outliers from an arbitrary distribution χ(y ij ), i.e., g(y ij ) = (1 − p) f θ 0 (y ij ) + pχ(y ij ), where 0 ≤ p < 0.5 and θ 0 ∈ Θ. We assume that there exist a small positive number γ 0 , such that η(γ) = max i y ij f γ θ 0 (y ij )χ(y ij )dy ij is sufficiently small for γ > γ 0 . A small value of η(γ) ensures that χ(·) is an outlying distribution as the effective mass of χ(·) lies at the tail of the model distribution f θ (·) [39]. Here, we relaxed the normality assumption from the classical ANOVA model; however, the main structure of the true distribution should be normal, only the tails may be different. If the main structure of the block distributions is not normal, one may consider a different model for f θ (y ij ). Although all the calculations in this paper are based on the normal model, one may follow the same techniques for an arbitrary model.
We also eliminate another crucial constraint from the classical ANOVA model: the error distributions are identical. We only need them to be mutually independent. Here, g(y ij ) is the true density of the i-th block, and different blocks may have different variances without violating Assumption 1. Thus, our approach allows heteroscedasticity if the outlying distribution causes it.

Density Power Divergence
Let us consider a family of models {F θ , θ ∈ Θ} with density f θ . We denote G as the class of all distributions having densities with respect to the Lebesgue measure. Suppose G ∈ G is the true distribution with density g. Then, the DPD measure between the model density f θ and the true density g is defined as follows: where γ is a tuning parameter [34]. Note that G is not necessarily a member of the model family F θ . Further, for γ = 0, the DPD measure is obtained as a limiting case of γ → 0 + , and is the same as the Kullback-Leibler (KL) divergence. Given a parametric model, we estimate θ by minimizing the DPD measure with respect to θ over its parametric space Θ. We call the estimator the minimum power divergence estimator (MDPDE). It is well-known that, for γ = 0, minimization of the KL-divergent is equivalent to maximization of the log-likelihood function. Thus, the maximum likelihood estimator (MLE) can be considered a special case of the MDPDE when γ = 0.
Let θ = (µ 1 , µ 2 , · · · , µ k , σ 2 ) T denote the parameter of the generalized ANOVA model (1). We have the model density For γ > 0, the DPD measure can empirically be written as where c(γ) = 1 Nγ ∑ k i=1 ∑ n i j=1 y ij g 1+γ (y ij )dy ij does not depend on θ. Using Equation (B.1) in Supplementary Materials, Equation (3) can be written as The MDPDE of θ is then obtained by minimizing d γ ( f θ , g) over θ ∈ Θ. Note that if the j-th observation under the i-th block is an outlier, then the value of f θ (y ij ) is very small compared to other observations. In that case, its contribution in the second term of Equation (3) is negligible when γ > 0; thus, the corresponding MDPDE becomes robust against outliers. On the other hand, when γ = 0, the KL divergent can be written For an outlying observation, the KL divergence measure diverges as f θ (y ij ) → 0. Therefore, the MLE breaks down in the presence of outliers as they dominate the loss function. In fact, the tuning parameter γ controls the trade-off between efficiency and robustness of the MDPDErobustness measure increases if γ increases, but at the same time, efficiency decreases.
The MDPDE of θ is obtained by directly minimizing the DPD measure given in (4). Alternatively, by solving the estimating equations (given in Supplementary Material C), an iterative algorithm for the MDPDE is as follows: for i = 1, 2, · · · , k, The above algorithm needs initial values for µ i and σ. To protect against outliers, we use the i-th block median for µ i for i = 1, 2, · · · , k, and a scaled median absolute deviation (MAD) for σ. The following lemma gives the interpretation of the parameter in the contaminated model g(·).
If η(γ) is sufficiently small, then, under the contaminated model, d γ ( f θ 0 , g) is the minimum for all θ ∈ Θ. Thus, the true value of θ is always θ 0 for γ > γ 0 . It ensures that the interpretation of θ 0 has the same meaning as the classical ANOVA model where the error distribution is normal. Therefore, we keep the target parameter free of γ in the subsequent sections.

Asymptotic Distribution of the MDPDE
In this section, we present the asymptotic distribution of the MDPDE when the data generating distribution G(y) is not necessarily a contaminated model. Let us define the score function as u θ (y ij ) = ∂ ∂θ log f θ (y ij ). For i = 1, 2, · · · , k and j = 1, 2, · · · , n i , we define The form of u θ (y ij ) is given in Supplementary Material D. We further define Here, as N → ∞, we also need n i /N → c i , such that c i > 0 for all i = 1, 2, · · · , k and ∑ i c i = 1. For the consistency and asymptotic distribution of the MDPDE, we need the following assumptions: (A1) The true density g(y ij ) is supported over the entire real line R.
There is an open subset ω ∈ Θ containing the best fitting parameter θ such that J is positive definite for all θ ∈ ω.
(A4) We denote δ(·) as the indicator function. Then, for all r and s, we have Under the independent heterogeneous setup, the above conditions are required to stabilize the matrices J and K for the existence of the asymptotic distribution (cf. [40][41][42]). These assumptions are satisfied by the true density g(y ij ) defined in Assumption 1. However, the following theorem is proved for a more general form of g(y ij ). Theorem 1. Under the regularity conditions (A1)-(A5), with probability tending to 1 as N → ∞, there exists θ, such that (i) θ is consistent for θ, and (ii) the asymptotic distribution of θ is given by Proof. The proof of the theorem is given in Appendix A.
The independent and non-identically distributed samples leading to heterogeneity in variances technically impose a computational burden (cf. [42]). Hence, a positive definite matrix J in assumption (A2) is required to stabilize the asymptotic variance of the MDPDE. Furthermore, the assumption (A4) and a generalized version of Khinchin's weak law of large numbers (cf. [43]) are needed to ensure consistency, while the asymptotic normality is guaranteed by the assumption (A5) and a multivariate extension of the Lindeberg-Levy central limit theorem.
Further calculations in the supplementary materials show that for the uncontaminated where S is a k × k dimensional diagonal matrix with i-th diagonal element n i /N. Thus, the variance of each component of µ increases as γ increases. Therefore, the efficiency of the MDPDE decreases as γ increases-the MLE being the most efficient estimator in pure data. However, our simulation studies show that the loss of efficiency is minimal unless γ is too large. On the other hand, the gain in robustness is significant for contaminated data.

Influence Function of the MDPDE
We access the extent of the resistance to outliers of our proposed estimator using the influence function approach of [26]. It measures the rate of asymptotic bias of an estimator to infinitesimal contamination in the distribution. A bounded influence function suggests that the corresponding estimator is robust against extreme outliers. Note that the MDPDE is an M-estimator [25] as the estimating equation can be written as This is obtained by differentiating d γ ( f θ , g) with respect to θ in Equation (3). Let G(y) be the true distribution function Y, and θ = T γ (G) be functional for the MDPDE. Following [34], the influence function of the MDPDE is given by where J is evaluated at the model when g = f θ , and ξ (ij) , given in Equation (E.6), is a fixed vector that does not depend on index i and j.

Remark 1.
Note that the score function u θ (y ij ) in Equation (D.3) of the Supplementary Materials is unbounded in y ij . As a result, the influence function of the MLE, i.e., the MDPDE with γ = 0, is unbounded. On the other hand, u θ (y ij ) f γ θ (y ij ) is bounded in y ij when γ > 0 as the corresponding terms can be written as y ij exp(y 2 ij ). So, the influence function of the MDPDE of θ is bounded in y ij when γ > 0. Moreover, IF(y ij , T γ , G) tends to zero as |y ij | → ∞, indicating a redescending effect for large vertical outliers. The higher the value of γ, the larger the down-weighting effect on the outliers.

Choice of the Optimum γ
One important use of the asymptotic distribution of the MDPDE is the selection of the optimum value of the DPD parameter γ. As the performance of the test depends on the corresponding estimator, we choose γ that is optimum in terms of robustness and efficiency of µ = ( µ 1 , µ 2 , · · · , µ k ) T . In practice, the user may work with a fixed value of γ depending on the desired level of robustness measure at the cost of efficiency. Alternatively, we may select a data-driven optimum γ. Following [44], we minimize the mean square error (MSE) of µ to obtain the optimum value of γ adaptively. Suppose Σ µ is the asymptotic variance of µ obtained from Theorem 1, assuming that the true distribution belongs to the model family. Let Σ µ be the estimate of Σ µ . The empirical estimate of the MSE, as the function of a pilot estimator µ P , is given by From Supplementary Material H, we find that Σ µ = (1+γ) 3 σ 2 (1+2γ) 3 2 S −1 . In particular, we recommend that a robust estimator, such as the MDPDE with γ ∈ (0.3, 0.5), can be used as a pilot estimator. One should then iterate this process by taking the previous stage's optimum γ as the current stage's pilot estimator and proceeding until convergence. It eliminates the sensitivity in the initial value of µ P as long as the initial estimate is robust. In our numerical examples, we have used this iterative procedure. Lemma 1 shows that the target parameter is the same for all γ for the contaminated model. Moreover, Theorem 1 proves that all µ converge to the target parameter. However, their small sample performance may be different depending on the contaminated proportion (p) and closeness of the contaminated distribution (χ) to the model distribution ( f θ ). Thus, selecting the DPD parameter γ in finite samples is important to get the best performance.

Testing of Hypothesis
Let us now consider the ANOVA test, where the null hypothesis assumes no treatment effects, i.e., The following m(·) function imposes k − 1 restrictions for the null hypothesis: where 0 k−1 is a zero vector of length k − 1.

Definition 1.
Let θ be the MDPDE of θ. The family of proposed Wald-type test statistics for testing the null hypothesis in (15) is given by where M(θ) = ∂m T (θ) ∂θ . When γ = 0, the Wald-type test statistic reduces to the classical Wald test for testing the null hypothesis in (15). We define a k × (k − 1)-dimensional matrix Then, the (k  (17) is simplified as where S is a k × k dimensional diagonal matrix with i-th diagonal element n i /N. In the following theorem, we present the asymptotic distribution of W N .

Theorem 2.
The asymptotic null distribution of the proposed Wald-type test statistics given in (19) is chi-square with k − 1 degrees of freedom.
Proof. The proof follows from Theorem 1 using the derivation given in [45].

Numerical Results
To investigate the empirical performance of our proposed method, an extensive simulation study under different sample sizes, block sizes, error distributions, and outlier types is performed. The performance of the proposed method is compared with the classical ANOVA test and two robust alternative methods based on Huber's M-estimator and Tukey's MM-estimator [25]. The latter two tests are implemented in R by combining the 'rlm' and 'Anova' functions from the 'MASS' and 'car' packages, respectively. For those estimators, we have used the default tuning parameters given in the corresponding functions. The robustness properties of the MDPDE depend on the choice of the tuning parameter, and thus, four fixed values of γ = 0.1, 0.2, 0.3, and 0.4 are considered. The optimum value of γ is determined based on the data-driven adaptive choice of γ as discussed in Section 4.2 and the pilot estimator is used iteratively until convergence. From now on, the DPD with optimum γ is abbreviated as "DPD(Opt.)".

Levels for Different Block Sizes and Error Distributions
We consider the generalized ANOVA model in (1) with k = 3 blocks of sizes; n 1 = 30, n 2 = 25, and n 3 = 35. First, we consider the standard normal errors ε ij ∼ N(0, 1) for all i and j. To check the empirical levels of different tests, the dataset is generated from the null hypothesis where µ 1 = µ 2 = µ 3 = 0. The empirical level is computed as the proportion of test statistics in 5000 replications that exceed the nominal χ 2 critical value at a 5% level of significance. The results are reported in the first column of Table 1. From the results, all the values are close to the nominal level. In addition, the MSE of µ (times N) for all the estimators is reported in the second column of Table 1. The ANOVA test is based on the MLE, theoretically the most efficient estimator under normal errors. The simulated results also show that the MLE gives the smallest MSE in pure data. The MSE of the MDPDE increases as the value of γ increases. In DPD(Opt.), we minimize the MSE of the block means ( µ), and the mean value of optimum γ comes out to be 0.0507, which is close to zero. However, as the algorithm uses a dummy value of the true parameter µ P iteratively, its efficiency is lower than the actual fixed γ that produces the minimum MSE. Thus, the corresponding empirical level is also slightly inflated. In the following simulation, the error distribution is changed to the Cauchy distribution, and we consider an additional block of size n 4 = 20. The empirical levels and MSEs computed for this case are reported in the third and fourth columns of Table 1, respectively.
From the results, the MLE breaks down, and the N times MSE of µ becomes 1.3 × 10 10 when the errors are heavy-tailed. The MDPDEs with small γ are affected by the heavy-tailed errors, and the corresponding tests become very conservative. On the other hand, the DPD tests with higher values of γ properly maintain the level of the test. The mean value of optimum γ is 0.5865. So, the algorithm adaptively selects a higher value of γ as the data contains some extreme values. The tests based on Huber and Tukey's estimators properly maintain the level of the test; however, the MSEs of µ are much higher than DPD(Opt.).
Furthermore, we consider two additional cases with k = 5 and k = 6, where additional block sizes are n 5 = 30 and n 6 = 50. The error distributions are the standard normal and t-distribution with 3 degrees of freedom (t 3 ), respectively. The results from the third simulation with k = 5 are similar to the first case with k = 3. On the other hand, the MSE of the MLE is still too large in the fourth case as t 3 is a heavy-tailed distribution, although less extreme than the Cauchy distribution. The empirical level of the ANOVA test improves; however, as demonstrated in the later part of our numerical results, the power of the test is considerably affected in such situations.

Remark 2.
Note that in our numerical analyses, the Cauchy and t-distributions, which follow the form of the true density g(y ij ) in Assumption 1, are considered as the structure of the central region resembles the normal model, i.e., only the tails are different. On the other hand, a chi-square error distribution with smaller degrees of freedom deviates much from the normal model, and thus, it creates a discrepancy in the empirical levels and loss of power for the DPD tests. As discussed in Section 2, in such cases, one needs to assume a different model for f θ (y ij ) and compute the test statistic accordingly.

Levels for Different Sample Sizes
Let us consider the generalized ANOVA Model in (1) with k = 4 blocks and equal sample size (n) in all blocks, i.e., n 1 = n 2 = n 3 = n 4 = n. The performance of the estimators is examined under the standard normal errors ε ij ∼ N(0, 1) for the increasing number of sample size per block (n = 20 to 100). The plot at the top left in Figure 1 displays the empirical levels of all tests for different values of the sample sizes. To avoid overlapping plots, we present the results only for one DPD test, DPD(Opt.), excluding the tests with fixed γ. From the results, DPD(Opt.) shows inflated levels, but it settles down rapidly around the nominal level as the sample size increases. Other tests, including the DPDs with fixed γ (not presented in the plot), perform well in maintaining the level of the test even in small sample sizes.

Effect of Outliers
In the following setups, the robustness properties of the estimators are evaluated in the presence of different types of outliers. A certain percentage (p%) of outliers are inserted using two different scenarios to generate contaminated data; (a) random contamination and (b) concentrated contamination. Following Assumption 1, the contamination schemes are as follows.

1.
The random outliers in the y-direction, i.e., random vertical outliers, are obtained by replacing p% original standard normal errors with ε it ∼ N(10, 1) in the generalized ANOVA model (1).

2.
Concentrated vertical outliers are generated by substituting p% errors in the first block by ε it ∼ N(10, 1).
The plots at the top right, bottom left, and bottom right in Figure 1 present the empirical levels of different tests in contaminated data. The plot on the top right in Figure 1 presents the results when the dataset is contaminated at random locations with 5% outliers. In this case, all the methods produce similar performance with their performance obtained when no outlier is present in the data (i.e., the plot on the top left in Figure 1). Outliers do not alter the level even for the classical ANOVA test, as all the blocks are equally affected by the outliers. DPD(Opt.) is slightly liberal as the optimum γ is estimated from the data. In the bottom plots, the first block is contaminated by the clustered outliers with 5% (left) and 10% (right) contamination levels. The results indicate that the clustered outliers drastically inflate the empirical levels of the ANOVA test. The Huber test eventually fails to maintain its level when the proportion of clustered outliers is large. On the other, the empirical levels of the DPD(Opt.) test are very close to the nominal level.

Empirical Powers
The empirical powers of the test procedures for all the outlier types and contamination levels are presented in Figure 2. To compute the power of the tests, the dataset is generated under the alternative hypothesis using the block means µ = (−0.4, 0.2, −0.1, 0.3) T . The top left plot in Figure 2 shows that the power of all four tests is similar when no outlier is present in the data, and the power converges to one as the sample size increases. DPD(Opt.) shows slightly higher power in small sample sizes. However, the level corrected power (not presented in the plot) is equivalent to other tests. From other plots in Figure 2, the classical ANOVA test is severely affected by both types of outliers. While the power of the Huber test is relatively high, it loses sufficient power, especially in the presence of clustered outliers at a large percentage. In other words, the Huber test is not fully robust to the clustered outliers. Compared with other tests, the proposed DPD(Opt.) produces improved power values in all cases. Moreover, it produces higher power that is not affected by the outlier types and contamination levels. While Tukey's test produces higher power than the classical ANOVA, the proposed DPD test gives even better power than Tukey's test even after level correction.
In a nutshell, the results produced by our simulation studies suggest that the performance of the proposed DPD test is similar to the classical ANOVA test when no outlier is present in the data. On the other hand, the DPD test with large values of γ yields an improved level and power values than the classical ANOVA and the test based on the Huber estimator. In addition, the data-dependent optimum MDPDE successfully produces the optimum performance and adequately balances the efficiency in pure data and robustness properties in the contaminated data. Moreover, our results indicate that the proposed method produces a competitive or even better level and power than the tests based on other M-estimators.

Remark 3.
We note that the error distributions are not identically distributed in the case of the clustered outliers. In addition, the error variance is different in the first block, i.e., the model is heteroscedastic because of the outliers. In this case, some of the assumptions for the classical ANOVA test are not satisfied, and thus, the empirical level breaks down, and the test losses significant power.
On the other hand, our proposed DPD test produces consistent results and successfully relaxes those assumptions.

Bone Marrow Transplant Dataset
The bone marrow transplant dataset, originally reported by [46], describes several hematologic diseases for 187 children and adolescents (112 males and 75 females) diagnosed with malignant (n = 155) and nonmalignant disorders (n = 32). The patients underwent unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation between 2000 and 2008. Their median age at transplant is 9.6 years (range: 0.6-20.2 years). With this dataset, our aim is to test if the average time to platelet recovery is related to the type of hematologic diseases. The platelet recovery is defined as a recovery of platelet count greater than 50 × 10 9 /L without transfusion support for three consecutive days. The first day of three consecutive days is regarded as the day of platelet engraftment. Because of some missing cases or patients without platelet recovery, 17 observations containing platelet recovery time of 10 6 days are excluded from the data. The remaining dataset has 142 patients with malignant disorders and 28 nonmalignant cases. Among the malignant disorders patients, there are cases of 62 acute lymphoblastic leukemia (ALL), 31 acute myelogenous leukemia (AML), 42 chronic myelogenous leukemia, and 7 lymphomas. Figure 3 presents the box plots and normal kernel density plots of platelet recovery time (in days) for different groups of patients. From Figure 3, the dataset has some large outliers, which motivates us to apply our proposed method to robustly test the equality of the average times to platelet recovery of different types of hematologic diseases.
The results are presented in Table 2. In this table, the first four columns denote the results of the ANOVA, DPD(Opt.), Huber, and Tukey's tests, respectively. From the results, the ANOVA test based on the MLE is considerably affected by the outliers, and the computed block means for this method are higher than the other methods. Note that the block medians of the five groups are 23, 25, 17.5, 21, and 17, respectively. In addition, the MLE produces a considerably larger standard deviation, σ = 36.83, compared with the robust methods, yielding a large p-value = 0.2298 for the ANOVA test. On the other hand, the robust methods produce smaller estimates for σ. Thus, the p-values obtained by the DPD and Tukey's tests are significant at a 5% level of significance, while Huber's test is on the borderline.  In this dataset, the block lymphoma has only seven patients, i.e., it includes only 4.12% of observations. Therefore, the test results may be biased due to the unbalanced case. To overcome this problem, we remove this block and re-compute the results for all the methods. The results are presented in the last four columns of Table 2. From the results, the p-value computed from the ANOVA is still large. On the other hand, the robust tests comfortably reject the null hypothesis at a 5% level of significance. Consequently, the results indicate that the platelet recovery time varies significantly depending on the type of hematologic disease. However, the classical ANOVA test fails to detect the difference because of the impacts of the large outliers.

Glucose Level Dataset
We analyze the glucose level dataset, where we are interested in determining the significant difference in average blood glucose levels among work types. The original dataset, available at https://healthdata.gov/ (accessed on 28 July 2022), is used to predict cerebral stroke based on 11 features (see [47]). The dataset contains 43,400 observations, including 6156 children aged below 16. The adults are categorized into four groups based on their work type-government, never worked, private, and self-employed. Figure 4 presents the box plots and normal kernel density plots of average glucose levels for different groups. This figure denotes that all the distributions have long tails to the right. In other words, the population of each group may follow a contaminated normal distribution. Therefore, it is expected that the robust tests may produce better results compared to the classical ANOVA.  Our results for the glucose level dataset are presented in the first four columns of Table 3. The null hypothesis claims that the means of the average glucose level in all categories are equal. All the tests excluding the proposed DPD(Opt.) come out to be significant as the p-values are almost identical to zero. Here, the group medians are 88.52, 92.35, 88.57, 91.61, and 94.68, respectively. From Table 3, the group means computed by the ANOVA are inflated dramatically because of the outliers. On the other hand, the estimates of the group mean obtained by the DPD(Opt.) are not significantly different. Similar to the bone marrow transplant dataset, this dataset is also unbalanced as only 117 people (0.41% of the sample size) have never worked. The other categories have sufficient sample sizes-6156 children, 5440 government jobs, 24,834 private jobs, and 6793 self-employed. Thus, the 117 observations belonging to the category 'never worked' are discarded from the dataset to obtain a balanced design. The results obtained for this balanced case are presented in the last four columns of Table 3. From the results, the computed p-values obtained by ANOVA (1.2 × 10 −161 ) and Huber (3.7 × 10 −61 ) tests are still very small. In this case, Tukey's test produces a large p-value (0.2890), and the corresponding estimates of the group means are close to 90. The p-value obtained by the DPD test is reduced compared with the one obtained from the unbalanced case, but it is still insignificant at the 5% level. Thus, it is evident from the results that the ANOVA and Huber tests produce false positive results for this dataset. On the other hand, the proposed and Tukey's tests show strong robustness against outliers.

Conclusions
In this study, we propose a robust procedure for testing the main effect in the one-way ANOVA model under mild restrictions. The test has a tuning parameter that controls the efficiency and robustness of the MDPDE of the treatment effect. In addition, we propose an adaptive method that estimates the tuning parameter without prior knowledge of the outliers. The proposed test can be used even if the normality assumption is violated at the tails of the distribution or errors are heteroscedastic because of the outliers. The empirical performance of the proposed method is evaluated via an extensive simulation study, and the results are favorably compared with existing robust and non-robust testing procedures. Our results indicate that the proposed method produces similar results to the classical ANOVA when no outlier is present in the data. On the other hand, the proposed method produces competitive or even significantly better results than the existing robust methods when outliers contaminate the data. Through several empirical data examples, we demonstrate that the proposed test can uncover both masking effects caused by outliers-blurring the actual difference when one exists and detecting a difference when none exists.
There are several ways in which the present study can be further extended. For instance, using a flexible formulation of the hypotheses obtained by a convenient contrast matrix as discussed by [23]; the proposed test can be extended to the more complex structure of the designs, such as factorial ANOVA. In addition, the proposed method can be used with the non-parametric inference procedures, such as the one proposed by [48], to incorporate the uncertainty associated with the underlying effect estimators and to handle the right-censored survival data. Differentiating Equation (3), it can be written as an M-estimator as follows where Let θ g be the true value of θ, then Taking a Taylor series expansion of Equation (A2), we get Under regularity condition (A1)-(A3), it can be easily shown that the reminder term √ NR N = o p (1). Now, we will show that Therefore, from Equation (A5), we will prove the theorem as First Part: So Equation (6) gives Second Part: From Equation (A4), we get Therefore, Finally, the asymptotic normality will be proved using the central limit theorem for the independent but not identical random variables using assumption (A5). A bound can be shown following Section 2.7 of [49] or Section 5 of [50] (also see [42]).