Empirical Likelihood Ratio Tests for Homogeneity of Multiple Populations in the Presence of Auxiliary Information

: The empirical likelihood ratio test (ELRT) statistic is constructed for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information. It is shown— under some regularity conditions and under the null hypothesis that all distribution functions of the populations are equal—that the asymptotic distribution of the ELRT is a chi-squared distribution. The proposed ELRT could be more powerful than the Kruskal–Wallis test, as extra information can be efﬁciently employed by ELRT. The advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons, and ELRT is more suitable for the multi-population consistency test with a small sample size. 62E20


Introduction
Suppose that there are k (k ≥ 2) populations, and the distribution function of the i-th population is F(x; θ i )(1 ≤ i ≤ k), where θ i ∈ A ⊆ R p , θ i are parameter vectors (1 ≤ i ≤ k), and A is the parameter space. In other words, the k populations share the same type of distribution but may have different structures as θ i (1 ≤ i ≤ k) varies. Consider the hypothesis H 0 : θ 1 = θ 2 = · · · = θ k .
This test for homogeneity arises, for example, in the comparison of a number of different treatments, processes, varieties, or locations, when one wishes to test whether these differences have any effect on an outcome X, where X can be a scalar or a vector.
If F is the normal distribution function, the standard analysis of variance (ANOVA) for testing the above hypothesis has been widely used by a number of investigators. For example, Dou [1] employed this method in a parametric study of a developed statistical model. However, the standard ANOVA is not suitable for other distributions.
Due to the complexity of the real world, the form of F may not be known in many applications. In this nonparametric setting, the Kruskal-Wallis test (KWT) provides tests of the null hypothesis that independent samples from two or more groups come from identical populations. Refer to Lehmann [2] for the theory and applications of KWT.
Here, we provide a brief definition for KWT and its limiting distributions. First, the data of all samples in a single series are arranged in an ascending order, and a rank is assigned to each data in the ascending order too. In the case of a repeated value, or a tie, assign ranks to them by averaging their rank position.For example, if the sample number is even, the rank of the median is the average rank of the two numbers before and after it. The KWT statistics for the k independent samples, each of size n i , is where n = ∑ k i=1 n i , and R i is the sum of the ranks (from all samples pooled). For the i-th sample, we have where R ij is the rank (from all samples pooled) of the j-th observation in the i-th sample.
The null hypothesis of this test is that all k distribution functions are equal. It is shown, under the null hypothesis and some regularity conditions, that T d → χ 2 k−1 . As mentioned above, when the form of the F may not be known in many applications. KWT can be used to perform consistency tests for multiple populations. As KWT constructs statistics based on sample rank, its test efficacy is good when the sample size is large. However, when the sample size is small, the statistics constructed based on sample rank carry much less sample information. In other words, KWT is obviously going to be a lot worse. So we introduce the empirical likelihood method; here, we provide a brief definition for it.
The empirical likelihood method as a nonparametric technique for statistical inference in the nonparametric setting was introduced by Owen [3,4] and has many advantages over other nonparametric test methods such as the normal-approximation-based method and the bootstrap method, as put forward by Hall and La Scala [5] and Hall [6]. The Wilks' theorem, Bartlett correction and the ability of using auxiliary information are three striking properties of the empirical likelihood methods. Chen and Qin [7] proved that the empirical likelihood method can be seamlessly applied to finite population estimation problems, and more accurate statistical inference can be obtained through the effective use of auxiliary information. Zhang [8] developed a new class of M function estimators and quantile estimators with some auxiliary information, using the empirical likelihood technique. A natural question is whether and how an empirical likelihood method can efficiently use the auxiliary information to decide whether several samples should be regarded as those that come from the same population. In this paper, an empirical likelihood ratio test (ELRT) statistics is constructed for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information. Since the auxiliary information is not employed in the KWT method, KWT may be less powerful than ELRT in the field of population distribution consistency. A comprehensive comparison between ELRT and KWT was conducted and is presented in Section 3.
We note that there exist a few other approaches which allow to incorporate auxiliary information in statistics testing. For example, the method based on the auxiliary information in a form of vectors of unbiased estimates in Tarima and Pavlov [9] (T&P (2006)) may be used in the context of this article. The asymptotic properties of the above work are analyzed by Albertus [10]. Tarima and Pavlov used additional information to construct parameter estimation statistics, and completed parameter estimation by adding data sources. Therefore, we will compare ELRT and T&P (2006) separately in the numerical simulation part.
The form of F is unknown in the present study. However, it is assumed that some auxiliary information about the distribution function F(x; θ) is available in the sense that there exist r(r > p) known functions g 1 (x; θ), g 2 (x; θ), · · · , g r (x, θ) such that where X ∼ F(x; θ) and g(x; θ) = (g 1 (x; θ), g 2 (x; θ), · · · , g r (x, θ)) τ is an r-dimensional vector.
Equation (2) defines a group of estimating equations. Those equations are widely applicable and particularly powerful when the data model is not specified by a full parametric likelihood function, as elaborated by Hansen [11] and Godambe and Heyde [12] among many others. Qin and Lawless [13] showed that an empirical likelihood approach produces a semiparametric efficient parameter estimate. In this study, r > p is required. Excellent explanations related to this requirement are given by Qin and Lawless [13] and Zhang [8]. More related results of statistical inference using the estimating equations can be found in Wang and Chen [14] and Zhou et al. [15], among others.
This assumption (2) is natural in practice; as in most commonly used distribution families, the distribution is usually determined by some of its moments, such as the mean, variance, skewness, kurtosis and so on. For example, if X is the amount of a type of grains and one suspects that there could be some differences in the amount of the grains, among several populations, caused by the amount of fertilizer, we may set θ (1) = EX and g 1 (x; θ (1) ) = x − θ (1) . On the other hand, if one suspects that the use of the fertilizer may not only cause the change of amount of grains but also the change of the variance of X, then we may set θ (1) (1) , θ (2) ) . These could be initially assessed by comparing the histograms of the data sets of populations which are under consideration. In addition to the above (partial) information, we may know some extra information. For example, we may know some moments of X or may know that the distribution of X is symmetric about some points.
Based on (2), we will construct an empirical likelihood ratio test (ELRT) to test H 0 . It is shown that the limiting distribution of the ELRT under H 0 is χ 2 (r−p)(k−1) , and thus the testing method for H 0 is ready to use, where The rest of the paper is organized as follows. The main results of this study are presented in Section 2. Results of a simulation study on the finite sample performance of the ELRT are reported in Section 3. We conclude and give some remarks on our future work in Section 4. Finally, the proof of the main results is presented in Section 5.

Main Results
For 1 ≤ i ≤ k, suppose that data X ij (j = 1, 2, · · · , n i ) are independently distributed as F(x; θ i ) (unknown) and that all X ij (j = 1, 2, · · · , n i ; i = 1, 2, · · · , k) are independent. Let n = ∑ k i=1 n i , θ i ∈ Θ ⊂ R p and Θ is the parameter space of θ and an open set of R p . Let Here, p ij is the probability mass, which represents the probability that the random variable g(X, θ) values g(X ij , θ), both of which are non-negative, and the sum is 1. Similarly, q ij represents the probability that the random variable g(X, θ) values g(X ij , θ) when Applying the method proposed by Qin and Lawless [13], the ELRT for testing H 0 can be defined as The ELRT rejects H 0 for large values of −2 log λ n .
For every 1 ≤ i ≤ k, assume that 0 is in the convex hull of {g(X ij ; θ), 1 ≤ j ≤ n i }. Then, according to the Lagrange multiplier method, one can obtain Similarly, for Hence, The log-empirical likelihood function for data X ij (j = 1, 2, · · · , n i ) is therefore defined by Suppose that there is aθ n that maximizes n (θ), then theθ n is called the maximum empirical likelihood estimator (MELE) of θ. Suppose that, in addition, n (θ) is differentiable in θ, thenθ n will be a solution of the empirical likelihood equation where Similarly, for 1 ≤ i ≤ k, the log-empirical likelihood function for X ij (j = 1, 2, · · · , n i ) is defined by The MELE of θ under the i-th sample is denoted asθ ni , which is a solution of the empirical likelihood equations We assume that allθ n andθ ni are consistent estimators of θ as min 1≤i≤k n i → ∞. Then the λ n can be rewritten as Let X be a population with a distribution F(x; θ), θ 0 be the true value of θ, and ||M|| be the L 2 -norm of a matrix M. To obtain the asymptotic distribution of λ n , we need some regularity conditions as follows Qin and Lawless [13] (pp. 305-306): (A) E{g(X; θ 0 )g (X; θ 0 )} is positive definite, ∂g(X; θ)/∂θ is continuous in a neighborhood of θ 0 , ||∂g(x; θ)/∂θ|| and ||g(x; θ)|| 3 are bounded by a function G(x) in this neighborhood, and the rank of E{g(X; θ 0 )g (X; θ 0 )} is p, where E{G(X)} < ∞.
The main results of this study are presented as follows.

Remark 1.
If we use λ n in Equation (11) in stead of Equation (6) as the original definition, wherê θ n andθ ni are the roots of related likelihood equations, then Theorem 1 still holds true. This can be seen from the proof of Theorem 1. In other words,θ n andθ ni do not need to be the MELEs to have the results of Theorem 1.
To sum up, we constructed an ELRT statistic for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information when the population distribution is unknown, and proved the asymptotic distribution of ELRT as a chi-square distribution under some regularity conditions when the null hypothesis is true. Next we will begin the numerical simulation. In this section, we will calculate the rejection rates of ELRT and compared with those of the Kruskal-Wallis test under several alternatives and compare the powers of them.

Simulation Results
Several commonly used distribution families were used in our simulations. The collective distribution and related parameter information are shown in Table 1.
In this study, only three populations were compared. In the simulations, it was supposed that we only know the means of the populations. On the one hand, under the combination of sample size, we took the true value of the distribution under the null hypothesis to generate three distribution populations, and calculated the value of −2 log λ n . The simulation was repeated 5000 times to obtain 5000 corresponding −2 log λ n values. Then, the quantiles of−2 log λ n samples obtained were compared with the quantiles of the Chi-square distribution in Theorem 1. Finally, the Q-Q diagram of ELRT was made as well as the Q-Q diagram of KWT under the same conditions (Figures 1-6). Here the abscissa is the theoretical quantile value, and the ordinate is the quantile value of the distribution population. It can be seen from Figures 1-6 that when the null hypothesis is true, the Q-Q diagrams of ERLT and KWT can prove that the asymptotic distribution of the test statistics given in this chapter obeys the Chi-square distribution when the null hypothesis is true.
At the same time, the simulated rejection rates of ELRT and KWT under several alternatives were compared using 5000 Monte Carlo trials with various sample sizes. It should be noted here that the rejection rate is calculated as follows: reject.rate.KWT = sum(reslt.KWT > quantl.KWT)/m where the reslt. KWT is calculated from the samples by KWT statistics, and quantl. KWT is calculated from the samples under the alternatives by Chi-square distribution, and m is the number of samples. The significant level was always set as 0.05 in the simulations. Results of these comparisons were reported in Table 2. In addition, we simulated the rejection rates of KWT and ELRT under the original hypothesis, and the results are shown in Table 3. From these results, it can be seen that the simulated powers are quite good for both tests, even for moderate sample sizes with better performance, as sample sizes increase and ELRT performs better than KWT.
On the other hand, we consider that T&P (2006) also performs parameter estimation research based on additional information, so we will separately compare the ELRT proposed by T&P (2006) in this paper. The results are shown in Table 4. We can see some interesting results from the comparison results. For example, T&P (2006) is more dependent on the normal sample, that is to say, when the comparison sample is biased to the normal sample, the test efficacy of T&P (2006) is very effective; when deviating from the normal condition, T&P (2006) showed poor test efficacy compared with ELRT. In other words, the advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons. At the same time, from the perspective of the sample size, ELRT is more suitable for the multi-population consistency test with a small sample size.

Conclusions
In this study, we discussed the consistency test of the population when the population distribution is unknown, and constructed an ELRT statistic for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information.Meanwhile, we proved the distribution of ELRT both theoretically and numerically and calculated the rejection rates of ELRT and compared with those of the Kruskal-Wallis test under several alternatives. In addition, the efficacy of ELRT and T&P (2006) were compared separately.
The results show that, firstly, the asymptotic distribution of ELRT as a chi-square distribution under some regularity conditions when the null hypothesis is true. Secondly, the rejection rates of ELRT are bigger than those of KWT, as the sample sizes increase when the sample is small. In other words, the proposed ELRT could be more powerful than the Kruskal-Wallis test, as extra information can be more efficiently employed by ELRT. Thirdly, the advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons. At the same time, compared with T&P (2006), ELRT is more suitable for multi-population consistency test with small sample size.
This discussion will be applied to the field of biological information. For example, when two samples are from the data of an experimental group and a control group, the statistics we constructed will be able to test whether the experimental processing is effective. If the overall distributions of the two data are equal, it means that the experimental processing is ineffective, otherwise it means that the experimental processing is effective. Although some good main conclusions and simulation results were obtained in this paper, there are still many problems to be further discussed in the future. On one hand, the study presented in this paper is based on simple random samples, so more complex cases (such as mixed cases or dependent samples) should be considered. On the other hand, the simulations of one-parameter distributions were completed in this paper, while the simulations of multi-parameter distributions still need to be completed. Therefore, in the future, we will continue to complete the simulation of population consistency for multi-parameter distributions by ELRT and construct a new ELRT statistics above multi-population consistency under complex samples.

Proofs
We first state a lemma which will be used in the proof of Theorem 1.

Lemma 1.
Let A k = (a ij ) be a k × k (k ≥ 2) symmetric matrix, r i > 0 for all 1 ≤ i ≤ k and ∑ k i=1 r i = 1, where a ii = r −1 i − 1 for 1 ≤ i ≤ k and a ij = −1 for i = j, 1 ≤ i, j ≤ k. Let B k = (b ij ) be a k × k diagonal matrix and C k = B k A k B k with b ii = r 1/2 i for 1 ≤ i ≤ k, then C k = C k and C k is an idempotent matrix with tr(C k ) = k − 1.
Proof of Lemma 1. Let R k = B 2 k and 1 k = (1, 1, · · · , 1) . Then A k = R −1 k − 1 k 1 k . It can be shown that R 1/2 k 1 k 1 k R 1/2 k = ((r i r j ) 1/2 ) k×k , where (r i r j ) 1/2 is the (i, j) element of the matrix. Combining with ∑ k i=1 r i = 1, one can show that ((r i r j ) 1/2 ) k×k is a idempotent matrix. Notice that C k = I k − R 1/2 k 1 k 1 k R 1/2 k . It follows that C k is an idempotent matrix and tr( The proof of Lemma 1 is thus complete. Proof of Theorem 1. Let S 11 = −E{g(X; θ 0 )g (X; θ 0 )}, S 12 = E{∂g(X; θ)/∂θ| θ=θ 0 }, Throughout the proof, we assume that H 0 holds true and the true value of θ is θ 0 . Rewrite λ n as Employing the result in the proof of Theorem 2 in Qin and Lawless [13], we have where with A being an identity matrix. Similarly, for 1 ≤ i ≤ k, It follows that where is the Kronecker product and M k = (m ij ) be a k × k symmetric matrix with m ii = n n i − 1 for 1 ≤ i ≤ k and a ij = −1 for i = j, 1 ≤ i, j ≤ k. Let N k = diag((n/n 1 ) 1/2 , · · · , (n/n k ) 1/2 ) k×k and Z n = N k (−S 11 ) −1/2 (n −1/2 Y 1 , · · · , n −1/2 Y k ) .

(16)
It can be shown, by the properties of the Kronecker product, that where S = N −1 k M k N −1 k (−S 11 ) 1/2 A(−S 11 ) 1/2 . It is clear that (−S 11 ) 1/2 · A · (−S 11 ) 1/2 is symmetric and idempotent with a trace equal to r − p. On the other hand, using Lemma 1, we can see that N −1 k M k N −1 k is symmetric and idempotent with a trace equal to k − 1. It follows that S must be symmetric and idempotent with a trace equal to (r − p)(k − 1).

Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable to this article, as no datasets were generated or analyzed during the current study. All data generated or analyzed in this study are generated by the corresponding probability distribution, and its parameters are presented in Table 2 of the numerical simulation.

Conflicts of Interest:
The authors declare no conflict of interest.