Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure : An Overview

In the last years minimum phi-divergence estimators (MφE) and phi-divergence test statistics (φTS) have been introduced as a very good alternative to classical likelihood ratio test and maximum likelihood estimator for different statistical problems. The main purpose of this paper is to present an overview of the main results presented until now in contingency tables with symmetry structure on the basis of (MφE) and (φTS).


Introduction
An interesting problem in a two-way contingency table is to investigate whether there are symmetric patterns in the data: Cell probabilities on one side of the main diagonal are a mirror image of those on the other side.This problem was first discussed by Bowker [1] who gave the maximum likelihood estimator as well as a large sample chi-square type test for the null hypothesis of symmetry.The minimum discrimination information estimator was proposed in [2] and the minimum chi-squared estimator in [3].In [4][5][6][7] new families of test statistics, based on ϕ-divergence measures, were introduced.These families contain as a particular case the test statistic given by [1] as well as the likelihood ratio test.
Let X and Y denote two ordinal response variables, X and Y having I levels.When we classify subjects on both variables, there are I 2 possible combinations of classifications.The responses (X, Y ) of a subject randomly chosen from some population have a probability distribution.Let p ij = Pr(X = i, Y = j), with p ij > 0, i, j = 1, ..., I.We display this distribution in a rectangular table having I rows for the categories of X and I columns for the categories of Y .Consider a random sample of size n on (X, Y ) and we denote by n ij the observed frequency in the (i, j)th cell for (i, j) ∈ I × I with ∑ I i=1 ∑ I j=1 n ij = n.The classical problem of testing for symmetry is given by versus H * 1 : p ij ̸ = p ji , for at least one (i, j) pair (2) This problem was considered for the first time by Bowker [1] using the Pearson test statistic for which established that X 2 ∼ χ 2 k for large n, where k = 1 2 I(I − 1).In some real problems (i.e., medicine, psychology, sociology, etc.) the categorical response variables (X, Y ) represent the measure after or before a treatment.In such situations our interest is to determine the treatment effect, i.e., if X ≥ Y (we assume that X represents the measure after the treatment and Y before the treatment).In the following we understand that X is preferred or indifferent to Y , according to joint likelihood ratio ordering, if and only if (iff) p ij ≥ p ji ∀i ≥ j.In this situation the alternative hypothesis is This problem was first considered by El Barmi and Kochar [8], who presented the likelihood ratio test for the problem of testing and considered the application of it to a real life problem: They tested if the vision of both the eyes, for 7477 women, is the same against the alternative that the right eye has better vision than the left eye.
In [5] these results were extended using ϕ-divergence measures.
In this paper we present an overview on contingency tables with symmetry structure on the basis of divergence measures.We pay especial attention to the family of ϕ-divergence test statistics for testing H 0 versus H * 1 , H 0 against H 1 and also for testing H 1 against the alternative H 2 of no restrictions over p ij 's, i.e., H 1 : It is interesting to observe that not only we consider ϕ-divergence test statistics but also we consider minimum ϕ-divergence estimators in order to estimate of the parameters of the model.

Hypothesis Testing: H
We define the hypothesis (1) can be written as where the function β is defined by β = (g ij ; i, j = 1, ..., I, (i, j) ̸ = (I, I)) with The maximum likelihood estimator (MLE) of β can be defined as where D KL ( p, p(g(β))) is the Kullback-Leibler divergence measure (see [13,14]) defined by We denote by θ = g( θ) and by p( θ) = (p 11 ( θ), ..., p II ( θ)) T .It is well known that p ij ( θ) = p ij + p ji 2 , i = 1, ..., I, j = 1, ..., I .Using the ideas developed in [15], we can consider the minimum ϕ 2 -divergence estimator (M ϕ 2 E) replacing the Kullback-Leibler divergence by a ϕ 2 -divergence measure in the following way where and we have (see [7,16]) where . The functions h ij are given by It is not difficult to establish that the matrix I S F (θ) can be written as where I F (β) is the Fisher information matrix corresponding to β ∈ B.
If we consider the family of power divergences we get the minimum power-divergence estimator, θ S,λ of θ, under the hypothesis of symmetry, whose expression is given by For λ = 0 we get hence, we obtain the maximum likelihood estimator for symmetry introduced by [1].For λ = −1, we obtain as a limit case , i, j = 1, ..., I i.e., the minimum discrimination estimator for symmetry introduced and studied in [2].For λ = 1 we get the minimum chi-squared estimator for symmetry introduced in [3], We denote θ ϕ 2 = g( β ϕ 2 ) and by the (M ϕ 2 E) of the probability vector that characterizes the symmetry model.Based on p( θ ϕ 2 ) it is possible to define a new family of statistics for testing (1) that contains as a particular case Pearson test statistic as well as likelihood ratio test.This family of statistics is given by We can observe that the family (13) involves two functions ϕ 1 and ϕ 2 , both belonging to Φ * .We use the function ϕ 2 to obtain the (M ϕ 2 E) and ϕ 1 to obtain the family of statistics.If we consider ϕ 1 (x) = 1 2 (x − 1) 2 and ϕ 2 (x) = x log x − x + 1 we get Pearson test statistic whose expression was given in (3) and for ϕ 1 (x) = ϕ 2 (x) = x log x − x + 1 we get the likelihood ratio test given by In the following theorem the asymptotic distribution of Proof.See Chapter 8 in [12].Thus, for a given significance level α ∈ (0, 1), the critical value of T ϕ 1 n ( θ ϕ 2 ) may be approximated by χ 2 m,α , the upper 100α% of the chi-square distribution with m degrees of freedom, i.e., reject the hypothesis of symmetry iff Now we are going to analyze the power of the test.Let q = (q 11 , ..., q II ) T be a point at the alternative hypothesis, i.e., there exist at least two indexes i and j for which q ij ̸ = q ji .We denote by θ ϕ 2 a the point on Θ verifying θ ϕ 2 a = arg min where Θ 0 is given by The notation f ij (q) indicates that the elements of the vector θ ϕ 2 a depend on q.For instance, for the power-divergence family ϕ (λ) (x) we have

., I
We also denote θ S,ϕ 2 = (p S,ϕ 2 ij ; i, j = 1, ..., I, (i, j) ̸ = (I, I)) T and then where f = (f ij ; i, j = 1, ..., I) T .If the alternative q is true we have that p tends to q and p( θ S,ϕ 2 ) to p(θ ϕ 2 a ) in probability.If we define the function we have Then the random variables have the same asymptotic distribution.If we define ∂q ij (16) and l = (l ij ; i, j = 1, ..., I) T , we have where Σ q = diag(q)−qq T .If we consider the maximum likelihood estimator instead of minimum ϕ-divergence estimator, we get )) It is also interesting to observe, if we consider the power divergence measure, that For λ → 0 and λ = 1 we get respectively.Therefore, the corresponding asymptotic variances are given by Based on the previous result we can formulate the following theorem.

Theorem 2
The asymptotic power for the test given in (15), at the alternative q, is given by )) where Φ n (x) is a sequence of distributions functions tending uniformly to the standard normal distribution function Φ (x).
We consider a contiguous sequence of alternative hypotheses that approaches the null hypothesis . Consider the multinomial probability vector , recall that n is the total count parameter of the multinomial distribution and β ∈ B. As n → ∞, the sequence of multinomial probabilities {p n } n∈N with p n = (p n,ij , i = 1, ..., I, j = 1, ..., I) T , converges to a multinomial probability in H 0 at the rate of In the next theorem we present the asymptotic distribution of the family of test statistics defined in (13), under the contiguous alternative hypotheses given in (18).
An interesting a simulation study can be seen in [7].In that study some interesting alternative test statistics appear to the classical Pearson test statistics and likelihood ratio test.
In the expression (19), p is the maximum likelihood estimator (MLE) of p given by p = [ p ij ], where p ij = n ij /n; and p (0) and p (1) denote the MLEs of p under H 0 and H 1 respectively.These MLEs were obtained by [8].Let , for i > j then H 0 : θ ij = 1/2 (for i > j) and H 1 : θ ij ≥ 1/2 (for i > j), and It follows that p (0) and p (1) are given by Then we have ) and D ϕ ( p, p (1) )) To solve the problem of testing H 1 against H 2 , [8] consider the likelihood ratio test statistic ) This statistic is such that T 12 = 2nD KL ( p, p (1) ) where D KL ( p, p (1) ) is the Kullback-Leibler divergence given by (20) with ϕ(x) = ϕ (0) (x) defined above.Then the likelihood ratio test statistic is based on the closeness, in terms of the Kullback-Leibler divergence measure, between the probability distributions p and p (1) .Thus, one could measure the closeness between the two probability distributions using a more general divergence measure if we are able to obtain its asymptotic distribution.One appropriate family of divergence measures for that purpose is the ϕ-divergence measure.As a generalization of the test statistic given in (20) for testing H 1 against H 2 we introduce the family of test statistics To test H 0 against H 1 , El Barmi and Kochar [8] consider the likelihood ratio test statistic ) ) It is clear that (1) ) ) As a generalization of this test statistic we consider in this paper the family of test statistics (1) ) We will refer here to the example of [18,Section 9.5], where the test proposed by Bowker [1] is applied.The proposed tests in this paper may be used in the situation such that it is hoped that a new formulation of a drug will reduce some side-effects.
Example We consider 158 patients who have been treated with the old formulation and records are available of any side-effects.We might now treat each patient with the new formulation and note incidence of side-effects.On the other hand, if we consider the usual Pearson test statistic X 2 , we have that the value of this statistic is 9.33.In this case using the chi-squared distribution with 3 degrees of freedom, the corresponding asymptotic distribution found by Bowker [1], Pr(χ 2  3 > X 2 ) = 0.025.Then for all the considered statistics there is evidence of a differing incidence rate for side-effects under the two formulations, moreover this difference is towards less severe side effects under the new formulation.Therefore, the two considered tests lead to the same conclusion: There is a strong evidence of a bigger incidence rate for side-effects under the old formulation.The conclusion obtained in [18] is in accordance with our conclusion.

Table 1 .
Table1shows a possible outcome for such an experiment.Do the data in Table1provide any evidence regarding a less severity of side-effects with the new formulation of the drug?BA The two test statistics given in (21) and (22) are appropriate for this problem.For Side-effect levels for old and new formulation.01 given in (22) the null hypothesis is that for all off-diagonal counts in the table the associated probabilities are such that all p ij = p ji .The alternative is that p ij ≥ p ji for all i ≥ j.We have computed the members of the family {T λ 01 } given in Remark 7 and the corresponding asymptotic p-values P λ 01 = Pr( χ2 3 > T λ 01 ) which are given in the following table: