Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Pardo, Leandro; Martín, Nirian

doi:10.3390/sym2021108

Open AccessReview

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

by

Leandro Pardo

^1,* and

Nirian Martín

²

¹

Department of Statistics and O.R., Complutense University of Madrid, 28040 Madrid, Spain

²

Department of Statistics, Carlos III University of Madrid, 28903 Getafe (Madrid), Spain

^*

Author to whom correspondence should be addressed.

Symmetry 2010, 2(2), 1108-1120; https://doi.org/10.3390/sym2021108

Submission received: 12 March 2010 / Revised: 7 May 2010 / Accepted: 10 June 2010 / Published: 11 June 2010

(This article belongs to the Special Issue Feature Papers: Symmetry Concepts and Applications)

Download Versions Notes

Abstract

:

In the last years minimum phi-divergence estimators (M

ϕ

E) and phi-divergence test statistics (

ϕ

TS) have been introduced as a very good alternative to classical likelihood ratio test and maximum likelihood estimator for different statistical problems. The main purpose of this paper is to present an overview of the main results presented until now in contingency tables with symmetry structure on the basis of (M

ϕ

E) and (

ϕ

TS).

Keywords:

minimum phi-divergence estimator; phi-divergence statistics; symmetry model

Classification:

MSC 62B10, 62H15

1. Introduction

An interesting problem in a two-way contingency table is to investigate whether there are symmetric patterns in the data: Cell probabilities on one side of the main diagonal are a mirror image of those on the other side. This problem was first discussed by Bowker [1] who gave the maximum likelihood estimator as well as a large sample chi-square type test for the null hypothesis of symmetry. The minimum discrimination information estimator was proposed in [2] and the minimum chi-squared estimator in [3]. In [4,5,6,7] new families of test statistics, based on

ϕ

-divergence measures, were introduced. These families contain as a particular case the test statistic given by [1] as well as the likelihood ratio test.

Let X and Y denote two ordinal response variables, X and Y having I levels. When we classify subjects on both variables, there are

I^{2}

possible combinations of classifications. The responses

(X, Y)

of a subject randomly chosen from some population have a probability distribution. Let

p_{i j} = Pr (X = i, Y = j)

, with

p_{i j} > 0

,

i, j = 1, \dots, I

. We display this distribution in a rectangular table having I rows for the categories of X and I columns for the categories of Y. Consider a random sample of size n on

(X, Y)

and we denote by

n_{i j}

the observed frequency in the

(i, j)

th cell for

(i, j) \in I \times I

with

\sum_{i = 1}^{I} \sum_{j = 1}^{I} n_{i j} = n

. The classical problem of testing for symmetry is given by

H_{0} : p_{i j} = p_{j i}, (i, j) \in I \times I

(1)

versus

H_{1}^{*} : p_{i j} \neq p_{j i}, for at least one (i, j) pair

(2)

This problem was considered for the first time by Bowker [1] using the Pearson test statistic

X^{2} = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i < j \end{matrix}}^{I} \frac{{(n_{i j} - n_{j i})}^{2}}{n_{i j} + n_{j i}}

(3)

for which established that

X^{2} \sim χ_{k}^{2}

for large n, where

k = \frac{1}{2} I (I - 1)

.

In some real problems (i.e., medicine, psychology, sociology, etc.) the categorical response variables

(X, Y)

represent the measure after or before a treatment. In such situations our interest is to determine the treatment effect, i.e., if

X \geq Y

(we assume that X represents the measure after the treatment and Y before the treatment). In the following we understand that X is preferred or indifferent to Y, according to joint likelihood ratio ordering, if and only if (iff)

p_{i j} \geq p_{j i}

\forall i \geq j

. In this situation the alternative hypothesis is

H_{1} : p_{i j} \geq p_{j i}, for all i \geq j

(4)

This problem was first considered by El Barmi and Kochar [8], who presented the likelihood ratio test for the problem of testing

H_{0} : p_{i j} = p_{j i} against H_{1} : p_{i j} \geq p_{j i}, \forall i \geq j

and considered the application of it to a real life problem: They tested if the vision of both the eyes, for 7477 women, is the same against the alternative that the right eye has better vision than the left eye. In [5] these results were extended using

ϕ

-divergence measures.

In this paper we present an overview on contingency tables with symmetry structure on the basis of divergence measures. We pay especial attention to the family of

ϕ

-divergence test statistics for testing

H_{0}

versus

H_{1}^{*}

,

H_{0}

against

H_{1}

and also for testing

H_{1}

against the alternative

H_{2}

of no restrictions over

p_{i j}

’s, i.e.,

H_{1} : p_{i j} \geq p_{j i}, \forall i \geq j, against H_{2} : p_{i j} ≱ p_{j i}, \forall i \geq j

(5)

It is interesting to observe that not only we consider

ϕ

-divergence test statistics but also we consider minimum

ϕ

-divergence estimators in order to estimate of the parameters of the model.

2. Phi-divergences Measures

We consider the set

Θ = {θ : θ = (p_{i j}; 1 \leq i, j \leq I, (i, j) \neq (I, I)) with p_{i j} > 0 and \sum_{i = 1}^{I} \sum_{j = 1, (i, j) \neq (I, I)}^{I} p_{i j} < 1}

(6)

and we denote

p (θ) = p = {(p_{11}, \dots, p_{I I})}^{T}

,

p_{I I} = 1 - \sum_{i = 1}^{I} \sum_{j = 1, (i, j) \neq (I, I)}^{I} p_{i j}

or equivalently by the

I \times I

matrix

p (θ) = [p_{i j}]

.

The

ϕ

-divergence between two probability distributions

p = [p_{i j}]

,

q = [q_{i j}]

was introduced independently by [9] and [10]. It is defined as follows:

D_{ϕ} (p, q) = \sum_{i = 1}^{I} \sum_{j = 1}^{I} q_{i j} ϕ (\frac{p_{i j}}{q_{i j}}), ϕ \in Φ^{*}

(7)

where

Φ^{*}

is the class of all convex functions

ϕ : [0, \infty) \to R \cup \{\infty\}

, such that

ϕ (1) = 0

,

ϕ^{″} (1) > 0

; and we define

0 ϕ (0 / 0) = 0

and

0 ϕ (p / 0) = {lim}_{u \to \infty} ϕ (u) / u

. For every

ϕ \in Φ^{*}

that is differentiable at

x = 1

, the function

ψ

given by

ψ (x) = ϕ (x) - ϕ^{'} (1) (x - 1)

also belongs to

Φ^{*}

. Then we have

D_{ψ} (p, q) = D_{ϕ} (p, q)

, and

ψ

has the additional property that

ψ^{'} (1) = 0

. Because the two divergence measures are equivalent, we can consider the set

Φ^{*}

to be equivalent to the set

Φ \equiv Φ^{*} \cap \{ϕ : ϕ^{'} (1) = 0\}

An important family of

ϕ

-divergences in statistical problems, is the power divergence family

\begin{matrix} ϕ_{(λ)} (x) & = \frac{x^{λ + 1} - x + λ (1 - x)}{λ (λ + 1)}; λ \neq 0, λ \neq - 1 \\ ϕ_{(0)} (x) & = lim_{λ \to 0} ϕ_{(λ)} (x) = x ln x - x + 1 \\ ϕ_{(- 1)} (x) & = lim_{λ \to - 1} ϕ_{(λ)} (x) = - ln x + x - 1 \end{matrix}

(8)

which was introduced and studied by [11]. Notice that

ϕ_{(λ)} \in Φ

. In the following we shall denote the power-divergence measures by

D_{ϕ_{(λ)}} (p, q)

,

λ \in R

. For more details about

ϕ

-divergence measures see [12].

3. Hypothesis Testing: $H_{0}$ versus $H_{1}^{*}$

We define

B = {{(a_{11}, \dots, a_{1 I}, a_{22,} \dots, a_{2 I}, \dots, a_{I - 1 I - 1}, a_{I - 1 I})}^{T} \in R_{+}^{\frac{I (I + 1)}{2} - 1} : \sum_{i \leq j} a_{i j} < 1, i, j = 1, ‥, I}

the hypothesis (1) can be written as

H_{0} : θ = g (β), β = {(p_{11}, \dots, p_{1 I}, p_{22}, \dots, p_{2 I}, \dots, p_{I - 1 I - 1}, p_{I - 1 I})}^{T} \in B

(9)

where the function

β

is defined by

β = (g_{i j};

i, j = 1, \dots, I,

(i, j) \neq (I, I))

with

g_{i j} (β) = \{\begin{matrix} p_{i j}, & i \leq j \\ p_{j i}, & i > j \end{matrix}, i, j = 1, \dots, I - 1

and

\begin{matrix} g_{I j} (β) & = p_{j I}, j = 1, \dots, I - 1 \\ g_{i I} (β) & = p_{i I}, i = 1, \dots, I - 1 \end{matrix}

Note that

p (g (β)) = (g_{i j} (β);

{i, j = 1, \dots, I)}^{T}

, where

g_{I I} (β) = 1 - \sum_{i, j = 1, (i, j) \neq (I, I)}^{I} g_{i j} (β)

The maximum likelihood estimator (MLE) of

β

can be defined as

\hat{β} = arg min_{β \in B} D_{K L} (\hat{p}, p (g (β))) a . s .

where

D_{K L} (\hat{p}, p (g (β)))

is the Kullback–Leibler divergence measure (see [13,14]) defined by

D_{K L} (\hat{p}, p (g (β))) = \sum_{i = 1}^{I} \sum_{j = 1}^{I} {\hat{p}}_{i j} log \frac{{\hat{p}}_{i j}}{g_{i j} (β)}

We denote by

\hat{θ} = g (\hat{θ})

and by

p (\hat{θ}) = {(p_{11} (\hat{θ}), \dots, p_{I I} (\hat{θ}))}^{T}

. It is well known that

p_{i j} (\hat{θ}) = \frac{{\hat{p}}_{i j} + {\hat{p}}_{j i}}{2},

i = 1, \dots, I,

j = 1, \dots, I

. Using the ideas developed in [15], we can consider the minimum

ϕ_{2}

-divergence estimator (

M ϕ_{2} E

) replacing the Kullback–Leibler divergence by a

ϕ_{2}

-divergence measure in the following way

{\hat{θ}}^{ϕ_{2}} = arg min_{β \in B} D_{ϕ_{2}} ((\hat{p}, p (g (β)))); ϕ_{2} \in Φ^{*}

(10)

where

D_{ϕ_{2}} (\hat{p}, p (g (β))) = \sum_{i = 1}^{I} \sum_{j = 1}^{I} g_{i j} (β) ϕ_{2} (\frac{{\hat{p}}_{i j}}{g_{i j} (β)})

We denote

{\hat{θ}}^{S, ϕ_{2}} = g ({\hat{β}}^{ϕ_{2}})

and we have (see [7,16])

\sqrt{n} (g ({\hat{β}}^{ϕ_{2}}) - β) \overset{L}{\underset{n \to \infty}{⟶}} N (0, I_{F}^{S} {(β)}^{- 1})

where

I_{F}^{S} (θ) = Σ_{θ} - Σ_{θ} B {(θ)}^{T} {(B (θ) Σ_{θ}^{T} B (θ))}^{- 1} B (θ) Σ_{θ}

being

Σ_{θ} = diag (θ) - θ θ^{T}

and

B (θ) = {(\frac{\partial h_{i j} (θ)}{\partial θ_{i j}})}_{\frac{I (I - 1)}{2} \times (I^{2} - 1)}

. The functions

h_{i j}

are given by

h_{i j} (θ) = p_{i j} - p_{j i}, i < j, i = 1, \dots, I - 1, j = 1, \dots, I

It is not difficult to establish that the matrix

I_{F}^{S} (θ)

can be written as

I_{F}^{S} (θ) = M_{β}^{T} I_{F} {(β)}^{- 1} M_{β}

where

I_{F} (β)

is the Fisher information matrix corresponding to

β \in B

.

If we consider the family of power divergences we get the minimum power-divergence estimator,

{\hat{θ}}^{S, λ}

of

θ

, under the hypothesis of symmetry, whose expression is given by

{\hat{θ}}_{i j}^{S, λ} = \frac{{(\frac{{\hat{p}}_{i j}^{λ + 1} + {\hat{p}}_{j i}^{λ + 1}}{2})}^{\frac{1}{λ + 1}}}{\sum_{i = 1}^{I} \sum_{j = 1}^{I} {(\frac{{\hat{p}}_{i j}^{λ + 1} + {\hat{p}}_{j i}^{λ + 1}}{2})}^{\frac{1}{λ + 1}}}, i, j = 1, \dots, I

(11)

For

λ = 0

we get

{\hat{θ}}_{i j}^{S, 0} = \frac{{\hat{p}}_{i j} + {\hat{p}}_{j i}}{2}, i, j = 1, \dots, I

hence, we obtain the maximum likelihood estimator for symmetry introduced by [1]. For

λ = - 1

, we obtain as a limit case

{\hat{θ}}_{i j}^{S, - 1} = \frac{{({\hat{p}}_{i j} {\hat{p}}_{j i})}^{\frac{1}{2}}}{\sum_{i = 1}^{I} \sum_{j = 1}^{I} {({\hat{p}}_{i j} {\hat{p}}_{j i})}^{\frac{1}{2}}}, i, j = 1, \dots, I

i.e., the minimum discrimination estimator for symmetry introduced and studied in [2]. For

λ = 1

we get the minimum chi-squared estimator for symmetry introduced in [3],

{\hat{θ}}_{i j}^{S, 1} = \frac{{(\frac{{\hat{p}}_{i j}^{2} + {\hat{p}}_{j i}^{2}}{2})}^{1 / 2}}{\sum_{i = 1}^{I} \sum_{j = 1}^{I} {(\frac{{\hat{p}}_{i j}^{2} + {\hat{p}}_{j i}^{2}}{2})}^{1 / 2}}

We denote

{\hat{θ}}^{ϕ_{2}} = g ({\hat{β}}^{ϕ_{2}})

and by

p ({\hat{θ}}^{ϕ_{2}}) = {(p_{11} ({\hat{θ}}^{ϕ_{2}}), \dots ., p_{I I} ({\hat{θ}}^{ϕ_{2}}))}^{T}

(12)

the (

M ϕ_{2} E

) of the probability vector that characterizes the symmetry model. Based on

p ({\hat{θ}}^{ϕ_{2}})

it is possible to define a new family of statistics for testing (1) that contains as a particular case Pearson test statistic as well as likelihood ratio test. This family of statistics is given by

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}}) \equiv \frac{2 n}{ϕ_{1}^{″} (1)} D_{ϕ_{1}} (\hat{p}, p ({\hat{θ}}^{ϕ_{2}})) = \frac{2 n}{ϕ_{1}^{″} (1)} \sum_{i = 1}^{I} \sum_{j = 1}^{I} p_{i j} ({\hat{θ}}^{ϕ_{2}}) ϕ_{1} (\frac{{\hat{p}}_{i j}}{p_{i j} ({\hat{θ}}^{ϕ_{2}})})

(13)

We can observe that the family (13) involves two functions

ϕ_{1}

and

ϕ_{2}

, both belonging to

Φ^{*}

. We use the function

ϕ_{2}

to obtain the (

M ϕ_{2} E

) and

ϕ_{1}

to obtain the family of statistics. If we consider

ϕ_{1} (x) = \frac{1}{2} {(x - 1)}^{2}

and

ϕ_{2} (x) = x log x - x + 1

we get Pearson test statistic whose expression was given in (3) and for

ϕ_{1} (x) = ϕ_{2} (x) = x log x - x + 1

we get the likelihood ratio test given by

G^{2} = 2 \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i < j \end{matrix}}^{I} n_{i j} log \frac{2 n_{i j}}{n_{j i} + n_{i j}}

(14)

In the following theorem the asymptotic distribution of

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}})

is obtained.

Theorem 1

The asymptotic distribution of

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}})

is chi-squared with

m = I (I - 1) / 2

degrees of freedom.

Proof.

See Chapter 8 in [12].

Thus, for a given significance level

α \in (0, 1)

, the critical value of

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}})

may be approximated by

χ_{m, α}^{2}

, the upper

100 α %

of the chi-square distribution with m degrees of freedom, i.e., reject the hypothesis of symmetry iff

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}}) \geq χ_{m, α}^{2}

(15)

Now we are going to analyze the power of the test. Let

q = {(q_{11}, \dots, q_{I I})}^{T}

be a point at the alternative hypothesis, i.e., there exist at least two indexes i and j for which

q_{i j} \neq q_{j i}

. We denote by

θ_{a}^{ϕ_{2}}

the point on

Θ

verifying

θ_{a}^{ϕ_{2}} = \underset{θ \in Θ_{0}}{arg min} D_{ϕ_{2}} (q, p (θ))

where

Θ_{0}

is given by

Θ_{0} = {θ \in Θ : θ = g (β) for some β \in B}

It is clear that

θ_{a}^{ϕ_{2}} = (f_{i j} {(q); i, j = 1, \dots, I, (i, j) \neq (I, I))}^{T}

and

p ({\tilde{θ}}_{a}^{ϕ_{2}}) = {(f_{i j} (q); i, j = 1, \dots, I,)}^{T} \equiv f (q)

with

f_{I I} (q) = 1 - \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ (i, j) \neq (I, I) \end{matrix}}^{I} f_{i j} (q)

The notation

f_{i j} (q)

indicates that the elements of the vector

θ_{a}^{ϕ_{2}}

depend on

q

. For instance, for the power-divergence family

ϕ_{(λ)} (x)

we have

f_{i j} (q) = \frac{{(q_{i j}^{λ + 1} + q_{j i}^{λ + 1})}^{\frac{1}{λ + 1}}}{\sum_{i = 1}^{I} \sum_{j = 1}^{I} {(q_{i j}^{λ + 1} + q_{j i}^{λ + 1})}^{\frac{1}{λ + 1}}}, i, j = 1, \dots, I

We also denote

{\hat{θ}}^{S, ϕ_{2}} = {(p_{i j}^{S, ϕ_{2}}; i, j = 1, \dots, I, (i, j) \neq (I, I))}^{T}

and then

p ({\hat{θ}}^{S, ϕ_{2}}) = {(p_{i j}^{S, ϕ_{2}}; i, j = 1, \dots, I)}^{T} \equiv f (\hat{p})

where

f = (f_{i j};

{i, j = 1, \dots, I)}^{T}

. If the alternative

q

is true we have that

\hat{p}

tends to

q

and

p ({\hat{θ}}^{S, ϕ_{2}})

to

p (θ_{a}^{ϕ_{2}})

in probability.

If we define the function

Ψ_{ϕ_{1}} (q) = D_{ϕ_{1}} (q, f (q))

we have

Ψ_{ϕ_{1}} (\hat{p}) = Ψ_{ϕ_{1}} (q) + \sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{\partial D_{ϕ_{1}} (q, f (q))}{\partial q_{i j}} ({\hat{p}}_{i j} - q_{i j}) + o (∥\hat{p} - q∥)

Then the random variables

\sqrt{n} (D_{ϕ_{1}} (\hat{p}, f (\hat{p})) - D_{ϕ_{1}} (q, f (q)))

and

\sqrt{n} \sum_{i = 1}^{I} \sum_{j = 1}^{J} \frac{\partial D_{ϕ_{1}} (q, g (q))}{\partial q_{i j}} ({\hat{p}}_{i j} - q_{i j})

have the same asymptotic distribution. If we define

l_{i j} = \frac{\partial D_{ϕ_{1}} (q, f (q))}{\partial q_{i j}}

(16)

and

l = {(l_{i j}; i, j = 1, \dots, I)}^{T}

, we have

\sqrt{n} (D_{ϕ_{1}} (\hat{p}, f \hat{(p})) - D_{ϕ_{1}} (q, f (q))) \overset{L}{\underset{n \to \infty}{⟶}} N (0, l^{T} Σ_{q} l)

(17)

where

Σ_{q} = diag (q) {- qq}^{T}

.

If we consider the maximum likelihood estimator instead of minimum

ϕ

-divergence estimator, we get

l^{T} Σ_{q} l = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} q_{i j} {(m_{i j}^{ϕ_{1}})}^{2} - {(\sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} q_{i j} m_{i j}^{ϕ_{1}})}^{2}

where

m_{i j}^{ϕ_{1}} = \frac{1}{2} ϕ_{1} (\frac{2 q_{i j}}{q_{i j} + q_{j i}}) + \frac{1}{2} ϕ_{1} (\frac{2 q_{j i}}{q_{i j} + q_{j i}}) + \frac{q_{i j}}{q_{i j} + q_{j i}} (ϕ_{1}^{'} (\frac{2 q_{i j}}{q_{i j} + q_{j i}}) - ϕ_{1}^{'} (\frac{2 q_{j i}}{q_{i j} + q_{j i}}))

It is also interesting to observe, if we consider the power divergence measure, that

\begin{matrix} m_{i j}^{λ} & = \frac{1}{2 λ (λ + 1)} ({(\frac{2 p_{i j}}{p_{i j} + p_{j i}})}^{λ + 1} + {(\frac{2 p_{j i}}{p_{i j} + p_{j i}})}^{λ + 1} - 2) \\ + \frac{1}{λ} \frac{p_{j i}}{p_{i j} + p_{j i}} ({(\frac{2 p_{i j}}{p_{i j} + p_{j i}})}^{λ} - {(\frac{2 p_{j i}}{p_{i j} + p_{j i}})}^{λ}) \end{matrix}

For

λ \to 0

and

λ = 1

we get

m_{i j}^{0} = log \frac{2 p_{i j}}{p_{i j} + p_{j i}} and m_{i j}^{1} = \frac{p_{i j}^{2} - 3 p_{j i}^{2} + 2 p_{i j} p_{j i}}{2 {(p_{i j} + p_{j i})}^{2}}

respectively. Therefore, the corresponding asymptotic variances are given by

σ_{(0)}^{2} = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} p_{i j} {(log \frac{2 p_{i j}}{p_{i j} + p_{j i}})}^{2} - {(\sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} p_{i j} log \frac{2 p_{i j}}{p_{i j} + p_{j i}})}^{2}

and

σ_{(1)}^{2} = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} p_{i j} {(\frac{p_{i j}^{2} - 3 p_{j i}^{2} + 2 p_{i j} p_{j i}}{2 {(p_{i j} + p_{j i})}^{2}})}^{2} - {(\sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} p_{i j} \frac{p_{i j}^{2} - 3 p_{j i}^{2} + 2 p_{i j} p_{j i}}{2 {(p_{i j} + p_{j i})}^{2}})}^{2}

Based on the previous result we can formulate the following theorem.

Theorem 2

The asymptotic power for the test given in (15), at the alternative

q

, is given by

β_{n, ϕ_{1}, ϕ_{2}} (q {) = 1 - Φ}_{n} (\frac{1}{l^{T} Σ_{q} l} (\frac{ϕ_{1}^{″} (1)}{2 \sqrt{n}} χ_{m, α}^{2} - \sqrt{n} D_{ϕ_{1}} (q, f (θ_{a}^{ϕ_{2}}))))

where

Φ_{n} (x)

is a sequence of distributions functions tending uniformly to the standard normal distribution function

Φ (x)

.

We consider a contiguous sequence of alternative hypotheses that approaches the null hypothesis

H_{0} : θ = p (g (β))

, for some unknown

β \in B

, at the rate

O (n^{- 1 / 2})

. Consider the multinomial probability vector

p_{n, i j} = p_{i j} (g (β)) + d_{i j} n^{- 1 / 2}, i = 1, \dots, I, j = 1, \dots, I

where

d = {(d_{11}, \dots, d_{I I})}^{T}

is a fixed

I^{2} \times 1

vector such that

\sum_{i = 1}^{I} \sum_{j = 1}^{I} d_{i j} = 0

, recall that n is the total count parameter of the multinomial distribution and

β \in B

. As

n \to \infty

, the sequence of multinomial probabilities

{p_{n}}_{n \in N}

with

p_{n} = {(p_{n, i j}, i = 1, \dots, I, j = 1, \dots, I)}^{T}

, converges to a multinomial probability in

H_{0}

at the rate of

O (n^{- 1 / 2})

. Let

H_{1, n} : p_{n} = p (g (β)) + d n^{- 1 / 2}, β \in B

(18)

In the next theorem we present the asymptotic distribution of the family of test statistics

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}})

defined in (13), under the contiguous alternative hypotheses given in (18).

Theorem 3

Under

H_{1, n}

, given in (18), the family of test statistics

T_{n}^{ϕ_{1}} ({\hat{θ}}^{ϕ_{2}})

is asymptotically noncentrality chi-squared distributed with

I (I - 1) / 2

degrees of freedom and noncentrality parameter

δ = \frac{1}{2} \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{I} \frac{d_{i j}^{2}}{p_{i j}} - \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i < j \end{matrix}}^{I} \frac{d_{i j} d_{j i}}{p_{i j}}

Proof.

See Chapter 8 in [12].

An interesting a simulation study can be seen in [7]. In that study some interesting alternative test statistics appear to the classical Pearson test statistics and likelihood ratio test.

4. Hypothesis Testing: $H_{0}$ versus $H_{1}$ and $H_{1}$ versus $H_{2}$

In this section we consider the three hypotheses,

H_{0}

,

H_{1}

,

H_{2}

given in (1), (4), (5) respectively and some test statistics based on

ϕ

-divergence

D_{ϕ} (\hat{p}, {\hat{p}}^{(0)}) and D_{ϕ} (\hat{p}, {\hat{p}}^{(1)})

(19)

for testing

H_{0}

against

H_{1}

and

H_{1}

against

H_{2}

.

In the expression (19),

\hat{p}

is the maximum likelihood estimator (MLE) of

p

given by

\hat{p} = [{\hat{p}}_{i j}]

, where

{\hat{p}}_{i j} = n_{i j} / n;

and

{\hat{p}}^{(0)}

and

{\hat{p}}^{(1)}

denote the MLEs of

p

under

H_{0}

and

H_{1}

respectively. These MLEs were obtained by [8]. Let

θ_{i j} = \frac{p_{i j}}{p_{i j} + p_{j i}}, for i > j

then

H_{0} : θ_{i j} = 1 / 2

(for

i > j)

and

H_{1} : θ_{i j} \geq 1 / 2

(for

i > j)

, and

{\hat{θ}}_{i j} = \frac{n_{i j}}{n_{i j} + n_{j i}}, {\hat{θ}}_{i j}^{(0)} = 1 / 2 and {\hat{θ}}_{i j}^{(1)} = max (\frac{n_{i j}}{n_{i j} + n_{j i}}, \frac{1}{2}) for i > j

It follows that

{\hat{p}}^{(0)}

and

{\hat{p}}^{(1)}

are given by

{\hat{p}}_{i j}^{(0)} = \frac{n_{i j} + n_{j i}}{2 n} and {\hat{p}}_{i j}^{(1)} = \frac{n_{i j} + n_{j i}}{n} max (\frac{n_{i j}}{n_{i j} + n_{j i}}, \frac{1}{2})

Then we have

D_{ϕ} (\hat{p}, {\hat{p}}^{(0)}) = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i > j \end{matrix}}^{I} \frac{n_{i j} + n_{j i}}{2 n} (ϕ (2 {\hat{θ}}_{i j}) + ϕ (2 (1 - {\hat{θ}}_{i j})))

and

D_{ϕ} (\hat{p}, {\hat{p}}^{(1)}) = \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i > j \end{matrix}}^{I} \frac{n_{i j} + n_{j i}}{n} ({\hat{θ}}_{i j}^{(1)} ϕ (\frac{{\hat{θ}}_{i j}}{{\hat{θ}}_{i j}^{(1)}}) + (1 - {\hat{θ}}_{i j}^{(1)}) ϕ (\frac{(1 - {\hat{θ}}_{i j})}{(1 - {\hat{θ}}_{i j}^{(1)})}))

To solve the problem of testing

H_{1}

against

H_{2}

, [8] consider the likelihood ratio test statistic

T_{12} = 2 \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i > j \end{matrix}}^{I} (n_{i j} ln {\hat{θ}}_{i j} + n_{j i} ln (1 - {\hat{θ}}_{i j}) - n_{i j} ln ({\hat{θ}}_{i j}^{(1)}) - n_{j i} ln (1 - {\hat{θ}}_{i j}^{(1)}))

This statistic is such that

T_{12} = 2 n D_{K L} (\hat{p}, {\hat{p}}^{(1)})

(20)

where

D_{K L} (\hat{p}, {\hat{p}}^{(1)})

is the Kullback–Leibler divergence given by (20) with

ϕ (x) = ϕ_{(0)} (x)

defined above. Then the likelihood ratio test statistic is based on the closeness, in terms of the Kullback–Leibler divergence measure, between the probability distributions

\hat{p}

and

{\hat{p}}^{(1)}

. Thus, one could measure the closeness between the two probability distributions using a more general divergence measure if we are able to obtain its asymptotic distribution. One appropriate family of divergence measures for that purpose is the

ϕ

-divergence measure.

As a generalization of the test statistic given in (20) for testing

H_{1}

against

H_{2}

we introduce the family of test statistics

T_{12}^{ϕ} = \frac{2 n}{ϕ^{″} (1)} D_{ϕ} (\hat{p}, {\hat{p}}^{(1)})

(21)

To test

H_{0}

against

H_{1}

, El Barmi and Kochar [8] consider the likelihood ratio test statistic

T_{01} = 2 \sum_{i = 1}^{I} \sum_{\begin{matrix} j = 1 \\ i > j \end{matrix}}^{I} (n_{i j} ln {\hat{θ}}_{i j}^{(1)} + n_{j i} ln (1 - {\hat{θ}}_{i j}^{(1)}) - n_{i j} ln (\frac{1}{2}) - n_{j i} ln (\frac{1}{2}))

It is clear that

T_{01} = 2 n (D_{K L} (\hat{p}, {\hat{p}}^{(0)}) - D_{K L} (\hat{p}, {\hat{p}}^{(1)}))

As a generalization of this test statistic we consider in this paper the family of test statistics

T_{01}^{ϕ} = \frac{2 n}{ϕ^{″} (1)} (D_{ϕ} (\hat{p}, {\hat{p}}^{(0)}) - D_{ϕ} (\hat{p}, {\hat{p}}^{(1)}))

(22)

If

ϕ = ϕ_{(0)}

then

T_{12}^{ϕ} = T_{12}

and

T_{01}^{ϕ} = T_{01}

, and hence the families of test statistics

T_{12}^{ϕ}

and

T_{01}^{ϕ}

can be considered as generalizations of the test statistics

T_{12}

and

T_{01}

respectively.

In order to get the asymptotic distribution of the test statistics given in (21) and (22), we first define the so-called chi-bar squared distribution with n degrees of freedom, denoted by

{\bar{χ}}_{n}^{2}

.

Definition 4

Let

U = max (0, Z)

where

Z \sim N (0, 1)

, so that the c.d.f. of U is given by

F_{U} (u) = \{\begin{matrix} Φ (u), & u > 0 \\ 0, & u < 0 \end{matrix}

where Φ denotes the standard normal cumulative distribution function. Let

V = \sum_{k = 1}^{n} U_{k}^{2}

, where

U_{1}, \dots, U_{n}

are independent and distributed like U, then

V \sim {\bar{χ}}_{n}^{2}

.

It is readily shown that

E (V) = \frac{1}{2} n and Var (V) = \frac{5}{4} n

This distribution is related to

χ^{2}

distribution. It can be readily shown that

Pr (V > v) = \sum_{l = 0}^{n} (\binom{n}{l}) \frac{1}{2^{n}} Pr (χ_{l}^{2} > v)

by conditioning on L, the number of non-zero

U_{i}

s.

Furthermore, like the

χ_{n}^{2}

distribution, the

{\bar{χ}}_{n}^{2}

distribution is stochastically increasing with n. If

V \sim {\bar{χ}}_{n}^{2}

and

V \sim {\bar{χ}}_{n^{'}}^{2}

, where

n < n^{'}

, then V is stochastically smaller than

V^{'} : Pr (V > t) \leq Pr (V^{'} > t)

. This follows since

V^{'} \sim V + W

where

W \sim {\bar{χ}}_{n - n^{'}}^{2}

with V and W independent. For more details about the chi-bar squared distribution see [17].

The following theorem present the asymptotic distribution of

T_{01}^{ϕ} .

Theorem 5

Under

H_{0}

,

T_{01}^{ϕ} \overset{L}{⟶}

{\bar{χ}}_{K}^{2}

as

n \to \infty

where

K = I (I - 1) / 2

.

Proof.

See [7].

If we consider the family of power divergences given in (8), we have the power divergence family of test statistics defined as

T_{01}^{λ} = T_{01}^{ϕ_{(λ)}}

which can be used for testing

H_{0}

against

H_{1}

. Therefore some important statistics can now be expressed as members of the power divergence family of test statistics

T_{01}^{λ}

, that is,

T_{01}^{1}

is the Pearson test statistic (

X_{01}^{2}

),

T_{01}^{- 1 / 2}

is the Freeman–Tukey test statistic (

F_{01}^{2}

),

T_{01}^{- 2}

is the Neyman-modified test statistic (

N M_{01}^{2}

),

T_{01}^{- 1}

is the modified loglikelihood ratio test statistic (

N G_{01}^{2}

),

T_{01}^{0}

is the loglikelihood ratio test statistic (

G_{01}^{2}

) introduced by [8] and

T_{01}^{2 / 3}

is Cressie–Read test statistic (see [11]).

Theorem 6

Under

H_{1}

,

T_{12}^{ϕ} \overset{L}{⟶} {\bar{χ}}_{M}^{2}

as

n \to \infty

, where M is the number of elements in the set

{(i, j) : i > j,

p_{i j} = p_{j i}} \leq K = \frac{1}{2} I (I - 1)

and

lim_{n \to \infty} Pr (T_{12}^{ϕ} \geq t) \leq \sum_{l = 0}^{K} (\binom{K}{l}) \frac{1}{2^{K}} Pr (χ_{l}^{2} \geq t)

If we consider the family of power divergences given in (8), we have the power divergence family of test statistics defined as

T_{12}^{λ} = T_{12}^{ϕ_{(λ)}}

which we can use for testing

H_{1}

against

H_{2} .

Remark 7

In the same way as previously we can obtain the test statistics

T_{12}^{0} = G_{12}^{2}

,

T_{12}^{- 1} = N G_{12}^{2}

,

T_{12}^{1} = X_{12}^{2}

,

T_{12}^{- 1 / 2} = F_{12}^{2}

,

T_{12}^{- 2} = N M_{12}^{2}

and

T_{12}^{2 / 3}

.

We will refer here to the example of [18, Section 9.5], where the test proposed by Bowker [1] is applied. The proposed tests in this paper may be used in the situation such that it is hoped that a new formulation of a drug will reduce some side-effects.

Example

We consider 158 patients who have been treated with the old formulation and records are available of any side-effects. We might now treat each patient with the new formulation and note incidence of side-effects. Table 1 shows a possible outcome for such an experiment. Do the data in Table 1 provide any evidence regarding a less severity of side-effects with the new formulation of the drug? BA The two test statistics given in (21) and (22) are appropriate for this problem. For the test statistic

T_{01}^{ϕ}

given in (22) the null hypothesis is that for all off-diagonal counts in the table the associated probabilities are such that all

p_{i j} = p_{j i} .

The alternative is that

p_{i j} \geq p_{j i}

for all

i \geq j

. We have computed the members of the family

{T_{01}^{λ}}

given in Remark 7 and the corresponding asymptotic p-values

P_{01}^{λ} = Pr ({\bar{χ}}_{3}^{2} > T_{01}^{λ})

which are given in the following table:

On the other hand, if we consider the usual Pearson test statistic

X^{2}

, we have that the value of this statistic is

9.33

. In this case using the chi-squared distribution with 3 degrees of freedom, the corresponding asymptotic distribution found by Bowker [1],

Pr (χ_{3}^{2} > X^{2}) = 0.025

. Then for all the considered statistics there is evidence of a differing incidence rate for side-effects under the two formulations, moreover this difference is towards less severe side effects under the new formulation. Therefore, the two considered tests lead to the same conclusion: There is a strong evidence of a bigger incidence rate for side-effects under the old formulation. The conclusion obtained in [18] is in accordance with our conclusion.

Acknowledgements

This work was supported by Grants MTM 2009-10072 and BSCH-UCM 2008-910707.

References

Bowker, A. A test for symmetry in contingency tables. J. Am. Statist. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
Ireland, C.T.; Ku, H.H.; Koch, G.G. Symmetry and marginal homogeneity of an r × r contingency table. J. Am. Statist. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
Quade, D.; Salama, I.A. A note on minimum chi-square statistics in contingency tables. Biometrics 1975, 31, 953–956. [Google Scholar] [CrossRef]
Menéndez, M.L.; Pardo, J.A.; Pardo, L. Tests based on ϕ-divergences for bivariate symmetry. Metrika 2001, 53, 15–29. [Google Scholar]
Menéndez, M.L.; Pardo, J.A.; Pardo, L. Tests for bivariate symmetry against ordered alternatives in square contingency tables. Aust. N. Z. J. Stat. 2003, 45, 115–124. [Google Scholar] [CrossRef]
Menéndez, M.L.; Pardo, J.A.; Pardo, L. Tests of symmetry in three-dimensional contingency tables based on phi-divergence statistics. J. Appl. Stat. 2004, 31, 1095–1114. [Google Scholar] [CrossRef]
Menéndez, M.L.; Pardo, J.A.; Pardo, L.; Zografos, K. On tests of symmetry, marginal homogeneity and quasi-symmetry in two contingency tables based on minimum ϕ-divergence estimator with constraints. J. Stat. Comput. Sim. 2005, 75, 555–580. [Google Scholar] [CrossRef]
El Barmi, H.; Kochar, S.C. Likelihood ratio tests for symmetry against ordered alternatives in a square contingency table. Stat. Probab. Lett. 1995, 2, 167–173. [Google Scholar] [CrossRef]
Csiszàr, I. Eine Informationstheorestiche Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten; The Mathematical Institute of Hungarian Academy of Sciences: Budapest, Hungary, 1963; Volume 8, pp. 84–108. [Google Scholar]
Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Ser. B Stat. Met. 1966, 28, 131–142. [Google Scholar] [CrossRef]
Cressie, N.; Read, T.R.C. Multinomial goodness-of-fit tests. J. Roy. Stat. Soc. Ser. B Stat. Met. 1984, 46, 440–464. [Google Scholar] [CrossRef]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
Kullback, S. Information Theory and Statistics; John Wiley: New York, NY, USA, 1959. [Google Scholar]
Kullback, S. Marginal homogeneity of multidimensional contingency tables. Ann. Math. Stat. 1971, 42, 594–606. [Google Scholar] [CrossRef]
Morales, D.; Pardo, L.; Vajda, I. Asymptotic divergences of estimates of discrete distributions. J. Stat. Plan. Infer. 1995, 48, 347–369. [Google Scholar] [CrossRef]
Pardo, J.A.; Pardo, L.; Zografos, K. Minimum ϕ-divergence estimators with constraints in multinomial populations. J. Stat. Plan. Infer. 2002, 104, 221–237. [Google Scholar] [CrossRef]
Robertson, T.; Wrigt, F.T.; Dykstra, R.L. Order Restricted Statistical Inference; Wiley: New York, NY, USA, 1988. [Google Scholar]
Sprent, P. Applied Nonparametric Statistical Methods; Chapman & Hall: London, UK, 1993. [Google Scholar]

Table 1. Side-effect levels for old and new formulation.

Table 2. Asymptotic p-values for

T_{01}^{λ}

.

Table 2. Asymptotic p-values for

T_{01}^{λ}

.

λ	−2	−1	−1/2	0	2/3	1
$T_{01}^{λ}$	14.43	11.48	10.58	9.96	9.46	9.33
$P_{01}^{λ}$	0.001	0.003	0.004	0.006	0.007	0.008

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

Share and Cite

MDPI and ACS Style

Pardo, L.; Martín, N. Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview. Symmetry 2010, 2, 1108-1120. https://doi.org/10.3390/sym2021108

AMA Style

Pardo L, Martín N. Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview. Symmetry. 2010; 2(2):1108-1120. https://doi.org/10.3390/sym2021108

Chicago/Turabian Style

Pardo, Leandro, and Nirian Martín. 2010. "Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview" Symmetry 2, no. 2: 1108-1120. https://doi.org/10.3390/sym2021108

APA Style

Pardo, L., & Martín, N. (2010). Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview. Symmetry, 2(2), 1108-1120. https://doi.org/10.3390/sym2021108

Article Menu

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Abstract

1. Introduction

2. Phi-divergences Measures

3. Hypothesis Testing: $H_{0}$ versus $H_{1}^{*}$

4. Hypothesis Testing: $H_{0}$ versus $H_{1}$ and $H_{1}$ versus $H_{2}$

Acknowledgements

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Minimum Phi-Divergence Estimators and Phi-Divergence Test Statistics in Contingency Tables with Symmetry Structure: An Overview

Abstract

1. Introduction

2. Phi-divergences Measures

3. Hypothesis Testing: H 0 versus H 1 *

4. Hypothesis Testing: H 0 versus H 1 and H 1 versus H 2

Acknowledgements

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Hypothesis Testing: $H_{0}$ versus $H_{1}^{*}$

4. Hypothesis Testing: $H_{0}$ versus $H_{1}$ and $H_{1}$ versus $H_{2}$