Contingency Table Analysis and Inference via Double Index Measures

Meselidis, Christos; Karagrigoriou, Alex

doi:10.3390/e24040477

Open AccessArticle

Contingency Table Analysis and Inference via Double Index Measures

by

Christos Meselidis

^*

and

Alex Karagrigoriou

Laboratory of Statistics and Data Analysis, Department of Statistics and Actuarial-Financial Mathematics, University of the Aegean, Karlovasi, GR-83200 Samos, Greece

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(4), 477; https://doi.org/10.3390/e24040477

Submission received: 20 February 2022 / Revised: 17 March 2022 / Accepted: 28 March 2022 / Published: 29 March 2022

(This article belongs to the Special Issue Information and Divergence Measures)

Download

Browse Figures

Versions Notes

Abstract

:

In this work, we focus on a general family of measures of divergence for estimation and testing with emphasis on conditional independence in cross tabulations. For this purpose, a restricted minimum divergence estimator is used for the estimation of parameters under constraints and a new double index (dual) divergence test statistic is introduced and thoroughly examined. The associated asymptotic theory is provided and the advantages and practical implications are explored via simulation studies.

Keywords:

double index divergence test statistic; multivariate data analysis; conditional independence; cross tabulations

1. Introduction

The concept of distance or divergence is known since at least the time of Pearson, who, in 1900, considered the classical goodness-of-fit (gof) problem by considering the distance between observed and expected frequencies. The problem for both discrete and discretized continuous distributions have been in the center of attention for the last 100+ years. The classical set-up is the one considered by Pearson where a hypothesized m-dimensional multinomial distribution, say

M u l t i (N, p_{1}, \dots, p_{m})

is examined as being the underlying distributional mechanism for producing a given sample of size N. The problem can be extended to examine the homogeneity (in terms of the distributional mechanisms) among two independent samples or the independence among two population characteristics. In all such problems we are dealing with cross tabulations or crosstabs (or contingency tables). Problems of such nature appear frequently in a great variety of fields including biosciences, socio-economic and political sciences, actuarial science, finance, business, accounting, and marketing. The need to establish for instance, whether the mechanisms producing two phenomena are the same or not is vital for altering economic policies, preventing socio-economic crises or enforcing the same economic or financial decisions to groups with similar underlying mechanisms (e.g., retaining the insurance premium in case of similarity or having different premiums in case of diversity). It is important to note that divergence measures play a pivotal role also in statistical inference in continuous settings. Indeed, for example, in [1] the authors investigate the multivariate normal case while in a recent work [2], the modified skew-normal-Cauchy (MSNC) distribution is considered, against normality.

Let us consider the general case of two m-dimensional multinomial distributions for which each probability depends on an s-dimensional unknown parameter, say

θ

= {(θ_{1}, \dots, θ_{s})}^{⊤}

. A general family of measures introduced by [3] is the

d_{Φ}^{α}

family defined by

d_{Φ}^{α} (p (θ), q (θ)) = \sum_{i = 1}^{m} q_{i} {(θ)}^{1 + α} Φ (\frac{p_{i} (θ)}{q_{i} (θ)}); α > 0, Φ \in F^{*}

(1)

where

α

is a positive indicator (index) value,

p (θ) = {(p_{1} (θ), \dots, p_{m} (θ))}^{⊤}

and

q (θ) = (q_{1} (θ),

\dots,

q_{m} {(θ))}^{⊤}

,

F^{*}

is a class of functions s.t.

F^{*} = {Φ (\cdot) : Φ (x)

strictly convex,

x \in R^{+},

Φ (1) = Φ^{'} (1) = 0,

Φ^{″} (1) \neq 0

and by convention,

Φ (0 / 0) = 0

and

0 Φ (p / 0)

=

{lim}_{x \to \infty} [Φ (x) / x]}

.

Note that the well known Csiszar family of measures [4] is obtained for the special case where the indicator is taken to be equal to 0 while the classical Kullback–Leibler (KL) distance [5] is obtained if the indicator

α

is equal to 0 and at the same time the function

Φ (\cdot)

is taken to be

Φ (x) \equiv Φ_{K L} (x) =

x log (x) or

x log (x) - x + 1

.

The function

Φ_{λ} (x) = \frac{1}{λ (λ + 1)} [x (x^{λ} - 1) - λ (x - 1)] \in F^{*}, λ \neq 0, - 1

is associated with the Freeman–Tukey test when

λ = - 1 / 2

, with the recommended Cressie and Read (CR) power divergence [6] when

λ = 2 / 3

, with the Pearson’s chi-squared divergence [7] when

λ = 1

and with the classical KL distance when

λ \to 0

.

Finally, the function

Φ_{α} (x) \equiv (λ + 1) Φ_{λ} {(x) |}_{λ = α} = \frac{1}{α} [x (x^{α} - 1) - α (x - 1)], α \neq 0

produces the BHHJ or

Φ_{α}

-power divergence [8] given by

d_{Φ_{α}}^{α} (p (θ), q (θ)) = \sum_{i = 1}^{m} q_{i}^{α} (θ) \{q_{i} (θ) - p_{i} (θ)\} + \frac{1}{α} \sum_{i = 1}^{m} p_{i} (θ) \{p_{i}^{α} (θ) - q_{i}^{α} (θ)\} .

Assume that the underlying true distribution of an m-dimensional multinomial random variable with N experiments, is

X = {(X_{1}, \dots, X_{m})}^{⊤} \sim M u l t i (N, p = {(p_{1}, \dots, p_{m})}^{⊤})

where

p

is, in general, unknown, belonging to the parametric family

P = \{p (θ) = {(p_{1} (θ), \dots, p_{m} (θ))}^{⊤} : θ = {(θ_{1}, \dots, θ_{s})}^{⊤} \in Θ \subset R^{s}\} .

(2)

The sample estimate

\hat{p} = {({\hat{p}}_{1}, \dots, {\hat{p}}_{m})}^{⊤}

of

p

is easily obtained by

{\hat{p}}_{i} = x_{i} / N

where

x_{i}

is the observed frequency for the i-th category (or class).

Divergence measures can be used for estimating purposes by minimizing the associated measure. The classical estimating technique is the one where (1) we take

α = 0

and

Φ (x) = Φ_{K L} (x)

. Then, the resulting

K L

minimization is equivalent to the classical maximization of the likelihood producing the well-known Maximum Likelihood Estimator (MLE, see ([9], Section 5.2)). In general, the minimization with respect to the parameter of interest of the divergence measure, gives rise to the corresponding minimum divergence estimator (see, e.g., [6,10,11]). For the case where constraints are involved the case associated with Csiszar’s family of measures was recently investigated [12]. For further references, please refer to [13,14,15,16,17,18,19,20,21].

Consider the hypothesis

\begin{matrix} H_{0} : p = p (θ_{0}) v s . H_{1} : p \neq p (θ_{0}), θ_{0} = {(θ_{01}, \dots, θ_{0 s})}^{⊤} \in Θ \subset R^{s} \end{matrix}

(3)

where

p

is the vector of the true but unknown probabilities of the underlying distribution and

p (θ_{0})

the vector of the corresponding probabilities of the hypothesized distribution which is unknown and falls within the family of

P

with the unknown parameters satisfying in general, certain constraints, e.g., of the form

c (θ) = 0

, under which the estimation of the parameter will be performed. The purpose of this work is twofold: having as a reference the divergence measure given in (1), we will first propose a general double index divergence class of measures and make inference regarding the parameter estimators involved. Then, we proceed with the hypothesis problem with the emphasis given to the concept of conditional independence. The innovative idea proposed in this work is the duality in choosing among the members of the general class of divergences, one for estimating and one for testing purposes which may not be necessarily, the same. In that sense, we propose a double index divergence test statistic offering the greatest possible range of options, both for the strictly convex function

Φ

and the indicator value

α > 0

.

Thus, the estimation problem can be examined considering expression (1) using a function

Φ_{2} \in F^{*}

and an indicator

α_{2} > 0

:

d_{Φ_{2}}^{α_{2}} (p, p (θ)) = \sum_{i = 1}^{m} p_{i}^{1 + α_{2}} (θ) Φ_{2} (\frac{p_{i}}{p_{i} (θ)})

(4)

the minimization of which with respect to the unknown parameter, will produce the restricted minimum

(Φ_{2}, α_{2})

divergence (rMD) estimator

{\hat{θ}}_{(Φ_{2}, α_{2})}^{r} = arg inf_{θ \in Θ : c (θ) = 0} d_{Φ_{2}}^{α_{2}} (\hat{p}, p (θ))

(5)

for some constraints

c (θ) = 0

. Observe that the unknown vector of underlying probabilities has been replaced by the vector of the corresponding sample frequencies

\hat{p}

. Then, the testing problem will be based on

d_{Φ_{1}}^{α_{1}} (\hat{p}, p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})) = \sum_{i = 1}^{m} p_{i}^{1 + α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) Φ_{1} (\frac{{\hat{p}}_{i}}{p_{i} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})})

(6)

where

Φ_{1} (\cdot)

and

α_{1}

may be different from the corresponding quantities used for the estimation problem in (4). Finally, the duality of the proposed methodology surfaces when the testing problem is explored via the dual divergence test statistic formulated on the basis of the double-

α

-double-

Φ

divergence given by

d_{Φ_{1}}^{α_{1}} (\hat{p}, p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}))

(7)

where

Φ_{1}, Φ_{2} \in F^{*}

and

α_{1}, α_{2} > 0

.

The remaining parts of this work are: Section 2 presents the formal definition and the asymptotic properties of the rMD estimator (rMDE). Section 3 deals with the general testing problem with the use of rMDE. The associated set up for the case of three-way contingency tables is developed in Section 4 with a simulation section emphasizing on the conditional independence of three random variables. We close this work with some conclusions.

2. Restricted Minimum $(Φ, α)$ -Power Divergence Estimator

In what follows, we will provide the formal definition and the expansion of the rMD estimator and prove its asymptotic normality. The assumptions required for establishing the results of this section for the rMD estimator under constraints, are provided below:

Assumption 1.

$(A_{0})$: $f_{1} (θ), \dots, f_{ν} (θ)$ are the constrained functions on the s-dimensional parameter $θ$ , $f_{k} (θ) = 0$ , $k = 1, \dots, ν$ and $ν < s < m - 1$ ;
$(A_{1})$: There exists a value $θ_{0} \in Θ$ , such that $X = {(X_{1}, \dots, X_{m})}^{⊤} \sim M u l t i (N, p (θ_{0}))$ ;
$(A_{2})$: Each constraint function $f_{k} (θ)$ has continuous second partial derivatives;
$(A_{3})$: The $ν \times s$ and $m \times s$ matrices

$Q (θ_{0}) = {(\frac{\partial f_{k} (θ_{0})}{\partial θ_{j}})}_{\begin{matrix} k = 1, \dots, ν \\ j = 1, \dots, s \end{matrix}} \begin{matrix} and \end{matrix} J (θ_{0}) = {(\frac{\partial p_{i} (θ_{0})}{\partial θ_{j}})}_{\begin{matrix} i = 1, \dots, m \\ j = 1, \dots, s \end{matrix}}$

are of full rank;
$(A_{4})$: p( $θ$ ) has continuous second partial derivatives in a neighbourhood of $θ_{0}$ ;
$(A_{5})$: $θ_{0}$ satisfies the Birch regularity conditions (see Appendix A and [22]).

Definition 1.

Under assumptions

\begin{matrix} (A_{0}) \end{matrix}

–

(A_{3})

the rMD estimator of

θ_{0}

is any vector in Θ, such that

{\hat{θ}}_{(Φ, α)}^{r} = arg {inf}_{{θ \in Θ \subset R^{s} : f_{k} (θ) = 0, k = 1, \dots, ν}} d_{Φ}^{α} (\hat{p}, p (θ)) .

(8)

In order to derive the decomposition of

{\hat{θ}}_{(Φ, α)}^{r}

the Implicit Function Theorem (IFT) is exploited according to which if a function has an invertible derivative at a point then itself is invertible in a neighbourhood of this point but it cannot be expressed in closed form [23].

Theorem 1.

Under Assumptions

\begin{matrix} (A_{0}) \end{matrix}

–

(A_{5})

, the rMD estimator of

θ_{0}

is such that

\begin{matrix} {\hat{θ}}_{(Φ, α)}^{r} = & θ_{0} + H (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} B {(θ_{0})}^{⊤} d i a g (p {(θ_{0})}^{α / 2}) \times \\ \times d i a g (p {(θ_{0})}^{- 1 / 2}) (\hat{p} - p (θ_{0})) + o (∥ \hat{p} - p (θ_{0}) ∥) \end{matrix}

(9)

where

{\hat{θ}}_{(Φ, α)}^{r}

is unique in a neighbourhood of

θ_{0}

and

\begin{matrix} H (θ_{0}) = & I - {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} Q {(θ_{0})}^{⊤} \times \\ \times {(Q (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} Q {(θ_{0})}^{⊤})}^{- 1} Q (θ_{0}), \end{matrix}

B (θ_{0}) = d i a g (p {(θ_{0})}^{α / 2}) A (θ_{0}), while A (θ_{0}) = d i a g (p {(θ_{0})}^{- 1 / 2}) J (θ_{0}) .

Proof.

Let V be a neighbourhood of

θ_{0}

on which

p (\cdot) : Θ \to P \subset l^{m}

has continuous second partial derivatives where

l^{m}

is the interior of the unit cube of dimension m. Let

F = (F_{1}, \dots, F_{ν + s}) : l^{m} \times R^{ν + s} \to R^{ν + s}

with

F_{j} (p, λ, θ) = \{\begin{matrix} f_{j} (θ), & j = 1, \dots, ν \\ \frac{\partial d_{Φ}^{α} (p, p (θ))}{\partial θ_{j - ν}} + \sum_{k = 1}^{ν} λ_{k} \frac{\partial f_{k} (θ)}{\partial θ_{j - ν}}, & j = ν + 1, \dots, ν + s . \end{matrix}

where

(p, λ, θ) = (p_{1}, \dots, p_{m}, λ_{1}, \dots, λ_{ν}, θ_{1}, \dots, θ_{s})

and

λ_{k}

,

k = 1, \dots, ν

are the coefficients of the constraints.

It holds that

F_{j} (p_{1} (θ_{0}), \dots, p_{m} (θ_{0}), 0, \dots, 0, θ_{01}, \dots, θ_{0 s}) = 0, j = 1, \dots, ν + s

and by denoting

γ = (γ_{1}, \dots, γ_{ν + s}) = (λ_{1}, \dots, λ_{ν}, θ_{1}, \dots, θ_{s})

, the matrix

\frac{\partial F}{\partial γ} = {(\frac{\partial F_{j}}{\partial γ_{k}})}_{\begin{matrix} j = 1, \dots, ν + s \\ k = 1, \dots, ν + s \end{matrix}} = (\begin{matrix} 0_{ν \times ν} & Q (θ_{0}) \\ Q {(θ_{0})}^{⊤} & Φ^{″} (1) B {(θ)}^{⊤} B (θ) \end{matrix})

is nonsingular at

(p, λ, θ) = (p (θ_{0}), γ_{0}) = (p_{1} (θ_{0}), \dots, p_{m} (θ_{0}), 0, \dots, 0, θ_{01}, \dots, θ_{0 s})

with

γ_{0} = (0_{ν}, θ_{0})

.

Using the IFT a neighbourhood U of

(p (θ_{0}), γ_{0})

exists, such that

\partial F / \partial γ

is nonsingular and a unique differentiable function

γ^{*} = (λ^{*}, θ^{*})

: A \subset l^{m} \to R^{ν + s}

, such that

p (θ_{0}) \in A

and

{(p, γ) \in U : F (p, γ) = 0} = {(p, γ^{*} (p)) : p \in A}

and

γ^{*} (p (θ_{0})) = (λ^{*} (p (θ_{0})), θ^{*} (p (θ_{0}))) = γ_{0}

. By the chain rule and for

p = p (θ_{0})

we obtain

\frac{\partial F}{\partial p (θ_{0})} + \frac{\partial F}{\partial γ_{0}} \frac{\partial γ_{0}}{\partial p (θ_{0})} = 0 .

Then

\frac{\partial θ_{0}}{\partial p (θ_{0})} = (\begin{matrix} E (θ_{0}) \\ W (θ_{0}) \end{matrix})

where

\begin{matrix} E (θ_{0}) = & Φ^{″} (1) {(Q (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} Q {(θ_{0})}^{⊤})}^{- 1} \times \\ \times Q (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} B {(θ_{0})}^{⊤} d i a g (p {(θ_{0})}^{α / 2}) d i a g (p {(θ_{0})}^{- 1 / 2}) \end{matrix}

and

W (θ_{0}) = H (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} B {(θ_{0})}^{⊤} d i a g (p {(θ_{0})}^{α / 2}) d i a g (p {(θ_{0})}^{- 1 / 2})

(10)

since

\frac{\partial F}{\partial p (θ_{0})} = (\begin{matrix} 0_{ν \times m} \\ - Φ^{″} (1) B {(θ_{0})}^{⊤} d i a g (p {(θ_{0})}^{α / 2}) d i a g (p {(θ_{0})}^{- 1 / 2}) \end{matrix}) .

Expanding

θ^{*} (p)

around

p (θ_{0})

and using (10) gives, for

θ^{*} (p (θ_{0})) = θ_{0}

,

\begin{matrix} θ^{*} (p) = & θ_{0} + H (θ_{0}) {(B {(θ_{0})}^{⊤} B (θ_{0}))}^{- 1} B {(θ_{0})}^{⊤} d i a g (p {(θ_{0})}^{α / 2}) \times \\ \times d i a g (p {(θ_{0})}^{- 1 / 2}) (\hat{p} - p (θ_{0})) + o (∥ \hat{p} - p (θ_{0}) ∥) . \end{matrix}

Since

\hat{p} \overset{p}{\to} p (θ_{0})

eventually

\hat{p} \in A

and then

γ^{*} (\hat{p}) = (λ^{*} (\hat{p}), θ^{*} (\hat{p}))

is the unique solution of the system

\begin{matrix} f_{k} (θ) = 0, & k = 1, \dots, ν \\ \frac{\partial d_{Φ}^{α} (p, p (θ))}{\partial θ_{j}} + \sum_{k = 1}^{ν} λ_{k} \frac{\partial f_{k} (θ)}{\partial θ_{j}} = 0, & j = 1, \dots, s \end{matrix}

and

(\hat{p}, γ^{*} (\hat{p})) \in U

. Hence,

θ^{*} (\hat{p})

coincides with rMDE

{\hat{θ}}_{(Φ, α)}^{r}

given in (9). □

The theorem below establishes the asymptotic normality of rMDE which is a straightforward extension of Theorem 2.4 [11] since by the Central Limit Theorem we know that

\sqrt{N} (\hat{p} - p (θ_{0})) \to_{N \to \infty}^{L} N (0, Σ_{p (θ_{0})})

(11)

with the asymptotic variance-covariance matrix

Σ_{p (θ_{0})}

given by

d i a g (p (θ_{0})) - p (θ_{0}) p {(θ_{0})}^{⊤}

.

Theorem 2.

Under Assumptions

(A_{0})

–

(A_{5})

, by (11) and for

W (θ_{0})

given in (10), the asymptotic distribution of rMDE is the s-dimensional Normal distribution given by

\sqrt{N} ({\hat{θ}}_{(Φ, α)}^{r} - θ_{0}) \to_{N \to \infty}^{L} N_{s} (0, W (θ_{0}) Σ_{p (θ_{0})} W {(θ_{0})}^{⊤}) .

Remark 1.

The proposed class of estimators forms a family of estimators that goes beyond the indicator α since it is easy to see that estimators obtained for the Csiszar’s φ family are given for

α = 0

in (1) and also the standard equiprobable model.

3. Statistical Inference

In this section, we introduce the double index divergence test statistic

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) = \frac{2 N}{Φ_{1}^{″} (1)} d_{Φ_{1}}^{α_{1}} (\hat{p}, p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}))

(12)

with

Φ_{1}, Φ_{2} \in F^{*}

and

α_{1}, α_{2} > 0

and make the additional assumptions by which we focus on the Csiszar’s family of measures for testing purposes (the notation

φ

is used for clarity) and the equiprobable model:

Assumption 2.

$(A_{6})$: $p_{i} = 1 / m, \forall i$
$(A_{7})$: $Φ_{1} = φ, α_{1} = 0$ .

The Theorem below provides the asymptotic distribution of (12) under Assumptions

(A_{0})

–

(A_{7})

. Assumption

\begin{matrix} (A_{7}) \end{matrix}

will be later relaxed and a general asymptotic result will be presented in the next subsection. A discussion about Assumption

A_{6}

will also be made in the sequel.

Theorem 3.

Under Assumptions

\begin{matrix} (A_{0}) \end{matrix}

–

(A_{7})

and for the hypothesis in (3) we have

T_{φ}^{0} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) = \frac{2 N}{φ^{″} (1)} d_{φ} (\hat{p}, p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})) \to_{N \to \infty}^{L} χ_{m - 1 - s - ν}^{2}

with

{\hat{θ}}_{(Φ_{2}, α_{2})}^{r}

given in (9).

Proof.

It is straightforward that

p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) = p (θ_{0}) + J (θ_{0}) ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r} - θ_{0}) + o (∥ {\hat{θ}}_{(Φ_{2}, α_{2})}^{r} - θ_{0} ∥)

which by Theorem 2, expression (11), and for

M (θ_{0}) = J (θ_{0}) W (θ_{0})

reduces to

p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) - p (θ_{0}) = M (θ_{0}) (\hat{p} - p (θ_{0})) + o_{p} (N^{- 1 / 2})

which implies that

\sqrt{N} (p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) - p (θ_{0})) \to_{N \to \infty}^{L} N (0, M (θ_{0}) Σ_{p (θ_{0})} M {(θ_{0})}^{⊤}) .

(13)

Combining the above we obtain

\sqrt{N} (\begin{matrix} \hat{p} - p (θ_{0}) \\ p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) - p (θ_{0}) \end{matrix}) \to_{N \to \infty}^{L} N (0, (\begin{matrix} I \\ M (θ_{0}) \end{matrix}) Σ_{p (θ_{0})} (I, M {(θ_{0})}^{⊤}))

and

\sqrt{N} (\hat{p} - p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})) \to_{N \to \infty}^{L} N (0, L (θ_{0}))

where

L (θ_{0}) = Σ_{p (θ_{0})} - M (θ_{0}) Σ_{p (θ_{0})} - Σ_{p (θ_{0})} M {(θ_{0})}^{⊤} + M (θ_{0}) Σ_{p (θ_{0})} M {(θ_{0})}^{⊤} .

(14)

The expansion of

d_{φ} (p, q)

around

(p (θ_{0}), p (θ_{0}))

yields

T_{φ}^{0} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) = \sum_{i = 1}^{m} \frac{N}{p_{i} (θ_{0})} {({\hat{p}}_{i} - p_{i} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}))}^{2} + o_{p} (1) = X^{⊤} X + o_{p} (1)

where

X = \sqrt{N} d i a g (p {(θ_{0})}^{- 1 / 2}) (\hat{p} - p ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})) \to_{N \to \infty}^{L} N (0, T (θ_{0})) .

Then, under

A_{7}

,

T (θ_{0})

(see (14)) is a projection matrix of rank

m - 1 - s + ν

since the trace of the matrices

A (θ_{0}) {(A {(θ_{0})}^{⊤} A (θ_{0}))}^{- 1} A {(θ_{0})}^{⊤}

and

A (θ_{0}) {(A {(θ_{0})}^{⊤} A (θ_{0}))}^{- 1} Q {(θ_{0})}^{⊤}

{(Q (θ_{0}) (A {(θ_{0})}^{⊤} A (θ_{0}))}^{- 1} Q {(θ_{0})}^{⊤})^{- 1}

Q (θ_{0})

{(A {(θ_{0})}^{⊤} A (θ_{0}))}^{- 1} A {(θ_{0})}^{⊤}

is equal to s and

ν

, respectively.

Then, the result follows from the fact (see ([24], p. 57)) that

X^{⊤} X

has a chi-squared distribution with degrees of freedom equal to the rank of the variance-covariance matrix of the random vector

X

as long as it is a projection matrix. □

Remark 2.

Relaxation of Assumption

\begin{matrix} (A_{6}) \end{matrix}

: Arguing as in [11], when the true model is not the equiprobable the result of Theorem 3 holds true as long as

α_{2} = 0

and approximately true when

α_{2} \to 0

.

Asymptotic Theory of the Dual Divergence Test Statistic

Having established the two main results of the work, namely the decomposition of the proposed restricted estimator (Theorem 1) together with its asymptotic properties (Theorem 2), as well as the asymptotic distribution of the associated test statistic under the class of Csiszar

φ

-functions (Theorem 3) we continue below extended in a natural way the results of [11] for the dual divergence test statistic. The extensions presented in this section are considered vital due to their practical impication on cross tabulations discussed in Section 4. The proofs will be omitted since both results (Theorems 4 and 5) follow along the lines of previous results (see Theorems 3.4 and 3.9 of [11]). In what follows we adopt the following notation:

b = m^{- α_{1}}, p_{(1)}^{α_{1}} = min_{i \in {1, \dots, m}} p_{i} {(θ_{0})}^{α_{1}}, p_{(m)}^{α_{1}} = max_{i \in {1, \dots, m}} p_{i} {(θ_{0})}^{α_{1}}, k = m - 1 - s + ν .

Theorem 4.

Under Assumptions

\begin{matrix} (A_{0}) \end{matrix}

–

(A_{7})

we have

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) \to_{N \to \infty}^{L} b χ_{k}^{2} .

Remark 3.

Consider the case where Assumption

\begin{matrix} (A_{6}) \end{matrix}

is relaxed. Then, the asymptotic distribution of the test statistic

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

is estimated to be approximately

b χ_{k}^{2}

where

b = \frac{p_{(1)}^{α_{1}} + p_{(m)}^{α_{1}}}{2}

(15)

as long as

α_{2} = 0

or

α_{2} \to 0

. For further elaboration of this remark we refer to [11].

Remark 4.

Observe that if

α_{1} \to 0

then

b \to 1

and the asymptotic distribution becomes

χ_{k}^{2}

, while for

α_{1}

away from 0 the distribution is proportional to

χ_{k}^{2}

with proportionality index

b \neq 1

. However, for not equiprobable models these statements hold true as long as

α_{2}

is close to zero.

Consider now the hypothesis with contiguous alternatives [25,26]

H_{0} : p = p (θ_{0}) v s . H_{1, N} : p = p (θ_{0}) + \frac{d}{\sqrt{N}}

(16)

where

d

is an m-dimensional vector of known real values with components

d_{i}

satisfying the assumption

\sum_{i = 1}^{m} d_{i} = 0

.

Observe that as N tends to infinity, the local contiguous alternative converges to the null hypothesis at the rate

O (N^{1 / 2})

. Alternatives, such as those in (16), are known as Pitman transition alternatives or Pitman (local) alternatives or local contiguous alternatives to the null hypothesis

H_{0}

[25].

Theorem 5.

Under Assumptions

(A_{0})

–

(A_{7})

and for the hypothesis (16) we have

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) \to_{N \to \infty}^{L} b χ_{k}^{2} (ξ^{⊤} ξ)

which represents a non-central chi-squared distribution with k degrees of freedom and non-centrality parameter

ξ^{⊤} ξ

for which

ξ

=

d i a g (p {(θ_{0})}^{- 1 / 2}) (I - J (θ_{0}) W (θ_{0})) d .

Remark 5.

Observe that under Assumption

\begin{matrix} (A_{6}) \end{matrix}

(

p_{i} = 1 / m

) the asymptotic distribution is independent of Φ,

α_{1}

and

α_{2}

. As a result the associated power of the test is

Pr (χ_{k}^{2} (ξ^{⊤} ξ) \geq χ_{k, a}^{2})

where a the

100 (1 - a) %

percentile of the distribution. If assumption

A_{6}

is relaxed then the distribution is approximately non-central chi-squared with proportionality index

b = \frac{p_{(1)}^{α_{1}} + p_{(m)}^{α_{1}}}{2}

.

4. Cross Tabulations and Dual Divergence Test Statistic

In this section, we try to take advantage of the methodology proposed earlier for the analysis of cross tabulations. In particular we focus on the case of three categorical variables, say

X, Y

, and Z with corresponding,

I, J

, and K. Then, assume that the probability mass of a realization of a randomly selected subject is denoted by

p_{i j k} (θ) = P r (X = i, Y = j, Z = k) > 0

, where here and in what follows

i = 1, \dots, I

,

j = 1, \dots, J

,

k = 1, \dots, K

unless otherwise stated. The associated probability vector is given as

p (θ) = {p_{i j k} (θ)}

where

p_{i j k} (θ) = \{\begin{matrix} θ_{i j k}, & (i, j, k) \neq (I, J, K) \\ 1 - \underset{(i, j, k) \neq (I, J, K)}{\sum_{i = 1}^{I} \sum_{j = 1}^{J} \sum_{k = 1}^{K}} θ_{i j k}, & (i, j, k) = (I, J, K) \end{matrix}

and the parameter space as

Θ = {θ_{i j k}, (i, j, k) \neq (I, J, K)}

. The sample estimator of

p_{i j k} (θ)

is

{\hat{p}}_{i j k} = n_{i j k} / N

, where

n_{i j k}

is the frequency of the corresponding

(i, j, k)

cell.

In this set up the dual divergence test statistics is given as

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r}) = \frac{2 N}{Φ_{1}^{″} (1)} \sum_{i = 1}^{I} \sum_{j = 1}^{J} \sum_{k = 1}^{K} p_{i j k} {({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})}^{1 + α} Φ_{1} (\frac{{\hat{p}}_{i j k}}{p_{i j k} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})})

(17)

where

{\hat{p}}_{i j k}

as above and the rMD estimator as

{\hat{θ}}_{(Φ_{2}, α_{2})}^{r} = arg inf_{{θ \in Θ \subset R^{s} : f_{k} (θ) = 0, k = 1, \dots, ν}} \sum_{i = 1}^{I} \sum_{j = 1}^{J} \sum_{k = 1}^{K} p_{i j k} {(θ)}^{1 + α_{2}} Φ_{2} (\frac{{\hat{p}}_{i j k}}{p_{i j k} (θ)}) .

(18)

For

α_{1}

,

α_{2}

= 0

and special cases of the functions

Φ_{1}

and

Φ_{2}

, classical restricted minimum divergence estimators and associated test statistics can be derived from (18) and (17), respectively. For example, for

α_{1}

,

α_{2}

= 0

, and

Φ_{1}

,

Φ_{2}

=

Φ_{K L}

the likelihood ratio test statistic with the restricted maximum likelihood estimator

(G^{2} ({\hat{θ}}^{r}))

can be derived, while for

Φ_{1}

,

Φ_{2}

=

Φ_{λ}

and

λ = 1

we obtain the chi-squared test statistic with the restricted minimum chi-squared estimator

(X^{2} ({\hat{θ}}_{X^{2}}^{r}))

. For

Φ_{1}

,

Φ_{2}

=

Φ_{λ}

and

λ = 2 / 3

the dual divergence test statistic reduces to the power divergence test statistic with the restricted minimum power divergence estimator

(C R ({\hat{θ}}_{C R}^{r}))

whereas for

λ = - 1 / 2

reduces to the Freeman–Tukey test statistic with the restricted minimum Freeman–Tukey estimator

(F T ({\hat{θ}}_{F T}^{r}))

.

The hypothesis of conditional independence between X, Y, and Z is given for any triplet

i, j, k

by

H_{0} : p_{i j k} (θ_{0}) = \frac{p_{i * k} (θ_{0}) p_{* j k} (θ_{0})}{p_{* * k} (θ_{0})}, θ_{0} \in Θ unknown

where

p_{i * k} (θ_{0}) = \sum_{j = 1}^{J} p_{i j k} (θ_{0}), p_{* j k} (θ_{0}) = \sum_{i = 1}^{I} p_{i j k} (θ_{0}) a n d p_{* * k} (θ_{0}) = \sum_{i = 1}^{I} \sum_{j = 1}^{J} p_{i j k} (θ_{0}) .

Under the

(I - 1) (J - 1) K

constrained functions

f_{i j k} (θ) = p_{11 k} (θ) p_{i j k} (θ) - p_{1 j k} (θ) p_{i 1 k} (θ) = 0

i = 2, \dots, I, j = 2, \dots, J, k = 1, \dots, K

the above

H_{0}

hypothesis with

θ_{0}

unknown, becomes

H_{0} : p = p (θ_{0}), for θ_{0} \in Θ_{0},

where

Θ_{0} = {θ \in Θ : f_{i j k} (θ) = 0, i = 2, \dots, I, j = 2, \dots, J, k = 1, \dots, K} .

Remark 6.

For practical purposes, the choice of the values of the indices is motivated by the work of [8] where, in an attempt to achieve a compromise between robustness and efficiency of estimators, they recommended the use of small values in the

(0, 1)

region. In the following subsection, our analysis will reconfirm their findings since as it will be seen, values of both indices close to (0) (than to one (1)) will be found to be associated with a good performance not only in terms of estimation but also in terms of goodness of fit as it will be reflected in the size and the power of the test.

Simulation Study

In this simulation study, we use the rMD estimator and the associated dual divergence test statistic for the analysis of cross tabulations. Specifically, we are going to compare in terms of size and power classical tests with those that can be derived through the proposed methodology, for the problem of conditional independence of three random variables in contingency tables. We test the hypothesis of conditional independence for a

2 \times 2 \times 2

contingency table, thus in this case we have

m = 8

probabilities of the multinomial model,

s = 7

unknown parameters to estimate and two constraint functions

(ν = 2)

which are given by

f_{221} (θ) = θ_{111} θ_{221} - θ_{121} θ_{211} a n d f_{222} (θ) = θ_{112} (1 - \underset{(i, j, k) \neq (2, 2, 2)}{\sum_{i = 1}^{2} \sum_{j = 1}^{2} \sum_{k = 1}^{2}} θ_{i j k}) - θ_{122} θ_{212} .

For a better understanding of the behaviour of the dual divergence test statistic given in (17) we compare it with the four classical tests-of-fit mentioned earlier in Section 4, namely with the

G^{2} ({\hat{θ}}^{r})

,

X^{2} ({\hat{θ}}_{X^{2}}^{r})

,

C R ({\hat{θ}}_{C R}^{r})

and

F T ({\hat{θ}}_{F T}^{r})

. The proposed test

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

is applied for

Φ_{1} = Φ_{α_{1}}

,

Φ_{2} = Φ_{α_{2}}

and six different values of

α_{1}

and

α_{2}

,

α_{1}

,

α_{2}

=

10^{- 7}

,

0.01

,

0.05

,

0.10

,

0.50

, and

1.50

. Note that, the critical values used in this simulation study, are the asymptotic critical values based on the asymptotic distribution

b χ_{2}^{2}

with b as in (15) for the double index family of test statistics, and the

χ_{2}^{2}

for the classical test statistics. For the analysis we used 100,000 simulations and sample sizes equal to

n = 20, 25

(small sample sizes) and

n = 40, 45

(moderate sample sizes).

In this study, we have used the model previously considered by [27] given by

\begin{matrix} p_{111} & = π_{111} - π_{111} w & p_{211} & = π_{211} + π_{222} w - π_{111} w \\ p_{112} & = π_{112} + π_{111} w - π_{222} w & p_{212} & = π_{212} + π_{111} w - π_{222} w \\ p_{121} & = π_{121} + π_{222} w & p_{221} & = π_{221} + π_{222} w - π_{111} w \\ p_{122} & = π_{122} + π_{111} w & p_{222} & = π_{222} - π_{222} w \end{matrix}

where

0 \leq w < 1

and

π_{i j k} = p_{i * *} \times p_{* j *} \times p_{* * k}

i, j, k = 1, 2

with

\begin{matrix} π_{111} & = 0.036254 & π_{112} & = 0.164994 & π_{121} & = 0.092809 & π_{122} & = 0.133645 \\ π_{211} & = 0.092809 & π_{212} & = 0.133645 & π_{221} & = 0.237591 & π_{222} & = 0.108253 . \end{matrix}

For

w = 0

we take the model under the null hypothesis of conditional independence while for values

w \neq 0

we take the models under the alternative hypotheses. We considered the following values of w =

0.00

,

0.30

,

0.60

, and

0.90

. Note that the larger the value of w the more we deviate from the null model. For the simulation study, we used the

R

software [28], while for the constrained optimization the

auglag

function from the

nloptr

package [29].

From Table 1, we can observe that in terms of size the performance of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

is adequate for values of

α_{1}, α_{2}

\leq 0.5

both for small and moderate sample sizes. In addition, we can see that for

α_{1} \leq 0.10

,

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

appears to be liberal while for

α_{1} \geq 0.5

appears to be conservative. We also note that the size becomes smaller as

α_{1}

and

α_{2}

increase with

α_{1} \geq α_{2}

. Table 2 provides the size of the classical tests-of-fit from where we can observe that

C R ({\hat{θ}}_{C R}^{r})

has the best performance among all competing tests for every sample size. In contrast,

F T ({\hat{θ}}_{F T}^{r})

has the worst performance among all competing tests and appears to be very liberal. Furthermore,

X^{2} ({\hat{θ}}_{X^{2}}^{r})

appears to be conservative while

G^{2} ({\hat{θ}}^{r})

appears to be liberal. Note that for

α_{1} \in [0.01, 0.5]

and

α_{2} \leq 0.10

,

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

behaves better than the

G^{2} ({\hat{θ}}^{r})

test statistic and its performance is quite close to the performance of the

X^{2} ({\hat{θ}}_{X^{2}}^{r})

.

In order to examine the closeness of the estimated (true) size to the nominal size

α = 0.05

we consider the criterion given by Dale [30]. The criterion involves the following inequality

| logit (1 - {\hat{α}}_{n}) - logit (1 - α) | \leq d

(19)

where

logit (p) = \log (p / (1 - p))

and

{\hat{α}}_{n}

is the estimated (true) size. The estimated (true) size is considered to be close to the nominal size if (19) is satisfied with

d = 0.35

. Note that in this situation the estimated (true) size is close to the nominal one if

{\hat{α}}_{n} \in [0.0357, 0.0695]

and is presented in Table 1 and Table 2 in bold. This criterion has been used previously among others by [27,31].

Regarding the proposed test we can see that for small sample sizes the estimated (true) size is close to the nominal for

α_{1} \in [0.10, 0.50]

and

α_{2} \leq 0.10

while for moderate sample sizes for

α_{1} \in [10^{- 7}, 0.50]

and

α_{2} \leq 0.10

. With reference to the classical tests-of-fit we can observe that the size of the

C R ({\hat{θ}}_{C R}^{r})

is close to the nominal for every sample size whereas the size of

G^{2} ({\hat{θ}}^{r})

and

X^{2} ({\hat{θ}}_{X^{2}}^{r})

is close only for moderate sample sizes. Finally, we note that the estimated (true) size of

F T ({\hat{θ}}_{F T}^{r})

fails to be close to the nominal both for small and moderate sample sizes.

In Table 3, Table 4 and Table 5, we provide the results regarding the power of the proposed family of test statistics for the three alternatives and sample sizes

n = 20, 25, 40, 45

, while Table 2 provides the results regarding the power of the classical tests-of-fit. The performance tends to be better as we deviate from the null model and as the sample size increases both for the classical and the proposed tests.

As general comments regarding the behaviour of the proposed and the classical tests-of-fit in terms of power we state that the best results for the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

are obtained for small values of

α_{1}

in the range

(0, 0.1]

and large values of

α_{2}

with

α_{1} \leq α_{2}

. Note that although in terms of power results become better as

α_{2}

increases in terms of size these are adequate only for

α_{2} \leq 0.5

. In addition, we can observe that the performance of

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

is better than the

C R ({\hat{θ}}_{C R}^{r})

and

X^{2} ({\hat{θ}}_{X^{2}}^{r})

for every alternative and every sample size for

α_{1} \leq 0.1

and

α_{2} \leq 0.5

and slightly better than

G^{2} ({\hat{θ}}^{r})

for small values of

α_{1}

and large values of

α_{2}

, for example for

α_{1} = 0.01

and

α_{2} = 0.50

. Furthermore, we can observe that for

α_{1} = 0.1

and

α_{2} \leq 0.1

the size of the test is better than the size of the

G^{2} ({\hat{θ}}^{r})

and slightly worst form the size of the

C R ({\hat{θ}}_{C R}^{r})

and

X^{2} ({\hat{θ}}_{X^{2}}^{r})

test statistics while its power is quite better than the power of the

C R ({\hat{θ}}_{C R}^{r})

and

X^{2} ({\hat{θ}}_{X^{2}}^{r})

and slightly worst than the

G^{2} ({\hat{θ}}^{r})

. Additionally, we can see that as

α_{1}

and

α_{2}

tend to 0 the behaviour of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic coincides with the

G^{2} ({\hat{θ}}^{r})

test both in terms of size and power as it was expected.

In order to attain a better insight about the behaviour of the test statistics, we apply Dale’s criterion, not only for the nominal size

α = 0.05

, but also for a range of nominal sizes that are of interest. Based on the previous analysis, beside the classical tests, we will focus our interest on the

T_{Φ_{1}}^{0.05} ({\hat{θ}}_{(Φ_{2}, 0.05)}^{r})

,

T_{Φ_{1}}^{0.10} ({\hat{θ}}_{(Φ_{2}, 0.10)}^{r})

, and

T_{Φ_{1}}^{0.20} ({\hat{θ}}_{(Φ_{2}, 0.20)}^{r})

. The following simplified notation is used in every Figure,

FT

≡

F T ({\hat{θ}}_{F T}^{r})

,

ML

≡

G^{2} ({\hat{θ}}^{r})

,

CR

≡

C R ({\hat{θ}}_{C R}^{r})

,

Pe

≡

X^{2} ({\hat{θ}}_{X^{2}}^{r})

,

T 1

≡

T_{Φ_{1}}^{0.05} ({\hat{θ}}_{(Φ_{2}, 0.05)}^{r})

,

T 2

≡

T_{Φ_{1}}^{0.10} ({\hat{θ}}_{(Φ_{2}, 0.10)}^{r})

, and

T 3

=

T_{Φ_{1}}^{0.20} ({\hat{θ}}_{(Φ_{2}, 0.20)}^{r})

. From Figure 1a, we can see that for small sample sizes

(n = 25)

T_{Φ_{1}}^{0.20} ({\hat{θ}}_{(Φ_{2}, 0.20)}^{r})

and

C R ({\hat{θ}}_{C R}^{r})

satisfy Dale’s criterion for every nominal size while

T_{Φ_{1}}^{0.10} ({\hat{θ}}_{(Φ_{2}, 0.10)}^{r})

and

X^{2} ({\hat{θ}}_{X^{2}}^{r})

for nominal sizes greater than

0.03

and

0.06

, respectively. Note that the dashed line in Figure 1 denotes the situation in which the estimated (true) size equals to the nominal size and thus lines that lie above this reference line refer to liberal tests while those that lie below to conservative ones. On the other hand, for moderate sample sizes

(n = 45)

all chosen test statistics satisfy Dale’s criterion except

F T ({\hat{θ}}_{F T}^{r})

.

Taking into account the fact that the actual size of each test differs from the targeted nominal size, we have to make an adjustment in order to proceed further with the comparison of the tests in terms of power. We focus our interest in those tests that satisfy Dale’s criterion and follow the method proposed in [32] which involves the so-called receiver operating characteristic (ROC) curves. In particular, let

G (t) = P r (T \geq t)

be the survivor function of a general test statistic T, and

c = inf {t : G (t) \leq α}

be the critical value, then ROC curves can be formulated by plotting the power

G_{1} (c)

against the size

G_{0} (c)

for various values of the critical value c. Note that with

G_{0} (t)

we denote the distribution of the test statistic under the null hypothesis and with

G_{1} (t)

under the alternative.

Since results are similar for every alternative we restrict ourselves to

w = 0.60

which refers to an alternative that is neither too close nor too far from the null. For small sample sizes

(n = 25)

results are presented in Figure 2, where we can see that the proposed test is superior from the classical tests-of-fit in terms of power. However, for moderate sample sizes

(n = 45)

we can observe in Figure 3 that

G^{2} ({\hat{θ}}^{r})

has the best performance among all competing tests followed by the proposed test-of-fit.

From the conducted analysis we conclude that regarding the proposed test there is a trade off between size and power for different choices of the indices

α_{1}

and

α_{2}

. In particular, we can see that as

α_{1}

increases the size becomes smaller in the expense of smaller power, while as

α_{2}

increases the power becomes better and the tests more liberal. In conclusion, we could state that for values of

α_{1}

and

α_{2}

in the range

(0.05, 0.25)

the resulting test statistic provides a fair balance between size and power which makes it an attractive alternative to the classical tests-of-fit where for small sample sizes larger values of the indices are preferable whereas for moderate sample sizes, smaller ones are recommended.

5. Conclusions

In this work, a general divergence family of test statistics is presented for hypothesis testing problems as in (3), under constraints. For estimating purposes, we introduce, discuss and use the rMD (restricted minimum divergence) estimator presented in (8). The proposed double index (dual) divergence test statistic involves two pairs of elements, namely

(Φ_{2}, α_{2})

to be used for the estimation problem and

(Φ_{1}, α_{1})

to be used for the testing problem. The duality refers to the fact that the two pairs may or may not be the same providing the researcher with the greatest possible flexibility.

The asymptotic distribution of the dual divergence test statistic is found to be proportional to the chi-squared distribution irrespectively of the nature of the multinomial model, as long as the values of the two indicators involved are relative close to zero (less than

0.5

). Such values are known to provide a satisfactory balance between efficiency and robustness (see, for instance, [8] or [3]).

The methodology developed in this work can be used in the analysis of contingency tables which is applicable in various scientific fields: biosciences, such as genetics [33] and epidemiology [34]; finance, such as the evaluation of investment effectiveness or business performance [35]; insurance science [36]; or socioeconomics [37]. This work concludes with a comparative simulation study between classical test statistics and members of the proposed family, where the focus is placed on the conditional independence of three random variables. Results indicate that, by selecting wisely the values of the

α_{1}

and

α_{2}

indices, we can derive a test statistic that can be thought of as a powerful and reliable alternative to the classical tests-of-fit especially for small sample sizes.

Author Contributions

Conceptualization, A.K. and C.M.; data curation, C.M.; methodology, A.K. and C.M; software, C.M.; formal analysis, A.K. and C.M.; writing—original draft preparation, C.M.; writing—review and editing, A.K. and C.M.; supervision, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to express their appreciation to the anonymous referees and the Associated Editor for their valuable comments and suggestions. The authors wish also to express their appreciation to the professor A. Batsidis of the University of Ioannina for bringing to their attention citation [31] which helped greatly the comparative analysis performed in this work. This work was completed as part of the first author PhD thesis and falls within the research activities of the Laboratory of Statistics and Data Analysis of the University of the Aegean.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The Birch regularity conditions mentioned in Assumption (A5) of Section 2 are stated below (for details see [22])

The point $θ_{0}$ is an interior point of $Θ$ ;
$p_{i} = p_{i} (θ_{0}) > 0$ for $i = 1, \dots, m$ ;
The mapping $p (θ) : Θ \to P$ is totally differentiable at $θ_{0}$ so that the partial derivatives of $p_{i} (θ_{0})$ with respect to each $θ_{j}$ exist at $θ_{0}$ and $p (θ)$ has a linear approximation at $θ_{0}$ given by

$p_{i} (θ) = p_{i} (θ_{0}) + \sum_{j = 1}^{s} (θ_{j} - θ_{0 j}) \frac{\partial p_{i} (θ_{0})}{\partial θ_{j}} + o (∥ θ - θ_{0} ∥), i = 1, \dots, m$

as $θ \to θ_{0}$ .
The Jacobian matrix

$J (θ_{0}) = {(\frac{\partial p (θ)}{\partial θ})}_{θ = θ_{0}} = {(\frac{\partial p_{i} (θ_{0})}{\partial θ_{j}})}_{\begin{matrix} i = 1, \dots, m \\ j = 1, \dots, s \end{matrix}}$

is of full rank;
The mapping inverse to $θ \to p (θ)$ exists and is continuous at $θ_{0}$ ;
The mapping $p : Θ \to P$ is continuous at every point $θ \in Θ$ .

References

Salicru, M.; Morales, D.; Menendez, M.; Pardo, L. On the Applications of Divergence Type Measures in Testing Statistical Hypotheses. J. Multivar. Anal. 1994, 51, 372–391. [Google Scholar] [CrossRef] [Green Version]
Contreras-Reyes, J.E.; Kahrari, F.; Cortés, D.D. On the modified skew-normal-Cauchy distribution: Properties, inference and applications. Commun. Stat. Theory Methods 2021, 50, 3615–3631. [Google Scholar] [CrossRef]
Mattheou, K.; Karagrigoriou, A. A New Family of Divergence Measures for Tests of Fit. Aust. N. Z. J. Stat. 2010, 52, 187–200. [Google Scholar] [CrossRef]
Csiszár, I. Eine Informationstheoretische Ungleichung und Ihre Anwendung auf Beweis der Ergodizitaet von Markoffschen Ketten. Magyer Tud. Akad. Mat. Kut. Int. Koezl. 1963, 8, 85–108. [Google Scholar]
Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Cressie, N.; Read, T.R.C. Multinomial Goodness-of-Fit Tests. J. R. Stat. Soc. Ser. B Methodol. 1984, 46, 440–464. [Google Scholar] [CrossRef]
Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1900, 50, 157–175. [Google Scholar] [CrossRef] [Green Version]
Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef] [Green Version]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman and Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
Morales, D.; Pardo, L.; Vajda, I. Asymptotic Divergence of Estimates of Discrete Distributions. J. Stat. Plan. Inference 1995, 48, 347–369. [Google Scholar] [CrossRef]
Meselidis, C.; Karagrigoriou, A. Statistical Inference for Multinomial Populations Based on a Double Index Family of Test Statistics. J. Stat. Comput. Simul. 2020, 90, 1773–1792. [Google Scholar] [CrossRef]
Pardo, J.; Pardo, L.; Zografos, K. Minimum φ-divergence Estimators with Constraints in Multinomial Populations. J. Stat. Plan. Inference 2002, 104, 221–237. [Google Scholar] [CrossRef]
Read, T.R.; Cressie, N.A. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
Alin, A.; Kurt, S. Ordinary and Penalized Minimum Power-divergence Estimators in Two-way Contingency Tables. Comput. Stat. 2008, 23, 455–468. [Google Scholar] [CrossRef]
Toma, A. Optimal Robust M-estimators Using Divergences. Stat. Probab. Lett. 2009, 79, 1–5. [Google Scholar] [CrossRef] [Green Version]
Jiménez-Gamero, M.; Pino-Mejías, R.; Alba-Fernández, V.; Moreno-Rebollo, J. Minimum ϕ-divergence Estimation in Misspecified Multinomial Models. Comput. Stat. Data Anal. 2011, 55, 3365–3378. [Google Scholar] [CrossRef]
Kim, B.; Lee, S. Minimum density power divergence estimator for covariance matrix based on skew t distribution. Stat. Methods Appl. 2014, 23, 565–575. [Google Scholar] [CrossRef]
Neath, A.A.; Cavanaugh, J.E.; Weyhaupt, A.G. Model Evaluation, Discrepancy Function Estimation, and Social Choice Theory. Comput. Stat. 2015, 30, 231–249. [Google Scholar] [CrossRef]
Ghosh, A. Divergence based robust estimation of the tail index through an exponential regression model. Stat. Methods Appl. 2016, 26, 181–213. [Google Scholar] [CrossRef] [Green Version]
Jiménez-Gamero, M.D.; Batsidis, A. Minimum Distance Estimators for Count Data Based on the Probability Generating Function with Applications. Metrika 2017, 80, 503–545. [Google Scholar] [CrossRef]
Basu, A.; Ghosh, A.; Mandal, A.; Martin, N.; Pardo, L. Robust Wald-type tests in GLM with random design based on minimum density power divergence estimators. Stat. Methods Appl. 2021, 30, 973–1005. [Google Scholar] [CrossRef]
Birch, M.W. A New Proof of the Pearson-Fisher Theorem. Ann. Math. Stat. 1964, 35, 817–824. [Google Scholar] [CrossRef]
Krantz, S.G.; Parks, H.R. The Implicit Function Theorem: History, Theory, and Applications; Birkhäuser: Basel, Swiztherland, 2013. [Google Scholar]
Ferguson, T.S. A Course in Large Sample Theory; Chapman and Hall: Boca Raton, FL, USA, 1996. [Google Scholar]
McManus, D.A. Who Invented Local Power Analysis? Econom. Theory 1991, 7, 265–268. [Google Scholar] [CrossRef]
Neyman, J. “Smooth” Test for Goodness of Fit. Scand. Actuar. J. 1937, 1937, 149–199. [Google Scholar] [CrossRef]
Pardo, J.A. An approach to multiway contingency tables based on φ-divergence test statistics. J. Multivar. Anal. 2010, 101, 2305–2319. [Google Scholar] [CrossRef] [Green Version]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
Johnson, S.G. The NLopt Nonlinear-Optimization Package. 2014. Available online: http://ab-initio.mit.edu/nlopt (accessed on 27 March 2022).
Dale, J.R. Asymptotic Normality of Goodness-of-Fit Statistics for Sparse Product Multinomials. J. R. Stat. Soc. Ser. B Methodol. 1986, 48, 48–59. [Google Scholar] [CrossRef]
Batsidis, A.; Martin, N.; Pardo Llorente, L.; Zografos, K. φ-Divergence Based Procedure for Parametric Change-Point Problems. Methodol. Comput. Appl. Probab. 2016, 18, 21–35. [Google Scholar] [CrossRef] [Green Version]
Lloyd, C.J. Estimating test power adjusted for size. J. Stat. Comput. Simul. 2005, 75, 921–933. [Google Scholar] [CrossRef]
Dubrova, Y.E.; Grant, G.; Chumak, A.A.; Stezhka, V.A.; Karakasian, A.N. Elevated Minisatellite Mutation Rate in the Post-Chernobyl Families from Ukraine. Am. J. Hum. Genet. 2002, 71, 801–809. [Google Scholar] [CrossRef] [Green Version]
Znaor, A.; Brennan, P.; Gajalakshmi, V.; Mathew, A.; Shanta, V.; Varghese, C.; Boffetta, P. Independent and combined effects of tobacco smoking, chewing and alcohol drinking on the risk of oral, pharyngeal and esophageal cancers in Indian men. Int. J. Cancer 2003, 105, 681–686. [Google Scholar] [CrossRef]
Merková, M. Use of Investment Controlling and its Impact into Business Performance. Procedia Econ. Financ. 2015, 34, 608–614. [Google Scholar] [CrossRef] [Green Version]
Geenens, G.; Simar, L. Nonparametric tests for conditional independence in two-way contingency tables. J. Multivar. Anal. 2010, 101, 765–788. [Google Scholar] [CrossRef] [Green Version]
Bartolucci, F.; Scaccia, L. Testing for positive association in contingency tables with fixed margins. Comput. Stat. Data Anal. 2004, 47, 195–210. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Estimated (true) sizes against nominal sizes. The shaded area refers to Dale’s criterion. (a)

n = 25

. (b)

n = 45

.

Figure 1. Estimated (true) sizes against nominal sizes. The shaded area refers to Dale’s criterion. (a)

n = 25

. (b)

n = 45

.

Figure 2. (a) Empirical ROC curves for n = 25. (b) The same curves magnified over a relevant range of empirical sizes.

Figure 3. (a) Empirical ROC curves for n = 45. (b) The same curves magnified over a relevant range of empirical sizes.

Table 1. Size (

w = 0.00

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

. Sizes that satisfy Dale’s criterion are presented in bold.

Table 1. Size (

w = 0.00

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

. Sizes that satisfy Dale’s criterion are presented in bold.

	$α_{2}$
$α_{1}$	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50
	$n = 20$						$n = 25$
$10^{- 7}$	8.256	8.257	8.260	8.263	9.216	13.856	7.863	7.865	7.878	7.920	8.927	13.192
0.01	8.207	8.206	8.209	8.224	9.224	13.623	7.753	7.754	7.763	7.817	8.797	12.930
0.05	7.896	7.849	7.879	7.886	8.719	12.916	7.340	7.334	7.327	7.350	8.313	12.277
0.10	7.403	7.404	7.378	7.356	8.046	11.994	6.965	6.959	6.940	6.934	7.675	11.364
0.50	3.873	3.850	3.769	3.612	3.023	4.050	3.857	3.819	3.722	3.604	3.191	4.304
1.50	0.920	0.893	0.807	0.758	0.509	0.202	1.046	1.019	0.948	0.885	0.602	0.203
	$n = 40$						$n = 45$
$10^{- 7}$	7.016	7.016	7.027	7.055	7.887	11.362	6.858	6.858	6.870	6.908	7.732	11.099
0.01	6.933	6.933	6.940	6.957	7.778	11.183	6.760	6.760	6.770	6.805	7.601	10.941
0.05	6.590	6.589	6.580	6.593	7.342	10.505	6.427	6.422	6.415	6.426	7.153	10.340
0.10	6.246	6.239	6.228	6.222	6.794	9.758	6.082	6.070	6.053	6.043	6.612	9.586
0.50	3.854	3.832	3.762	3.661	3.367	4.362	3.813	3.789	3.716	3.635	3.331	4.269
1.50	1.172	1.160	1.115	1.066	0.760	0.383	1.183	1.170	1.119	1.068	0.773	0.437

Table 2. Size (

w = 0.00

) and power (

w = 0.30, 0.60, 0.90

) calculations (%) for the classical tests-of-fit. Sizes that satisfy Dale’s criterion are presented in bold.

Table 2. Size (

w = 0.00

) and power (

w = 0.30, 0.60, 0.90

) calculations (%) for the classical tests-of-fit. Sizes that satisfy Dale’s criterion are presented in bold.

Sample size	$F T$	$G^{2}$	$C R$	$X^{2}$	$F T$	$G^{2}$	$C R$	$X^{2}$
	$w = 0.00$				$w = 0.30$
$n = 20$	14.715	8.261	4.219	3.140	18.366	9.072	4.200	2.966
$n = 25$	13.664	7.865	4.333	3.477	19.674	9.846	4.783	3.646
$n = 40$	11.154	7.016	4.722	4.059	21.920	12.192	6.935	5.548
$n = 45$	10.787	6.858	4.703	4.082	22.467	12.992	7.471	6.081
	$w = 0.40$				$w = 0.45$
$n = 20$	29.707	14.936	7.096	4.910	47.859	26.721	13.789	9.704
$n = 25$	35.768	18.966	9.469	7.118	62.810	38.023	20.147	15.296
$n = 40$	48.366	31.513	18.780	15.030	85.773	69.599	47.644	39.481
$n = 45$	50.821	35.381	22.367	18.217	89.108	76.685	57.000	48.451

Table 3. Power (

w = 0.30

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

Table 3. Power (

w = 0.30

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

	$α_{2}$
$α_{1}$	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50
	$n = 20$						$n = 25$
$10^{- 7}$	9.073	9.072	9.071	9.076	9.993	15.062	9.846	9.846	9.868	9.895	10.924	15.729
0.01	8.990	8.989	8.988	9.006	9.948	14.724	9.630	9.630	9.651	9.727	10.712	15.343
0.05	8.350	8.278	8.340	8.357	9.231	13.819	9.033	9.008	8.990	9.022	9.876	14.332
0.10	7.694	7.696	7.626	7.616	8.273	12.656	8.225	8.216	8.194	8.188	8.890	13.111
0.50	3.751	3.717	3.607	3.418	2.889	4.199	3.797	3.761	3.656	3.581	3.252	4.620
1.50	0.793	0.764	0.676	0.630	0.415	0.163	0.820	0.810	0.756	0.718	0.479	0.158
	$n = 40$						$n = 45$
$10^{- 7}$	12.192	12.193	12.207	12.231	13.142	17.775	12.992	12.992	13.003	13.052	14.014	18.490
0.01	11.935	11.934	11.942	11.979	12.853	17.387	12.724	12.724	12.730	12.764	13.721	18.148
0.05	11.075	11.075	11.069	11.074	11.844	16.046	11.799	11.786	11.760	11.768	12.628	16.815
0.10	10.072	10.060	10.039	10.022	10.565	14.549	10.747	10.729	10.688	10.669	11.218	15.183
0.50	4.863	4.842	4.743	4.648	4.342	5.815	5.214	5.179	5.078	4.977	4.648	6.116
1.50	0.979	0.970	0.928	0.890	0.662	0.379	1.032	1.019	0.978	0.928	0.693	0.412

Table 4. Power (

w = 0.60

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

Table 4. Power (

w = 0.60

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

	$α_{2}$
$α_{1}$	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50
	$n = 20$						$n = 25$
$10^{- 7}$	14.928	14.937	14.932	14.944	16.186	22.900	18.965	18.964	19.004	19.042	20.607	27.684
0.01	14.807	14.813	14.808	14.833	16.117	22.486	18.565	18.564	18.598	18.702	20.235	27.069
0.05	13.711	13.583	13.726	13.735	14.939	21.143	17.436	17.383	17.360	17.422	18.733	25.365
0.10	12.612	12.619	12.529	12.525	13.217	19.545	15.794	15.767	15.743	15.726	16.869	23.368
0.50	6.088	5.994	5.811	5.416	4.553	6.403	6.879	6.821	6.656	6.473	5.912	8.458
1.50	1.118	1.077	0.944	0.889	0.553	0.215	1.275	1.240	1.152	1.081	0.729	0.260
	$n = 40$						$n = 45$
$10^{- 7}$	31.513	31.518	31.533	31.608	33.469	40.799	35.381	35.381	35.404	35.465	37.411	44.556
0.01	30.904	30.903	30.925	30.999	32.868	40.221	34.848	34.845	34.863	34.941	36.744	43.942
0.05	28.949	28.946	28.938	28.956	30.509	37.756	32.727	32.716	32.697	32.715	34.310	41.510
0.10	26.504	26.485	26.434	26.398	27.631	34.747	30.146	30.110	30.051	30.014	31.289	38.456
0.50	11.949	11.867	11.598	11.409	10.830	14.703	14.052	13.966	13.632	13.321	12.731	16.901
1.50	1.797	1.761	1.692	1.578	1.142	0.716	1.973	1.945	1.870	1.776	1.295	0.838

Table 5. Power (

w = 0.90

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

Table 5. Power (

w = 0.90

) calculations (%) of the

T_{Φ_{1}}^{α_{1}} ({\hat{θ}}_{(Φ_{2}, α_{2})}^{r})

test statistic for sample sizes

n = 20, 25, 40, 45

.

	$α_{2}$
$α_{1}$	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50	$10^{- 7}$	0.01	0.05	0.10	0.50	1.50
	$n = 20$						$n = 25$
$10^{- 7}$	26.712	26.710	26.707	26.711	28.495	37.924	38.017	38.016	38.132	38.191	40.982	50.954
0.01	26.589	26.586	26.585	26.613	28.718	37.421	37.365	37.364	37.456	37.645	40.482	50.206
0.05	25.437	25.267	25.531	25.502	27.170	35.979	35.674	35.559	35.526	35.643	38.260	48.187
0.10	24.287	24.284	24.232	24.172	24.868	33.946	33.014	32.939	32.867	32.854	35.184	45.569
0.50	12.003	11.780	11.424	10.772	8.807	11.665	14.353	14.226	13.870	13.560	12.312	16.886
1.50	1.731	1.662	1.489	1.422	0.904	0.298	2.268	2.226	2.026	1.916	1.387	0.506
	$n = 40$						$n = 45$
$10^{- 7}$	69.599	69.605	69.637	69.755	72.196	79.363	76.685	76.685	76.731	76.805	78.802	84.683
0.01	68.923	68.923	68.954	69.049	71.518	79.003	76.177	76.173	76.192	76.264	78.143	84.344
0.05	66.310	66.309	66.306	66.365	68.576	77.069	73.760	73.745	73.732	73.766	75.748	82.751
0.10	62.500	62.455	62.372	62.343	64.660	74.161	70.295	70.264	70.144	70.131	72.172	80.319
0.50	30.094	29.904	29.349	28.848	27.895	36.902	36.612	36.465	35.792	35.073	34.056	43.732
1.50	3.748	3.678	3.472	3.210	2.269	1.562	4.349	4.274	4.017	3.747	2.665	1.927

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meselidis, C.; Karagrigoriou, A. Contingency Table Analysis and Inference via Double Index Measures. Entropy 2022, 24, 477. https://doi.org/10.3390/e24040477

AMA Style

Meselidis C, Karagrigoriou A. Contingency Table Analysis and Inference via Double Index Measures. Entropy. 2022; 24(4):477. https://doi.org/10.3390/e24040477

Chicago/Turabian Style

Meselidis, Christos, and Alex Karagrigoriou. 2022. "Contingency Table Analysis and Inference via Double Index Measures" Entropy 24, no. 4: 477. https://doi.org/10.3390/e24040477

APA Style

Meselidis, C., & Karagrigoriou, A. (2022). Contingency Table Analysis and Inference via Double Index Measures. Entropy, 24(4), 477. https://doi.org/10.3390/e24040477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contingency Table Analysis and Inference via Double Index Measures

Abstract

1. Introduction

2. Restricted Minimum $(Φ, α)$ -Power Divergence Estimator

3. Statistical Inference

Asymptotic Theory of the Dual Divergence Test Statistic

4. Cross Tabulations and Dual Divergence Test Statistic

Simulation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Contingency Table Analysis and Inference via Double Index Measures

Abstract

1. Introduction

2. Restricted Minimum ( Φ , α ) -Power Divergence Estimator

3. Statistical Inference

Asymptotic Theory of the Dual Divergence Test Statistic

4. Cross Tabulations and Dual Divergence Test Statistic

Simulation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Restricted Minimum $(Φ, α)$ -Power Divergence Estimator