Next Article in Journal
Epistemic Communities under Active Inference
Next Article in Special Issue
Information Inequalities via Submodularity and a Problem in Extremal Graph Theory
Previous Article in Journal
Application of Convolutional Neural Network for Fingerprint-Based Prediction of Gender, Finger Position, and Height
Previous Article in Special Issue
Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Contingency Table Analysis and Inference via Double Index Measures

by
Christos Meselidis
* and
Alex Karagrigoriou
Laboratory of Statistics and Data Analysis, Department of Statistics and Actuarial-Financial Mathematics, University of the Aegean, Karlovasi, GR-83200 Samos, Greece
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(4), 477; https://doi.org/10.3390/e24040477
Submission received: 20 February 2022 / Revised: 17 March 2022 / Accepted: 28 March 2022 / Published: 29 March 2022
(This article belongs to the Special Issue Information and Divergence Measures)

Abstract

:
In this work, we focus on a general family of measures of divergence for estimation and testing with emphasis on conditional independence in cross tabulations. For this purpose, a restricted minimum divergence estimator is used for the estimation of parameters under constraints and a new double index (dual) divergence test statistic is introduced and thoroughly examined. The associated asymptotic theory is provided and the advantages and practical implications are explored via simulation studies.

1. Introduction

The concept of distance or divergence is known since at least the time of Pearson, who, in 1900, considered the classical goodness-of-fit (gof) problem by considering the distance between observed and expected frequencies. The problem for both discrete and discretized continuous distributions have been in the center of attention for the last 100+ years. The classical set-up is the one considered by Pearson where a hypothesized m-dimensional multinomial distribution, say M u l t i ( N , p 1 , , p m ) is examined as being the underlying distributional mechanism for producing a given sample of size N. The problem can be extended to examine the homogeneity (in terms of the distributional mechanisms) among two independent samples or the independence among two population characteristics. In all such problems we are dealing with cross tabulations or crosstabs (or contingency tables). Problems of such nature appear frequently in a great variety of fields including biosciences, socio-economic and political sciences, actuarial science, finance, business, accounting, and marketing. The need to establish for instance, whether the mechanisms producing two phenomena are the same or not is vital for altering economic policies, preventing socio-economic crises or enforcing the same economic or financial decisions to groups with similar underlying mechanisms (e.g., retaining the insurance premium in case of similarity or having different premiums in case of diversity). It is important to note that divergence measures play a pivotal role also in statistical inference in continuous settings. Indeed, for example, in [1] the authors investigate the multivariate normal case while in a recent work [2], the modified skew-normal-Cauchy (MSNC) distribution is considered, against normality.
Let us consider the general case of two m-dimensional multinomial distributions for which each probability depends on an s-dimensional unknown parameter, say θ = ( θ 1 , , θ s ) . A general family of measures introduced by [3] is the d Φ α family defined by
d Φ α ( p ( θ ) , q ( θ ) ) = i = 1 m q i ( θ ) 1 + α Φ ( p i ( θ ) q i ( θ ) ) ; α > 0 , Φ F
where α is a positive indicator (index) value, p ( θ ) = ( p 1 ( θ ) , , p m ( θ ) ) and q ( θ ) = ( q 1 ( θ ) , , q m ( θ ) ) , F is a class of functions s.t. F = { Φ ( · ) : Φ ( x ) strictly convex, x R + , Φ ( 1 ) = Φ ( 1 ) = 0 , Φ ( 1 ) 0 and by convention, Φ ( 0 / 0 ) = 0 and 0 Φ ( p / 0 ) = lim x [ Φ ( x ) / x ] } .
Note that the well known Csiszar family of measures [4] is obtained for the special case where the indicator is taken to be equal to 0 while the classical Kullback–Leibler (KL) distance [5] is obtained if the indicator α is equal to 0 and at the same time the function Φ ( · ) is taken to be Φ ( x ) Φ K L ( x ) = x log ( x ) or x log ( x ) x + 1 .
The function
Φ λ ( x ) = 1 λ ( λ + 1 ) x ( x λ 1 ) λ ( x 1 ) F , λ 0 , 1
is associated with the Freeman–Tukey test when λ = 1 / 2 , with the recommended Cressie and Read (CR) power divergence [6] when λ = 2 / 3 , with the Pearson’s chi-squared divergence [7] when λ = 1 and with the classical KL distance when λ 0 .
Finally, the function
Φ α ( x ) ( λ + 1 ) Φ λ ( x ) | λ = α = 1 α x x α 1 α ( x 1 ) , α 0
produces the BHHJ or Φ α -power divergence [8] given by
d Φ α α ( p ( θ ) , q ( θ ) ) = i = 1 m q i α ( θ ) q i ( θ ) p i ( θ ) + 1 α i = 1 m p i ( θ ) p i α ( θ ) q i α ( θ ) .
Assume that the underlying true distribution of an m-dimensional multinomial random variable with N experiments, is
X = ( X 1 , , X m ) M u l t i N , p = ( p 1 , , p m )
where p is, in general, unknown, belonging to the parametric family
P = p ( θ ) = ( p 1 ( θ ) , , p m ( θ ) ) : θ = ( θ 1 , , θ s ) Θ R s .
The sample estimate p ^ = ( p ^ 1 , , p ^ m ) of p is easily obtained by p ^ i = x i / N where x i is the observed frequency for the i-th category (or class).
Divergence measures can be used for estimating purposes by minimizing the associated measure. The classical estimating technique is the one where (1) we take α = 0 and Φ ( x ) = Φ K L ( x ) . Then, the resulting K L minimization is equivalent to the classical maximization of the likelihood producing the well-known Maximum Likelihood Estimator (MLE, see ([9], Section 5.2)). In general, the minimization with respect to the parameter of interest of the divergence measure, gives rise to the corresponding minimum divergence estimator (see, e.g., [6,10,11]). For the case where constraints are involved the case associated with Csiszar’s family of measures was recently investigated [12]. For further references, please refer to [13,14,15,16,17,18,19,20,21].
Consider the hypothesis
H 0 : p = p ( θ 0 ) v s . H 1 : p p ( θ 0 ) , θ 0 = ( θ 01 , , θ 0 s ) Θ R s
where p is the vector of the true but unknown probabilities of the underlying distribution and p ( θ 0 ) the vector of the corresponding probabilities of the hypothesized distribution which is unknown and falls within the family of P with the unknown parameters satisfying in general, certain constraints, e.g., of the form c ( θ ) = 0 , under which the estimation of the parameter will be performed. The purpose of this work is twofold: having as a reference the divergence measure given in (1), we will first propose a general double index divergence class of measures and make inference regarding the parameter estimators involved. Then, we proceed with the hypothesis problem with the emphasis given to the concept of conditional independence. The innovative idea proposed in this work is the duality in choosing among the members of the general class of divergences, one for estimating and one for testing purposes which may not be necessarily, the same. In that sense, we propose a double index divergence test statistic offering the greatest possible range of options, both for the strictly convex function Φ and the indicator value α > 0 .
Thus, the estimation problem can be examined considering expression (1) using a function Φ 2 F and an indicator α 2 > 0 :
d Φ 2 α 2 p , p ( θ ) = i = 1 m p i 1 + α 2 ( θ ) Φ 2 p i p i ( θ )
the minimization of which with respect to the unknown parameter, will produce the restricted minimum ( Φ 2 , α 2 ) divergence (rMD) estimator
θ ^ ( Φ 2 , α 2 ) r = arg inf θ Θ : c ( θ ) = 0 d Φ 2 α 2 ( p ^ , p ( θ ) )
for some constraints c ( θ ) = 0 . Observe that the unknown vector of underlying probabilities has been replaced by the vector of the corresponding sample frequencies p ^ . Then, the testing problem will be based on
d Φ 1 α 1 p ^ , p ( θ ^ ( Φ 2 , α 2 ) r ) = i = 1 m p i 1 + α 1 ( θ ^ ( Φ 2 , α 2 ) r ) Φ 1 p ^ i p i ( θ ^ ( Φ 2 , α 2 ) r )
where Φ 1 ( · ) and α 1 may be different from the corresponding quantities used for the estimation problem in (4). Finally, the duality of the proposed methodology surfaces when the testing problem is explored via the dual divergence test statistic formulated on the basis of the double- α -double- Φ divergence given by
d Φ 1 α 1 p ^ , p ( θ ^ ( Φ 2 , α 2 ) r )
where Φ 1 , Φ 2 F and α 1 , α 2 > 0 .
The remaining parts of this work are: Section 2 presents the formal definition and the asymptotic properties of the rMD estimator (rMDE). Section 3 deals with the general testing problem with the use of rMDE. The associated set up for the case of three-way contingency tables is developed in Section 4 with a simulation section emphasizing on the conditional independence of three random variables. We close this work with some conclusions.

2. Restricted Minimum ( Φ , α ) -Power Divergence Estimator

In what follows, we will provide the formal definition and the expansion of the rMD estimator and prove its asymptotic normality. The assumptions required for establishing the results of this section for the rMD estimator under constraints, are provided below:
Assumption 1.
( A 0 )
  f 1 ( θ ) , , f ν ( θ ) are the constrained functions on the s-dimensional parameter θ , f k ( θ ) = 0 , k = 1 , , ν and ν < s < m 1 ;
( A 1 )
 There exists a value θ 0 Θ , such that X = ( X 1 , , X m ) M u l t i N , p ( θ 0 ) ;
( A 2 )
 Each constraint function f k ( θ ) has continuous second partial derivatives;
( A 3 )
 The ν × s and m × s matrices
Q ( θ 0 ) = f k ( θ 0 ) θ j k = 1 , , ν j = 1 , , s and J ( θ 0 ) = p i ( θ 0 ) θ j i = 1 , , m j = 1 , , s
are of full rank;
( A 4 )
 p( θ ) has continuous second partial derivatives in a neighbourhood of θ 0 ;
( A 5 )
  θ 0 satisfies the Birch regularity conditions (see Appendix A and [22]).
Definition 1.
Under assumptions ( A 0 ) ( A 3 ) the rMD estimator of θ 0 is any vector in Θ, such that
θ ^ ( Φ , α ) r = arg inf { θ Θ R s : f k ( θ ) = 0 , k = 1 , , ν } d Φ α ( p ^ , p ( θ ) ) .
In order to derive the decomposition of θ ^ ( Φ , α ) r the Implicit Function Theorem (IFT) is exploited according to which if a function has an invertible derivative at a point then itself is invertible in a neighbourhood of this point but it cannot be expressed in closed form [23].
Theorem 1.
Under Assumptions ( A 0 ) ( A 5 ) , the rMD estimator of θ 0 is such that
θ ^ ( Φ , α ) r = θ 0 + H ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 B ( θ 0 ) d i a g ( p ( θ 0 ) α / 2 ) × × d i a g ( p ( θ 0 ) 1 / 2 ) ( p ^ p ( θ 0 ) ) + o ( p ^ p ( θ 0 ) )
where θ ^ ( Φ , α ) r is unique in a neighbourhood of θ 0 and
H ( θ 0 ) = I B ( θ 0 ) B ( θ 0 ) 1 Q ( θ 0 ) × × Q ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 Q ( θ 0 ) 1 Q ( θ 0 ) ,
B ( θ 0 ) = d i a g ( p ( θ 0 ) α / 2 ) A ( θ 0 ) , while A ( θ 0 ) = d i a g ( p ( θ 0 ) 1 / 2 ) J ( θ 0 ) .
Proof. 
Let V be a neighbourhood of θ 0 on which p ( · ) : Θ P l m has continuous second partial derivatives where l m is the interior of the unit cube of dimension m. Let
F = ( F 1 , , F ν + s ) : l m × R ν + s R ν + s
with
F j ( p , λ , θ ) = f j ( θ ) , j = 1 , , ν d Φ α p , p ( θ ) θ j ν + k = 1 ν λ k f k ( θ ) θ j ν , j = ν + 1 , , ν + s .
where ( p , λ , θ ) = ( p 1 , , p m , λ 1 , , λ ν , θ 1 , , θ s ) and λ k , k = 1 , , ν are the coefficients of the constraints.
It holds that
F j ( p 1 ( θ 0 ) , , p m ( θ 0 ) , 0 , , 0 , θ 01 , , θ 0 s ) = 0 , j = 1 , , ν + s
and by denoting γ = ( γ 1 , , γ ν + s ) = ( λ 1 , , λ ν , θ 1 , , θ s ) , the matrix
F γ = F j γ k j = 1 , , ν + s k = 1 , , ν + s = 0 ν × ν Q ( θ 0 ) Q ( θ 0 ) Φ ( 1 ) B ( θ ) B ( θ )
is nonsingular at ( p , λ , θ ) = ( p ( θ 0 ) , γ 0 ) = ( p 1 ( θ 0 ) , , p m ( θ 0 ) , 0 , , 0 , θ 01 , , θ 0 s ) with γ 0 = ( 0 ν , θ 0 ) .
Using the IFT a neighbourhood U of ( p ( θ 0 ) , γ 0 ) exists, such that F / γ is nonsingular and a unique differentiable function γ = ( λ , θ ) : A l m R ν + s , such that p ( θ 0 ) A and { ( p , γ ) U : F ( p , γ ) = 0 } = { ( p , γ ( p ) ) : p A } and γ ( p ( θ 0 ) ) = ( λ ( p ( θ 0 ) ) , θ ( p ( θ 0 ) ) ) = γ 0 . By the chain rule and for p = p ( θ 0 ) we obtain
F p ( θ 0 ) + F γ 0 γ 0 p ( θ 0 ) = 0 .
Then
θ 0 p ( θ 0 ) = E ( θ 0 ) W ( θ 0 )
where
E ( θ 0 ) = Φ ( 1 ) Q ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 Q ( θ 0 ) 1 × × Q ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 B ( θ 0 ) d i a g ( p ( θ 0 ) α / 2 ) d i a g ( p ( θ 0 ) 1 / 2 )
and
W ( θ 0 ) = H ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 B ( θ 0 ) d i a g ( p ( θ 0 ) α / 2 ) d i a g ( p ( θ 0 ) 1 / 2 )
since
F p ( θ 0 ) = 0 ν × m Φ ( 1 ) B ( θ 0 ) d i a g ( p ( θ 0 ) α / 2 ) d i a g ( p ( θ 0 ) 1 / 2 ) .
Expanding θ ( p ) around p ( θ 0 ) and using (10) gives, for θ ( p ( θ 0 ) ) = θ 0 ,
θ ( p ) = θ 0 + H ( θ 0 ) B ( θ 0 ) B ( θ 0 ) 1 B ( θ 0 ) d i a g ( p ( θ 0 ) α / 2 ) × × d i a g ( p ( θ 0 ) 1 / 2 ) ( p ^ p ( θ 0 ) ) + o ( p ^ p ( θ 0 ) ) .
Since p ^ p p ( θ 0 ) eventually p ^ A and then γ ( p ^ ) = ( λ ( p ^ ) , θ ( p ^ ) ) is the unique solution of the system
f k ( θ ) = 0 , k = 1 , , ν d Φ α p , p ( θ ) θ j + k = 1 ν λ k f k ( θ ) θ j = 0 , j = 1 , , s
and ( p ^ , γ ( p ^ ) ) U . Hence, θ ( p ^ ) coincides with rMDE θ ^ ( Φ , α ) r given in (9). □
The theorem below establishes the asymptotic normality of rMDE which is a straightforward extension of Theorem 2.4 [11] since by the Central Limit Theorem we know that
N ( p ^ p ( θ 0 ) ) N L N ( 0 , Σ p ( θ 0 ) )
with the asymptotic variance-covariance matrix Σ p ( θ 0 ) given by d i a g ( p ( θ 0 ) ) p ( θ 0 ) p ( θ 0 ) .
Theorem 2.
Under Assumptions ( A 0 ) ( A 5 ) , by (11) and for W ( θ 0 ) given in (10), the asymptotic distribution of rMDE is the s-dimensional Normal distribution given by
N ( θ ^ ( Φ , α ) r θ 0 ) N L N s ( 0 , W ( θ 0 ) Σ p ( θ 0 ) W ( θ 0 ) ) .
Remark 1.
The proposed class of estimators forms a family of estimators that goes beyond the indicator α since it is easy to see that estimators obtained for the Csiszar’s φ family are given for α = 0 in (1) and also the standard equiprobable model.

3. Statistical Inference

In this section, we introduce the double index divergence test statistic
T Φ 1 α 1 θ ^ ( Φ 2 , α 2 ) r = 2 N Φ 1 ( 1 ) d Φ 1 α 1 p ^ , p ( θ ^ ( Φ 2 , α 2 ) r )
with Φ 1 , Φ 2 F and α 1 , α 2 > 0 and make the additional assumptions by which we focus on the Csiszar’s family of measures for testing purposes (the notation φ is used for clarity) and the equiprobable model:
Assumption 2.
( A 6 )
  p i = 1 / m , i
( A 7 )
  Φ 1 = φ , α 1 = 0 .
The Theorem below provides the asymptotic distribution of (12) under Assumptions ( A 0 ) ( A 7 ) . Assumption ( A 7 ) will be later relaxed and a general asymptotic result will be presented in the next subsection. A discussion about Assumption A 6 will also be made in the sequel.
Theorem 3.
Under Assumptions ( A 0 ) ( A 7 ) and for the hypothesis in (3) we have
T φ 0 θ ^ ( Φ 2 , α 2 ) r = 2 N φ ( 1 ) d φ p ^ , p ( θ ^ ( Φ 2 , α 2 ) r ) N L χ m 1 s ν 2
with θ ^ ( Φ 2 , α 2 ) r given in (9).
Proof. 
It is straightforward that
p ( θ ^ ( Φ 2 , α 2 ) r ) = p ( θ 0 ) + J ( θ 0 ) ( θ ^ ( Φ 2 , α 2 ) r θ 0 ) + o ( θ ^ ( Φ 2 , α 2 ) r θ 0 )
which by Theorem 2, expression (11), and for M ( θ 0 ) = J ( θ 0 ) W ( θ 0 ) reduces to
p ( θ ^ ( Φ 2 , α 2 ) r ) p ( θ 0 ) = M ( θ 0 ) ( p ^ p ( θ 0 ) ) + o p ( N 1 / 2 )
which implies that
N ( p ( θ ^ ( Φ 2 , α 2 ) r ) p ( θ 0 ) ) N L N ( 0 , M ( θ 0 ) Σ p ( θ 0 ) M ( θ 0 ) ) .
Combining the above we obtain
N p ^ p ( θ 0 ) p ( θ ^ ( Φ 2 , α 2 ) r ) p ( θ 0 ) N L N 0 , I M ( θ 0 ) Σ p ( θ 0 ) ( I , M ( θ 0 ) )
and
N ( p ^ p ( θ ^ ( Φ 2 , α 2 ) r ) ) N L N ( 0 , L ( θ 0 ) )
where
L ( θ 0 ) = Σ p ( θ 0 ) M ( θ 0 ) Σ p ( θ 0 ) Σ p ( θ 0 ) M ( θ 0 ) + M ( θ 0 ) Σ p ( θ 0 ) M ( θ 0 ) .
The expansion of d φ p , q around ( p ( θ 0 ) , p ( θ 0 ) ) yields
T φ 0 θ ^ ( Φ 2 , α 2 ) r = i = 1 m N p i ( θ 0 ) p ^ i p i ( θ ^ ( Φ 2 , α 2 ) r ) 2 + o p ( 1 ) = X X + o p ( 1 )
where
X = N d i a g ( p ( θ 0 ) 1 / 2 ) ( p ^ p ( θ ^ ( Φ 2 , α 2 ) r ) ) N L N ( 0 , T ( θ 0 ) ) .
Then, under A 7 , T ( θ 0 ) (see (14)) is a projection matrix of rank m 1 s + ν since the trace of the matrices A ( θ 0 ) A ( θ 0 ) A ( θ 0 ) 1 A ( θ 0 ) and A ( θ 0 ) A ( θ 0 ) A ( θ 0 ) 1 Q ( θ 0 )   Q ( θ 0 ) ( A ( θ 0 ) A ( θ 0 ) 1 Q ( θ 0 ) ) 1   Q ( θ 0 ) A ( θ 0 ) A ( θ 0 ) 1 A ( θ 0 ) is equal to s and ν , respectively.
Then, the result follows from the fact (see ([24], p. 57)) that X X has a chi-squared distribution with degrees of freedom equal to the rank of the variance-covariance matrix of the random vector X as long as it is a projection matrix. □
Remark 2.
Relaxation of Assumption ( A 6 ) : Arguing as in [11], when the true model is not the equiprobable the result of Theorem 3 holds true as long as α 2 = 0 and approximately true when α 2 0 .

Asymptotic Theory of the Dual Divergence Test Statistic

Having established the two main results of the work, namely the decomposition of the proposed restricted estimator (Theorem 1) together with its asymptotic properties (Theorem 2), as well as the asymptotic distribution of the associated test statistic under the class of Csiszar φ -functions (Theorem 3) we continue below extended in a natural way the results of [11] for the dual divergence test statistic. The extensions presented in this section are considered vital due to their practical impication on cross tabulations discussed in Section 4. The proofs will be omitted since both results (Theorems 4 and 5) follow along the lines of previous results (see Theorems 3.4 and 3.9 of [11]). In what follows we adopt the following notation:
b = m α 1 , p ( 1 ) α 1 = min i { 1 , , m } p i ( θ 0 ) α 1 , p ( m ) α 1 = max i { 1 , , m } p i ( θ 0 ) α 1 , k = m 1 s + ν .
Theorem 4.
Under Assumptions ( A 0 ) ( A 7 ) we have
T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) N L b χ k 2 .
Remark 3.
Consider the case where Assumption ( A 6 ) is relaxed. Then, the asymptotic distribution of the test statistic T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) is estimated to be approximately b χ k 2 where
b = p ( 1 ) α 1 + p ( m ) α 1 2
as long as α 2 = 0 or α 2 0 . For further elaboration of this remark we refer to [11].
Remark 4.
Observe that if α 1 0 then b 1 and the asymptotic distribution becomes χ k 2 , while for α 1 away from 0 the distribution is proportional to χ k 2 with proportionality index b 1 . However, for not equiprobable models these statements hold true as long as α 2 is close to zero.
Consider now the hypothesis with contiguous alternatives [25,26]
H 0 : p = p ( θ 0 ) v s . H 1 , N : p = p ( θ 0 ) + d N
where d is an m-dimensional vector of known real values with components d i satisfying the assumption i = 1 m d i = 0 .
Observe that as N tends to infinity, the local contiguous alternative converges to the null hypothesis at the rate O ( N 1 / 2 ) . Alternatives, such as those in (16), are known as Pitman transition alternatives or Pitman (local) alternatives or local contiguous alternatives to the null hypothesis H 0 [25].
Theorem 5.
Under Assumptions ( A 0 ) ( A 7 ) and for the hypothesis (16) we have
T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) N L b χ k 2 ( ξ ξ )
which represents a non-central chi-squared distribution with k degrees of freedom and non-centrality parameter ξ ξ for which ξ = d i a g ( p ( θ 0 ) 1 / 2 ) ( I J ( θ 0 ) W ( θ 0 ) ) d .
Remark 5.
Observe that under Assumption ( A 6 ) ( p i = 1 / m ) the asymptotic distribution is independent of Φ, α 1 and α 2 . As a result the associated power of the test is Pr ( χ k 2 ( ξ ξ ) χ k , a 2 ) where a the 100 ( 1 a ) % percentile of the distribution. If assumption A 6 is relaxed then the distribution is approximately non-central chi-squared with proportionality index b = p ( 1 ) α 1 + p ( m ) α 1 2 .

4. Cross Tabulations and Dual Divergence Test Statistic

In this section, we try to take advantage of the methodology proposed earlier for the analysis of cross tabulations. In particular we focus on the case of three categorical variables, say X , Y , and Z with corresponding, I , J , and K. Then, assume that the probability mass of a realization of a randomly selected subject is denoted by p i j k ( θ ) = P r ( X = i , Y = j , Z = k ) > 0 , where here and in what follows i = 1 , , I , j = 1 , , J , k = 1 , , K unless otherwise stated. The associated probability vector is given as p ( θ ) = { p i j k ( θ ) } where
p i j k ( θ ) = θ i j k , ( i , j , k ) ( I , J , K ) 1 i = 1 I j = 1 J k = 1 K ( i , j , k ) ( I , J , K ) θ i j k , ( i , j , k ) = ( I , J , K )
and the parameter space as Θ = { θ i j k , ( i , j , k ) ( I , J , K ) } . The sample estimator of p i j k ( θ ) is p ^ i j k = n i j k / N , where n i j k is the frequency of the corresponding ( i , j , k ) cell.
In this set up the dual divergence test statistics is given as
T Φ 1 α 1 θ ^ ( Φ 2 , α 2 ) r = 2 N Φ 1 ( 1 ) i = 1 I j = 1 J k = 1 K p i j k ( θ ^ ( Φ 2 , α 2 ) r ) 1 + α Φ 1 p ^ i j k p i j k ( θ ^ ( Φ 2 , α 2 ) r )
where p ^ i j k as above and the rMD estimator as
θ ^ ( Φ 2 , α 2 ) r = arg inf { θ Θ R s : f k ( θ ) = 0 , k = 1 , , ν } i = 1 I j = 1 J k = 1 K p i j k ( θ ) 1 + α 2 Φ 2 p ^ i j k p i j k ( θ ) .
For α 1 , α 2 = 0 and special cases of the functions Φ 1 and Φ 2 , classical restricted minimum divergence estimators and associated test statistics can be derived from (18) and (17), respectively. For example, for α 1 , α 2 = 0 , and Φ 1 , Φ 2 = Φ K L the likelihood ratio test statistic with the restricted maximum likelihood estimator ( G 2 ( θ ^ r ) ) can be derived, while for Φ 1 , Φ 2 = Φ λ and λ = 1 we obtain the chi-squared test statistic with the restricted minimum chi-squared estimator ( X 2 ( θ ^ X 2 r ) ) . For Φ 1 , Φ 2 = Φ λ and λ = 2 / 3 the dual divergence test statistic reduces to the power divergence test statistic with the restricted minimum power divergence estimator ( C R ( θ ^ C R r ) ) whereas for λ = 1 / 2 reduces to the Freeman–Tukey test statistic with the restricted minimum Freeman–Tukey estimator ( F T ( θ ^ F T r ) ) .
The hypothesis of conditional independence between X, Y, and Z is given for any triplet i , j , k by
H 0 : p i j k ( θ 0 ) = p i k ( θ 0 ) p j k ( θ 0 ) p k ( θ 0 ) , θ 0 Θ unknown
where
p i k ( θ 0 ) = j = 1 J p i j k ( θ 0 ) , p j k ( θ 0 ) = i = 1 I p i j k ( θ 0 ) a n d p k ( θ 0 ) = i = 1 I j = 1 J p i j k ( θ 0 ) .
Under the ( I 1 ) ( J 1 ) K constrained functions
f i j k ( θ ) = p 11 k ( θ ) p i j k ( θ ) p 1 j k ( θ ) p i 1 k ( θ ) = 0
i = 2 , , I , j = 2 , , J , k = 1 , , K the above H 0 hypothesis with θ 0 unknown, becomes
H 0 : p = p ( θ 0 ) , for θ 0 Θ 0 ,
where Θ 0 = { θ Θ : f i j k ( θ ) = 0 , i = 2 , , I , j = 2 , , J , k = 1 , , K } .
Remark 6.
For practical purposes, the choice of the values of the indices is motivated by the work of [8] where, in an attempt to achieve a compromise between robustness and efficiency of estimators, they recommended the use of small values in the ( 0 , 1 ) region. In the following subsection, our analysis will reconfirm their findings since as it will be seen, values of both indices close to (0) (than to one (1)) will be found to be associated with a good performance not only in terms of estimation but also in terms of goodness of fit as it will be reflected in the size and the power of the test.

Simulation Study

In this simulation study, we use the rMD estimator and the associated dual divergence test statistic for the analysis of cross tabulations. Specifically, we are going to compare in terms of size and power classical tests with those that can be derived through the proposed methodology, for the problem of conditional independence of three random variables in contingency tables. We test the hypothesis of conditional independence for a 2 × 2 × 2 contingency table, thus in this case we have m = 8 probabilities of the multinomial model, s = 7 unknown parameters to estimate and two constraint functions ( ν = 2 ) which are given by
f 221 ( θ ) = θ 111 θ 221 θ 121 θ 211 a n d f 222 ( θ ) = θ 112 1 i = 1 2 j = 1 2 k = 1 2 ( i , j , k ) ( 2 , 2 , 2 ) θ i j k θ 122 θ 212 .
For a better understanding of the behaviour of the dual divergence test statistic given in (17) we compare it with the four classical tests-of-fit mentioned earlier in Section 4, namely with the G 2 ( θ ^ r ) , X 2 ( θ ^ X 2 r ) , C R ( θ ^ C R r ) and F T ( θ ^ F T r ) . The proposed test T Φ 1 α 1 θ ^ ( Φ 2 , α 2 ) r is applied for Φ 1 = Φ α 1 , Φ 2 = Φ α 2 and six different values of α 1 and α 2 , α 1 , α 2 = 10 7 , 0.01 , 0.05 , 0.10 , 0.50 , and 1.50 . Note that, the critical values used in this simulation study, are the asymptotic critical values based on the asymptotic distribution b χ 2 2 with b as in (15) for the double index family of test statistics, and the χ 2 2 for the classical test statistics. For the analysis we used 100,000 simulations and sample sizes equal to n = 20 , 25 (small sample sizes) and n = 40 , 45 (moderate sample sizes).
In this study, we have used the model previously considered by [27] given by
p 111 = π 111 π 111 w p 211 = π 211 + π 222 w π 111 w p 112 = π 112 + π 111 w π 222 w p 212 = π 212 + π 111 w π 222 w p 121 = π 121 + π 222 w p 221 = π 221 + π 222 w π 111 w p 122 = π 122 + π 111 w p 222 = π 222 π 222 w
where 0 w < 1 and π i j k = p i × p j × p k i , j , k = 1 , 2 with
π 111 = 0.036254 π 112 = 0.164994 π 121 = 0.092809 π 122 = 0.133645 π 211 = 0.092809 π 212 = 0.133645 π 221 = 0.237591 π 222 = 0.108253 .
For w = 0 we take the model under the null hypothesis of conditional independence while for values w 0 we take the models under the alternative hypotheses. We considered the following values of w = 0.00 , 0.30 , 0.60 , and 0.90 . Note that the larger the value of w the more we deviate from the null model. For the simulation study, we used the R software [28], while for the constrained optimization the auglag function from the nloptr package [29].
From Table 1, we can observe that in terms of size the performance of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) is adequate for values of α 1 , α 2 0.5 both for small and moderate sample sizes. In addition, we can see that for α 1 0.10 , T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) appears to be liberal while for α 1 0.5 appears to be conservative. We also note that the size becomes smaller as α 1 and α 2 increase with α 1 α 2 . Table 2 provides the size of the classical tests-of-fit from where we can observe that C R ( θ ^ C R r ) has the best performance among all competing tests for every sample size. In contrast, F T ( θ ^ F T r ) has the worst performance among all competing tests and appears to be very liberal. Furthermore, X 2 ( θ ^ X 2 r ) appears to be conservative while G 2 ( θ ^ r ) appears to be liberal. Note that for α 1 [ 0.01 , 0.5 ] and α 2 0.10 , T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) behaves better than the G 2 ( θ ^ r ) test statistic and its performance is quite close to the performance of the X 2 ( θ ^ X 2 r ) .
In order to examine the closeness of the estimated (true) size to the nominal size α = 0.05 we consider the criterion given by Dale [30]. The criterion involves the following inequality
| logit ( 1 α ^ n ) logit ( 1 α ) | d
where logit ( p ) = log ( p / ( 1 p ) ) and α ^ n is the estimated (true) size. The estimated (true) size is considered to be close to the nominal size if (19) is satisfied with d = 0.35 . Note that in this situation the estimated (true) size is close to the nominal one if α ^ n [ 0.0357 , 0.0695 ] and is presented in Table 1 and Table 2 in bold. This criterion has been used previously among others by [27,31].
Regarding the proposed test we can see that for small sample sizes the estimated (true) size is close to the nominal for α 1 [ 0.10 , 0.50 ] and α 2 0.10 while for moderate sample sizes for α 1 [ 10 7 , 0.50 ] and α 2 0.10 . With reference to the classical tests-of-fit we can observe that the size of the C R ( θ ^ C R r ) is close to the nominal for every sample size whereas the size of G 2 ( θ ^ r ) and X 2 ( θ ^ X 2 r ) is close only for moderate sample sizes. Finally, we note that the estimated (true) size of F T ( θ ^ F T r ) fails to be close to the nominal both for small and moderate sample sizes.
In Table 3, Table 4 and Table 5, we provide the results regarding the power of the proposed family of test statistics for the three alternatives and sample sizes n = 20 , 25 , 40 , 45 , while Table 2 provides the results regarding the power of the classical tests-of-fit. The performance tends to be better as we deviate from the null model and as the sample size increases both for the classical and the proposed tests.
As general comments regarding the behaviour of the proposed and the classical tests-of-fit in terms of power we state that the best results for the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) are obtained for small values of α 1 in the range ( 0 , 0.1 ] and large values of α 2 with α 1 α 2 . Note that although in terms of power results become better as α 2 increases in terms of size these are adequate only for α 2 0.5 . In addition, we can observe that the performance of T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) is better than the C R ( θ ^ C R r ) and X 2 ( θ ^ X 2 r ) for every alternative and every sample size for α 1 0.1 and α 2 0.5 and slightly better than G 2 ( θ ^ r ) for small values of α 1 and large values of α 2 , for example for α 1 = 0.01 and α 2 = 0.50 . Furthermore, we can observe that for α 1 = 0.1 and α 2 0.1 the size of the test is better than the size of the G 2 ( θ ^ r ) and slightly worst form the size of the C R ( θ ^ C R r ) and X 2 ( θ ^ X 2 r ) test statistics while its power is quite better than the power of the C R ( θ ^ C R r ) and X 2 ( θ ^ X 2 r ) and slightly worst than the G 2 ( θ ^ r ) . Additionally, we can see that as α 1 and α 2 tend to 0 the behaviour of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic coincides with the G 2 ( θ ^ r ) test both in terms of size and power as it was expected.
In order to attain a better insight about the behaviour of the test statistics, we apply Dale’s criterion, not only for the nominal size α = 0.05 , but also for a range of nominal sizes that are of interest. Based on the previous analysis, beside the classical tests, we will focus our interest on the T Φ 1 0.05 ( θ ^ ( Φ 2 , 0.05 ) r ) , T Φ 1 0.10 ( θ ^ ( Φ 2 , 0.10 ) r ) , and T Φ 1 0.20 ( θ ^ ( Φ 2 , 0.20 ) r ) . The following simplified notation is used in every Figure, FT F T ( θ ^ F T r ) , ML G 2 ( θ ^ r ) , CR C R ( θ ^ C R r ) , Pe X 2 ( θ ^ X 2 r ) , T 1 T Φ 1 0.05 ( θ ^ ( Φ 2 , 0.05 ) r ) , T 2 T Φ 1 0.10 ( θ ^ ( Φ 2 , 0.10 ) r ) , and T 3 = T Φ 1 0.20 ( θ ^ ( Φ 2 , 0.20 ) r ) . From Figure 1a, we can see that for small sample sizes ( n = 25 ) T Φ 1 0.20 ( θ ^ ( Φ 2 , 0.20 ) r ) and C R ( θ ^ C R r ) satisfy Dale’s criterion for every nominal size while T Φ 1 0.10 ( θ ^ ( Φ 2 , 0.10 ) r ) and X 2 ( θ ^ X 2 r ) for nominal sizes greater than 0.03 and 0.06 , respectively. Note that the dashed line in Figure 1 denotes the situation in which the estimated (true) size equals to the nominal size and thus lines that lie above this reference line refer to liberal tests while those that lie below to conservative ones. On the other hand, for moderate sample sizes ( n = 45 ) all chosen test statistics satisfy Dale’s criterion except F T ( θ ^ F T r ) .
Taking into account the fact that the actual size of each test differs from the targeted nominal size, we have to make an adjustment in order to proceed further with the comparison of the tests in terms of power. We focus our interest in those tests that satisfy Dale’s criterion and follow the method proposed in [32] which involves the so-called receiver operating characteristic (ROC) curves. In particular, let G ( t ) = P r ( T t ) be the survivor function of a general test statistic T, and c = inf { t : G ( t ) α } be the critical value, then ROC curves can be formulated by plotting the power G 1 ( c ) against the size G 0 ( c ) for various values of the critical value c. Note that with G 0 ( t ) we denote the distribution of the test statistic under the null hypothesis and with G 1 ( t ) under the alternative.
Since results are similar for every alternative we restrict ourselves to w = 0.60 which refers to an alternative that is neither too close nor too far from the null. For small sample sizes ( n = 25 ) results are presented in Figure 2, where we can see that the proposed test is superior from the classical tests-of-fit in terms of power. However, for moderate sample sizes ( n = 45 ) we can observe in Figure 3 that G 2 ( θ ^ r ) has the best performance among all competing tests followed by the proposed test-of-fit.
From the conducted analysis we conclude that regarding the proposed test there is a trade off between size and power for different choices of the indices α 1 and α 2 . In particular, we can see that as α 1 increases the size becomes smaller in the expense of smaller power, while as α 2 increases the power becomes better and the tests more liberal. In conclusion, we could state that for values of α 1 and α 2 in the range ( 0.05 , 0.25 ) the resulting test statistic provides a fair balance between size and power which makes it an attractive alternative to the classical tests-of-fit where for small sample sizes larger values of the indices are preferable whereas for moderate sample sizes, smaller ones are recommended.

5. Conclusions

In this work, a general divergence family of test statistics is presented for hypothesis testing problems as in (3), under constraints. For estimating purposes, we introduce, discuss and use the rMD (restricted minimum divergence) estimator presented in (8). The proposed double index (dual) divergence test statistic involves two pairs of elements, namely ( Φ 2 , α 2 ) to be used for the estimation problem and ( Φ 1 , α 1 ) to be used for the testing problem. The duality refers to the fact that the two pairs may or may not be the same providing the researcher with the greatest possible flexibility.
The asymptotic distribution of the dual divergence test statistic is found to be proportional to the chi-squared distribution irrespectively of the nature of the multinomial model, as long as the values of the two indicators involved are relative close to zero (less than 0.5 ). Such values are known to provide a satisfactory balance between efficiency and robustness (see, for instance, [8] or [3]).
The methodology developed in this work can be used in the analysis of contingency tables which is applicable in various scientific fields: biosciences, such as genetics [33] and epidemiology [34]; finance, such as the evaluation of investment effectiveness or business performance [35]; insurance science [36]; or socioeconomics [37]. This work concludes with a comparative simulation study between classical test statistics and members of the proposed family, where the focus is placed on the conditional independence of three random variables. Results indicate that, by selecting wisely the values of the α 1 and α 2 indices, we can derive a test statistic that can be thought of as a powerful and reliable alternative to the classical tests-of-fit especially for small sample sizes.

Author Contributions

Conceptualization, A.K. and C.M.; data curation, C.M.; methodology, A.K. and C.M; software, C.M.; formal analysis, A.K. and C.M.; writing—original draft preparation, C.M.; writing—review and editing, A.K. and C.M.; supervision, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to express their appreciation to the anonymous referees and the Associated Editor for their valuable comments and suggestions. The authors wish also to express their appreciation to the professor A. Batsidis of the University of Ioannina for bringing to their attention citation [31] which helped greatly the comparative analysis performed in this work. This work was completed as part of the first author PhD thesis and falls within the research activities of the Laboratory of Statistics and Data Analysis of the University of the Aegean.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The Birch regularity conditions mentioned in Assumption (A5) of Section 2 are stated below (for details see [22])
  • The point θ 0 is an interior point of Θ ;
  • p i = p i ( θ 0 ) > 0 for i = 1 , , m ;
  • The mapping p ( θ ) : Θ P is totally differentiable at θ 0 so that the partial derivatives of p i ( θ 0 ) with respect to each θ j exist at θ 0 and p ( θ ) has a linear approximation at θ 0 given by
    p i ( θ ) = p i ( θ 0 ) + j = 1 s ( θ j θ 0 j ) p i ( θ 0 ) θ j + o ( θ θ 0 ) , i = 1 , , m
    as θ θ 0 .
  • The Jacobian matrix
    J ( θ 0 ) = p ( θ ) θ θ = θ 0 = p i ( θ 0 ) θ j i = 1 , , m j = 1 , , s
    is of full rank;
  • The mapping inverse to θ p ( θ ) exists and is continuous at θ 0 ;
  • The mapping p : Θ P is continuous at every point θ Θ .

References

  1. Salicru, M.; Morales, D.; Menendez, M.; Pardo, L. On the Applications of Divergence Type Measures in Testing Statistical Hypotheses. J. Multivar. Anal. 1994, 51, 372–391. [Google Scholar] [CrossRef] [Green Version]
  2. Contreras-Reyes, J.E.; Kahrari, F.; Cortés, D.D. On the modified skew-normal-Cauchy distribution: Properties, inference and applications. Commun. Stat. Theory Methods 2021, 50, 3615–3631. [Google Scholar] [CrossRef]
  3. Mattheou, K.; Karagrigoriou, A. A New Family of Divergence Measures for Tests of Fit. Aust. N. Z. J. Stat. 2010, 52, 187–200. [Google Scholar] [CrossRef]
  4. Csiszár, I. Eine Informationstheoretische Ungleichung und Ihre Anwendung auf Beweis der Ergodizitaet von Markoffschen Ketten. Magyer Tud. Akad. Mat. Kut. Int. Koezl. 1963, 8, 85–108. [Google Scholar]
  5. Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  6. Cressie, N.; Read, T.R.C. Multinomial Goodness-of-Fit Tests. J. R. Stat. Soc. Ser. B Methodol. 1984, 46, 440–464. [Google Scholar] [CrossRef]
  7. Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1900, 50, 157–175. [Google Scholar] [CrossRef] [Green Version]
  8. Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef] [Green Version]
  9. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman and Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
  10. Morales, D.; Pardo, L.; Vajda, I. Asymptotic Divergence of Estimates of Discrete Distributions. J. Stat. Plan. Inference 1995, 48, 347–369. [Google Scholar] [CrossRef]
  11. Meselidis, C.; Karagrigoriou, A. Statistical Inference for Multinomial Populations Based on a Double Index Family of Test Statistics. J. Stat. Comput. Simul. 2020, 90, 1773–1792. [Google Scholar] [CrossRef]
  12. Pardo, J.; Pardo, L.; Zografos, K. Minimum φ-divergence Estimators with Constraints in Multinomial Populations. J. Stat. Plan. Inference 2002, 104, 221–237. [Google Scholar] [CrossRef]
  13. Read, T.R.; Cressie, N.A. Goodness-of-Fit Statistics for Discrete Multivariate Data; Springer: New York, NY, USA, 1988. [Google Scholar]
  14. Alin, A.; Kurt, S. Ordinary and Penalized Minimum Power-divergence Estimators in Two-way Contingency Tables. Comput. Stat. 2008, 23, 455–468. [Google Scholar] [CrossRef]
  15. Toma, A. Optimal Robust M-estimators Using Divergences. Stat. Probab. Lett. 2009, 79, 1–5. [Google Scholar] [CrossRef] [Green Version]
  16. Jiménez-Gamero, M.; Pino-Mejías, R.; Alba-Fernández, V.; Moreno-Rebollo, J. Minimum ϕ-divergence Estimation in Misspecified Multinomial Models. Comput. Stat. Data Anal. 2011, 55, 3365–3378. [Google Scholar] [CrossRef]
  17. Kim, B.; Lee, S. Minimum density power divergence estimator for covariance matrix based on skew t distribution. Stat. Methods Appl. 2014, 23, 565–575. [Google Scholar] [CrossRef]
  18. Neath, A.A.; Cavanaugh, J.E.; Weyhaupt, A.G. Model Evaluation, Discrepancy Function Estimation, and Social Choice Theory. Comput. Stat. 2015, 30, 231–249. [Google Scholar] [CrossRef]
  19. Ghosh, A. Divergence based robust estimation of the tail index through an exponential regression model. Stat. Methods Appl. 2016, 26, 181–213. [Google Scholar] [CrossRef] [Green Version]
  20. Jiménez-Gamero, M.D.; Batsidis, A. Minimum Distance Estimators for Count Data Based on the Probability Generating Function with Applications. Metrika 2017, 80, 503–545. [Google Scholar] [CrossRef]
  21. Basu, A.; Ghosh, A.; Mandal, A.; Martin, N.; Pardo, L. Robust Wald-type tests in GLM with random design based on minimum density power divergence estimators. Stat. Methods Appl. 2021, 30, 973–1005. [Google Scholar] [CrossRef]
  22. Birch, M.W. A New Proof of the Pearson-Fisher Theorem. Ann. Math. Stat. 1964, 35, 817–824. [Google Scholar] [CrossRef]
  23. Krantz, S.G.; Parks, H.R. The Implicit Function Theorem: History, Theory, and Applications; Birkhäuser: Basel, Swiztherland, 2013. [Google Scholar]
  24. Ferguson, T.S. A Course in Large Sample Theory; Chapman and Hall: Boca Raton, FL, USA, 1996. [Google Scholar]
  25. McManus, D.A. Who Invented Local Power Analysis? Econom. Theory 1991, 7, 265–268. [Google Scholar] [CrossRef]
  26. Neyman, J. “Smooth” Test for Goodness of Fit. Scand. Actuar. J. 1937, 1937, 149–199. [Google Scholar] [CrossRef]
  27. Pardo, J.A. An approach to multiway contingency tables based on φ-divergence test statistics. J. Multivar. Anal. 2010, 101, 2305–2319. [Google Scholar] [CrossRef] [Green Version]
  28. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
  29. Johnson, S.G. The NLopt Nonlinear-Optimization Package. 2014. Available online: http://ab-initio.mit.edu/nlopt (accessed on 27 March 2022).
  30. Dale, J.R. Asymptotic Normality of Goodness-of-Fit Statistics for Sparse Product Multinomials. J. R. Stat. Soc. Ser. B Methodol. 1986, 48, 48–59. [Google Scholar] [CrossRef]
  31. Batsidis, A.; Martin, N.; Pardo Llorente, L.; Zografos, K. φ-Divergence Based Procedure for Parametric Change-Point Problems. Methodol. Comput. Appl. Probab. 2016, 18, 21–35. [Google Scholar] [CrossRef] [Green Version]
  32. Lloyd, C.J. Estimating test power adjusted for size. J. Stat. Comput. Simul. 2005, 75, 921–933. [Google Scholar] [CrossRef]
  33. Dubrova, Y.E.; Grant, G.; Chumak, A.A.; Stezhka, V.A.; Karakasian, A.N. Elevated Minisatellite Mutation Rate in the Post-Chernobyl Families from Ukraine. Am. J. Hum. Genet. 2002, 71, 801–809. [Google Scholar] [CrossRef] [Green Version]
  34. Znaor, A.; Brennan, P.; Gajalakshmi, V.; Mathew, A.; Shanta, V.; Varghese, C.; Boffetta, P. Independent and combined effects of tobacco smoking, chewing and alcohol drinking on the risk of oral, pharyngeal and esophageal cancers in Indian men. Int. J. Cancer 2003, 105, 681–686. [Google Scholar] [CrossRef]
  35. Merková, M. Use of Investment Controlling and its Impact into Business Performance. Procedia Econ. Financ. 2015, 34, 608–614. [Google Scholar] [CrossRef] [Green Version]
  36. Geenens, G.; Simar, L. Nonparametric tests for conditional independence in two-way contingency tables. J. Multivar. Anal. 2010, 101, 765–788. [Google Scholar] [CrossRef] [Green Version]
  37. Bartolucci, F.; Scaccia, L. Testing for positive association in contingency tables with fixed margins. Comput. Stat. Data Anal. 2004, 47, 195–210. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Estimated (true) sizes against nominal sizes. The shaded area refers to Dale’s criterion. (a) n = 25 . (b) n = 45 .
Figure 1. Estimated (true) sizes against nominal sizes. The shaded area refers to Dale’s criterion. (a) n = 25 . (b) n = 45 .
Entropy 24 00477 g001
Figure 2. (a) Empirical ROC curves for n = 25. (b) The same curves magnified over a relevant range of empirical sizes.
Figure 2. (a) Empirical ROC curves for n = 25. (b) The same curves magnified over a relevant range of empirical sizes.
Entropy 24 00477 g002
Figure 3. (a) Empirical ROC curves for n = 45. (b) The same curves magnified over a relevant range of empirical sizes.
Figure 3. (a) Empirical ROC curves for n = 45. (b) The same curves magnified over a relevant range of empirical sizes.
Entropy 24 00477 g003
Table 1. Size ( w = 0.00 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 . Sizes that satisfy Dale’s criterion are presented in bold.
Table 1. Size ( w = 0.00 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 . Sizes that satisfy Dale’s criterion are presented in bold.
α 2
α 1 10 7 0.010.050.100.501.50 10 7 0.010.050.100.501.50
n = 20 n = 25
10 7 8.2568.2578.2608.2639.21613.8567.8637.8657.8787.9208.92713.192
0.018.2078.2068.2098.2249.22413.6237.7537.7547.7637.8178.79712.930
0.057.8967.8497.8797.8868.71912.9167.3407.3347.3277.3508.31312.277
0.107.4037.4047.3787.3568.04611.9946.9656.9596.9406.9347.67511.364
0.503.8733.8503.7693.6123.0234.0503.8573.8193.7223.6043.1914.304
1.500.9200.8930.8070.7580.5090.2021.0461.0190.9480.8850.6020.203
n = 40 n = 45
10 7 7.0167.0167.0277.0557.88711.3626.8586.8586.8706.9087.73211.099
0.016.9336.9336.9406.9577.77811.1836.7606.7606.7706.8057.60110.941
0.056.5906.5896.5806.5937.34210.5056.4276.4226.4156.4267.15310.340
0.106.2466.2396.2286.2226.7949.7586.0826.0706.0536.0436.6129.586
0.503.8543.8323.7623.6613.3674.3623.8133.7893.7163.6353.3314.269
1.501.1721.1601.1151.0660.7600.3831.1831.1701.1191.0680.7730.437
Table 2. Size ( w = 0.00 ) and power ( w = 0.30 , 0.60 , 0.90 ) calculations (%) for the classical tests-of-fit. Sizes that satisfy Dale’s criterion are presented in bold.
Table 2. Size ( w = 0.00 ) and power ( w = 0.30 , 0.60 , 0.90 ) calculations (%) for the classical tests-of-fit. Sizes that satisfy Dale’s criterion are presented in bold.
Sample size F T G 2 C R X 2 F T G 2 C R X 2
w = 0.00 w = 0.30
n = 20 14.7158.2614.2193.14018.3669.0724.2002.966
n = 25 13.6647.8654.3333.47719.6749.8464.7833.646
n = 40 11.1547.0164.7224.05921.92012.1926.9355.548
n = 45 10.7876.8584.7034.08222.46712.9927.4716.081
w = 0.40 w = 0.45
n = 20 29.70714.9367.0964.91047.85926.72113.7899.704
n = 25 35.76818.9669.4697.11862.81038.02320.14715.296
n = 40 48.36631.51318.78015.03085.77369.59947.64439.481
n = 45 50.82135.38122.36718.21789.10876.68557.00048.451
Table 3. Power ( w = 0.30 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
Table 3. Power ( w = 0.30 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
α 2
α 1 10 7 0.010.050.100.501.50 10 7 0.010.050.100.501.50
n = 20 n = 25
10 7 9.0739.0729.0719.0769.99315.0629.8469.8469.8689.89510.92415.729
0.018.9908.9898.9889.0069.94814.7249.6309.6309.6519.72710.71215.343
0.058.3508.2788.3408.3579.23113.8199.0339.0088.9909.0229.87614.332
0.107.6947.6967.6267.6168.27312.6568.2258.2168.1948.1888.89013.111
0.503.7513.7173.6073.4182.8894.1993.7973.7613.6563.5813.2524.620
1.500.7930.7640.6760.6300.4150.1630.8200.8100.7560.7180.4790.158
n = 40 n = 45
10 7 12.19212.19312.20712.23113.14217.77512.99212.99213.00313.05214.01418.490
0.0111.93511.93411.94211.97912.85317.38712.72412.72412.73012.76413.72118.148
0.0511.07511.07511.06911.07411.84416.04611.79911.78611.76011.76812.62816.815
0.1010.07210.06010.03910.02210.56514.54910.74710.72910.68810.66911.21815.183
0.504.8634.8424.7434.6484.3425.8155.2145.1795.0784.9774.6486.116
1.500.9790.9700.9280.8900.6620.3791.0321.0190.9780.9280.6930.412
Table 4. Power ( w = 0.60 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
Table 4. Power ( w = 0.60 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
α 2
α 1 10 7 0.010.050.100.501.50 10 7 0.010.050.100.501.50
n = 20 n = 25
10 7 14.92814.93714.93214.94416.18622.90018.96518.96419.00419.04220.60727.684
0.0114.80714.81314.80814.83316.11722.48618.56518.56418.59818.70220.23527.069
0.0513.71113.58313.72613.73514.93921.14317.43617.38317.36017.42218.73325.365
0.1012.61212.61912.52912.52513.21719.54515.79415.76715.74315.72616.86923.368
0.506.0885.9945.8115.4164.5536.4036.8796.8216.6566.4735.9128.458
1.501.1181.0770.9440.8890.5530.2151.2751.2401.1521.0810.7290.260
n = 40 n = 45
10 7 31.51331.51831.53331.60833.46940.79935.38135.38135.40435.46537.41144.556
0.0130.90430.90330.92530.99932.86840.22134.84834.84534.86334.94136.74443.942
0.0528.94928.94628.93828.95630.50937.75632.72732.71632.69732.71534.31041.510
0.1026.50426.48526.43426.39827.63134.74730.14630.11030.05130.01431.28938.456
0.5011.94911.86711.59811.40910.83014.70314.05213.96613.63213.32112.73116.901
1.501.7971.7611.6921.5781.1420.7161.9731.9451.8701.7761.2950.838
Table 5. Power ( w = 0.90 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
Table 5. Power ( w = 0.90 ) calculations (%) of the T Φ 1 α 1 ( θ ^ ( Φ 2 , α 2 ) r ) test statistic for sample sizes n = 20 , 25 , 40 , 45 .
α 2
α 1 10 7 0.010.050.100.501.50 10 7 0.010.050.100.501.50
n = 20 n = 25
10 7 26.71226.71026.70726.71128.49537.92438.01738.01638.13238.19140.98250.954
0.0126.58926.58626.58526.61328.71837.42137.36537.36437.45637.64540.48250.206
0.0525.43725.26725.53125.50227.17035.97935.67435.55935.52635.64338.26048.187
0.1024.28724.28424.23224.17224.86833.94633.01432.93932.86732.85435.18445.569
0.5012.00311.78011.42410.7728.80711.66514.35314.22613.87013.56012.31216.886
1.501.7311.6621.4891.4220.9040.2982.2682.2262.0261.9161.3870.506
n = 40 n = 45
10 7 69.59969.60569.63769.75572.19679.36376.68576.68576.73176.80578.80284.683
0.0168.92368.92368.95469.04971.51879.00376.17776.17376.19276.26478.14384.344
0.0566.31066.30966.30666.36568.57677.06973.76073.74573.73273.76675.74882.751
0.1062.50062.45562.37262.34364.66074.16170.29570.26470.14470.13172.17280.319
0.5030.09429.90429.34928.84827.89536.90236.61236.46535.79235.07334.05643.732
1.503.7483.6783.4723.2102.2691.5624.3494.2744.0173.7472.6651.927
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Meselidis, C.; Karagrigoriou, A. Contingency Table Analysis and Inference via Double Index Measures. Entropy 2022, 24, 477. https://doi.org/10.3390/e24040477

AMA Style

Meselidis C, Karagrigoriou A. Contingency Table Analysis and Inference via Double Index Measures. Entropy. 2022; 24(4):477. https://doi.org/10.3390/e24040477

Chicago/Turabian Style

Meselidis, Christos, and Alex Karagrigoriou. 2022. "Contingency Table Analysis and Inference via Double Index Measures" Entropy 24, no. 4: 477. https://doi.org/10.3390/e24040477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop