Asymptotic Results for Multinomial Models

Akoto, Isaac; Mexia, João T.; Marques, Filipe J.

doi:10.3390/sym13112173

Open AccessArticle

Asymptotic Results for Multinomial Models

by

Isaac Akoto

^1,2,*,†

,

João T. Mexia

^1,3,†

and

Filipe J. Marques

^1,3

¹

NOVA School of Science and Technology, Campus de Caparica, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

²

Department of Mathematics and Statistics, University of Energy and Natural Resources, Sunyani P.O. Box 214, Ghana

³

Center of Mathematics and Its Applications, NOVA School of Science and Technology, Campus de Caparica, Universidade NOVA de Lisboa, 2829-516 Lisbon, Portugal

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2021, 13(11), 2173; https://doi.org/10.3390/sym13112173

Submission received: 6 October 2021 / Revised: 30 October 2021 / Accepted: 4 November 2021 / Published: 12 November 2021

(This article belongs to the Special Issue Probability, Statistics and Applied Mathematics)

Download Versions Notes

Abstract

:

In this work, we derived new asymptotic results for multinomial models. To obtain these results, we started by studying limit distributions in models with a compact parameter space. This restriction holds since the key parameter whose components are the probabilities of the possible outcomes have non-negative components that add up to 1. Based on these results, we obtained confidence ellipsoids and simultaneous confidence intervals for models with normal limit distributions. We then studied the covariance matrices of the limit normal distributions for the multinomial models. This was a transition between the previous general results and on the inference for multinomial models in which we considered the chi-square tests, confidence regions and non-linear statistics—namely log-linear models with two numerical applications to those models. Namely, our approach overcame the hierarchical restrictions assumed to analyse the multidimensional contingency table.

Keywords:

confidence ellipsoids; covariance matrices; limit distributions; classification; non-linear models

1. Introduction

In several fields of study such as health, business, social sciences and education, the outcomes of variables are mainly discrete, i.e., the variables only take finite or countable numbers. A discrete variable whose outcome only takes finite numbers is called a categorical variable [1]. A categorical variable consists of a set of categories that are non-overlapping [2] and the outcome could be binary (dichotomous), i.e., with just two possible levels, such as “present” or “absent” as a desired condition, or polytomous, i.e., with more than two levels, as is the case of the “Likert” scale [3]. There are two common types of polytomous variables, as can be seen in [4], which are the ordinal and nominal scale of measurement. Categorical variables, such as one’s eye colour, ethnicity and affiliations, the categories of which cannot be ordered in any way, are nominal, while categories such as the level of resistance to a drug of a patient, the level of education and economic status exhibit a natural order and are thus ordinal.

In a study in which all the observed variables are categorical, the most common way of representing the data is in a contingency table, which is a cross-tabulation of the variables [5,6]. When there are m-variables, the contingency table is an m-dimensional table—also known as a multidimensional table when the attributes are more than two. The information of a contingency table is mainly summarized through appropriate measures such as measures of association or models. Association measures, although easy in their computation and interpretation, lead to a great loss of information, as can be seen in [7]. Models are preferred in the case where a more sensitive analysis is required. A model is a “theory” or a conceptual framework about observations, and the parameters in the model represent the “effects” that particular variables or combinations of variables have in determining the values taken by the observations.

The easiest and most common model for a contingency table is the log-linear model [8]. It is constructed by taking the natural logarithms of the cell probabilities by the analogy of the analysis of variance (ANOVA) models, as can be seen in [9,10,11]. Classical log-linear models are sometimes regarded in the framework of the generalized linear model (GLM). They are also important in connection with contingency matrices, as can be seen in [12]. Contemporary problems in categorical data analysis with extremely high-dimensional data with demanding computational procedures require the development of complex models. Much work has been done on the modelling of categorical data, as can be seen in [7,13]. For example, in [14], the author used regression models for modelling categorical data. In our work, we derived new asymptotic results that will enable us to obtain confidence ellipsoids and simultaneous confidence intervals, respectively, for the vector of probabilities and its components, which will enable us to overcome some inference limitations of the existing procedures.

Inferential statistical analysis requires assumptions about the probability distribution of the response variable. For categorical data, the main distribution is the multinomial distribution. Most of the time, categorical data result from n-independent and identical trials with each trial having two or more possible outcomes. When the n is identical and independent trials have the same category probabilities, then the distribution of counts in the various categories is the multinomial distribution. The binomial distribution is a special case of the multinomial distribution with just two possible outcomes for each trial. Usually, the parameters of the multinomial distribution are not known and these parameters are often estimated from the sample data by several estimation methods such as the maximum likelihood estimation (MLE), as can be seen, for instance, in [15], the minimum discrimination information (MDI) [16], weighted least squares (WLS) [17] and Bayesian estimation (BA) [18]. In a previous study [19], we wanted to minimize the average cost so we used statistical decision theory (SDT) since there were only a finite number of possible choices. We point out that we achieved consistency since the probability of selecting the choice with the least average cost tends towards 1 and when the sample size tends towards infinity.

If we have n realizations of an experiment with m possible results with probabilities

p_{1}, \dots, p_{m}

, we have the probability mass function, as can be seen in [20,21]:

P r [⋂_{l = 1}^{m} (N_{l} = n_{l})] = \frac{n!}{\prod_{l = 1}^{m} n_{l}!} p_{l}^{n_{l}}

(1)

for the vector

N = (N_{1}, \dots, N_{m})

of the times we obtain the different results. This probability mass function corresponds to the singular multinomial distribution

M (\cdot | n, p)

. We name as multinomial the models for describing these sets of independent realizations of experiments with a finite number of results:

For the vector with

p = (p_{1}, \dots, p_{m})

of probabilities, we have the vector of estimators:

\tilde{p} = ({\tilde{p}}_{1}, \dots, {\tilde{p}}_{m})

(2)

with:

{\tilde{p}}_{l} = \frac{n_{l}}{n}, l = 1, \dots, m

(3)

Moreover, as can be seen in [22], as

n \to \infty

:

\sqrt{n} ({\tilde{p}}_{m} - p) \sim N (0, U (p))

(4)

where ∼ indicates the limit distribution, in this case

N (0, U (p))

, the normal distribution with the null mean vector and covariance matrix:

U (p) = D (p) - p p^{t},

(5)

where

D (p)

is the diagonal matrix with principal elements

p_{1}, \dots, p_{m}

. This result will play an important role in the asymptotic treatment of the multinomial models which is this paper’s goal.

To carry out that asymptotic treatment, we start by obtaining a convenient version of the continuous mapping theorem [23] in the next section on limit distributions. Then, we obtained confidence regions in Section 3, namely the confidence ellipsoids and simultaneous confidence intervals. Then, in Section 4, we studied the algebraic structure of the limit covariance matrix,

U (p)

.

In Section 5, we obtained chi-square tests for hypotheses on outcome probabilities and confidence ellipsoids and simultaneous confidence intervals for them. We also considered log-linear models for which we presented a numerical application. We pointed out that our approach to these models overcame the hierarchical restriction used to analyse multidimensional contingency tables.

Our use of both the classical and the new version of the parametrized continuous mapping theorem (PCMT) enabled us to carry out statistical inference for multinomial models. This inference was similar to ANOVA and related techniques but F-tests were replaced by chi-square tests which is highly convenient since now we have an infinity of degrees of freedom for the error.

Finally, we stress the close relationship between our ANOVA-like inference using the chi-square test and the usual treatment of fixed effect models. We point out that the F-test in that treatment had interesting invariance properties that expressed the symmetry of those models, especially since those models are associated with orthogonal partitions or sub-spaces which are invariant for rotation.

2. Limit Distributions

Let

C

be the class of continuous functions. If

l (\cdot) \in C

, and the distribution

F_{Y_{n}}

, of

Y_{n}

converges to

F_{Y} \in C

(that is

F_{Y_{n}} (y_{\cdot}) ⇝ F_{Y} (y_{\cdot})

, whenever

y_{\cdot}

is a continuity point of

F_{Y} (\cdot)

), we have, as can be seen in [23,24,25,26]:

F_{l (Y_{n})} \underset{n \to \infty}{\to} F_{l (Y)}

(6)

as follows from the continuous mapping theorem.

If

z

is obtained superposing sub-vectors

u

and

v

, we put

z = {[u v]}^{t}

. Then, if

θ_{n} \overset{p}{\to} θ

, with

θ

belonging to a compact set

D

, and

F_{Y_{n}} \overset{}{\to} F_{Y}

, putting

{\dot{Y}}_{n} = {[Y_{n}^{t} θ_{n}^{t}]}^{t}

,

Y_{n}^{+} = {[Y_{n}^{t} θ^{t}]}^{t}

and

\dot{Y} = {[Y^{t} θ^{t}]}^{t}

to show that:

F_{\dot{l} ({\dot{Y}}_{n})} ⟶ F_{\dot{l} (\dot{Y})} (\cdot | θ),

(7)

it is only needed to show that, as can be seen in [27]:

sup \{| F_{\dot{l} (\dot{Y_{n}})} - F_{\dot{l} (Y_{n}^{+})} |\} \underset{n \to \infty}{\to} 0

(8)

since

F_{\dot{l} (Y_{n}^{+})} (\cdot) = F_{\dot{l} (\dot{Y})} (\cdot | θ)

With

ξ_{n} (.)

and

ξ (.)

, the probability measures associated with

F_{Y_{n}}

and

F_{Y}

and representing the Cartesian product by ×, whatever

ε > 0

, there exists a parallelepiped:

H (ε) = \times_{i = 1}^{m} [a (ε); b (ε)]

(9)

with

Y_{n}, Y \in R^{n}

, such that

ξ (H (ε)) \geq 1 - ε

. Since

F_{Y_{n}} ⟶ F_{Y}

, we have

ξ_{n} (H (ε)) \underset{n \to \infty}{\to} ξ (H (ε))

and so there will be

n (ε)

such that, for

n > n (ε)

:

ξ_{n} (H (ε)) > 1 - 2 ε .

(10)

Now:

\dot{H} (ε) = H (ε) \times D

(11)

will also be a compact. Thus, if

\dot{l} (\cdot) \in C

, it is restriction to

\dot{H} (ε)

will be uniformly continuous. So, whatever

δ > 0

, there exists

δ^{'} (δ) > 0

, such that, if

\dot{Y}, {\dot{Y}}^{'} \in \dot{H} (ε)

and

∥ \dot{Y} - {\dot{Y}}^{'} ∥ \leq δ^{'} (δ)

,

∥ \dot{l} (\dot{Y}) - \dot{l} ({\dot{Y}}^{'}) ∥ < δ

, where

∥ \cdot ∥

indicates the Euclidean norm of a vector.

Let

{\dot{E}}_{n} (δ)

and

{\dot{E}}_{n} (ε)

be events that occur when

∥ \dot{l} (\dot{Y}) - \dot{l} ({\dot{Y}}^{'}) ∥ < δ

and when

Y_{n} \in H (ε)

, respectively. We now establish:

Lemma 1.

P r ({\dot{E}}_{n} (δ)) \underset{n \to \infty}{\to} 1

.

Proof.

Since the restriction on

\dot{l} (\cdot)

to

\dot{H} (ε)

is uniformly continuous:

P r [{\dot{E}}_{n} (δ) | {\dot{E}}_{n} (ε)] \underset{n \to \infty}{\to} 1 .

Thus, we only have to point out that

ε

is arbitrary and that:

P r [{\dot{E}}_{n} (δ) | {\dot{E}}_{n} (ε)] - P r [{\dot{E}}_{n} (δ)] \leq 1 - P r [{\dot{E}}_{n} (ε)]

(12)

so that:

P r [{\dot{E}}_{n} (δ)] \geq P r [{\dot{E}}_{n} (δ) | {\dot{E}}_{n} (ε)] - (1 - P r [{\dot{E}}_{n} (ε)])

(13)

where

lim_{n \to \infty} (1 - P r [{\dot{E}}_{n} (ε)]) \leq ε

, whatever

ε > 0

, to establish the thesis. □

Since:

P r ({\dot{E}}_{n} (δ)) \underset{n \to \infty}{\to} 1, \forall δ > 0,

we have:

\{\begin{matrix} P r [(\dot{l} (Y_{n}^{+}) \leq z \pm ε 1_{m}) | {\dot{E}}_{n} (δ)] \underset{n \to \infty}{\to} P r [\dot{l} (Y_{n}^{+}) \leq (z \pm ε 1_{m})] = F_{\dot{l} (Y_{n}^{+})} (z \pm ε 1_{m}) \\ P r [(\dot{l} ({\dot{Y}}_{n}) \leq z \pm ε 1_{m}) | {\dot{E}}_{n} (δ)] \underset{n \to \infty}{\to} P r [\dot{l} ({\dot{Y}}_{n}) \leq (z \pm ε 1_{m})] = F_{\dot{l} ({\dot{Y}}_{n})} (z \pm ε 1_{m}) \end{matrix}

(14)

thus, also whatever

δ > 0

, there exists

\ddot{n} (ε)

such that, for

n \geq \ddot{n} (ε)

, we have:

F_{\dot{l} (Y_{n}^{+})} (z) = F_{\dot{l} (Y_{n})} (z | θ) \underset{n \to \infty}{\to} F_{\dot{l} (Y)} (z | θ)

(15)

as well as:

F_{\dot{l} (Y_{n}^{+})} (z) \underset{n \to \infty}{\to} F_{\dot{l} (Y)} (z | θ)

(16)

whenever

z

is a continuity point of

F_{\dot{l} (Y)}

. Thus, we have the parametrized continuous mapping theorem (PCMT).

If

\dot{l} (\cdot) \in C, F_{Y_{n}} \overset{}{\to} F_{Y} \in C,

and

θ_{n} \to_{n \to \infty}^{p} θ

, with

θ \in D

, a compact set:

F_{\dot{l} ({\dot{Y}}_{n})} (\cdot) \overset{}{\to} F_{l (Y)} (| θ)

(17)

Remark 1

(Corollary to PCMT ).

a: With $θ$ a parameter both for $F_{Y}$ and $F_{l (Y)}$ ;
b: when $F_{Y}$ is $N (0, V (θ))$ and $θ_{n} \overset{p}{\to} θ$ , we have:

$F_{\dot{l} ({\dot{Y}}_{n})} \overset{}{\to} N (0, G V (θ) G^{t})$

when $\dot{l} ({\dot{Y}}_{n}) = {[{(G Y_{n})}^{t}, θ_{n}^{t}]}^{t}$ with $G$ a matrix and $V (θ)$ the covariance matrix of the parameter, $θ$ .

This remark will be renamed as a corollary of PCMT (CPCMT), as can be seen in [6].

We now consider the case of a sequence of random vectors with the same limit distribution. The sequence

\{X_{n}\}

of random vectors is the mean stable when all its vectors have the same mean vector

μ

.

Between the mean stable sequences, we may establish an equivalence relation writing

\{X_{n}\} γ \{Y_{n}\}

if and only if:

S_{n} = sup \{∥ Y_{n} - X_{n} ∥\} \to_{n \to 0}^{s} 0

(18)

where

\overset{s}{\to}

means stochastic convergence, i.e., convergence in probability, as can be seen in [27,28]. We now establish:

Proposition 1.

If

S_{n} \overset{s}{\to} 0

and

F_{X_{n}} \to F^{\circ} \in C, then F_{Y_{n}} \to F^{\circ}

, whenever

\{X_{n}\} γ \{Y_{n}\}

.

Proof.

Let the vectors in

\{X_{n}\}

and

\{Y_{n}\}

have m components. Given

x \in R^{m}

, we consider the events

A_{n, 1} (ε) = \{X_{n} \leq x - ε 1_{m}\}, A_{n, 2} (ε) = \{Y_{n} \leq x\}, A_{n, 3} (ε) = \{X_{n} \leq ε + 1_{m}\}

and

B_{n} (ε) = \{S_{n} \leq ε\}

.

Taking

{\dot{A}}_{n, i} (ε) = A_{n, i} (ε) ⋂ B_{n} (ε)

, and ⇒ to indicate the implication, we have

{\dot{A}}_{n, 1} (ε) \Rightarrow {\dot{A}}_{n, 2} (ε) \Rightarrow {\dot{A}}_{n, 3} (ε)

. Thus, with

q_{n, i} (ε) = P r (A_{n, 1} (ε))

and

{\dot{q}}_{n, i} (ε) = P r ({\dot{A}}_{n, 1} (ε)), i = 1, 2, 3

, we have

{\dot{q}}_{n, 1} \leq {\dot{q}}_{n, 2} \leq {\dot{q}}_{n, 3}

.

Moreover, since

S_{n} \to_{n \to \infty}^{s} 0

, whatever

δ > 0

, there exist

n (ε, δ)

such that, for

n > n (ε, δ), P r (B_{n} (ε)) > 1 - δ

, so

{\dot{q}}_{n, i} (ε) \leq q_{n, i} (ε) \leq {\dot{q}}_{n, i} (ε) + δ

, and, since

δ

is arbitrary:

lim_{n \to \infty} {\dot{q}}_{n, i} (ε) = lim_{n \to \infty} q_{n, i} (ε), i = 1, 2, 3

(19)

so:

lim_{n \to \infty} q_{n, 1} (ε) \leq lim_{n \to \infty} q_{n, 2} (ε) \leq lim_{n \to \infty} q_{n, 3} (ε)

(20)

since

F^{\circ} \in C

, we have

lim_{n \to \infty} q_{n, 1} (ε) = F^{\circ} (x - ε 1_{n})

and

lim_{n \to \infty} q_{n, 3} (ε) = F^{\circ} (x + ε 1_{n})

, so:

F^{\circ} (x - ε 1_{n}) \leq lim_{n \to \infty} F_{Y_{n}} (x) \leq F^{\circ} (x + ε 1_{n})

(21)

and given

ε

is arbitrary and

x

may be whatever, from

F^{\circ} \in C

, we obtain:

F_{Y_{n}} (x) \underset{n \to \infty}{\to} F^{\circ} (x)

(22)

which completes the proof. □

We then consider normal limit distributions, starting with:

Proposition 2.

Given

G (\cdot) = \{g_{1} (\cdot), \dots, g_{w} (\cdot)\}

such that its component functions have gradients

g_{1} (\cdot), \dots, g_{w} (\cdot)

, Hessian matrices

{\underset{̲}{g}}_{1} (\cdot), \dots, {\underset{̲}{g}}_{w} (\cdot)

, and continuous second-order partial derivatives. Whatever the mean stable sequence

\{Z_{n}\}

with the invariant mean vector

μ

, taking

Y_{n} = G (Z_{n})

and

X_{n} = G (Z_{n})

with

G (\cdot) = {[g_{1} (\cdot), \dots, g_{w} (\cdot)]}^{t}

, we have

\{X_{n}\} γ \{Y_{n}\}

, whenever

\sqrt{n} (Z_{n} - μ_{n})

converges in distribution to

F^{\circ} \in C

.

Proof.

We have

g_{j} (Z_{n}) - g_{j} (μ) = g_{j} {(μ)}^{t} (Z_{n} - μ) + \frac{1}{2} {(Z_{n} - μ)}^{t} {\underset{̲}{g}}_{j} (θ_{n, j}) (Z_{n} - μ)

with

(θ_{n, j})

between

Z_{n}

and

μ

. Since

Z_{n} \overset{s}{\to} μ,

we also have

θ_{n, j} \overset{s}{\to} μ

. Then, with

Θ_{ε} (μ)

the radius

ε

sphere with centre

μ, P r (Z_{n}, θ_{n, j} ε Θ_{ε} (μ)) \overset{}{\to} 1, j = 1, \dots, m

. Now

∥ {\underset{̲}{g}}_{j} (z) ∥

is a continuous function of

z

so it will have a maximum

u_{j, ε}

in

Θ_{ε} (μ)

that will exceed the supremum of the spectral radius of

{\underset{̲}{g}}_{j} (μ)

in

Θ_{ε} (μ)

, so:

∥ \frac{1}{2} {(Z_{n} - μ)}^{t} {\underset{̲}{g}}_{j} (θ_{n, j}) (Z_{n} - μ) ∥ \leq u_{j, ε} {∥ Z_{n} - μ ∥}^{2}, j = 1, \dots, m

(23)

thus:

(g_{j} (Z_{n}) - g_{j} (μ_{n})) - g_{j} {(μ)}^{t} (Z_{n} - μ) \overset{s}{\to} 0, j = 1, \dots, m

(24)

and so:

∥ (G (Z_{n}) - G (μ)) - G {(μ)}^{t} (Z_{n} - μ) ∥ \overset{s}{\to} 0

(25)

and the thesis follows from Proposition 1. □

Corollary 3.

If

\sqrt{n} (Z_{n} - μ) \sim N (0, V)

, under the hypothesis of Propositions 1 and 2,

\sqrt{n} [G (Z_{n}) - G (μ)] \sim N [0, G (μ) V G {(μ)}^{t}]

.

Proof.

The thesis follows from Propositions 1 and 2 since the continuous mapping theorem, as can be seen in [23,24], implies that the limit distribution of

\sqrt{n} G (μ) (Z_{n} - μ)

is

N (0, G (μ) V G {(μ)}^{t})

. □

3. Confidence Ellipsoids

We start by establishing the following.

Proposition 4.

If

Y

—not necessarily normal—has a covariance matrix

C

, with a range space

Ω = R (C)

, and the mean vector

μ

, then:

P r (Y - μ \in Ω) = 1

(26)

Proof.

Let

α_{1}, \dots, α_{m}

constitute an orthonormal basis for the orthogonal complement,

Ω^{⊥}

, of

Ω

. Then,

α_{j}^{t} (Y - μ)

will have a null mean value and variance. Thus, according to the Bienaymé–Tchebycheff inequality:

P r (α_{j}^{t} (Y - μ) = 0) = 1, j = 1, \dots, m .

Therefore, we obtain, with

A

, the matrix with row vectors

α_{1}, \dots, α_{m}

:

\begin{matrix} P r (Y - μ \in Ω) & = & P r [A (Y - μ) = 0] \\ = & P r [⋂_{j = 1}^{m} (α_{j}^{t} (Y - μ) = 0)] \\ = & m - (m - 1) \\ = & 1, \end{matrix}

as follows from the Boole generalized inequalities, and so the thesis is established. □

We then have:

Lemma 2.

Given

B

is a positive semi-definite [definite] matrix with positive eigenvalues

ε_{1}, \dots, a n d ε_{h}

corresponding to eigenvectors

α_{1}, \dots, α_{m}

, we have

B = A^{t} D A

and

B^{+} = A^{t} D^{- 1} A

with + indicating the Moore–Penrose inverse,

D

the diagonal matrix with principal elements

ε_{1}, \dots, ε_{h}

and

A^{t} = [α_{1}, \dots, α_{h}]

.

Proof.

It is easy to show that

{BB}^{+}

and

B^{+} B

are asymmetrical and that

{BB}^{+} B = B

and

B^{+} {BB}^{+} = B^{+}

establish the thesis. □

We can now establish the following.

Proposition 5.

If

Y \sim N (μ, σ^{2} B)

, with

B^{+}

the Moore–Penrose inverse of B:

{(Y - μ)}^{t} B^{+} (Y - μ) \sim σ^{2} χ_{h}^{2}

where

χ_{h}^{2}

is a central chi-square distribution with

h = r a n k (B)

, degrees of freedom and

σ^{2}

is the variance of

Y

.

Proof.

As stated in Lemma 2, we have:

{(Y - μ)}^{t} B^{+} (Y - μ) = {\ddot{Y}}^{t} \ddot{Y} \sim σ^{2} χ_{h}^{2}

with

\ddot{Y} = D^{- \frac{1}{2}} A (Y - μ)

where

D^{- \frac{1}{2}}

is the diagonal matrix with principal elements

ε_{1}^{- 1 / 2}, \dots, ε_{h}^{- 1 / 2}

. We now only have to point out that

\ddot{Y} \sim N (0, σ^{2} I_{h})

to establish the thesis, where

I_{h}

is the identity matrix. □

We now consider confidence ellipsoids and simultaneous confidence intervals. Ellipsoids and their support planes are presented in [29]; the affine point of

x

belongs to the ellipsoid:

ξ (μ, B, r) = \{x : {(x - μ)}^{t} B^{+} (x - μ) \leq r\}

(27)

if and only if:

⋂_{v} (| v^{t} μ - v^{t} x | \leq \sqrt{r (v^{t} B v)}),

(28)

where

⋂_{v}

indicates that all possible vectors

v

are considered. We now establish:

Proposition 6.

If

Y \sim N (μ, B) :

P r [⋂_{v} (| v^{t} μ - v^{t} y | \leq \sqrt{x_{h, 1 - q} (v^{t} B v)})] = 1 - q,

(29)

with

x_{h, 1 - q}

, the (1-q)-th quantile of

χ_{h}^{2}

(the central chi-square with h degrees of freedom), when

r a n k (B) = h

.

Proof.

The proof for the case

Y \sim N (μ, B)

directly follows from the previous considerations. Thus, we only have to point out that

x \in ξ (μ, B, r)

is equivalent to, as can be seen in [29]:

⋂_{v} (| v^{t} μ - v^{t} x | \leq \sqrt{r (v^{t} B v)}) .

(30)

□

Since, as we saw

{(Y - μ)}^{t} B^{+} (Y - μ) \sim σ^{2} χ_{h}^{2}

when

Y \sim N (μ, σ^{2} B)

, we have

{(p - \tilde{p})}^{t} U^{+} (p) (p - \tilde{p}) \sim χ_{m - 1}^{2}

, because, as we shall see in the next section,

r a n k (U (p)) = m - 1

.

In the next section, we will obtain results on

(U (p))

that will be used to obtain chi-square confidence regions for

p

and through duality, test hypothesis on

p

.

4. Covariance Matrices

As we saw, for

\tilde{p}

, the limit covariance matrix of

\sqrt{n} ({\tilde{p}}_{m} - p)

is:

U (p) = D (p) - p p^{t}

(31)

where

p = (p_{1}, \dots, p_{m}), p_{j} > 0, j = 1, \dots, m

and

\sum_{j = 1}^{m} p_{j} = 1

. For the rank of the covariance matrix, we have:

\begin{matrix} r a n k (U (p)) & = & r a n k ({D (p) - pp}^{t}) \geq m - 1 \end{matrix}

since

r a n k (D (p)) = m

and

r a n k (p p^{t}) = 1

, as follows from, as can be seen in [30], page 46, that

| r a n k (A) - r a n k (B) | \leq r a n k (A + B)

. In addition to this,

U (p) 1 = 0

so

r a n k (U (p)) \leq m - 1

. Thus,

r a n k (U (p)) = m - 1

:

Matrix

U (p)

is a covariance matrix which, as can be seen in, [30], is positive semi-definite. There is therefore an orthogonal matrix

P (p)

and a diagonal matrix

D (v)

whose principal elements are the eigenvalues

v_{1}, \dots, v_{m}

of

U (p)

such that:

U (p) = P (p) D (v) P {(p)}^{t}

(32)

Since

r a n k (U (p)) = m - 1

, we may order its eigenvalues to have

v_{j} > 0, j = 1, \dots, m - 1

, and

v_{m} = 0

. With

D {(v)}^{\frac{1}{2}}

, the diagonal matrix with principal elements,

v_{1}^{1 / 2}, \dots, v_{m}^{1 / 2}

and:

U {(p)}^{\frac{1}{2}} = P (p) D {(v)}^{\frac{1}{2}} {P (p)}^{t}

(33)

we will have:

U (p) = {U (p)}^{\frac{1}{2}} {U (p)}^{\frac{1}{2}} .

(34)

We now establish:

Lemma 3.

If the

m \times m

matrices

M_{1} \dots M_{w}

are such that

M_{j}^{t} M_{j^{'}} = 0_{m \times m}

, when

j \neq j^{'}

, we have

r a n k (\sum_{j = 1}^{w} M_{j}) = \sum_{j = 1}^{w} r a n k (M_{j})

.

Proof.

With

g_{j} = r a n k (M_{j})

and

M_{j} = [m_{j, 1}, \dots, m_{j, m}], j = 1, \dots, w

, there will be

g_{j}

linearly independent column vectors

\{{\dot{m}}_{j, l}; l \in D_{j}\}

of

M_{j}, j = 1, \dots, w

. The vectors in

⋃_{j = 1}^{w} \{{\dot{m}}_{j, l}; l \in D_{j}\}

will be linearly independent, since, when

j \neq j^{'}

, they are orthogonal. Thus,

r a n k ([M_{1} \dots M_{w}]) \geq \sum_{j = 1}^{w} g_{j} = \sum_{j = 1}^{w} r a n k (M_{j})

. Moreover, if we join another column vector of

[M_{1} \dots M_{w}]

, say

m_{j^{'}, l^{'}}

, to the set

⋃_{j = 1}^{w} \{m_{j, l}; l \in D_{j}\}

, it will linearly depend on the

\{m_{j^{'}, l^{'}}^{'}; l^{'} \in C_{j^{'}}\}

. Thus, the vectors in the extended set will not be linearly independent. Thus,

r a n k ([M_{1} \dots M_{w}]) = \sum_{j = 1}^{w} g_{j} = \sum_{j = 1}^{w} r a n k (M_{j})

. □

Consider that

\sum_{j = 1}^{w} M_{j} = [M_{1} \dots M_{w}] (1_{w} ⨂ I_{m})

, with ⨂ indicating the Kronecker matrix product, as can be seen in [31], with

r a n k (1_{w} ⨂ I_{m}) = r a n k (1_{m}) r a n k (I_{m}) = m

. Thus, as can be seen in [30]:

\begin{matrix} r a n k (\sum_{j = 1}^{w} M_{j}) & \geq & r a n k ([M_{1} \dots M_{w}]) + r a n k (1_{w} ⨂ I_{m}) - m \\ = & r a n k ([M_{1}, \dots, M_{w}]) \\ = & \sum_{j = 1}^{w} r a n k (M_{j}) . \end{matrix}

and:

r a n k (\sum_{j = 1}^{w} M_{j}) \leq r a n k ([M_{1}, \dots, M_{w}]) = \sum_{j = 1}^{w} r a n k (M_{j})

(35)

so:

r a n k (\sum_{j = 1}^{w} M_{j}) = \sum_{j = 1}^{w} r a n k (M_{j}),

(36)

as we wished to establish.

Let

Q_{1}, \dots, Q_{w}

now be pairwise orthogonal orthogonal projection matrices (POOPM) with

Q_{1} = \frac{1}{m} 1_{m} 1_{n}^{t}

. Now,

U (p)

is

m \times m

with rank

m - 1

. Thus, its nullity space,

N (p)

will have a dimension 1 and since

α_{1} = \frac{1}{\sqrt{m}} 1_{m} \in N (p), α_{1} α_{1}^{t}

will be the orthogonal projection matrix on

N (p)

, since

U (p)

is symmetrical so its range space

R (p)

will be the orthogonal complement

N {(p)}^{⊥}

of

N (p)

. The orthogonal projection matrix

T (p)

on

R (p)

will then be:

T (p) = I_{m} - Q_{1} .

(37)

Thus, if

\sum_{j = 1}^{w} Q_{j} = I_{m}

, we will have

T (p) = \sum_{j = 2}^{w} Q_{j}

as well as

U (p) = T (p) U (p) = \sum_{j = 2}^{w} Q_{j} U (p)

(38)

and, according to Lemma 3:

m - 1 = r a n k (U (p)) = \sum_{j = 1}^{w} r a n k (Q_{j} U (p)) .

(39)

Now:

r a n k (Q_{j} U (p)) \leq r a n k (Q_{j}), j = 2, \dots, w

and

m - 1 = \sum_{j = 1}^{w} r a n k (Q_{j})

. Thus, we must have:

r a n k (Q_{j} U (p)) = r a n k (Q_{j}), j = 2, \dots, w

(40)

We now highlight that

U (p)

and

{U (p)}^{\frac{1}{2}}

have the same eigenvectors associated with positive eigenvalues. These eigenvectors constitute an orthonormal basis for

R (U (p)) = R ({U (p)}^{\frac{1}{2}})

. Thus:

T (p) {U (p)}^{\frac{1}{2}} = {U (p)}^{\frac{1}{2}}

(41)

and, reasoning as above, we obtain:

r a n k (Q_{j} {U (p)}^{\frac{1}{2}}) = r a n k (Q_{j}), j = 2, \dots, w

(42)

Matrices

Q_{1}, \dots, Q_{w}

naturally appear, as can be seen in [32], when there are factors that cross or groups of nested factors that cross. The sums of squares of effects and interactions of these factors are the

∥ A_{j} {Y ∥}^{2}

,

j = 2, . . ., w

, and

∥ A_{1} {Y ∥}^{2}

can be associated with the general mean.

5. Inference

5.1. Chi-Square Tests

According to the PCMT, when:

H_{0} (A) : Ap = 0

(43)

holds, the limit distribution of:

L_{n} (A) = n {(A {\tilde{p}}_{n})}^{t} {(A U ({\tilde{p}}_{n}) A^{t})}^{+} (A {\tilde{p}}_{n})

(44)

will be that of

χ_{r}^{2}

with

r = r a n k (AU ({\tilde{p}}_{n}) A^{t})

, since when

H_{0} (A)

holds:

\sqrt{n} A {\tilde{p}}_{n} \sim N (0, AU (p_{n}) A^{t})

(45)

and we also have

{\tilde{p}}_{n} \overset{p}{\to} p

. We thus have for

H_{0} (A)

a q limit level test with statistic

L_{n} (A)

and a critical value

x_{r, 1 - q}

, the

(1 - q) - t h

is a quantile of

χ_{r}^{2}

.

Moreover, under any alternative to

H_{0} (A)

:

H_{1} (A) : Ap = q

(46)

we have, whatever

K > 0

:

P r (L_{n} (A) > K) \underset{n \to \infty}{\to} 1

(47)

so the chi-square tests will be strongly consistent.

Let us assume that the probabilities

p_{1}, \dots, p_{m}

correspond to the treatments of a fixed effects model in which d factors, with

h_{1}, \dots, h_{d}

levels, cross (for instance, probabilities of cures for different treatments). We then have:

m = \prod_{l = 1}^{d} h_{l}

(48)

which, as can be seen in [32], tests the effects and interactions of the factors. These effects and interactions correspond to subsets of

\bar{d} = \{1, \dots, d\}

. Thus, the null set,

ϕ

, will correspond to the general mean value, if the set has one element, it will be associated with the effects of the factor whose index belongs to the set. Otherwise, if the set has more than one element, it will be associated with the interaction between the levels of the factors with those indices. The sets can be ordered by the indexes:

j (φ) = 1 + \sum_{l \in φ} 2^{l - 1}

(49)

Putting

φ_{j}

to indicate the

j - t h

set,

j = 1, \dots, 2^{d}

, we have, as can be seen in [30], the matrices:

A (φ_{j}) = ⨂_{j = 1}^{d} A_{l} (φ_{j}), j = 1, \dots, 2^{d}

(50)

where:

\{\begin{matrix} A_{l} (φ_{j}) = \frac{1}{\sqrt{h_{l}}} 1_{h_{l}}^{t}, l \notin φ_{j} j = 1, \dots, 2^{d} \\ A_{l} (φ_{j}) = T_{h_{l}}, l \in φ_{j} j = 1, \dots, 2^{d}, \end{matrix}

(51)

we then have, with

A_{j} = A (φ_{j}), j = 1, \dots, 2^{d}

:

g_{j} = r a n k (A_{j}) = \prod_{j = φ_{j}} (h_{j} - 1) j = 1, \dots, 2^{d}

(52)

Thus, for testing the:

H_{0, j} = H_{0} (A_{j}) : j = 1, \dots, 2^{d}

(53)

we have the statistic

L_{n} (A_{j})

, with

g_{j}

degrees of freedom,

j = 1, \dots, 2^{d}

.

Another interesting case is that of cross-nesting factors. The factors in the

h

-th group have

a_{h, 1}, \dots, a_{h, f_{h}}

levels,

h = 1, \dots, d

. There are then

b_{h, v} = \prod_{v^{'} = 1}^{v} a_{h, v^{'}}, v = 1, \dots, f_{h}

, combinations of levels of the first v factors in group h, and we also put

b_{h, 0} = 1, h = 1, \dots, d

. Each of the combinations contains

c_{h, v} = \prod_{v^{'} = v + 1}^{f_{h}} a_{h, v^{'}}, 0 \leq v s . < f_{h}

, or

c_{h, f_{h}} = 1

, for combinations of levels of the remaining factors, and we have the matrices:

\{\begin{matrix} A_{h, 0} = \frac{1}{\sqrt{c_{h, 0}}} 1_{c_{h, 0}}^{t}, h = 1, \dots, d \\ A_{h, j} = I_{b_{h}, j - 1} ⨂ T_{a_{h, j}} ⨂ \frac{1}{\sqrt{c_{h, j}}} 1_{c_{h, j}}^{t} j = 1, \dots, f_{h}, h = 1, \dots, d, \end{matrix}

(54)

where

T_{r}

is obtained by deleting the first row equal to

\frac{1}{\sqrt{r}} 1_{r}^{t}

of a

r \times r

orthogonal matrix and ⨂ indicates the Kronecker matrix product. These matrices have ranks:

\{\begin{matrix} g_{h, 0} = 1, h = 1, \dots, d \\ g_{h, j} = b_{h, j} - b_{h, j - 1} h = 1, \dots, d, \end{matrix}

(55)

where

b_{h, 0} = 1, h = 1, \dots, d

.

The effects and interactions in this cross-nesting are associated with the vectors

j = (j_{1}, \dots, j_{d})

with

j_{h} = 0, \dots, f_{h}, h = 1, \dots, d

. So

j

is associated with the matrices:

A_{j} = ⨂_{h = 1}^{d} A_{h, j_{h}}; j \in Γ

(56)

with ranks:

g_{j} = \prod_{h = 1}^{d} g_{h, j_{h}}; j \in Γ

(57)

and where

Γ = \{j; 0 \leq j_{h} \leq f_{h}, h = 1, \dots, d\}

.

To test the hypothesis:

H_{0, j} = H_{0, j} (A_{j}); j \in Γ

(58)

we have the statistic

L_{n} (A_{j})

with

g_{j}

degrees of freedom,

j \in Γ

.

5.2. Confidence Regions

According to the continuity of the Moore–Penrose inverses, as can be seen in [30], pp. 221–224, we have:

U {({\tilde{p}}_{n})}^{+} \underset{n \to \infty}{\to} U {(p)}^{+},

so the PCMT gives:

n {(p - {\tilde{p}}_{n})}^{t} U {({\tilde{p}}_{n})}^{+} (p - {\tilde{p}}_{n}) \sim χ_{m - 1}^{2}

(59)

as well as:

n {(A_{j} p - A_{j} {\tilde{p}}_{n})}^{t} {(A_{j} U ({\tilde{p}}_{n}) A_{j}^{t})}^{+} (A_{j} p - A_{j} {\tilde{p}}_{n}) \sim χ_{g_{j}}^{2}, j \in Γ

(60)

for models with factors crossing and cross-nesting.

Thus:

\{\begin{matrix} P r [n {(p - {\tilde{p}}_{n})}^{t} U {({\tilde{p}}_{n})}^{+} (p - {\tilde{p}}_{n}) \leq x_{m - 1, 1 - q}] \underset{n \to \infty}{\to} 1 - q \\ P r [n {(A_{j} p - A_{j} {\tilde{p}}_{n})}^{t} {(A_{j} U ({\tilde{p}}_{n}) A_{j}^{t})}^{+} (A_{j} p - A_{j} {\tilde{p}}_{n}) \leq x_{r (A_{j}), 1 - q}] \underset{n \to \infty}{\to} 1 - q, j = 2, \dots, w \\ P r [n {(A_{j} p - A_{j} {\tilde{p}}_{n})}^{t} {(A_{j} U ({\tilde{p}}_{n}) A_{j}^{t})}^{+} (A_{j} p - A_{j} {\tilde{p}}_{n}) \leq x_{r (A_{j}), 1 - q}] \underset{n \to \infty}{\to} 1 - q, j \in Γ \end{matrix}

(61)

We can now apply Proposition 6 to obtain:

\{\begin{matrix} P r [⋂_{v} (| v^{t} p - v^{t} {\tilde{p}}_{n} | \leq \sqrt{x_{n - 1, 1 - q} (v^{t} U ({\tilde{p}}_{n}) v)})] \underset{n \to \infty}{\to} 1 - q \\ P r [⋂_{v} (| v^{t} A_{j} p - v^{t} A_{j} {\tilde{p}}_{n} | \leq \sqrt{x_{g_{j}, 1 - q} (v^{t} A_{j} U ({\tilde{p}}_{n}) A_{j}^{t} v)})] \underset{n \to \infty}{\to} 1 - q, j = 2, \dots, w \\ P r [⋂_{v} (| v^{t} A_{j} p - v^{t} A_{j} {\tilde{p}}_{n} | \leq \sqrt{x_{g_{j}, 1 - q} (v^{t} A_{j} U ({\tilde{p}}_{n}) A_{j}^{t} v)})] \underset{n \to \infty}{\to} 1 - q, j \in Γ \end{matrix}

(62)

5.3. Non-Linear Statistics

We assume that the component functions

g_{l} (\cdot), l = 1, \dots, w

of

G (\cdot)

have continuous partial derivatives of the second order to apply Proposition 2 and its corollary in showing that:

\sqrt{n} (G ({\tilde{p}}_{n}) - G (p)) \sim N (0, G (p) U (p) {G (p)}^{t}) .

(63)

An interesting application of this result is that to log-linear models, as can be seen in [12], in which we use:

\{\begin{matrix} G ({\tilde{p}}_{n}) = \log ({\tilde{p}}_{n}) = (l o g {\tilde{p}}_{n, 1}, \dots, l o g {\tilde{p}}_{n, m}) \\ G (p) = \log (p) = (l o g p_{1}, \dots, l o g p_{m}) \end{matrix}

(64)

We now have:

G (p) = D (\frac{1}{p_{1}}, \dots, \frac{1}{p_{m}}) = D {(p)}^{- 1},

(65)

so:

G (p) U (p) G {(p)}^{t} = D {(p)}^{- 1} - 1_{m} 1_{m}^{t} = W (p),

(66)

since

D {(p)}^{- 1} p = 1

.

Moreover, since

D {(p)}^{- 1}

is invertible, taking:

U^{0} (p) = D {(p)}^{- 1} U (p) D {(p)}^{- 1}

(67)

we have:

r a n k (U^{0} (p)) = r a n k (U (p)) = m - 1 .

(68)

Thus:

U^{0} (p) (\frac{1}{∥ p ∥} p) = 0

(69)

so

\frac{1}{∥ p ∥} p

belonging to the nullity space

N^{0} (p)

of

U^{0} (p)

constitutes an orthonormal basis for that space. Since

U^{0} (p)

is symmetrical, its range space

R^{0} (p)

will be

N^{0} {(p)}^{⊥}

, and the orthogonal projection matrix on

R^{0} (p)

will be:

T^{0} (p) = I_{m} - \frac{1}{∥ p ∥} p p^{t}

(70)

Let

A^{0}

have row vectors that constitute an orthonormal basis for a sub-space

\nabla^{0}

. Putting:

\{\begin{matrix} l^{0} (p) = \log (p) \\ l^{0} ({\tilde{p}}_{n}) = \log ({\tilde{p}}_{n}) \end{matrix}

(71)

We have, according to the PCMT:

\sqrt{n} [A l^{0} ({\tilde{p}}_{n}) - Al (p)] \sim N (0, {AU}^{0} (p) A^{t}) .

Thus, to test:

H_{0} (A) : A l^{0} (p) = 0

(72)

we have the limit q level chi-square test with the statistic

n l^{0} {({\tilde{p}}_{n})}^{t} {({AU}^{0} (p) A^{t})}^{+} l^{0} ({\tilde{p}}_{n})

and the critical value

x_{r^{0} (A), 1 - q}

with

r^{0} (A) = r a n k ({AU}^{0} (p) A^{t})

. These tests will be strongly consistent as follows from

l^{0} ({\tilde{p}}_{n}) \to_{n \to \infty}^{p} l^{0} (p)

.

Moreover, we have the limit level

1 - q

confidence ellipsoids given by

\{\begin{matrix} P r [{(l^{0} (p) - l^{0} ({\tilde{p}}_{n}))}^{t} U^{0} {({\tilde{p}}_{n})}^{+} (l^{0} (p) - l^{0} ({\tilde{p}}_{n})) \leq x_{m - 1, 1 - q}] \underset{n \to \infty}{\to} 1 - q \\ P r [{({Al}^{0} (p) - {Al}^{0} ({\tilde{p}}_{n}))}^{t} {AU}^{0} {({\tilde{p}}_{n} A^{t})}^{+} ({Al}^{0} (p) - A l^{0} ({\tilde{p}}_{n})) \leq x_{r^{0} (A), 1 - q}] \underset{n \to \infty}{\to} 1 - q \end{matrix}

(73)

We can now apply Proposition 6 to obtain:

\{\begin{matrix} P r [⋂_{v} (| v^{t} l^{0} (p) - v^{t} l^{0} ({\tilde{p}}_{n}) | \leq \sqrt{x_{m - 1, 1 - q} (v^{t} U^{0} ({\tilde{p}}_{n}) v)})] \underset{n \to \infty}{\to} 1 - q \\ P r [⋂_{v} (| v^{t} {Al}^{0} (p) - v^{t} {Al}^{0} ({\tilde{p}}_{n}) | \leq \sqrt{x_{r^{0} (A), 1 - q} (v^{t} {AU}^{0} ({\tilde{p}}_{n}) A^{t} v)})] \underset{n \to \infty}{\to} 1 - q, j = 2, \dots, w \end{matrix}

(74)

5.4. Numerical Example

In this section, we apply our results, for non-linear statistic, to a dataset on coronary heart disease analysed by [12,33], with the log-linear model, a non-linear statistic. In all, a total of 1330 sick patients were categorized with respect to three variables, namely blood pressure, serum cholesterol and whether they had a coronary heart disease or not. The Blood pressure, which was the first variable had four categorical levels. The second variable, which was, Serum cholesterol, also had four categorical levels while the third variable, which indicated the presence of coronary heart disease, had two levels. So in all, the data had 32 classes with a total of 1330 sick patients. Kindly refer to Section 5.6 of [12] for details on the variables and their levels as well as a cross-classification of the frequencies for each category.

To proceed with the analysis by the application of our method, as can be seen in Equation (63), we firstly estimate the probabilities of each of the 32 classes. In order to apply the non-linear statistics, we calculated the logarithms of those estimates. As in Equation (63), the composite functions of the estimates were normally distributed with a null mean vector and a certain covariance matrix. We proceeded by evaluating our covariance matrix—firstly by determining the Jacobian matrix of the gradient of our composite function, as in Equation (65) and by Equation (66), and then determining the covariance matrix for our composite function.

To test the hypothesis of the absence of effects and interactions, we started by obtaining matrices

A

using orthogonal matrices.

The first orthogonal matrix we considered is:

P_{2} = [\begin{matrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \end{matrix}]

(75)

Since we have 32 classes, we build the

P

matrices up to

P_{32}

with Kronecker matrix products:

\begin{matrix} P_{4} & = & P_{2} ⨂ P_{2} \\ P_{8} & = & P_{2} ⨂ P_{4} \\ P_{16} & = & P_{2} ⨂ P_{8} \\ P_{32} & = & P_{2} ⨂ P_{16}, where ⨂ is the Kronecker product . \end{matrix}

After getting the orthogonal matrix

P_{32}

, we determined our

A

matrices. Since we have three factors, we have eight

A

matrices defined as follows.

Let the set containing the factors indexes be

φ = \{1, 2, 3\}

, then the subsets of

φ

and the corresponding factor effects and interactions are:

\begin{matrix} j = 1, φ = ⌀ & : & overall or general mean effect \\ j = 2, φ = \{1\} & : & effects of the first factor \\ j = 3, φ = \{2\} & : & effects of the second factor \\ j = 4, φ = \{1, 2\} & : & interactions between the first and \sec ond factors \\ j = 5, φ = \{3\} & : & effects of the third factor \\ j = 6, φ = \{1, 3\} & : & interactions between the first and third factors \\ j = 7, φ = \{2, 3\} & : & interactions between the \sec ond and third factors \\ j = 8, φ = \{1, 2, 3\} & : & interactions between all factors \end{matrix}

Now, the

A_{j}, j = 1, \dots, 8

are obtained from

P_{32}

as follows:

A_{1}

is the first row;

A_{2}

is the second to fourth rows;

A_{3}

is the fifth, ninth and thirteenth rows;

A_{4}

is the sixth, seventh, eighth, tenth, eleventh, twelfth, fourteenth, fifteenth and sixteenth rows. Similarly,

A_{5}

is the seventeenth row;

A_{6}

is the eighteenth, nineteenth and twentieth rows;

A_{7}

is the twenty-first, twenty-fifth and twenty-ninth rows; and

A_{8}

is the twenty-second, twenty-third, twenty-forth, twenty-sixth, twenty-seventh, twenty-eight, thirtieth, thirty-first and thirty-second rows.

Now, according to Equation (44), we have the covariance matrices:

V_{j} (p) = A_{j}^{t} W (p) A_{j}, j = 1, \dots, 8,

(76)

where

W (p)

is defined in Equation (66), for the:

Z_{j} (p) = A_{j} G (p), j = 1, \dots, 8,

(77)

where

G (p)

is given by Equation (64). The sum of squares

(SS)

for the statistic

Z_{j} (p), j = 1, \dots, 8

are given by

{SS}_{j} = n Z_{j} {(p)}^{t} V_{j} {(p)}^{+} Z_{j} (p), j = 1, \dots, 8

(78)

where n is the total number of our observations, and + indicates the Moore–Penrose inverses of the

V_{j} (p), j = 1, \dots, 8

. The

{SS}_{j}, j = 1, \dots, 8

are chi-squares with degrees of freedom the

r a n k (A_{j}), j = 1, \dots, 8

.

Table 1 is an ANOVA-like table that presents the results of our analysis of the coronary heart disease data. This gives the sources of variation of the general mean, the main effects and the interaction effects. It also presents the degrees of freedom and the sum of squares for these effects. The significance level of these effects are indicated in *. From Table 1, we see that the general mean is highly significantly different from 0. Moreover, the factors blood pressure, serum cholesterol and coronary heart disease as well as both interactions between the first and second factors with the third factor were highly significant. The interactions in which the first and second factors did partake were not significant. We pointed out that we were able to consider the interaction between the three factors by using our approach. In the classical analysis of the data, as can be seen in [12] (Section 5.6), this would not be possible.

Again, we applied our method to analyse the General Social Survey (GSS 2008). The General Social Survey (GSS) is a survey which conducts basic scientific research on the structure and development of American society with a data-collection program designed to both monitor societal change within the United States and to compare the United States with other nations [34]. We considered three categorical variables, “Education”, “Political party affiliation” and “Gender”. The “Education” variable had 5 categorical levels, while “Political party affiliation” and “Gender” has 7 and 2 categorical levels respectively, as analyzed by [7] in Section 3.2.4.

Just as in Table 1 and Table 2, an ANOVA-like table, presents the results of our analysis. It gives the sources of variation of the general mean, the main effects and the interaction effects for our GSS (2008) data. It also presents the degrees of freedom and the sum of square for these effects. The significance level of these effects is indicated in *. From the table, both factors 1 and 2 have significant effects as well as a significant interaction. The interactions between factors 2 and 3 is also significant. Factor 3 had neither significant effects nor interactions with the exception of its interaction with factor 2.

We can thus conclude that the core of significance was in factors 1 and 2 (education and political party affiliation) and that both genders behave similarly.

Comparing Procedures

Our procedure is based on the asymptotic distribution given by

\sqrt{n} ({\tilde{p}}_{n} - p) \sim N (0, U (p))

(79)

and by

\sqrt{n} (l ({\tilde{p}}_{n}) - l (\tilde{p})) \sim N (0, J (p) U (p) J {(p)}^{t})

(80)

where

J (\cdot)

is the Jacobian matrix of the gradients of

l (\cdot)

(the row vectors of

J (\cdot)

are the gradients of the components of

l (\cdot)

):

This enabled us to, given an orthogonal partition:

R^{m} = ⊞_{j = 1}^{d} \nabla_{j},

(81)

where m is the number of probabilities, test the hypotheses:

H_{0, j} : l (p) \in \nabla_{j}^{⊥} .

(82)

In our numerical example, we have:

\begin{matrix} \nabla (⌀) : general mean \\ \nabla ({1}) : null effect of the first factor \\ \nabla ({2}) : null effect of the second factor \\ \nabla ({3}) : null effect of the third factor \\ \nabla ({1, 2}) : null interaction of the first and second factors \\ \nabla ({1, 3}) : null interaction of the first and third factors \\ \nabla ({2, 3}) : null interaction of the second and third factors \\ \nabla ({1, 2, 3}) : null interaction of the three factors \end{matrix}

(83)

This way, we overcame the requirement of using hierarchical models.

6. Conclusions

Multinomial models having their limit distribution given by

\sqrt{n} ({\tilde{p}}_{n} - p) \sim N (0, U (p))

with

r a n k (U (p)) = m - 1

and

p

belonging to a compact set led us to establish the parametrized continuous mapping theorem (PCMT) to carry out inference for them. Namely, we obtained the confidence ellipsoids and confidence intervals. Moreover, we obtained chi-square tests for the hypothesis:

H_{0} (A) : Ap = 0

which led to ANOVA-like inference for both multinomial and log-linear models. We pointed out that now hierarchical assumptions made on log-linear models are no longer necessary and all effects and interactions can be tested without any restrictions. Actually, this was used in the two numerical examples we presented. In addition to this, the replacement of F-tests by chi-square ones increases the power of our inference by replacing a finite number of degrees of freedom for the error by an infinity of them.

Author Contributions

This research article was written while preparing my Ph.D. thesis. Thus, I performed most of the efforts while interacting with my supervisors who are also the co-authors of this article. Conceptualization, I.A. and J.T.M.; Data curation, I.A.; Formal analysis, I.A. and J.T.M.; Methodology, I.A., J.T.M. and F.J.M.; Project administration, F.J.M.; Software, I.A. and F.J.M.; Supervision, J.T.M. and F.J.M.; Validation, J.T.M.; Visualization, I.A. and F.J.M.; Writing—original draft, I.A.; Writing—review & editing, J.T.M. and F.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) through project UIDB/00297/2020.

Data Availability Statement

The data on coronary heart disease used in this article were obtained from [12,33].

Acknowledgments

I would like to especially acknowledge the much appreciated and extensive support from my supervisors, J.T.P.N. Mexia and Filipe Marques for their supervision, resources and in-depth contributions to my Ph.D. I am very grateful to them.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bishop, Y.M.; Fienberg, S.E.; Holl, P.W. Discrete Multivariate Analysis: Theory and Practice; Springer Science & Business Media: New York, NY, USA, 2007. [Google Scholar]
Agresti, A. Categorical Data Analysis, 3rd ed.; Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Agresti, A. Analysis of Ordinal Categorical Data, 2nd ed.; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar]
Tang, W.; He, H.; Tu, X.M. Applied Categorical and Count Data Analysis; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Fagerl, M.W.; Lydersen, S.; Laake, P. Statistical Analysis of Contingency Tables; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
Lloyd, C.J. Statistical Analysis of Categorical Data; Wiley: New York, NY, USA, 1999. [Google Scholar]
Kateri, M. Contingency Table Analysis, 1st ed.; Methods and Implementation Using R; Editorial Advisory Board: Aachen, Germany, 2014. [Google Scholar]
Kateri, M.; Balakrishnan, N. Statistical evidence in contingency tables analysis. J. Stat. Plan. Inference 2008, 138, 873–887. [Google Scholar] [CrossRef]
Birch, M.W. Maximum likelihood in three-way contingency tables. J. R. Stat. Soc. Ser. (Methodol.) 1963, 25, 220–233. [Google Scholar] [CrossRef]
Goodman, L.A. Interactions in multidimensional contingency tables. Ann. Math. Stat. 1964, 35, 632–646. [Google Scholar] [CrossRef]
Fienberg, S.E.; Rinaldo, A. Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. J. Stat. Plan. Inference 2007, 137, 3430–3445. [Google Scholar] [CrossRef]
Everitt, B.S. The Analysis of Contingency Tables; Chapman and Hall/CRC: New York, NY, USA, 2019; pp. 94–99. [Google Scholar]
Andersen, E.B. Introduction to the Statistical Analysis of Categorical Data; Springer Science & Business Media: New York, NY, USA, 2012. [Google Scholar]
Tutz, G. Regression for Categorical Data; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Fienberg, S.E.; Rinaldo, A. Maximum likelihood estimation in log-linear models. Ann. Stat. 2012, 40, 996–1023. [Google Scholar] [CrossRef] [Green Version]
Irel, C.T.; Kullback, S. Minimum discrimination information estimation. Biometrics 1968, 24, 707–713. [Google Scholar]
Grizzle, J.E.; Starmer, C.F.; Koch, G.G. Analysis of categorical data by linear models. Biometrics 1969, 25, 489–504. [Google Scholar] [CrossRef]
Fienberg, S.E. When did Bayesian inference become “Bayesian”? Bayesian Anal. Int. Soc. Bayesian Anal. 2006, 1, 1–40. [Google Scholar] [CrossRef]
Akoto, I.; Mexia, J.T.; Guerreiro, G.R. Discriminant analysis in discrete models: An application to HIV Treatment status. Symmetry 2021. submitted. [Google Scholar]
Rohatgi, V.K.; Saleh, A.K.M. An Introduction to Probability and Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2015; pp. 189–192. [Google Scholar]
Taboga, M. Lectures on Probability Theory and Mathematical Statistics; CreateSpace Independent Publishing Platform: North Charleston, SC, USA, 2012; pp. 431–438. [Google Scholar]
Wilks, S.S. Mathematical Statistics, 2nd ed.; John Willey & Sons: New York, NY, USA, 1962; p. 262. [Google Scholar]
Kallenberg, O. Foundations of Modern Probability, 2nd ed.; Springer: New York, NY, USA, 2010; p. 76. [Google Scholar]
Rao, J.N.K. On two simple schemes of unequal probability sampling without replacement. J. Indian Stat. Assoc. 1965, 3, 80–173. [Google Scholar]
Mukhopadhyay, P. Complex Surveys: Analysis of Categorical Data; Springer: New York, NY, USA, 2016; pp. 223–240. [Google Scholar]
DasGupta, A. Probability for Statistics and Machine Learning: Fundamentals and Advanced Topics; Springer Science & Business Media: New York, NY, USA, 2011; pp. 268–282. [Google Scholar]
Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000; Volume 3. [Google Scholar]
Loeve, M.M. Probability Theory I; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 1977; p. 151. [Google Scholar]
Scheffé, H. The Analysis of Variance; John Willey & Sons: New York, NY, USA, 1959; pp. 406–411. [Google Scholar]
Schott, J.R. Matrix Analysis for Statistics, 3rd ed.; John Willey & Sons: New York, NY, USA, 2016; p. 46. [Google Scholar]
Silvey, S.D. Statistical Inference; Reprinted; Chapman & Hall: New York, NY, USA, 1975. [Google Scholar]
Fonseca, M.; Mexia, J.; Zmyślony, R. Estimators and tests for variance components in cross nested orthogonal designs. Discuss. Math. Probab. Stat. 2003, 23, 175–201. [Google Scholar]
Ku, H.H.; Kullback, S. Log-linear models in contingency table analysis. Am. Stat. 1974, 28, 115–122. [Google Scholar]
National Opinion Research Center. General Social Survey; Inter-University Consortium for Political and Social Research: Ann Arbor, MI, USA, 2016. [Google Scholar] [CrossRef]

Table 1. ANOVA-like table for the coronary heart disease data.

Sources of Variation	Degree of Freedom	Sum of Squares
General mean, $μ$	1	4752.504 ***
1	3	30.289 ***
2	3	20.915 ***
1 × 2	9	5.623
3	1	365.871 ***
1 × 3	3	24.204 ***
2 × 3	3	22.284 ***
1 × 2 × 3	9	4.568

*** indicates “significance” at

0.005

levels. Source: author’s own calculations.

Table 2. ANOVA-like table for the GSS 2008 data.

Sources of Variation	Degree of Freedom	Sum of Squares
General mean, $μ$	1	49,387.730 ***
1	4	860.829 ***
2	6	84.692 ***
1 × 2	24	89.636 ***
3	1	$1.943$
1 × 3	4	$1.624$
2 × 3	6	17.466 **
1 × 2 × 3	24	27.902 ***

*** and ** indicate “significance” at

0.005

and

0.01

levels respectively. Source: author’s own calculations.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akoto, I.; Mexia, J.T.; Marques, F.J. Asymptotic Results for Multinomial Models. Symmetry 2021, 13, 2173. https://doi.org/10.3390/sym13112173

AMA Style

Akoto I, Mexia JT, Marques FJ. Asymptotic Results for Multinomial Models. Symmetry. 2021; 13(11):2173. https://doi.org/10.3390/sym13112173

Chicago/Turabian Style

Akoto, Isaac, João T. Mexia, and Filipe J. Marques. 2021. "Asymptotic Results for Multinomial Models" Symmetry 13, no. 11: 2173. https://doi.org/10.3390/sym13112173

APA Style

Akoto, I., Mexia, J. T., & Marques, F. J. (2021). Asymptotic Results for Multinomial Models. Symmetry, 13(11), 2173. https://doi.org/10.3390/sym13112173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotic Results for Multinomial Models

Abstract

1. Introduction

2. Limit Distributions

3. Confidence Ellipsoids

4. Covariance Matrices

5. Inference

5.1. Chi-Square Tests

5.2. Confidence Regions

5.3. Non-Linear Statistics

5.4. Numerical Example

Comparing Procedures

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI