1. Introduction
In several fields of study such as health, business, social sciences and education, the outcomes of variables are mainly discrete, i.e., the variables only take finite or countable numbers. A discrete variable whose outcome only takes finite numbers is called a categorical variable [
1]. A categorical variable consists of a set of categories that are non-overlapping [
2] and the outcome could be binary (dichotomous), i.e., with just two possible levels, such as “present” or “absent” as a desired condition, or polytomous, i.e., with more than two levels, as is the case of the “Likert” scale [
3]. There are two common types of polytomous variables, as can be seen in [
4], which are the ordinal and nominal scale of measurement. Categorical variables, such as one’s eye colour, ethnicity and affiliations, the categories of which cannot be ordered in any way, are nominal, while categories such as the level of resistance to a drug of a patient, the level of education and economic status exhibit a natural order and are thus ordinal.
In a study in which all the observed variables are categorical, the most common way of representing the data is in a contingency table, which is a cross-tabulation of the variables [
5,
6]. When there are
m-variables, the contingency table is an
m-dimensional table—also known as a multidimensional table when the attributes are more than two. The information of a contingency table is mainly summarized through appropriate measures such as measures of association or models. Association measures, although easy in their computation and interpretation, lead to a great loss of information, as can be seen in [
7]. Models are preferred in the case where a more sensitive analysis is required. A model is a “theory” or a conceptual framework about observations, and the parameters in the model represent the “effects” that particular variables or combinations of variables have in determining the values taken by the observations.
The easiest and most common model for a contingency table is the log-linear model [
8]. It is constructed by taking the natural logarithms of the cell probabilities by the analogy of the analysis of variance (ANOVA) models, as can be seen in [
9,
10,
11]. Classical log-linear models are sometimes regarded in the framework of the generalized linear model (GLM). They are also important in connection with contingency matrices, as can be seen in [
12]. Contemporary problems in categorical data analysis with extremely high-dimensional data with demanding computational procedures require the development of complex models. Much work has been done on the modelling of categorical data, as can be seen in [
7,
13]. For example, in [
14], the author used regression models for modelling categorical data. In our work, we derived new asymptotic results that will enable us to obtain confidence ellipsoids and simultaneous confidence intervals, respectively, for the vector of probabilities and its components, which will enable us to overcome some inference limitations of the existing procedures.
Inferential statistical analysis requires assumptions about the probability distribution of the response variable. For categorical data, the main distribution is the multinomial distribution. Most of the time, categorical data result from
n-independent and identical trials with each trial having two or more possible outcomes. When the
n is identical and independent trials have the same category probabilities, then the distribution of counts in the various categories is the multinomial distribution. The binomial distribution is a special case of the multinomial distribution with just two possible outcomes for each trial. Usually, the parameters of the multinomial distribution are not known and these parameters are often estimated from the sample data by several estimation methods such as the maximum likelihood estimation (MLE), as can be seen, for instance, in [
15], the minimum discrimination information (MDI) [
16], weighted least squares (WLS) [
17] and Bayesian estimation (BA) [
18]. In a previous study [
19], we wanted to minimize the average cost so we used statistical decision theory (SDT) since there were only a finite number of possible choices. We point out that we achieved consistency since the probability of selecting the choice with the least average cost tends towards 1 and when the sample size tends towards infinity.
If we have
n realizations of an experiment with
m possible results with probabilities
, we have the probability mass function, as can be seen in [
20,
21]:
for the vector
of the times we obtain the different results. This probability mass function corresponds to the singular multinomial distribution
. We name as multinomial the models for describing these sets of independent realizations of experiments with a finite number of results:
For the vector with
of probabilities, we have the vector of estimators:
with:
Moreover, as can be seen in [
22], as
:
where ∼ indicates the limit distribution, in this case
, the normal distribution with the null mean vector and covariance matrix:
where
is the diagonal matrix with principal elements
. This result will play an important role in the asymptotic treatment of the multinomial models which is this paper’s goal.
To carry out that asymptotic treatment, we start by obtaining a convenient version of the continuous mapping theorem [
23] in the next section on limit distributions. Then, we obtained confidence regions in
Section 3, namely the confidence ellipsoids and simultaneous confidence intervals. Then, in
Section 4, we studied the algebraic structure of the limit covariance matrix,
.
In
Section 5, we obtained chi-square tests for hypotheses on outcome probabilities and confidence ellipsoids and simultaneous confidence intervals for them. We also considered log-linear models for which we presented a numerical application. We pointed out that our approach to these models overcame the hierarchical restriction used to analyse multidimensional contingency tables.
Our use of both the classical and the new version of the parametrized continuous mapping theorem (PCMT) enabled us to carry out statistical inference for multinomial models. This inference was similar to ANOVA and related techniques but F-tests were replaced by chi-square tests which is highly convenient since now we have an infinity of degrees of freedom for the error.
Finally, we stress the close relationship between our ANOVA-like inference using the chi-square test and the usual treatment of fixed effect models. We point out that the F-test in that treatment had interesting invariance properties that expressed the symmetry of those models, especially since those models are associated with orthogonal partitions or sub-spaces which are invariant for rotation.
2. Limit Distributions
Let
be the class of continuous functions. If
, and the distribution
, of
converges to
(that is
, whenever
is a continuity point of
), we have, as can be seen in [
23,
24,
25,
26]:
as follows from the continuous mapping theorem.
If
is obtained superposing sub-vectors
and
, we put
. Then, if
, with
belonging to a compact set
, and
, putting
,
and
to show that:
it is only needed to show that, as can be seen in [
27]:
since
With
and
, the probability measures associated with
and
and representing the Cartesian product by ×, whatever
, there exists a parallelepiped:
with
, such that
. Since
, we have
and so there will be
such that, for
:
Now:
will also be a compact. Thus, if
, it is restriction to
will be uniformly continuous. So, whatever
, there exists
, such that, if
and
,
, where
indicates the Euclidean norm of a vector.
Let and be events that occur when and when , respectively. We now establish:
Lemma 1. .
Proof. Since the restriction on
to
is uniformly continuous:
Thus, we only have to point out that
is arbitrary and that:
so that:
where
, whatever
, to establish the thesis. □
Since:
we have:
thus, also whatever
, there exists
such that, for
, we have:
as well as:
whenever
is a continuity point of
. Thus, we have the parametrized continuous mapping theorem (PCMT).
If
and
, with
, a compact set:
Remark 1 (Corollary to PCMT ).
- a
With a parameter both for and ;
- b
when is and , we have:when with a matrix and the covariance matrix of the parameter, .
This remark will be renamed as a corollary of PCMT (CPCMT), as can be seen in [6]. We now consider the case of a sequence of random vectors with the same limit distribution. The sequence of random vectors is the mean stable when all its vectors have the same mean vector .
Between the mean stable sequences, we may establish an equivalence relation writing
if and only if:
where
means stochastic convergence, i.e., convergence in probability, as can be seen in [
27,
28]. We now establish:
Proposition 1. If and , whenever .
Proof. Let the vectors in and have m components. Given , we consider the events and .
Taking , and ⇒ to indicate the implication, we have . Thus, with and , we have .
Moreover, since
, whatever
, there exist
such that, for
, so
, and, since
is arbitrary:
so:
since
, we have
and
, so:
and given
is arbitrary and
may be whatever, from
, we obtain:
which completes the proof. □
We then consider normal limit distributions, starting with:
Proposition 2. Given such that its component functions have gradients , Hessian matrices , and continuous second-order partial derivatives. Whatever the mean stable sequence with the invariant mean vector , taking and with , we have , whenever converges in distribution to .
Proof. We have
with
between
and
. Since
we also have
. Then, with
the radius
sphere with centre
. Now
is a continuous function of
so it will have a maximum
in
that will exceed the supremum of the spectral radius of
in
, so:
thus:
and so:
and the thesis follows from Proposition 1. □
Corollary 3. If , under the hypothesis of Propositions 1 and 2, .
Proof. The thesis follows from Propositions 1 and 2 since the continuous mapping theorem, as can be seen in [
23,
24], implies that the limit distribution of
is
. □
3. Confidence Ellipsoids
We start by establishing the following.
Proposition 4. If —not necessarily normal—has a covariance matrix , with a range space , and the mean vector , then: Proof. Let
constitute an orthonormal basis for the orthogonal complement,
, of
. Then,
will have a null mean value and variance. Thus, according to the Bienaymé–Tchebycheff inequality:
Therefore, we obtain, with
, the matrix with row vectors
:
as follows from the Boole generalized inequalities, and so the thesis is established. □
We then have:
Lemma 2. Given is a positive semi-definite [definite] matrix with positive eigenvalues corresponding to eigenvectors , we have and with + indicating the Moore–Penrose inverse, the diagonal matrix with principal elements and .
Proof. It is easy to show that and are asymmetrical and that and establish the thesis. □
We can now establish the following.
Proposition 5. If , with the Moore–Penrose inverse of B:where is a central chi-square distribution with , degrees of freedom and is the variance of . Proof. As stated in Lemma 2, we have:
with
where
is the diagonal matrix with principal elements
. We now only have to point out that
to establish the thesis, where
is the identity matrix. □
We now consider confidence ellipsoids and simultaneous confidence intervals. Ellipsoids and their support planes are presented in [
29]; the affine point of
belongs to the ellipsoid:
if and only if:
where
indicates that all possible vectors
are considered. We now establish:
Proposition 6. If with , the (1-q)-th quantile of (the central chi-square with h degrees of freedom), when . Proof. The proof for the case
directly follows from the previous considerations. Thus, we only have to point out that
is equivalent to, as can be seen in [
29]:
□
Since, as we saw when , we have , because, as we shall see in the next section, .
In the next section, we will obtain results on that will be used to obtain chi-square confidence regions for and through duality, test hypothesis on .
4. Covariance Matrices
As we saw, for
, the limit covariance matrix of
is:
where
and
. For the rank of the covariance matrix, we have:
since
and
, as follows from, as can be seen in [
30], page 46, that
. In addition to this,
so
. Thus,
:
Matrix
is a covariance matrix which, as can be seen in, [
30], is positive semi-definite. There is therefore an orthogonal matrix
and a diagonal matrix
whose principal elements are the eigenvalues
of
such that:
Since
, we may order its eigenvalues to have
, and
. With
, the diagonal matrix with principal elements,
and:
we will have:
We now establish:
Lemma 3. If the matrices are such that , when , we have .
Proof. With and , there will be linearly independent column vectors of . The vectors in will be linearly independent, since, when , they are orthogonal. Thus, . Moreover, if we join another column vector of , say , to the set , it will linearly depend on the . Thus, the vectors in the extended set will not be linearly independent. Thus, . □
Consider that
, with ⨂ indicating the Kronecker matrix product, as can be seen in [
31], with
. Thus, as can be seen in [
30]:
and:
so:
as we wished to establish.
Let
now be pairwise orthogonal orthogonal projection matrices (POOPM) with
. Now,
is
with rank
. Thus, its nullity space,
will have a dimension 1 and since
will be the orthogonal projection matrix on
, since
is symmetrical so its range space
will be the orthogonal complement
of
. The orthogonal projection matrix
on
will then be:
Thus, if
, we will have
as well as
and, according to Lemma 3:
Now:
and
. Thus, we must have:
We now highlight that
and
have the same eigenvectors associated with positive eigenvalues. These eigenvectors constitute an orthonormal basis for
. Thus:
and, reasoning as above, we obtain:
Matrices
naturally appear, as can be seen in [
32], when there are factors that cross or groups of nested factors that cross. The sums of squares of effects and interactions of these factors are the
,
, and
can be associated with the general mean.