1. Introduction
Classical Statistical Inference (cSI) is centered on minimizing the probabilities of errors, but in Statistical Decision Theory (SDT), minimizing the decision costs is the goal; see for instance [
1,
2]. In minimizing the probability of errors in cSI, or the decision costs in SDT, a desired property is to have our procedure to be consistent, that is, we would like the mass of the distribution of the sequence of the sample for the procedure to converge to the population constant as the sample size gets larger, see [
3,
4]. As far as we have searched prior to the writing of this manuscript, not many studies have been conducted on the consistency of decision costs for SDT as compared to that of the error probabilities in the case of cSI.
Multinomial distribution is often used in modeling categorical data because it describes the probability of a random observation being assigned to one of several mutually exclusive categories. Thus, having
n independent realizations of experiments with a finite or numerable set of incompatible results with probabilities
,
, the probabilities of obtaining
times the
ith result,
follows the multinomial distribution denoted as
with parameters
n and
. The probability density function of this distribution is
where
,
and
, see [
5,
6]. To avoid repetitions, we give the expression for the numerable case since the particularization to the finite set is direct.
In what follows, we take a double approach in the treatment of Multinomial Models, both classical and decision theory approaches. In our classical approach, we show that , with is a consistent estimator of . This result will play a central part in our paper. We point out that we use a finite sample to obtain a numerable family of jointly consistent estimators. We thus have consistent results in the fold of classical statistical inference.
Now considering a decision problem, let there be a family of possible decisions. For each of these decisions, we have a cost that depends on the results of the experiment. These results will have probabilities
. We thus have for the
ith result the costs
. The average cost for decision
will be
Assuming the
is known, we can use the estimators
where
is the number of times that in
n independent realizations of the experiment, we get the
ith result with cost
. We have the estimators for the costs as
We will show that , are jointly consistent even when we have a numerable set of possible results, thus the will also be consistent, .
If there is a decision
with least average cost and
is the one with the least estimated average cost where there are
n realizations of the experiment, we will show that
so that, see [
7], we will have consistency in decision taking into the setup of experiments with finite or numerable set of incompatible results.
In the subsequent sections, we consider the multinomial model and its estimator in
Section 2, where by limits distributions we show that the estimators of the multinomial model are consistent. In
Section 3, we develop the cost function by statistical decision theory for multinomial models and again show that this function has the property of consistency. A further extension for the cost function is presented in
Section 4.
2. Multinomial Models and Estimators
In this section, we obtain and consider estimators for the multinomial model. By limit distributions, see [
8], we show that the estimators are consistent.
In
, the space of vectors
with numerable sets of components such that
We can consider
as a norm, see [
9]. The sub-space
of
constituted by the vectors
with non-negative components that add up to 1 will be bounded since
. Given
, we have
and if
we have
since the components of
will be non-negative and add up to 1. Thus,
is compact since it is bounded and closed.
Let us put
as well as
in order to get
Besides this, we have the vector
whose components are the estimators
. Let us put
as well as
so that
With
for
, we get
since
when
. We also get
Now, by representing stochastic convergence by , we establish Proposition 1:
Proposition 1. The estimator is said to be a consistent estimator for since that is according to the Weak Law of Large numbers, see [10,11]. Proof. Taking
so that for
, we have
as well as
since
Now
since see [
7,
8,
12],
where
indicates convergence in distribution in this case to the normal distribution with null mean vector and covariance matrix
where
is the diagonal matrix whose principal elements are the components of
. Thus
and so
as well as
whenever
.
Now, we also have
if
, then
which establishes the thesis. □
Corollary 2. If is a continuous function of , we have The thesis follows from Proposition 1 and the Slutsky theorem, see [13,14]. Corollary 3. With the vector of the indexes of the h largest components of , we have see [10]if the largest components of are distinct. Proof. When the largest components of are distinct, will be a continuity point of and the thesis follows from Corollary 3. □
The results in this section belong to the study of consistency in classical Statistical Inference (cSI). These classical inferences are, in most instances, made without regard to the use to which they are to be put. In the next section, we go into consistency for Statistical Decision Theory.
3. Cost Function for Multinomial Models
In Statistical Decision Theory (SDT), the goal is to incorporate more than just sample data in order to arrive at the optimal decision, unlike cSI. The knowledge of the possible consequences of a decision is much incorporated and this knowledge is quantified as the cost incurred for each possible decision that is taken. According to [
15], Abraham Wald was the first person to thoroughly examine the inclusion of a cost function in statistical analysis.
The cost function represents the costs associated with taking a particular decision. It is a function that maps every possible decision and outcome to a real-valued cost. The cost function is used to evaluate the performance of various decision rules in terms of their expected cost. The goal of statistical decision theory is to identify the decision rule that minimizes the expected cost, see [
16].
Now, we go back to the decision problem we presented in the Introduction and consider the cost function for multinomial models.
Let
, be the cost for decision
, where
is the vector of probabilities and
are the estimated probabilities of the
n results. We will assume that this cost is the sum of two components, both non-negative,
, that in a given decision
, depends only on
, the vector of probabilities, and
that depends on the estimation errors. Namely, we take
with
, so
since, as we saw
we will have
Thus, for every
, the limit cost will be
. It is now easy to see that if there is
such that
with
as the decision with the least estimated cost. We have
Proposition 4. We have consistency for the cost function given by Equation (32) whenever Equation (33) holds. If the
largest components of
are obtained, as an alternative to Equation (
32), we may take
reobtaining Proposition 4, since
and so we continue to have
We may also take
with
a positive definite matrix, see [
8,
9], or
Thus, there is a wide range of possible cost functions, namely, with
, a continuous function
will, according to the Slutsky theorem, be a consistent estimator of
.
Moreover, if we have a cost function
where
is the cost that depends on
and
is continuous and such that
, we can use again the Slutsky theorem, to get
We thus extended our previous results on to any parameter given by a continuous function of , such as the cost function.
4. Extension on Cost Functions
In this section, we develop an extension for the previous
Section 3 to incorporate a more general form of our results on consistency for the cost function for multinomial models.
For instance, we consider
the sum of the probabilities of the results with indexes in
. These results may be of interest and so we are led to consider their joint probability. A direct extension of this case is given by
where we consider
k sets of results. The coefficients
value the relevances of the corresponding sets of results.
In general, let us have a succession
of observation vectors whose distributions depend on a parameter
for which we have a consistent estimator
. Then, if we have a cost function
with
a continuous function of
and
also continuous, and such that
we have
which implies consistency for the cost functions. Thus, the two consistent features display the relation we had already found for multinomial models. Namely, we obtain the following result:
Proposition 5. If we have a consistent estimator for a parameter θ, we have consistency for cost functionswhere is continuous with minimum . The extension behind this proposition, and getting consistent estimators for a numerable set of parameters, the components of , from a finite sample are maybe the most interesting features of our discussion.
5. Final Remark
In this study, based on limit distribution, by considering the vector of probabilities for the multinomial model, we showed, using classical Statistical Inference, that the estimators for the vector of probabilities are consistent. Due to the limitation of classical Statistical Inference but not incorporating the knowledge of the possible consequences of a decision, we used a Statistical Decision Theory approach to quantify the cost incurred for each possible decision by obtaining a cost function for the vector of probabilities. We showed that the estimators of cost function are consistent.
Our results on having consistency for the estimator of probabilities leads to consistency of decision function; in this, we hope to have opened an interesting line of work on multinomial and other models using Statistical Decision Theory.