Consistency of Decision in Finite and Numerable Multinomial Models

Akoto, Isaac; Mexia, João T.

doi:10.3390/math11112434

Open AccessArticle

Consistency of Decision in Finite and Numerable Multinomial Models

by

Isaac Akoto

^1,2,*,†

and

João T. Mexia

^1,†

¹

Center of Mathematics and Its Applications, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal

²

Department of Mathematics and Statistics, University of Energy and Natural Resources, Sunyani P. O. Box 214, Ghana

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(11), 2434; https://doi.org/10.3390/math11112434

Submission received: 14 April 2023 / Revised: 20 May 2023 / Accepted: 22 May 2023 / Published: 24 May 2023

(This article belongs to the Special Issue Probability, Statistics and Random Processes)

Download Versions Notes

Abstract

:

The multinomial distribution is often used in modeling categorical data because it describes the probability of a random observation being assigned to one of several mutually exclusive categories. Given a finite or numerable multinomial model

M (| n, p)

whose decision is indexed by a parameter

θ

and having a cost

c (θ, p)

depending on

θ

and on

p

, we show that, under general conditions, the probability of taking the least cost decision tends to 1 when n tends to ∞, i.e., we showed that the cost decision is consistent, representing a Statistical Decision Theory approach to the concept of consistency, which is not much considered in the literature. Thus, under these conditions, we have consistency in the decision making. The key result is that the estimator

{\tilde{p}}_{n}

with components

{\tilde{p}}_{n, i} = \frac{n_{i}}{n}, i = 1, \dots

, where

n_{i}

is the number of times we obtain the ith result when we have a sample of size n, is a consistent estimator of

p

. This result holds both for finite and numerable models. By this result, we were able to incorporate a more general form for consistency for the cost function of a multinomial model.

Keywords:

stochastic convergence; decision theory; estimators

MSC:

62F12; 62C05; 60F05

1. Introduction

Classical Statistical Inference (cSI) is centered on minimizing the probabilities of errors, but in Statistical Decision Theory (SDT), minimizing the decision costs is the goal; see for instance [1,2]. In minimizing the probability of errors in cSI, or the decision costs in SDT, a desired property is to have our procedure to be consistent, that is, we would like the mass of the distribution of the sequence of the sample for the procedure to converge to the population constant as the sample size gets larger, see [3,4]. As far as we have searched prior to the writing of this manuscript, not many studies have been conducted on the consistency of decision costs for SDT as compared to that of the error probabilities in the case of cSI.

Multinomial distribution is often used in modeling categorical data because it describes the probability of a random observation being assigned to one of several mutually exclusive categories. Thus, having n independent realizations of experiments with a finite or numerable set of incompatible results with probabilities

p_{i}

,

i = 1, \dots, \infty

, the probabilities of obtaining

N_{i} = n_{i}

times the ith result,

i = 1, \dots

follows the multinomial distribution denoted as

M (| n, p)

with parameters n and

p

. The probability density function of this distribution is

P [⋂_{i = 1}^{\infty} (N_{i} = n_{i}) | p] = \frac{n!}{\prod_{i = 1}^{\infty} n_{i}!} \prod_{i = 1}^{\infty} p_{i}^{n_{i}}

(1)

where

0^{0} = 1

,

p = (p_{1}, \dots, p_{i}, \dots)

and

n = \sum_{i = 1}^{\infty} n_{i}

, see [5,6]. To avoid repetitions, we give the expression for the numerable case since the particularization to the finite set is direct.

In what follows, we take a double approach in the treatment of Multinomial Models, both classical and decision theory approaches. In our classical approach, we show that

\tilde{p} = (\tilde{p_{1}}, \dots, \tilde{p_{i}}, \dots)

, with

{\tilde{p}}_{n, i} = \frac{n_{i}}{n}, i = 1, \dots,

is a consistent estimator of

p = (p_{1}, \dots, p_{i}, \dots)

. This result will play a central part in our paper. We point out that we use a finite sample to obtain a numerable family

\{{\tilde{p}}_{n, i}, i = 1, \dots\}

of jointly consistent estimators. We thus have consistent results in the fold of classical statistical inference.

Now considering a decision problem, let there be a family of possible decisions. For each of these decisions, we have a cost that depends on the results of the experiment. These results will have probabilities

p_{i}, i = 1, \dots

. We thus have for the ith result the costs

c_{i} (d), d \in D, i = 1, \dots

. The average cost for decision

d \in D

will be

c_{\cdot} (d) = \sum_{i = 1} p_{i} c_{i} (d), d \in D .

(2)

Assuming the

c_{i} (d), d \in D, i = 1, \dots

is known, we can use the estimators

{\tilde{p}}_{n, i} = \frac{n_{i}}{n}, i = 1, \dots,

(3)

where

n_{i}, i = 1, \dots

is the number of times that in n independent realizations of the experiment, we get the ith result with cost

c_{i} (d), d \in D, i = 1, \dots

. We have the estimators for the costs as

{\tilde{c}}_{n, \cdot} (d) = \sum_{i = 1} {\tilde{p}}_{n, i} c_{i} (d), d \in D .

We will show that

{\tilde{p}}_{n, i}, i = 1, \dots

, are jointly consistent even when we have a numerable set of possible results, thus the

{\tilde{c}}_{n, \cdot} (d)

will also be consistent,

d \in D

.

If there is a decision

d^{\circ}

with least average cost and

{\tilde{d}}_{n}^{\circ}

is the one with the least estimated average cost where there are n realizations of the experiment, we will show that

P r ({\tilde{d}}_{n}^{\circ} = d) \underset{n \to \infty}{\to} 1

so that, see [7], we will have consistency in decision taking into the setup of experiments with finite or numerable set of incompatible results.

In the subsequent sections, we consider the multinomial model and its estimator in Section 2, where by limits distributions we show that the estimators of the multinomial model are consistent. In Section 3, we develop the cost function by statistical decision theory for multinomial models and again show that this function has the property of consistency. A further extension for the cost function is presented in Section 4.

2. Multinomial Models and Estimators

In this section, we obtain and consider estimators for the multinomial model. By limit distributions, see [8], we show that the estimators are consistent.

In

X

, the space of vectors

v = (v_{1}, \dots, v_{i}, \dots)

with numerable sets of components such that

∥ v ∥_{1} = \sum_{i = 1}^{\infty} | v_{i} | < + \infty .

(4)

We can consider

∥ v ∥_{1}

as a norm, see [9]. The sub-space

Γ

of

X

constituted by the vectors

p

with non-negative components that add up to 1 will be bounded since

∥ p ∥_{1} = 1

. Given

p_{1}, p_{2} \in Γ

, we have

∥ p_{1} - p_{2} ∥_{1} \leq 2,

(5)

and if

∥ p_{n} - v ∥_{1} \underset{n \to \infty}{\to} 0

(6)

we have

v \in Γ

since the components of

v

will be non-negative and add up to 1. Thus,

Γ

is compact since it is bounded and closed.

Let us put

\{\begin{matrix} p (m) = \sum_{i = 1}^{m} p_{i} \\ p^{c} (m) = 1 - p (m) \end{matrix}

(7)

as well as

p (m) = (p_{1}, \dots, p_{m} + p^{c} (m), 0 \dots)

(8)

in order to get

\begin{matrix} {∥p - p (m)∥}_{1} & = p^{c} (m) + \sum_{i = m + 1}^{\infty} p_{i} \\ = 2 p^{c} (m) . \end{matrix}

(9)

Besides this, we have the vector

{\tilde{p}}_{n} = ({\tilde{p}}_{n, 1}, \dots, {\tilde{p}}_{n, m}, \dots)

(10)

whose components are the estimators

{\tilde{p}}_{n, i} = \frac{n_{i}}{n}, i = 1, \dots

. Let us put

\{\begin{matrix} {\tilde{p}}_{n} (m) = \sum_{i = 1}^{m} {\tilde{p}}_{n, i} \\ {\tilde{p}}_{n}^{c} (m) = 1 - {\tilde{p}}_{n} (m) \end{matrix}

(11)

as well as

{\tilde{p}}_{n} (m) = ({\tilde{p}}_{n, 1}, \dots, {\tilde{p}}_{n, m} + {\tilde{p}}_{n}^{c} (m), 0, \dots)

(12)

so that

{∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} = 2 {\tilde{p}}_{n}^{c} (m) .

(13)

With

{\tilde{m}}_{n} = min \{h : 0 = {\tilde{p}}_{n, h + 1} = {\tilde{p}}_{n, h + 2} = \dots\},

(14)

for

m \geq {\tilde{m}}_{n}

, we get

{∥({\tilde{p}}_{n} - {\tilde{p}}_{n} (m)) - (p - p (m))∥}_{1} = {∥p - p (m)∥}_{1} = 2 p^{c} (m)

(15)

since

{\tilde{p}}_{n} = {\tilde{p}}_{n} (m)

when

m \geq {\tilde{m}}_{n}

. We also get

{∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} \leq 2 {∥p - p (m)∥}_{1} \leq 4 p^{c} (m) .

(16)

Now, by representing stochastic convergence by

\to_{n \to \infty}^{s}

, we establish Proposition 1:

Proposition 1.

The estimator

{\tilde{p}}_{n}

is said to be a consistent estimator for

p

since

{\tilde{p}}_{n} \to_{n \to \infty}^{s} p,

(17)

that is according to the Weak Law of Large numbers, see [10,11].

Proof.

Taking

m (ε) = min \{m : p^{c} (m) \leq ε\}, \forall ϵ > 0 .

(18)

so that for

m > m (ε)

, we have

P (p_{n}^{c} (m) \leq 2 ε) \to_{n \to \infty}^{s} 1,

(19)

as well as

P ({∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} \leq 4 p^{c} (m)) \to_{n \to \infty}^{s} 1,

(20)

since

{∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} \leq 2 p^{c} (m) .

(21)

Now

{∥{\tilde{p}}_{n} (m) - p (m)∥}_{1} \to_{n \to \infty}^{s} 0,

(22)

since see [7,8,12],

\sqrt{n} ({\tilde{p}}_{n} (m) - p (m)) \overset{D}{\to} N (0, U (p (m)))

(23)

where

\overset{D}{\to}

indicates convergence in distribution in this case to the normal distribution with null mean vector and covariance matrix

U (p (m)) = D (p (m)) - p (m) p {(m)}^{t},

(24)

where

D (p (m))

is the diagonal matrix whose principal elements are the components of

p (m)

. Thus

P ({∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} \leq ε) \to_{n \to \infty}^{s} 1,

(25)

and so

P [({∥{\tilde{p}}_{n} - {\tilde{p}}_{n} (m)∥}_{1} \leq 4 ε) ⋂ ({∥{\tilde{p}}_{n} (m) - p (m)∥}_{1} \leq ε)] \to_{n \to \infty}^{s} 1,

(26)

as well as

P ({∥{\tilde{p}}_{n} - p (m)∥}_{1} \leq 5 ε) \to_{n \to \infty}^{s} 1,

(27)

whenever

m > m (ε)

.

Now, we also have

∥ p - p (m) ∥_{1} = 2 p_{n}^{c} \leq 2 ε,

(28)

if

m > m (ε)

, then

P ({∥{\tilde{p}}_{n} - p∥}_{1} \leq 7 ε) \to_{n \to \infty}^{s} 1,

(29)

which establishes the thesis. □

Corollary 2.

If

w (p)

is a continuous function of

p

, we have

w ({\tilde{p}}_{n}) \to_{n \to \infty}^{s} w (p) .

(30)

The thesis follows from Proposition 1 and the Slutsky theorem, see [13,14].

Corollary 3.

With

d_{h} (v)

the vector of the indexes of the h largest components of

v_{n} \in X

, we have see [10]

d_{h} ({\tilde{p}}_{n}) \to_{n \to \infty}^{s} d_{h} (p)

(31)

if the

h + 1

largest components of

p

are distinct.

Proof.

When the

h + 1

largest components of

p

are distinct,

p

will be a continuity point of

d_{h} (\cdot)

and the thesis follows from Corollary 3. □

The results in this section belong to the study of consistency in classical Statistical Inference (cSI). These classical inferences are, in most instances, made without regard to the use to which they are to be put. In the next section, we go into consistency for Statistical Decision Theory.

3. Cost Function for Multinomial Models

In Statistical Decision Theory (SDT), the goal is to incorporate more than just sample data in order to arrive at the optimal decision, unlike cSI. The knowledge of the possible consequences of a decision is much incorporated and this knowledge is quantified as the cost incurred for each possible decision that is taken. According to [15], Abraham Wald was the first person to thoroughly examine the inclusion of a cost function in statistical analysis.

The cost function represents the costs associated with taking a particular decision. It is a function that maps every possible decision and outcome to a real-valued cost. The cost function is used to evaluate the performance of various decision rules in terms of their expected cost. The goal of statistical decision theory is to identify the decision rule that minimizes the expected cost, see [16].

Now, we go back to the decision problem we presented in the Introduction and consider the cost function for multinomial models.

Let

c (p, {\tilde{p}}_{n}, d), d \in D

, be the cost for decision

d \in D

, where

p

is the vector of probabilities and

{\tilde{p}}_{n}

are the estimated probabilities of the n results. We will assume that this cost is the sum of two components, both non-negative,

c_{0} (p, d)

, that in a given decision

d \in D

, depends only on

p

, the vector of probabilities, and

c_{1} (p, \tilde{p_{n}}, d)

that depends on the estimation errors. Namely, we take

c (p, {\tilde{p}}_{n}, d) = c_{0} (p, d) + k_{d} (∥ {\tilde{p}}_{n} - p ∥_{1}), d \in D

(32)

with

k_{d} > 0, d \in D

, so

c_{0} (p, d) = min \{c (p, {\tilde{p}}_{n}, d)\}, d \in D

since, as we saw

{\tilde{p}}_{n} \to_{n \to \infty}^{s} p

we will have

c (p, {\tilde{p}}_{n}, d) \to_{n \to \infty}^{s} c_{0} (p, d), d \in D .

(33)

Thus, for every

d \in D

, the limit cost will be

c_{0} (p, d)

. It is now easy to see that if there is

d (p)

such that

c_{0} (p, d^{0} (p)) \leq c_{0} (p, d), d \in D,

with

{\tilde{d}}_{n}^{0} (p)

as the decision with the least estimated cost. We have

P r ({\tilde{d}}_{n}^{0} (p) = d^{0} (p)) \to_{n \to \infty}^{s} 1

Proposition 4.

We have consistency for the cost function given by Equation (32) whenever Equation (33) holds.

If the

h + 1

largest components of

p

are obtained, as an alternative to Equation (32), we may take

c (p, {\tilde{p}}_{n}) = c_{0} (p) + c_{1} (∥ d_{h} ({\tilde{p}}_{n}) - d_{h} (p) ∥_{1}),

(34)

reobtaining Proposition 4, since

∥ d_{h} ({\tilde{p}}_{n}) - d_{h} (p) ∥_{1} \to_{n \to \infty}^{s} 0

(35)

and so we continue to have

c (p, {\tilde{p}}_{n}) \to_{n \to \infty}^{s} c_{0} (p) .

(36)

We may also take

c_{1} ({\tilde{p}}_{n} - p) = {({\tilde{p}}_{n} - p)}^{t} M ({\tilde{p}}_{n} - p)

(37)

with

M

a positive definite matrix, see [8,9], or

c_{1} ({\tilde{p}}_{n} - p) = {(d_{h} ({\tilde{p}}_{n}) - d_{h} (p))}^{t} M (d_{h} ({\tilde{p}}_{n}) - d_{h} (p)) .

(38)

Thus, there is a wide range of possible cost functions, namely, with

g (\cdot)

, a continuous function

{\tilde{γ}}_{n} = g ({\tilde{p}}_{n})

(39)

will, according to the Slutsky theorem, be a consistent estimator of

γ = g (p)

.

Moreover, if we have a cost function

c (γ, {\tilde{γ}}_{n}) = c_{0} (γ) + c_{1} (∥ {\tilde{γ}}_{n} - γ ∥_{1}),

(40)

where

c_{0} (γ)

is the cost that depends on

γ

and

c_{1} (\cdot)

is continuous and such that

c_{1} (0) = 0

, we can use again the Slutsky theorem, to get

c (γ, {\tilde{γ}}_{n}) \to_{n \to \infty}^{s} c_{0} (γ) .

(41)

We thus extended our previous results on

p

to any parameter given by a continuous function of

p

, such as the cost function.

4. Extension on Cost Functions

In this section, we develop an extension for the previous Section 3 to incorporate a more general form of our results on consistency for the cost function for multinomial models.

For instance, we consider

γ (φ) = \sum_{j \in φ} p_{j}

(42)

the sum of the probabilities of the results with indexes in

φ

. These results may be of interest and so we are led to consider their joint probability. A direct extension of this case is given by

γ (φ_{1}, \dots, φ_{k}) = \sum_{l = 1}^{k} a_{l} \sum_{j \in φ_{l}} p_{j}

(43)

where we consider k sets of results. The coefficients

a_{1}, \dots, a_{k}

value the relevances of the corresponding sets of results.

In general, let us have a succession

\{Y_{n}\}

of observation vectors whose distributions depend on a parameter

θ

for which we have a consistent estimator

{\tilde{θ}}_{n}

. Then, if we have a cost function

c (θ, {\tilde{θ}}_{n}) = c_{0} (θ) + c_{1} (∥ {\tilde{θ}}_{n} - θ ∥_{1})

(44)

with

c_{0} (θ)

a continuous function of

θ

and

c_{1} (\cdot)

also continuous, and such that

min \{c_{1} {(∥ z ∥}_{1})\} = c_{1} (0) = 0,

(45)

we have

c (θ, {\tilde{θ}}_{n}) \to_{n \to \infty}^{s} c_{0} (θ)

(46)

which implies consistency for the cost functions. Thus, the two consistent features display the relation we had already found for multinomial models. Namely, we obtain the following result:

Proposition 5.

If we have a consistent estimator

{\tilde{θ}}_{n}

for a parameter θ, we have consistency for cost functions

c (θ, {\tilde{θ}}_{n}) = c_{0} (θ) + c_{1} (∥ {\tilde{θ}}_{n} - θ ∥_{1})

where

c_{1} (\cdot)

is continuous with minimum

c_{1} (0)

.

The extension behind this proposition, and getting consistent estimators for a numerable set of parameters, the components of

p

, from a finite sample are maybe the most interesting features of our discussion.

5. Final Remark

In this study, based on limit distribution, by considering the vector of probabilities for the multinomial model, we showed, using classical Statistical Inference, that the estimators for the vector of probabilities are consistent. Due to the limitation of classical Statistical Inference but not incorporating the knowledge of the possible consequences of a decision, we used a Statistical Decision Theory approach to quantify the cost incurred for each possible decision by obtaining a cost function for the vector of probabilities. We showed that the estimators of cost function are consistent.

Our results on having consistency for the estimator of probabilities leads to consistency of decision function; in this, we hope to have opened an interesting line of work on multinomial and other models using Statistical Decision Theory.

Author Contributions

Conceptualization, I.A. and J.T.M.; methodology, I.A. and J.T.M.; software, I.A.; formal analysis, J.T.M.; writing—original draft, I.A.; writing—review & editing, I.A. and J.T.M.; supervision, J.T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All relevant data are within the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Le Cam, L. Asymptotic Methods in Statistical Decision Theory, 2nd ed.; Springer: New York, NY, USA, 2012. [Google Scholar]
Liese, F.; Miescke, K.-J. Statistical Decision Theory: Estimation, Testing, and Selection; Springer Science and Business Media: New York, NY, USA, 2008; pp. 1–52. [Google Scholar]
Berger, C.; Casella, G.T. Statistical Inference, 2nd ed.; Cengage Learning: Duxbury, CA, USA, 2005; pp. 232–233. [Google Scholar]
Hogg, R.V.; McKean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 6th ed.; Pearson Education, Inc.: Upper Saddle River, NJ, USA, 2005; pp. 204–205. [Google Scholar]
Rohatgi, V.K.; Saleh, A.K.M.E. An Introduction to Probability and Statistics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015; pp. 189–191. [Google Scholar]
Evans, M.; Hastings, N.; Peacock, B.; Forbes, C. Statistical Distributions, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2011; pp. 135–136. [Google Scholar]
Wilks, S.S. Mathematical Statistics, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1962; p. 261. [Google Scholar]
Akoto, I.; Mexia, J.T.; Marques, F.J. Asymptotic results for multinomial modesls. Symmetry 2021, 13, 2173. [Google Scholar] [CrossRef]
Schott, R.J. Matrix Analysis for Statistics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016; p. 12. [Google Scholar]
Akoto, I. Asymptotic Treatment for Multinomial Models and Applications. Ph.D. Thesis, NOVA University Lisbon, School of Science and Technology, Caparica, Portugal, 2022. [Google Scholar]
Resnick, S. A Probability Path, 2005th ed.; Springer Science and Business Media: New York, NY, USA, 2019; pp. 204–208. [Google Scholar]
Van der Vaart, A.W. Asymptotic Statistics, 2nd ed.; Cambridge University Press: Cambridge, UK, 2000; pp. 25–34. [Google Scholar]
Kallenberg, O. Foundations of Modern Probability, 2nd ed.; Springer: Jersey City, NJ, USA, 1997; p. 571. [Google Scholar]
Slutsky, E.E. Qualche proposizione relative alla teoria delle funzioni aleatorie. Giorn. Dell’Istituto Ital. Degli Attuari 1937, 8, 183–199. [Google Scholar]
Wald, A. Statistical decision functions. Ann. Math. Stat. 1949, 20, 165–205. [Google Scholar] [CrossRef]
Berger, J.O. Statistical Decision Theory and Bayesian Analysis, 2nd ed.; Springer Science and Business Media: New York, NY, USA, 1985; pp. 1–2. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akoto, I.; Mexia, J.T. Consistency of Decision in Finite and Numerable Multinomial Models. Mathematics 2023, 11, 2434. https://doi.org/10.3390/math11112434

AMA Style

Akoto I, Mexia JT. Consistency of Decision in Finite and Numerable Multinomial Models. Mathematics. 2023; 11(11):2434. https://doi.org/10.3390/math11112434

Chicago/Turabian Style

Akoto, Isaac, and João T. Mexia. 2023. "Consistency of Decision in Finite and Numerable Multinomial Models" Mathematics 11, no. 11: 2434. https://doi.org/10.3390/math11112434

APA Style

Akoto, I., & Mexia, J. T. (2023). Consistency of Decision in Finite and Numerable Multinomial Models. Mathematics, 11(11), 2434. https://doi.org/10.3390/math11112434

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consistency of Decision in Finite and Numerable Multinomial Models

Abstract

1. Introduction

2. Multinomial Models and Estimators

3. Cost Function for Multinomial Models

4. Extension on Cost Functions

5. Final Remark

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI