Next Article in Journal
Sandpile Universality in Social Inequality: Gini and Kolkata Measures
Next Article in Special Issue
Quantum Distance Measures Based upon Classical Symmetric Csiszár Divergences
Previous Article in Journal
Downward-Growing Neural Networks
Previous Article in Special Issue
About the Entropy of a Natural Number and a Type of the Entropy of an Ideal
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Distribution of Certain Types of Entropy under the Multinomial Law

1
Signal and Image Processing Center, Universidad Tecnológica Nacional, Ciudad Autónoma de Buenos Aires C1179AAQ, Argentina
2
School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
3
Department of Mathematics, FaCENA, Universidad Nacional del Nordeste and CONICET, Corrientes W3400AYY, Argentina
4
CIDIA, Universidad Nacional de Hurlingham, Pcia. de Buenos Aires Argentina—CPSI Universidad Tecnológica Nacional, Buenos Aires C1041AAJ, Argentina
5
Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(5), 734; https://doi.org/10.3390/e25050734
Submission received: 14 February 2023 / Revised: 24 March 2023 / Accepted: 21 April 2023 / Published: 28 April 2023
(This article belongs to the Special Issue Mathematics in Information Theory and Modern Applications)

Abstract

:

Simple Summary

We obtain expressions for the asymptotic distributions of the Rényi and Tsallis of order q entropies, and Fisher information when computed on the maximum likelihood estimator of probabilities from multinomial random samples. We recall results related to the Shannon entropy. We build a test for comparing entropies of different types and categories.

Abstract

We obtain expressions for the asymptotic distributions of the Rényi and Tsallis of order q entropies and Fisher information when computed on the maximum likelihood estimator of probabilities from multinomial random samples. We verify that these asymptotic models, two of which (Tsallis and Fisher) are normal, describe well a variety of simulated data. In addition, we obtain test statistics for comparing (possibly different types of) entropies from two samples without requiring the same number of categories. Finally, we apply these tests to social survey data and verify that the results are consistent but more general than those obtained with a χ 2 test.

1. Introduction

The multinomial distribution is an adequate model for describing how observations fall into categories. Quoting Johnson et al. [1], “The Multinomial distribution, like the Multivariate Normal distribution among the continuous multivariate distributions, consumed a sizable amount of the attention that numerous theoretical as well as applied researchers directed towards the area of discrete multivariate distributions.”
The entropy of a (multivariate, in our case) random variable is a substantial quantity. It quantifies the predictability of a system whose outputs can be described by such a model. Entropy has several definitions, both conceptual and mathematical. The concept of entropy originated as a way to relate a system’s energy and temperature [2]. The same concept was used to describe the number of ways the particles of a system can be arranged.
Entropy has been seldom studied as a random variable. Hutcheson [3] and Hutcheson and Shenton [4] discussed the exact expected value and variance of the Shannon entropy under the multinomial model. These works also provided approximate expressions that circumvent the numerical issues when using the exact value.
Jacquet and Szpankowski [5] studied high-quality analytic approximations of the Rényi entropy, of which the Shannon entropy is a particular case, under the binomial model. With the same approach, Cichoń and Golębiewski [6] obtained expressions for more general functionals that include the multinomial distribution. These works treat the entropy as a fixed quantity. Cook et al. [7] studied almost unbiased estimators of functions of the parameter of the binomial distribution. The authors extended those results to find an almost-unbiased estimator for the entropy under multinomial laws.
Chagas et al. [8] treated the Shannon entropy as a random variable. The authors obtained its asymptotic distribution when indexing by the maximum likelihood estimators of the proportions under the multinomial distribution. This result allowed the devising of unilateral and bilateral tests for comparing the entropy from two samples in a very general way. These tests do not require having the same number of categories.
In this work, our attention is directed toward the asymptotic distribution of other forms of entropy under the multinomial model. This allows the comparison of large samples throughout their entropies and, with this, they may have different numbers of classes. The comparison also allows using different types of entropy. We firstly apply the multivariate delta method and, in the case of the Rényi entropy, we transform the resulting multivariate normal distribution into that of the logarithm of the absolute value of a normally distributed random variable. Then, we provide the general expression of a test statistic that suits our needs.
This paper unfolds as follows. Section 2 recalls the main properties of the multinomial distribution and defines the four types of entropies we will study. In Section 3, we present the central results, i.e., the asymptotic distribution of those entropies. We describe the techniques we used and left for Appendix A.1 technical details. We validate our results with simulation studies in Section 4: we show the adequacy of the normal distribution as limit law for the entropies under three probability models of different support, considering various sample sizes. In Section 5, we show that these asymptotic properties lead to a helpful hypothesis test between samples with different categories. We conclude the article in Section 6. Appendix A.2 comments on applications that justify our choices of the number of categories and sample sizes in the simulation studies. Appendix A.3 discloses relevant computational information, including reproducibility.

2. Entropies and the Multinomial Distribution

Consider a series of n independent trials, where only one of k mutually exclusive events π 1 , π 2 , , π k must be observed in each one, with probability p = { p 1 , p 2 , , p k } such that p 0 and = 1 k p = 1 . Let N = ( N 1 , N 2 , , N k ) be the random vector that counts the number of occurrences of the events π 1 , π 2 , , π k in the n trials, with N 0 and = 1 k N = n . A sample from N , say n , is a k-variate vector of integer values n = ( n 1 , n 2 , , n k ) . Then, the joint distribution of N is
Pr ( N = n ) = Pr ( N 1 = n 1 , N 2 = n 2 , , N k = n k ) = n ! = 1 k p n n ! .
We denote this situation as N Mult ( n , p ) .
In practice, one does not know the true values of p , the probabilities that index this multinomial distribution. Such values are estimated by computing p ^ , the proportion of times the class (category, event) π was observed among the k possible categories π = { π 1 , π 2 , , π k } during the n trials. The maximum likelihood estimator for p ^ = ( p ^ 1 , p ^ 2 , , p ^ k ) is the random vector of proportions. This maximum likelihood estimator coincides with the intuitive estimator based on the distribution’s first moments, and is the most frequently used in applications.
We study the distribution of several forms of entropy of the random vector p ^ for fixed k. Notice that p ^ is computed over a single k-variate measurement of random proportions corresponding to a single random sample from N Mult ( n , p ) . The asymptotic behaviors we derive hold for typical cases in which n k .
The Shannon entropy measures the disorder or unpredictability of systems characterized by a probability distribution. On the one hand, the minimum Shannon value occurs when there is complete knowledge about the system behavior and total confidence in predicting the following observation. On the other hand, when a uniform distribution describes the system’s behavior, that is, when all possibilities have the same probability of occurrence, the knowledge about the behavior of the data is minimal. In Chagas et al. [8], we studied the asymptotic distribution of the Shannon entropy. In this work, we extend those results to three other forms of entropy.
Other types of descriptors have been proposed in the literature to extract additional information not captured by the Shannon entropy. Tsallis [9] and Rényi [10], for instance, proposed parametric versions, which include the Shannon entropy.
Fisher information [11] is defined by an average logarithm derivative of a continuous probability density function. In the case of discrete densities, this measure can be approximated using differences of probabilities between consecutive distribution elements. While the Shannon entropy captures the degree of unpredictability of a system, the Fisher information is related to the rate of change of consecutive observations and, thus, quantifies small changes and perturbations.
Given a type of entropy H, we are interested in the distribution of H ( p ) when indexed by p ^ , the maximum likelihood estimator of p . Our problem then becomes finding the distribution of H ( p ^ ) for the following:
  • The Shannon entropy
    H S ( p ^ ) = = 1 k p ^ log p ^ ,
  • The Tsallis entropy with index q R \ { 1 }
    H T q ( p ^ ) = = 1 k p ^ p ^ q q 1 ,
  • The Rényi entropy of order q R + \ { 1 }
    H R q ( p ^ ) = 1 1 q log = 1 k p ^ q ,
  • The Fisher information, also termed “Fisher Information Measure” in the literature, with renormalization coefficient F 0 = 4
    H F ( p ^ ) = F 0 = 1 k 1 p ^ + 1 p ^ 2 .
    Among other possibilities, we used Equation (2.7) from Ref. [12].

3. Asymptotic Distributions of Entropies

The main results of this section are the asymptotic distributions of the Shannon (2), Tsallis of order q (3), and Rényi of order q (4) entropies, and Fisher information (5). These results are presented, respectively, in Equations (30)–(32) and (35). Notice that the Rényi entropy is not asymptotically normally distributed, while the other three are.
We recall the following theorems known respectively as the delta method and its multivariate version. We refer to Lehmann and Casella [13] for their proofs.
Theorem 1.
Let X n be a sequence of independent and identically distributed random variables such that n [ X n θ ] converges in distribution to a N ( 0 , σ 2 ) . If h / θ exists and does not vanish, then n [ h ( X n ) h ( θ ) ] converges in distribution to a N ( 0 , σ 2 [ h / θ ] 2 ) .
Theorem 2.
Let X n = ( X 1 n , X 2 n , , X k n ) be a sequence of independent and identically distributed vectors of random variables such that n [ X 1 n θ 1 , X 2 n θ 2 , , X k n θ k ] converges in distribution to a multivariate normal distribution N n ( 0 , Σ ) , where Σ is the covariance matrix. Suppose that h 1 , h 2 , , h k are real functions continuously differentiable in a neighborhood of the parameter point θ = ( θ 1 , θ 2 , , θ k ) and such that the matrix of partial derivatives B = ( h / θ j ) , j = 1 k is non-singular in the mentioned neighborhood. Then, the following convergence in distribution holds:
n h 1 ( X n ) h 1 ( θ ) , h 2 ( X n ) h 2 ( θ ) , , h k ( X n ) h k ( θ ) D N ( 0 , B Σ B ) ,
where B denotes the transpose of B.
Now, we focus on the case N Mult ( n , p ) . Let p ^ = N / n be the vector of sample proportions which coincides with the maximum likelihood estimator (MLE) of p and Y n = n ( p ^ p ) . Then
Y n D N ( 0 , D p p p ) ,
where D p = Diag ( p 1 , p 2 , , p k ) .
Let us explore the covariance matrix in this case:
D p p p = p 1 0 0 0 p 2 0 0 0 p k p 1 p 2 p k p 1 p 2 p k
= p 1 p 1 2 p 1 p 2 p 1 p k p 2 p 1 p 2 p 2 2 p 2 p k p k p 1 p k p 2 p k p k 2
It means that the covariance matrix Σ p R k × k we are interested in is of the form
( Σ p ) j = p ( 1 p ) if = j , p p j if j .
The above statements are generalized. In the following, we obtain new results for the Tsallis and Rényi entropies, and for the Fisher information. For the sake of completeness, we also include the results for the Shannon entropy.
In order to apply the delta method using Theorem 2, we consider the following functions:
h S ( p 1 , p 2 , , p k ) = p log p ,
h T ( p 1 , p 2 , , p k ) = p p q ,
h R ( p 1 , p 2 , , p k ) = p q ,
h F ( p 1 , p 2 , , p k ) = p + 1 p 2 ,
for = 1 , 2 , , k except for () that holds for = 1 , 2 , , k 1 . The assumptions are verified, and thus,
h S p = log p + 1 and h S p j = 0 if j ,
h T p = 1 q p q 1 and h T p j = 0 if j ,
h R p = q p q 1 and h R p j = 0 if j ,
h F p j = p + 1 p ( 1 ) + j 1 p j if j = , + 1 and h F p j = 0 if j , + 1 .
Finally, we need the covariance matrix of the multivariate normal limit distribution, which is
Σ p Δ M = h M p j , j = 1 k Σ p h M p j , j = 1 k ,
where M { S , T , R , F } . Since ( h M / p j ) , j = 1 k are diagonal matrices for M { S , T , R } , we can use Equation (A1) to conclude that
( Σ p Δ S ) j = ( p p 2 ) ( log p + 1 ) 2 if = j , p p j ( log p + 1 ) ( log p j + 1 ) if j ;
( Σ p Δ T ) j = ( p p 2 ) ( 1 q p q 1 ) 2 if = j , p p j ( 1 q p q 1 ) ( 1 q p j q 1 ) if j ;
( Σ p Δ R ) j = q 2 ( p p 2 ) p 2 ( q 1 ) if = j , q 2 ( p p j ) q if j .
In the case of Σ p Δ F , from Equations (A3) and (A4) we have the following:
  • For , j = 1 , 2 , , k 2 and j 1 , j , j + 1 :
    ( Σ p Δ F ) j = p + 1 p p j + 1 p j p + 1 p j + p p j + 1 p p j p + 1 p j + 1 .
  • For = 1 , 2 , , k 2 :
    ( Σ p Δ F ) , 1 = p + 1 p p p 1 p + 1 p 1 + p 1 p p 1 p + 1 p .
  • For = 1 , 2 , , k 2 :
    ( Σ p Δ F ) = p + 1 p 2 2 p p + 1 + 2 p p + 1 .
  • For = 1 , 2 , , k 2 :
    ( Σ p Δ F ) , + 1 = p + 1 p p + 2 p + 1 p + 1 1 + p p + 2 p p + 1 p + 1 p + 2 .
  • For j = 1 , 2 , , k 2 :
    ( Σ p Δ F ) k 1 , j = ( p k p k 1 ) ( p j + 1 p j ) p k p j + 1 p j + 1 p k 1 p j p j .
  • Finally,
    ( Σ p Δ F ) k 1 , k 1 = ( p k p k 1 ) 2 ( 1 p k 1 ) .
Hence, we conclude that
n h 1 M ( p ^ 1 ) h 1 M ( p 1 ) , h 2 M ( p ^ 2 ) h 2 M ( p 2 ) , , h k M p ^ k h k M ( p k ) D N ( 0 , Σ p Δ M ) ,
where M { S , T , R , F } and k = k in all cases except for the case of the Fisher information in which k = k 1 . An equivalent expression is
n h 1 M ( p ^ 1 ) , h 2 M ( p ^ 2 ) , , h k M ( p ^ k ) D N n h 1 M ( p 1 ) h 2 M ( p 2 ) h k M ( p k ) , Σ p Δ M .
If Y is a vector of random variables such that n Y D N ( n μ , Σ ) , then it can be proved that E ( n Y ) n μ and Var ( n Y ) Σ . Provided well-known properties, it holds that E ( Y ) μ and Var ( Y ) 1 / n Σ . Applying this to (28),
h 1 M ( p ^ 1 ) , h 2 M ( p ^ 2 ) , , h k M ( p ^ k ) D N h 1 M ( p 1 ) h 2 M ( p 2 ) h k M ( p k ) , 1 n Σ p Δ M .
Now, using (29), we find the asymptotic distribution of (2)–(5). In order to do so, we need to know the distribution of the sum of k Gaussian random variables with different means and an arbitrary covariance matrix.
For any k-dimensional multivariate normal distribution Z N ( μ , Σ ) , with μ R k and covariance matrix Σ = ( σ j ) , holds that the distribution of W = a T Z , with a R k , is N a T μ , = 1 k a 2 σ + 2 = 1 k 1 j = i + 1 k a a j σ j . Using the limit distribution presented in (29) and a = ( 1 , 1 , , 1 ) , we directly have the asymptotic distribution of the Shannon entropy as follows:
H S ( p ^ ) = = 1 k p ^ log p ^ D N = 1 k p log p , 1 n = 1 k p ( 1 p ) ( log p + 1 ) 2 2 n j = 1 k 1 = j + 1 k p p j ( log p + 1 ) ( log p j + 1 ) .
With similar arguments and a = ( 1 , 1 , , 1 ) , we obtain the asymptotic distribution for the Tsallis entropy of order q:
H T q ( p ^ ) = = 1 k p ^ p ^ q q 1 D N = 1 k p p q q 1 , = 1 k ( p p 2 ) ( 1 q p q 1 ) 2 n ( q 1 ) 2 2 j = 1 k 1 = j + 1 k p p j ( 1 q p q 1 ) ( 1 q p j q 1 ) n ( q 1 ) 2 .
The procedure is analogous for the Fisher information but with a = ( 1 , 1 , , 1 ) R k 1 . Hence, it can be proved that
H F ( p ^ ) = F 0 = 1 k 1 ( p ^ 1 p ^ ) 2 D N F 0 = 1 k 1 ( p 1 p ) 2 , F 0 n Σ * ,
where
Σ * = p k p k 1 2 ( 1 p k 1 ) + = 1 k 2 p + 1 p 2 2 p p + 1 p p + 1 + 2 + 2 = 3 k 2 j = 1 2 p + 1 p p j + 1 p j p + 1 p j + p p j + 1 p p j p + 1 p j + 1 + 2 j = 1 k 2 p k p k 1 p j + 1 p j p k p j + 1 p k 1 p j + 2 = 2 k 2 p + 1 p p p 1 p + 1 p 1 p p 1 p + 1 p + p 1 .
To obtain expression (33), we use the symmetry of the covariance matrix which implies that = 1 k 1 j = + 1 k a a j σ j = = 2 k 1 j = 1 1 a a j σ j . It is worth noticing that the expression of the covariance matrix for Fisher information is more complicated than the previously analyzed entropies since the matrix of partial derivatives is not diagonal in this case.
The case of Rényi entropy is different because, following the previous methodology, we can prove that
= 1 k p ^ q D N = 1 k p q , 1 n = 1 k q 2 ( p p 2 ) p 2 ( q 1 ) 2 n = 1 k 1 j = + 1 k q 2 ( p j p ) q .
Hence,
H R q ( p ^ ) = 1 1 q log = 1 k p ^ q D P R q ,
where
P R q ( x ) = 1 q σ * 2 π exp [ ( 1 q ) x log ( k ) ] exp 1 2 exp [ ( 1 q ) x log ( k ) ] μ * σ * 2 ,
with μ * = = 1 k p q and σ * = n 1 = 1 k q 2 ( p p 2 ) p 2 ( q 1 ) 2 n 1 = 1 k 1 j = + 1 k q 2 ( p p j ) q . Notice that this is not a normal distribution but the distribution of the logarithm of the absolute value of a normally distributed random variable.
Often, in practice, these entropies are scaled to be in [ 0 , 1 ] ; these are called “normalized entropies”. The following modifications must be considered in the normalized versions of the entropies. For the normalized Shannon entropy, the asymptotic mean and variance are multiplied by 1 / log k and 1 / ( log k ) 2 , respectively. In the case of the normalized Tsallis entropy, the asymptotic mean and variance are multiplied by ( q 1 ) / ( 1 k 1 q ) and ( q 1 ) 2 / ( 1 k 1 q ) 2 , respectively. Finally, the asymptotic distribution of the normalized Rényi entropy is P ˜ R q ( x ) = log k P R q ( x log k ) . Notice that normalized entropies do not depend on the logarithm basis. The Fisher information is, as defined in (5), already normalized.

4. Analysis and Validation

In this section, we study the empirical distribution of the entropies computed from p ^ under three models, four categories ( k { 6 , 24 , 120 , 720 } ), and three sample sizes ( n { 10 2 k , 10 3 k , 10 4 k } ) that depend on the number of categories. These choices of k and n are based on the values that appear in signal analysis with ordinal patterns; see details of this technique in Appendix A.2.
We considered the following probability functions p = ( p 1 , p 2 , , p k ) :
  • Linear: p = 2 / ( k ( k + 1 ) ) , 1 k .
  • One-Almost-Zero: p = 1 / k for 1 k 2 , p k 1 = ϵ 0 , and p k = 2 / k ϵ 0 with ϵ 0 = 2.220446 × 10 16 (the smallest positive number for which, in our computer platform, 1 + ϵ 0 > 1 ).
  • Half-and-Half: p = 1 / k + ϵ / k for 1 k / 2 , and p = 1 / k ϵ / k for k / 2 + 1 k , with ϵ { 0.1 , 0.3 , 0.5 , 0.8 } .
These probability functions are illustrated, for k = 6 and ϵ = 0.3 , in Figure 1. We studied the behavior of the Shannon entropy, the Rényi entropy with q { 1 / 3 , 2 / 5 } , the Tsallis entropy with q { 1 / 2 , 3 / 2 } , and the Fisher information computed on samples of sizes n { 10 2 k , 10 3 k , 10 4 k } . We used 300 independent samples (replicates).
Although Equation (35) shows that the Rényi entropy is not asymptotically normal, we verified that its density is similar to that of a Gaussian distribution. With this in mind, we also checked of the normality of Rényi entropies. We used the Anderson–Darling test to verify the null hypothesis that the data follow a normal distribution. We chose this test because it uses the hypothesized distribution in calculating critical values. This test is more sensitive than other alternatives; see, for instance, the book by Lehman and Romano [14].
From Table 1, we notice that the Fisher information is the one that fails most times to pass the normality test at the 1% The situation that appears with p - value = 0.0010 in the table has, in fact, p - value = 9.606130 × 10 3 ; the table shows rounded values. Figure 2 shows four of these cases, namely for k = 6 , n = 600 , and ϵ = 0.1 , 0.3 , 0.5 , 0.8 . We notice that the deviation from the normal hypothesis is more prevalent in both tails, being that the observations are larger than the theoretical quantiles.
The normality hypothesis was rejected at the 1% level by the Anderson–Darling test in only 24 out of 432 situations, showing that the asymptotic Gaussian model for the entropies is a good description for these data. Table 1 shows those situations.
With the aim to assess the goodness of fit of the asymptotic models, we applied the Kolmogorov–Smirnov test to fifty replicates of samples. Table 2 shows the results where the p-value of the test is at least equal to 0.05 .
It is worth noticing that even in those cases where the p-value is lesser than 0.05 , the asymptotic models are a good fit to the data as can be seen in several examples exhibited in Figure 3. The Fisher information shows the worst fitting. Additionally, notice in Figure 3d that, although the asymptotic distribution of the Rényi entropy is not normal, the probability density function is visually very close to the Gaussian model. We verified this similarity in all the cases we considered.

5. Application

Inspired by an example from Agresti [16] (p. 200), we extracted data from the General Social Survey (GSS, a project of the independent research organization NORC at the University of Chicago, with principal funding from the National Science Foundation, available at https://gss.norc.org/. The data were downloaded on 24 December 2022). Table 3 shows the level of agreement to the assertion “Religious people are often too intolerant” as measured in three years.
The p-values of pairwise χ 2 tests for the null hypotheses that the underlying probabilities are equal are
  • 1998 and 2008: 3.43 × 10 22 ,
  • 1998 and 2018: 2.01 × 10 8 ,
  • 2008 and 2018: 1.06 × 10 3 .
On the one hand, these values attest that 1998 and 2008 and 1998 and 2018 are very different. On the other hand, although significant, the change between 2008 and 2018 is not so significant.
Table 4 shows the asymptotic mean and variance (in entropies normalized units) of the entropies of the proportions reported in Table 3.
We perform the same hypothesis test with the asymptotic quantities presented in Table 4. Table 5 shows the p-values of the null hypothesis that the entropies are equal, using the test discussed by Chagas et al. [8] (Section 5):
p - value 2 1 Φ | H ( p 1 ^ ) H ( p 2 ^ ) | σ 2 ^ n 1 , p 1 ^ + σ 2 ^ n 2 , p 2 ^ ,
where Φ is the cumulative distribution function of a standard normal random variable, H is any of the considered entropies computed with the observed proportions p i ^ , i = 1 , 2 , and σ 2 ^ n i , p i ^ is the corresponding sample asymptotic variance that takes into account the sample size n i . Notice that the test based on entropies compares only these features, and not the underlying distribution.
The results in Table 5 are consistent with those provided by the χ 2 tests, i.e., the most significant differences arise between 1998 and 2008 and between 1998 and 2018. Moreover, the tests based on entropies do not reject the null hypothesis in the pair 2008–2018, except for Rényi entropy of order 2 / 3 . The increased p-values are a consequence of the information reduction: whereas the χ 2 test compares count-by-count, those based on entropies compare two scalars.
In the second part of this application, we will illustrate the use of test statistics based on entropies for comparing samples with different categories. Situations like this may appear when applying alternative versions of the same questionnaire in a series of surveys.
We collapsed the categories of 1998 into three: “agreement” (by adding “strongly agree” and “agree”), “indifference” (“not agree/disagree”), and “disagreement” (by adding “disagree” and “strong disagree”). The resulting asymptotic mean entropies and asymptotic variances are shown in Table 6.
Table 7 presents the p-values of the tests that verify the null hypothesis of the same entropy between the collapsed 1998 data (three categories), and 2008 and 2018 (five categories). These results agree with those presented in Table 5. Such an agreement suggests that, although the number of categories was reduced in 1998 from five to three, the tests based on entropies cope with the loss of information.

6. Conclusions

We presented expressions for the asymptotic distribution of the Rényi and Tsallis entropies of order q, and Fisher information. The Fisher information and the Tsallis and Shannon entropies have limit normal distribution with means and variances that depend on the underlying probability of patterns and the number of patterns. The Rényi entropy follows, asymptotically, a different distribution, cf. (35), but a Gaussian law can well approximate it. Those expressions pose no numerical challenges other than setting 0 log 0 0 . We verified that these asymptotic distributions are good models for data arising from both simulations with a variety of models and from the analysis of actual data.
On the one hand, the Fisher information is the one that fails more frequently to pass the Anderson–Darling normality tests. On the other hand, it does not provide evidence to reject the same hypothesis under the One-Almost-Zero model.
The distributions we present here can be used for building test statistics, as discussed by Chagas et al. [8]. Moreover, Equation (37) allows performing tests with mixed types of distributions, a situation that may appear in Internet of Things applications, in which, citing Borges et al. [17], one has to deal with “large time series data generated at different rates, of different types and magnitudes, possibly having issues concerning uncertainty, inconsistency, and incompleteness due to missing readings and sensor failures.”

Author Contributions

Conceptualization, A.A.R., A.C.F. and J.G.; methodology, A.A.R., A.C.F., M.L., J.G. and H.S.R.; software, A.A.R., A.C.F., M.L., J.G. and E.T.C.C.; validation, A.A.R., A.C.F., M.L., J.G., E.T.C.C. and H.S.R.; formal analysis, A.A.R., A.C.F., M.L., J.G., E.T.C.C. and H.S.R.; investigation, A.A.R., A.C.F., M.L., J.G., E.T.C.C. and H.S.R.; resources, A.C.F. and H.S.R.; data curation, A.A.R., A.C.F., M.L., J.G., E.T.C.C. and H.S.R.; writing—original draft preparation, A.A.R., A.C.F., M.L., J.G. and E.T.C.C.; writing—review and editing, A.A.R., A.C.F., M.L., J.G., E.T.C.C. and H.S.R.; visualization, A.A.R., A.C.F., M.L., J.G. and E.T.C.C.; supervision, A.C.F. and H.S.R.; project administration, A.A.R., A.C.F. and J.G.; funding acquisition, A.C.F. and H.S.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were funded by Project 410695 from Victoria University of Wellington. It was also partially funded by Project 2020/05121-4 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), and project APQ-00426-22 from Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

p vector of probabilities
p the transpose of p
p ^ an estimator of p
N multivariate discrete random variable
n a sample from N
H S Shannon entropy
H T q Tsallis entropy of order q
H R q Rényi entropy of order q
H F Fisher information measure
Σ covariance matrix

Appendix A

Appendix A.1. Matrix Operations

Consider the real matrix M R k × k , and denote as M its transpose. If D = Diag ( d 1 , d 2 , , d k ) R k × k , then
( D M D ) i j = r = 1 k ( D M ) i r D r j = r = 1 k s = 1 k D i s M s r D r j = D i i M i j D j j = d i 2 M i i if i = j , d i d j M i j if i j .
We consider now
B = b 11 b 12 0 0 0 0 0 b 22 b 23 0 0 0 0 0 0 0 b k 1 , k b k k 0 0 0 0 0 b k k R k × k .
Analogous to the computation in Equation (A1), it can be seen that
( B M B ) i j = r = 1 k s = 1 k B i s M s r B r j = r = 1 k s = 1 k B i s M s r B j r .
Due to the form of B, if i , j = 1 , 2 , , k 1 then
( B M B ) i j = r = 1 k ( B i i M i r + B i , i + 1 M i + 1 , r ) B j r = ( B i i M i j + B i , i + 1 M i + 1 , j ) B j j + ( B i i M i , j + 1 + B i , i + 1 M i + 1 , j + 1 ) B j , j + 1 .
If i = k , replacing in Equation (A2),
( B M B ) k j = r = 1 k B k k M k r B j r = B k k ( M k j B j j + M k , j + 1 B j , j + 1 ) if j k , B k k 2 M k k if j = k .

Appendix A.2. Ordinal Patterns

Symbolic data analysis [18] encompasses methods that study the statistical properties of data aggregated by criteria that meet some scientific question. Such methods have attracted lots of attention because they present competitive results in many data analysis applications [17,19,20].
Ordinal patterns [21] belong to this class of techniques. They impose low computational complexity and are inherently robust. This approach consists of constructing a set of symbolic ordinal patterns based on intrinsic data characteristics without any prior model. Ordinal patterns often reveal and quantify the underlying time series dynamics. In spite of their successful application to biomedicine, economics, mechanics and electronics engineering, image analysis and remote sensing, to name a few (see, for instance, Refs. [20,22,23]), little is known about the statistical properties of the features they induce. One of these features is entropy, in its several forms.
Signal analysis with ordinal patterns requires coding D observations into k = D ! categories, in which D is typically small [8,20,24]. Motivated by these applications, we chose k { 6 , 24 , 120 , 720 } , which allows checking results in various categories. Bear in mind that, when using ordinal patterns, the subsequent patterns are not independent and, thus, the multinomial distribution is an approximation.

Appendix A.3. Computational Information

This article was written in Rmarkdown and is fully reproducible. We used RStudio version 2022.07.2 and R version 4.2.1. The code and data are available at https://gitlab.ecs.vuw.ac.nz/freryal/asymptotic-distribution-of-various-types-of-entropyunder-the-multinomial-law, accessed on 26 April 2023.

References

  1. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Discrete Multivariate Distributions; Wiley-Interscience: Hoboken, NJ, USA, 1997. [Google Scholar]
  2. Modis, T. Links between entropy, complexity, and the technological singularity. Technol. Forecast. Soc. Chang. 2022, 176, 121457. [Google Scholar] [CrossRef]
  3. Hutcheson, K. A test for comparing diversities based on the Shannon formula. J. Theor. Biol. 1970, 29, 151–154. [Google Scholar] [CrossRef] [PubMed]
  4. Hutcheson, K.; Shenton, L.R. Some moments of an estimate of Shannon’s measure of information. Commun. Stat. Theory Methods 1974, 3, 89–94. [Google Scholar] [CrossRef]
  5. Jacquet, P.; Szpankowski, W. Entropy computations via analytic depoissonization. IEEE Trans. Inf. Theory 1999, 45, 1072–1081. [Google Scholar] [CrossRef]
  6. Cichoń, J.; Golębiewski, Z. On Bernoulli Sums and Bernstein Polynomials. In Proceedings of the 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms, Montreal, QC, Canada, 18–22 June 2012; pp. 179–190. [Google Scholar]
  7. Cook, G.W.; Kerridge, D.F.; Pryce, J.D. Estimations of Functions of a Binomial Parameter. Sankhyā Indian J. Stat. Ser. A 1974, 36, 443–448. [Google Scholar]
  8. Chagas, E.T.C.; Frery, A.C.; Gambini, J.; Lucini, M.M.; Ramos, H.S.; Rey, A.A. Statistical Properties of the Entropy from Ordinal Patterns. Chaos Interdiscip. J. Nonlinear Sci. 2022, 32, 113118. [Google Scholar] [CrossRef] [PubMed]
  9. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  10. Rényi, A. On Measures of Entropy and Information. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1961; Volume 1, pp. 547–561. [Google Scholar]
  11. Frieden, B.R. Science from Fisher Information: A Unification; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  12. Sánchez-Moreno, P.; Yáñez, R.J.; Dehesa, J.S. Discrete densities and Fisher information. In Proceedings of the 14th International Conference on Difference Equations and Applications, Istanbul, Turkey, 19–23 October 2009; pp. 291–298. [Google Scholar]
  13. Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  14. Lehman, E.L.; Romano, J.P. Testing Statistical Hypothesis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  15. Freedman, D.; Diaconis, P. On the histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 1981, 57, 453–476. [Google Scholar] [CrossRef]
  16. Agresti, A. An Introduction to Categorical Data Analysis; Wiley-Interscience: Hoboken, NJ, USA, 2007. [Google Scholar]
  17. Borges, J.B.; Ramos, H.S.; Loureiro, A.A.F. A Classification Strategy for Internet of Things Data Based on the Class Separability Analysis of Time Series Dynamics. ACM Trans. Internet Things 2022, 3, 1–30. [Google Scholar] [CrossRef]
  18. Beranger, B.; Lin, H.; Sisson, S. New models for symbolic data analysis. Adv. Data Anal. Classif. 2022, 1–41. [Google Scholar] [CrossRef]
  19. Borges, J.B.; Medeiros, J.P.S.; Barbosa, L.P.A.; Ramos, H.S.; Loureiro, A.A. IoT Botnet Detection based on Anomalies of Multiscale Time Series Dynamics. IEEE Trans. Knowl. Data Eng. 2022; Early Access. [Google Scholar] [CrossRef]
  20. Chagas, E.T.C.; Frery, A.C.; Rosso, O.A.; Ramos, H.S. Analysis and Classification of SAR Textures using Information Theory. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 663–675. [Google Scholar] [CrossRef]
  21. Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
  22. Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation Entropy and Its Main Biomedical and Econophysics Applications: A Review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
  23. Sigaki, H.Y.D.; Perc, M.; Ribeiro, H.V. History of art paintings through the lens of entropy and complexity. Proc. Natl. Acad. Sci. USA 2018, 115, E8585–E8594. [Google Scholar] [CrossRef] [PubMed]
  24. Chagas, E.T.C.; Queiroz-Oliveira, M.; Rosso, O.A.; Ramos, H.S.; Freitas, C.G.S.; Frery, A.C. White Noise Test from Ordinal Patterns in the Entropy-Complexity Plane. Int. Stat. Rev. 2022, 90, 374–396. [Google Scholar] [CrossRef]
Figure 1. Linear, One-Almost-Zero, and Half-and-Half probability functions for k = 6 and ϵ = 0.3 .
Figure 1. Linear, One-Almost-Zero, and Half-and-Half probability functions for k = 6 and ϵ = 0.3 .
Entropy 25 00734 g001
Figure 2. Empirical densities and normal QQ-plots of the Fisher information in situations that fail to pass the normality test at 1%.
Figure 2. Empirical densities and normal QQ-plots of the Fisher information in situations that fail to pass the normality test at 1%.
Entropy 25 00734 g002
Figure 3. Examples of cases where the null hypothesis of the Kolmogorov–Smirnov test is rejected. The histograms are computed with samples of size 300 using the Freedman–Diaconis rule [15], and the green lines are the asymptotic probability density functions. (a) Type: H S , Model: OAZ, k = 120 , n = 10 4 k , p - val = 0.00202 ; (b) Type: H F , Model: OAZ, k = 120 , n = 10 4 k , p - val = 0.00013 ; (c) Type: H R 1 / 3 , Model: Linear, k = 24 , n = 10 4 k , p - val 0 ; (d) Type: H T 1 / 2 , Model: HaH, ϵ = 0.8 , k = 6 , n = 10 4 k , p - val = 0.04297 .
Figure 3. Examples of cases where the null hypothesis of the Kolmogorov–Smirnov test is rejected. The histograms are computed with samples of size 300 using the Freedman–Diaconis rule [15], and the green lines are the asymptotic probability density functions. (a) Type: H S , Model: OAZ, k = 120 , n = 10 4 k , p - val = 0.00202 ; (b) Type: H F , Model: OAZ, k = 120 , n = 10 4 k , p - val = 0.00013 ; (c) Type: H R 1 / 3 , Model: Linear, k = 24 , n = 10 4 k , p - val 0 ; (d) Type: H T 1 / 2 , Model: HaH, ϵ = 0.8 , k = 6 , n = 10 4 k , p - val = 0.04297 .
Entropy 25 00734 g003
Table 1. Situations for which the p-values of the Anderson–Darling test for the normality of samples of size 300 are less than 0.01 (“HF” stands for the Fisher information; “HaH” and “OAZ” are the Half-And-Half and One-Almost-Zero models).
Table 1. Situations for which the p-values of the Anderson–Darling test for the normality of samples of size 300 are less than 0.01 (“HF” stands for the Fisher information; “HaH” and “OAZ” are the Half-And-Half and One-Almost-Zero models).
TypeModel ϵ knp-Value
H F HaH0.866000.0030
H F HaH0.166000.0000
H F HaH0.12424000.0000
H F HaH0.82424000.0064
H F HaH0.1660000.0000
H F HaH0.366000.0000
H F HaH0.1120120,0000.0089
H F HaH0.124240000.0004
H F Linear066000.0000
H F Linear02424000.0000
H R 1 / 3 HaH0.166000.0000
H R 1 / 3 HaH0.12424000.0001
H R 1 / 3 OAZ02424000.0000
H R 2 / 3 HaH0.166000.0000
H R 2 / 3 HaH0.12424000.0001
H R 2 / 3 OAZ02424000.0000
H S HaH0.166000.0001
H S HaH0.12424000.0002
H S OAZ02424000.0000
H T 1 / 2 HaH0.166000.0000
H T 1 / 2 HaH0.12424000.0001
H T 1 / 2 OAZ02424000.0000
H T 3 / 2 HaH0.166000.0001
H T 3 / 2 HaH0.12424000.0003
H T 3 / 2 OAZ02424000.0000
Table 2. Situations for which the p-values of the Kolmogorov–Smirnov test of samples of size 50 are larger than or equal to 0.05 (“HaH” and “OAZ” are the Half-And-Half and One-Almost-Zero models).
Table 2. Situations for which the p-values of the Kolmogorov–Smirnov test of samples of size 50 are larger than or equal to 0.05 (“HaH” and “OAZ” are the Half-And-Half and One-Almost-Zero models).
TypeModel ϵ kn TypeModel ϵ kn
H S HaH 0.1 6 , 24 10 3 k , 10 4 k H T 3 / 2 HaH0.1 6 , 24 10 3 k , 10 4 k
120 , 720 10 4 k 120 , 720 10 4 k
0.3 6 , 120 for all 0.3 6 , 120 for all
24 10 3 k 24 10 3 k
720 10 3 k , 10 4 k 720 10 3 k , 10 4 k
0.5 6 , 24 for all 0.5 6 , 24 for all
120 , 720 10 3 k , 10 4 k 120 , 720 10 3 k , 10 4 k
0.8 6 10 2 k , 10 3 k 0.8 6 10 2 k , 10 3 k
24 , 720 for all 24 , 720 for all
120 10 3 k , 10 4 k 120 10 3 k , 10 4 k
Linear0 6 , 24 , 120 for all Linear0 6 , 24 , 120 for all
720 10 3 k , 10 4 k 720 10 2 k , 10 4 k
OAZ0 6 , 24 for all OAZ0 6 , 24 for all
H F HaH 0.1 6 10 3 k H R 1 / 3 HaH 0.3 6 10 2 k , 10 3 k
0.3 6 , 24 for all 0.56for all
0.5 6 10 2 k , 10 3 k 0.8 6 10 2 k , 10 3 k
24 10 3 k , 10 4 k Linear06for all
120 10 4 k OAZ06for all
0.8 6 10 2 k , 10 3 k 24 10 2 k
24for all H R 2 / 3 HaH 0.3 6for all
Linear06 10 3 k , 10 4 k 0.5 6for all
24 10 4 k 0.8 6 10 2 k , 10 3 k
OAZ0 6 , 24 for all Linear06for all
H T 1 / 2 HaH 0.1 6 , 24 10 3 k , 10 4 k OAZ06 10 2 k , 10 3 k
120 , 720 10 4 k
0.3 6 , 120 for all
24 10 3 k
720 10 3 k , 10 4 k
0.5 6 , 24 for all
120 , 720 10 3 k , 10 4 k
0.8 6 10 2 k , 10 3 k
24for all
120 , 720 10 3 k , 10 4 k
Linear0for allfor all
OAZ0 6 , 24 for all
120 10 3 k , 10 4 k
720 10 4 k
Table 3. GSS data about religious intolerance.
Table 3. GSS data about religious intolerance.
Year199820082018
STRONGLY AGREE148285186
AGREE429602496
NOT AGREE/DISAGREE278210229
DISAGREE275196181
STRONG DISAGREE723038
Total120213231130
Table 4. Asymptotic mean and variance of entropies.
Table 4. Asymptotic mean and variance of entropies.
MeanVariance
199820082018199820082018
H S 0.9140.8390.8630.00007240.00010810.0001175
H R 1 / 3 0.9670.9330.9470.00011720.00023750.0002122
H R 2 / 3 0.9390.8810.9020.00003170.00005000.0000512
H T 1 / 2 0.9320.8680.8920.00005000.00008710.0000842
H T 3 / 2 0.9190.8490.8700.00006220.00010890.0001183
H F 0.5160.7020.6420.00080280.00113550.0011917
Table 5. p-values of the hypothesis of equal entropies.
Table 5. p-values of the hypothesis of equal entropies.
1998–20081998–20182008–2018
H S 0.00000000.00026060.1029
H R 1 / 3 0.07056810.25202620.5316
H R 2 / 3 0.00000000.00004780.0404
H T 1 / 2 0.00000000.00044710.0690
H T 3 / 2 0.00000010.00023160.1672
H F 0.00002190.00457770.2118
Table 6. Asymptotic mean and variance of entropies of the collapsed entries of 1998.
Table 6. Asymptotic mean and variance of entropies of the collapsed entries of 1998.
MeanVariance
H S 0.9550.0000678
H R 1 / 3 0.9850.0000171
H R 2 / 3 0.9700.0000083
H T 1 / 2 0.9710.0000281
H T 3 / 2 0.9490.0000896
H F 0.1920.0000678
Table 7. p-values of the hypotheses of equal entropies using collapsed data in 1998.
Table 7. p-values of the hypotheses of equal entropies using collapsed data in 1998.
1998–20081998–2018
H S 0.000000.0000
H R 1 / 3 0.001150.0107
H R 2 / 3 0.000000.0000
H T 1 / 2 0.000000.0000
H T 3 / 2 0.000000.0000
H F 0.000000.0000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rey, A.A.; Frery, A.C.; Lucini, M.; Gambini, J.; Chagas, E.T.C.; Ramos, H.S. Asymptotic Distribution of Certain Types of Entropy under the Multinomial Law. Entropy 2023, 25, 734. https://doi.org/10.3390/e25050734

AMA Style

Rey AA, Frery AC, Lucini M, Gambini J, Chagas ETC, Ramos HS. Asymptotic Distribution of Certain Types of Entropy under the Multinomial Law. Entropy. 2023; 25(5):734. https://doi.org/10.3390/e25050734

Chicago/Turabian Style

Rey, Andrea A., Alejandro C. Frery, Magdalena Lucini, Juliana Gambini, Eduarda T. C. Chagas, and Heitor S. Ramos. 2023. "Asymptotic Distribution of Certain Types of Entropy under the Multinomial Law" Entropy 25, no. 5: 734. https://doi.org/10.3390/e25050734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop