Measures of Qualitative Variation in the Case of Maximum Entropy

Asymptotic behavior of qualitative variation statistics, including entropy measures, can be modeled well by normal distributions. In this study, we test the normality of various qualitative variation measures in general. We find that almost all indices tend to normality as the sample size increases, and they are highly correlated. However, for all of these qualitative variation statistics, maximum uncertainty is a serious factor that prevents normality. Among these, we study the properties of two qualitative variation statistics; VarNC and StDev statistics in the case of maximum uncertainty, since these two statistics show lower sampling variability and utilize all sample information. We derive probability distribution functions of these statistics and prove that they are consistent. We also discuss the relationship between VarNC and the normalized form of Tsallis (α = 2) entropy in the case of maximum uncertainty.


Introduction
Whenever the scale of measurement is nominal or ordinal, the classical measures of dispersion, like standard deviation and variance, cannot be used.In such cases, the only way to measure dispersion is to use measures, which involve frequencies of random observations.Wilcox [1] made the first attempt to gather some of the qualitative variation indices together pointing out the utility of these measures for statistical handling of qualitative data.One of the rare attempts of deriving probability functions for qualitative measures can be seen in Swanson [2].
Qualitative measures are widely used in social and biological sciences.Appropriate for the area of application, qualitative variation indices are preferred compared to diversity indices, or vice versa.A diversity index is a quantitative measure that accounts for the number of categories in a dataset.Measures of diversity are distinguished from measures of variation such that the former refers to counting numbers of discrete types [3].Diversity is the antonym of concentration, whilst a near synonym of variety.The term concentration is more common in some areas of ecology, and economics.Diversity is more likely in sociology and communication [4].For example, in economics, concentration is a measure of competitiveness in a market.The more concentrated the market, the less competitive it will be.Heip et al. [5] introduced a list of diversity and evenness indices, most of which could also be seen as qualitative variation indices.The same parallelism between qualitative variation indices and diversity indices can be found in [6][7][8][9][10] as well.
Although there are some differences in explanations of these measures, some mathematical analogies between them are straightforward for some inferential purposes.For this reason it is not surprising to find a concentration measure, which was originally proposed for measuring diversity before it was reversed (a measure is "reversed" by taking its reciprocal, or subtracting it from its maximum value, etc.).For example, Simpson's D statistic is a measure of diversity, and "the reversed form" of it gives the Gini concentration index, which is a special case of Tsallis entropy.The Herfindahl-Hirschmann concentration index is obtained by subtracting Simpson's D from 1.
Statistical entropy is a measure of uncertainty of an experiment as well as being a measure of qualitative variation.Once the experiment has been carried out, uncertainty is not present [11].Thus, it can also be evaluated as the measure of information; one can get through sampling, or ignorance, before experimentation [12].Jaynes [13] proposes that the maximizing of Shannon's entropy provides the most appropriate interpretation for the amount of uncertainty.The principle of maximum entropy also coincides with Laplace's well-known principle of insufficient reasoning.Pardo [14] and Esteban and Morales [15] provide theoretical background for different entropy measures.
In our study, we first present the three most common entropy measures, namely Shannon, Rényi, and Tsallis entropies with their variances, and then we discuss their asymptotic behaviour.Moreover, we list various qualitative variation indices as axiomatized by Wilcox [1], including Shannon and Tsallis entropies, and VarNC and StDev statistics.By simulations we check the normality of these measures for various entropy assumptions.We observe that maximum entropy is a serious factor, which prevents normality.We formulate the probability density functions of VarNC and StDev statistics and the first two moments under the assumption of maximum uncertainty.We also show that VarNC is special case of normalized Tsallis (α = 2) entropy under the same assumptions.We discuss the relationship between qualitative variation indices and power divergence statistics since entropy measures the divergence of a distribution from maximum uncertainty.

Common Entropy Measures
There are various entropy measures formulated by various authors in literature.Most commonly used ones are Shannon, Rényi, and Tsallis entropies.We give basic properties of these three measures below.

Shannon Entropy
In his study on mathematical theory for communication, Shannon [16] developed a measure of uncertainty or entropy, which was later named as "Shannon entropy".If the discrete random variable X takes on the values x 1 , x 2 , . . ., x K with respective probabilities p 1 , p 2 , . . ., p K , Shannon entropy is defined as: In case of maximum uncertainty (i.e., the case in which all probabilities are equal), this becomes H s = log K.The upper limit of Shannon entropy depends on the number of categories K.The estimator of Shannon entropy, H, is calculated by using sample information as: where probabilities pi are estimated by maximum likelihood method.Although this estimator is biased, the amount of bias can be reduced by increasing the sample size [17].Zhang Xing [18] gives the variance of Shannon's entropy with sample size n as follows: (3)

Rényi Entropy
Rényi entropy [19] is defined as: Shannon entropy is a special case of Rényi entropy for α → 1 .The variance of Rényi entropy estimator can be approximated by:

Tsallis Entropy
Another generalization of Shannon entropy is mainly due to Constantino Tsallis.Tsallis entropy is also known as q-entropy and is a monotonic function of Rényi entropy.It is given by (see [20]): For α = 2, Tsallis entropy is identical to Gini Concentration Index [21].The variance of this entropy estimator is given by [14]: 2.1.4.Asymptotic Sampling Distributions of Entropy Measures Agresti and Agresti [22] present some information about sampling properties of Gini concentration index.They also introduce some tests to compare the qualitative variation of two groups.Magurran [23] discusses some statistical tests for comparing the entropies of two samples.Agresti [24] provides the method of deriving sampling distributions for qualitative variation statistics.Pardo [14] emphasizes that entropy-based uncertainty statistics can also be derived from divergence statistics.He also discusses some inferential issues in detail.For the asymptotic behaviour of entropy measures, one may refer to Zhang Xing [18] and Evren and Ustao glu [25] under the condition of maximum uncertainty.

Qualitative Variation Statistics
In this section we give a list of qualitative variation indices axiomatized by Wilcox [1] and discuss sampling properties and relationship with power divergence statistic.Wilcox [1] notes that in textbook treatments of measures of variation, range, semi-interquartile range, average deviation and standard deviation are presented and discussed.However, the presentation and discussion of measures of variation suitable for a nominal scale is often completely absent.His paper represents a first attempt to gather and to generate alternative indices of qualitative variation at introductory level.

Axiomatizing Qualitative Variation
Wilcox points out that any measure of qualitative variation must satisfy the following: 1.
Variation is between zero and one; 2.
When all of the observations are identical, variation is zero; 3.
When all of the observations are different, the variation is one.

Selected Indices of Qualitative Variation
A list of selected qualitative variation statistics, some of which might have already been called differently by different authors in various economical, ecological, and statistical studies, are listed in Table 1.
Table 1.A list of selected qualitative variation indices ( f i denotes the frequency of category i, n is the sample size and K is the number of categories).

Index Defining Formula Min Max Explanation
Variation ratio or Freeman's index (VR) 1 − f mode is the frequency of the modal class.Index of deviations from the mode (ModVR) Normalized form of the variation ratio.Index based on a range of frequencies (RanVR) f min, f mode are minimum and maximum frequencies.
Average deviation (AVDEV) Analogous to mean deviation.K is the number of categories.Variation index based on the variance of cell frequencies (VarNC) Normalized form of Tsallis entropy when α = 2. Analogous to variance.
The base of the logarithm is immaterial.
Normalized entropy (HRel) Normalization is used to force the index between 0 and 1.
B index B index considers the geometric mean of cell probabilities.

M1 (Tsallis entropy for
It is also known as Gini Concentration Index. Heip Index (HI) If Shannon entropy is based on natural logarithms.

Normalizing (Standardizing) an Index
In general, if an index I fails to satisfy any of the requirements in Section 2.2.1, the following transformation can be used for remedy: Note that this has the same form as the distribution function of a uniform distribution.Since any distribution function is limited between 0 and 1, this transformation is useful in improving the situation.The term "normalization" or "standardization" is not related to normal distribution.Rather, it is intentionally used to indicate that any "normalized" index takes values from the interval [0, 1].For example, VR = 1 − f mode n and whenever all observations come from one category, (Variation ratio) VR is zero.On the other hand, when normalizes VR.In other words, normalizing VR this way produces the index (Index of deviations from the mode) ModVR.

Power-Divergence Statistic and Qualitative Variation
Loosely speaking, for discrete cases, a statistic of qualitative variation measures the divergence between the distribution under study, and the uniform discrete distribution.For the general exposition of statistical divergence measures, one may refer to Basseville [26] and Bhatia and Singh [27].Cressie and Read [28] show that ordinary chi-square and log-likelihood ratio test statistics for goodness of fit can be taken as the special cases of power-divergence statistic.Chen et al. [29] and Harremoës [30] explain the family of power-divergence statistics based on different parametrizations.Power-divergence statistic is an envelope for goodness of fit testing and is defined as: where f i is the observed frequency, e i is the expected frequency under the null hypothesis, and λ is a constant.Under the assumption of maximum uncertainty e i = n K , it becomes f i = np i and: By substituting α = λ + 1 in Equation ( 6), Tsallis entropy can be formulated alternatively as: The normalized Tsallis entropy (H TN ) can also be found as: From Equation (12), we obtain the power-divergence statistic as: This result is in agreement with intuition.In the case of maximum entropy, normalized Tsallis entropy will be equal to one and PD(λ) = 0 as expected.

Tests for Normality and Scenarios Used for the Evaluation
In order to test the normality of the above given qualitative indices under various entropy values, distributions with four, six and eight categories are studied.These distributions are chosen to investigate the differences between the behaviour of these indices in cases of both maximum entropy and lower entropy.Samples of 1000, 2000, and 5000 units are taken with corresponding runs for all distributions.The distributions are shown in Table 2, labelled from 1 to 6.Note that odd-numbered distributions correspond to maximum entropy cases.We have observed that none of the indices distribute normally for maximum entropy distributions no matter how large the sample size is.Therefore, we present only the results for lower entropy distributions.As a general tendency as the sample size increases, nine of the eleven indices tend to normality for all non-maximum entropy distributions.Nevertheless the normality of two indices, namely RanVR and the B index, is affected by dimensionality and sample size.Moreover, sampling variability of these two indices is found to be considerably higher as compared to the other nine indices.This phenomenon can be seen in the coefficient variation diagrams of indices in Figure 1 for the six distributions in Table 2 with three different sample sizes.

Test Results for Cases of Maximum Entropy
When the entropy is at the maximum, the variability of VarNC, StDev, Shannon, Hrel, and M1 statistics is comparatively low, as seen in Figure 1.On the other hand when the level of uncertainty is lower, the variability of VarNC and StDev statistics is still one among the lower scores.In addition, because of the close relationship between VarNC and StDev statistics with the chi-square distribution in case of maximum entropy, sampling properties of these two statistics can be deduced exactly; we address this issue in Section 4.

VarNC Statistic
The VarNC statistic is the analogous form of variance for analysing nominal distributions.It also equals the normalized form of Tsallis entropy when = 2.The VarNC statistic can be evaluated in analogy to the variance of discrete distributions.It is defined as: Under maximum entropy assumption, the quantity:

VarNC Statistic
The VarNC statistic is the analogous form of variance for analysing nominal distributions.It also equals the normalized form of Tsallis entropy when α = 2.The VarNC statistic can be evaluated in analogy to the variance of discrete distributions.It is defined as: Under maximum entropy assumption, the quantity: fits a chi-square distribution with K − 1 degrees of freedom [1].Thus, the probability density of VarNC statistics can be written as: If we let Z = VarNC = a + bX for a = 1 and b = −1 n(K−1) , then with f (z) = f (x) dx dz we obtain: where c = It can be shown that VarNC equals the normalized version of Tsallis (α = 2) entropy with VarNC is also inversely proportional to the variance of cell probabilities of a multinomial distribution.The variance of Tsallis (α = 2) entropy can be found by direct substitution of α = 2 in Equation ( 7): which is larger than the variance of Tsallis (α = 2) entropy.
By deriving moments, one can show the consistency of VarNC statistics: Under the assumption of maximum entropy, VarNC is biased since E(varNC) = 1; however, it is consistent since lim n→∞ E(VarNC) = 1, and lim n→∞ Var(VarNC) = 0 (see [31]).Finally, it can be noted that for larger K values, VarNC can be approximated by a normal distribution with µ = (n−1) n and

StDev Statistic
The StDev statistic was proposed by Wilcox [1] as the analogous formulation of ordinary standard deviation for qualitative distributions.It is defined as: The statistic where X has the same probability density function as in Equation (16).Then by the transformation of probability densities one obtains: where .
By deriving moments, one can show the consistency of StDev statistics.By Equation ( 24), we write: If we let g(x) = X n(K−1) , by Taylor series expansion we have: Then we obtain: Similarly, ignoring the quadratic and higher terms in the Taylor-series expansion yields: StDev is biased, but consistent since, as n → ∞ , it holds E(Y) → 1 and lim n→∞ Var(Y) = 0.

Probability Distribution of VarNC and StDev under Maximum Entropy
The probability distributions of the statistics VarNC and StDev under the assumption of maximum entropy are shown in Figure 2, for two different n and K values. .
The statistic Y = StDev is a function of = ∑ as it holds = 1 − ( ) .That means that Y is − √ with = 1 and = ( ) where X has the same probability density function as in Equation (16).Then by the transformation of probability densities one obtains: .
Then we obtain: Similarly, ignoring the quadratic and higher terms in the Taylor-series expansion yields: StDev is biased, but consistent since, as → ∞, it holds E( ) → 1 and lim → Var( ) = 0.

Probability Distribution of VarNC and StDev under Maximum Entropy
The probability distributions of the statistics VarNC and StDev under the assumption of maximum entropy are shown in Figure 2, for two different n and K values.

Discussion
All categorical distributions can be modelled by multinomial distribution.As the sample size increases indefinitely, multinomial distribution tends to a multivariate normal distribution.All qualitative variation statistics discussed are functions of cell counts of multinomial distribution.This fact implies the asymptotic normality of various qualitative variation measures which are simply the functions of cell counts or probabilities, themselves.In our simulation studies, we have observed this tendency for most of the investigated qualitative variation statistics whenever the uncertainty is not at maximum, except for RanVR and the B index for some dimensionalities and sample sizes.The RanVR statistic mainly uses two special numbers, the minimum and maximum frequencies.In other words, the RanVR statistic is not sufficient since it does not use all relevant sample information.For this reason, higher sampling variability is expected a priori.This situation is especially important as dimensionality (K) increases.On the other hand, the B Index is a function of the geometric mean of the probabilities.This way of multiplicative formulation of uncertainty causes higher sampling variability and may be a factor preventing normality.
None of the indices which we studied distribute normally in the case of maximum entropy, no matter how large the sample size is.This implies that maximum entropy is a factor preventing normality.Secondly, when there is little or no information about cell probabilities of multinomial distribution, the principle of insufficient reasoning justifies assuming maximum entropy distributions.In such cases, VarNC and StDev may be used in modelling the qualitative variation, since the probability distributions of these two statistics can be derived based on the relation between VarNC, StDev, and chi-square distribution.In this study we have derived the probability functions of these two statistics and shown that both statistics discussed are consistent.We have also shown that the variance of VarNC is less than that of StDev statistic and VarNC has some additional appealing properties because it is simply the normalized version of Tsallis ( α = 2) when the uncertainty is at maximum.

Discussion
All categorical distributions can be modelled by multinomial distribution.As the sample size increases indefinitely, multinomial distribution tends to a multivariate normal distribution.All qualitative variation statistics discussed are functions of cell counts of multinomial distribution.This fact implies the asymptotic normality of various qualitative variation measures which are simply the functions of cell counts or probabilities, themselves.In our simulation studies, we have observed this tendency for most of the investigated qualitative variation statistics whenever the uncertainty is not at maximum, except for RanVR and the B index for some dimensionalities and sample sizes.The RanVR statistic mainly uses two special numbers, the minimum and maximum frequencies.In other words, the RanVR statistic is not sufficient since it does not use all relevant sample information.For this reason, higher sampling variability is expected a priori.This situation is especially important as dimensionality (K) increases.On the other hand, the B Index is a function of the geometric mean of the probabilities.This way of multiplicative formulation of uncertainty causes higher sampling variability and may be a factor preventing normality.
None of the indices which we studied distribute normally in the case of maximum entropy, no matter how large the sample size is.This implies that maximum entropy is a factor preventing normality.Secondly, when there is little or no information about cell probabilities of multinomial distribution, the principle of insufficient reasoning justifies assuming maximum entropy distributions.In such cases, VarNC and StDev may be used in modelling the qualitative variation, since the probability distributions of these two statistics can be derived based on the relation between VarNC, StDev, and chi-square distribution.In this study we have derived the probability functions of these two statistics and shown that both statistics discussed are consistent.We have also shown that the variance of VarNC is less than that of StDev statistic and VarNC has some additional appealing properties because it is simply the normalized version of Tsallis (α = 2) when the uncertainty is at maximum.

Table 2 .
Six parent discrete distributions used in the simulations.To test the asymptotic normality of the indices, Schapiro-Wilk W, Anderson-Darling, Martinez-Iglewicz, Kolmogorov-Smirnov, D'Agostino Skewness, D'Agostino Kurtosis, and D'Agostino Omnibus tests are used.Success rates of these tests are shown in percentages for the non-maximum uncertainty distributions 2, 4, and 6 in Table3.For instance, a rate 71% means five of the above mentioned seven tests accepted normality (5/7 = 0.714; the numbers are rounded to the nearest integer).