Entropy and Effective Support Size

Notion of Effective size of support (Ess) of a random variable is introduced. A smallset of natural requirements that a measure of Ess should satisfy is presented. The measure withprescribed properties is in a direct (exp-) relationship to the family of R nyi’s α-entropies which eincludes also Shannon’s entropy H. Considerations of choice of the value of α imply that exp(H)appears to be the most appropriate measure of Ess.Entropy and Ess can be viewed thanks to their log / exp relationship as two aspects of the samething. In Probability and Statistics the Ess aspect could appear more basic than the entropic one.


Introduction
Interpretation of Shannon's entropy H(p) is usually developed in context of an experiment where the entropy is described as a measure of uncertainty; cf.[6], [5], [7].Motivated by a simple (and well-known) observation that exp(H(p)) is equal to the size of support of the underlying random variable for the uniform distribution, in this short note we introduce concept of Effective size of support (Ess).Measure of Ess should satisfy a small set of natural requirements.The class of Ess measures S(•, α) = ( m i=1 p α i ) 1−α which satisfy the requirements is in a direct relationship to the family of Rényi's α-entropies which includes as its special case also Shannon's entropy.We address the issue of selecting the value of α such that the corresponding S(•, α) would be the most appropriate measure of Ess.Unlike to entropy, Ess has an obvious meaning.From the point of view of Probability or Statistics, Ess can be seen as a more natural concept than entropy.

Effective size of support
Let X be a discrete random variable which can take on values from a finite set X of m elements, with probabilities specified by the probability mass function (pmf) p.The support of X is a set S(p(X)) {p : p i > 0, i = 1, 2, . . ., m}.Let |S(p(X))| denote the size of the support.
While pmf p = [0.5, 0.5] makes both outcomes equally likely, the following pmf q = [0.999,0.001] characterizes a random variable that can take on almost exclusively only one of two values.However, both p and q have the same size of support.This motivates a need for a quantity that could measure size of support of the random variable in a different way, so that the random variable can be placed in the range [1, m] according to its pmf.We will call the new quantity/measure the effective support size (Ess), and denote it by S(p(X)); S(p) or S(X), for short.The example makes it obvious that S(•) should be such that S(q) will be close to 1, while to p it should assign value S(p) = 2.

Properties of Ess
Ess should have certain properties, dictated by common sense.

P1) S(p) should be continuous, symmetric function (i.e., invariant under exchange of p
where u m denotes the uniform pmf on m-element support, δ m denotes an m-element pmf with probability concentrated at one point, p m denotes a pmf , if X and Y are independent random variables. The first two properties are obvious.The third one states that extending support by an impossible outcome should leave Ess unchanged.Only the fourth property needs, perhaps, some little discussion.Or, better, an example.Let p(X) = [1, 1, 1]/3 and p(Y ) = [1, 1]/2 and let X be independent of Y .Then p(X, Y ) = [1, 1, 1, 1, 1, 1]/6.According to P2), S(p(X)) = 3, S(p(Y )) = 2 and S(p(X, Y )) = 6 = S(p(X))S(p(Y )).It is reasonable to require the product relationship to hold for independent random variables with arbitrary distributions.
The properties P1-P4 are satisfied by S(p, α) = ( m i=1 p α i ) where α is a positive real number, different than 1.Note that S(•) of this form is exp of Rényi's entropy.For α → 1, S(p, α) also satisfies P1-P4 and takes the form of exp(H(p)), where H(p) − m i=1 p i log p i is Shannon's entropy2 ; cf.[1].It is thus reasonable to define S(p, α) for α = 1 this way (with the convention 0 log 0 = 0), so that S(•) then becomes a continuous function of α.

Selecting α
The requirements P1-P4 define entire class of measures of effective support size.This opens a problem of selecting α.
It is instructive to begin addressing the problem with a consideration of behavior of S(p(X), α) at the limit values of α.It can be easily seen that as α → 0, S(p(X), α) → |S(p(X))|, i.e., the size of the support.Thus, the closer the α to zero, the more S(•, α) behaves like the standard support size |S(p(X))|.
For α → ∞, S(p(X), α) = 1 p(X) , where p(X) = sup i=1,2,...,m p i .Thus, the higher the α, the more S(•, α) judges a pmf solely by its component with the highest value of probability.At the limit, all pmf's with the same p(X) are seen as entirely equivalent.
Table 1: S(p, α) for α = 0.001, 0.1, 0.5, 0.9, 1.0, 1.5, 2.0, 10, ∞ and different p's.Based on the table, in this simplest case of two-valued random variable we would opt for S(•, ∞) as the good measure of Ess.However, for larger |S| this choice becomes less attractive.As it was already noted, S(•, ∞) = 1/p and all pmf's with the same p are seen to have the same Ess.For instance, p = [0.95,0.05] and q = [0.95,x] where x stands for the other remaining 99 components with the value 0.05/99 = 0.0005, are by S(•, ∞) judged to have the same Ess, equal to 1.053.Just for a comparison, S(p, 1) = 1.220, while S(q, 1) = 1.535.This undesirable feature of S(•, ∞) manifests itself even more sharply in the case of continuous random variables.

Ess in the continuous case
The continuous-case analogue 3 1−α , where f (x) denotes a density with respect to Lebesgue measure.The continuous-case S c , though always positive, can -naturally -be smaller than one.And the discrete-case upper bound m is now replaced by ∞.It is worth stressing that S c behaves with respect to shift and scale transformations in the desired manner.Indeed, if For the Gaussian n(µ, σ 2 ) distribution, S(•, α) ; cf.[8].This for α → ∞ converges to √ 2πσ 2 , so that for σ 2 = 1 it becomes √ 2π = 2.5067.It is worth comparing with S(•, 1) = √ 2eπσ 2 (cf.[9]), which reduces in the case of σ 2 = 1 to 4.1327.This makes much more sense.
That S(•, ∞) is not the appropriate measure of Ess can be even more clearly seen in the case of the Exponential distribution.For βe −βx with β = 1, S(•, ∞) = 1 while S(•, 1) = e.

Adding another property
The above considerations suggest that S(•, 1) might be the most appropriate of the Ess measures which satisfy the requirements P1-P4.The question is whether there is some other requirement that is reasonable to add to the already employed properties, such that it could narrow down the set of S(•, α) to S(•, 1).
To this end, let us consider two random variables X, Y that, in general, might be dependent.It is natural, to extend requirement P4 to the more general setting, by requiring that4 P4 * ) S(p(X)) S(p(Y )) ≥ S(p(X, Y )),with the equality if and only if X and Y are independent.For α = 1, it might be in some cases that instead of ≥ the opposite relation < holds true.Indeed, consider for instance the following bivariate discrete random variable with pmf p(X, Y )