Next Article in Journal
Unified Approach to Thermodynamic Optimization of Generic Objective Functions in the Linear Response Regime
Previous Article in Journal
Analytical Modeling of MHD Flow over a Permeable Rotating Disk in the Presence of Soret and Dufour Effects: Entropy Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Measurement of Randomness (Uncertainty): A More Informative Entropy

1
Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
2
Department of Industrial & Systems Engineering, University of Minnesota, Minneapolis, MN 55455, USA
Entropy 2016, 18(5), 159; https://doi.org/10.3390/e18050159
Submission received: 11 February 2016 / Revised: 7 April 2016 / Accepted: 15 April 2016 / Published: 26 April 2016
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
As a measure of randomness or uncertainty, the Boltzmann–Shannon entropy H has become one of the most widely used summary measures of a variety of attributes (characteristics) in different disciplines. This paper points out an often overlooked limitation of H: comparisons between differences in H-values are not valid. An alternative entropy H K is introduced as a preferred member of a new family of entropies for which difference comparisons are proved to be valid by satisfying a given value-validity condition. The H K is shown to have the appropriate properties for a randomness (uncertainty) measure, including a close linear relationship to a measurement criterion based on the Euclidean distance between probability distributions. This last point is demonstrated by means of computer generated random distributions. The results are also compared with those of another member of the entropy family. A statistical inference procedure for the entropy H K is formulated.

1. Introduction

For some probability distribution P n = ( p 1 , ... , p n ) , with p i 0 for i = 1, …, n and i = 1 n p i = 1 , the entropy H ( P n ) , or simply H, is defined by:
H = i = 1 n p i   log   p i [ 0 ,  log   n ]
where the logarithm is the natural (base-e) logarithm. The probabilities p i ( i = 1 , ... , n ) may be associated with a set of quantum states of a physical system in statistical mechanics or physics, a set of symbols or messages in a communication system, or, most generally, a set of mutually exclusive and exhaustive events. First used by Boltzmann [1] in statistical mechanics (as kH with k being the so-called Boltzmann constant) and later introduced by Shannon [2] as the basis for information theory (with base-2 logarithm and bits as the unit of measurement), this entropy H can appropriately be called the Boltzmann–Shannon entropy.
Although interpreted in a number of different ways, the most common and general interpretation of H is as a measure of randomness or uncertainty of a set of random events (e.g., [3] (pp. 67–97), [4], [5] (Chapter 2), [6] (pp. 12, 13, 90)). A number of alternative entropy formulations have been proposed as parameterized generalizations of H in Equation (1) (see, e.g., [7,8,9]), but with limited success or impact. The most notable exception is the following one-parameter family of entropies by Rényi [10]:
H R = 1 1 α log 2 i = 1 n p i α , α > 0 ,   α 1
which reduces to Equation (1), with base-2 logarithm, when α 1 . This entropy family has, for instance, been used as a fractal dimension [11] (pp. 686–688). Another such family of entropies is that of Tsallis [12] defined as:
H T = 1 α 1 ( 1 i = 1 n p i α ) ,   < α <
which includes the H in Equation (1) as the limiting case when α 1 .
Since its origins in statistical physics and information theory, the entropy H in Equation (1) has proved to be remarkably versatile as a summary measure of a wide variety of attributes (characteristics) within diverse fields of study, ranging from psychology (e.g., [13]) to fractal geometry [11] (pp. 678–687). However, such widespread use of H has led to misuse, improper applications, and misleading results due to the fact that, although H has a number of desirable properties [14] (Chapter 1), it does suffer from one serious limitation: comparisons between differences in H-values are not valid. The basis for this limitation is explained in the next section of this paper.
As a clarification of such comparisons in general, consider some summary measure M and probability distributions P n , Q m , R t , S u . The various types of potential comparisons can then be defined as follows:
S i z e   ( o r d e r )   c o m p a r i s o n s :   M ( P n ) > M ( Q m )
D i f f e r e n c e   c o m p a r i s o n s :   M ( P n ) M ( Q m ) > M ( R t ) M ( S u )
P r o p o r t i o n a l   d i f f e r e n c e   c o m p a r i s o n s :   M ( P n ) M ( Q m ) = c [ M ( R t ) M ( S u ) ]
where c is a constant. While, because of the properties of H in Equation (1), there is no particular reason to doubt the validity of the size comparison in Equation (2a) involving H, the difference comparisons in Equations (2b) and (2c) are not valid for H as discussed below.
In this paper, an alternative and equally simple entropy is introduced as:
H K = ( i = 1 i m n p i ) 2 + i = 1 i m n p i ,   p m = max i { p i }
The term entropy is used for this measure of randomness or uncertainty since (a) it has many of the same properties as H in Equation (1) and (b) the entropy term has been used in such a variety of measurement situations for which H K can similarly be used. As is established in this paper, H K has the important advantage of being more informative than H in the sense that H K meets the conditions for valid difference comparisons as in Equations (2b) and (2c). It will also be argued that the H K is the preferred member of a family of entropies with similar properties. Statistical inference procedure for H K will also be outlined.

2. Conditions for Valid Difference Comparisons

Consider that M is a measure of randomness (uncertainty) such that its value M ( P n ) for any probability distribution P n = ( p 1 , ... , p n ) is bounded as:
M ( P n 0 ) M ( P n ) M ( P n 1 )
where P n 0 and P n 1 are the degenerate and uniform distributions
P n 0 = ( 1 , 0 , ... , 0 ) ,   P n 1 = ( 1 / n , ... , 1 / n )
and where one can set M ( P n 0 ) = 0 . In order for the difference comparisons in Equations (2b) and (2c) to be permissible or valid, some condition needs to be imposed on M (see [15]). Specifically, all intermediate values M ( P n ) in Equation (4) have to provide numerical representations of the extent of randomness (uncertainty) that are true or realistic with respect to some acceptable criterion. While different types of validity are used in measurement theory [16] (pp. 129–134), value validity will be used here for this required property of M. In order to establish specific requirements for M to have value validity, a particular probability distribution proves useful and Euclidean distances will be used as a criterion.
Therefore, consider the recently introduced lambda distribution:
P n λ = ( 1 λ + λ n , λ n , ... , λ n ) ,   λ [ 0 , 1 ]
where λ is a uniformity (evenness) parameter and with P n 0 and P n 1 in Equation (5) being special (extreme) cases [17]. This P n λ is simply the following weighted mean of P n 0 and P n 1 :
P n λ = λ P n 1 + ( 1 λ ) P n 0
From Equations (4) and (6), it follows that, for any given P n :
M ( P n ) = M ( P n λ )  for one unique  λ
so that validity conditions on M ( P n ) can equivalently be determined in terms of M ( P n λ ) . With probability distributions considered as points (vectors) in n-dimensional space and with the Euclidean distance function d being the basis for a validity criterion, the following requirement seems most natural and obvious [17]:
M ( P n 1 ) M ( P n λ ) M ( P n 1 ) M ( P n 0 ) = d ( P n λ , P n 1 ) d ( P n 0 , P n 1 ) = 1 λ
Since M ( P n 0 ) = 0 , i.e., there is no randomness when one p i = 1 , it follows from Equation (9) that:
M ( P n λ ) = λ M ( P n 1 )
as a value-validity condition. This condition also follows immediately from Equation (7) as:
M ( P n λ ) = M [ λ P n 1 + ( 1 λ ) P n 0 ] = λ M ( P n 1 ) + ( 1 λ ) M ( P n 0 )
which equals Equation (10) for M ( P n 0 ) = 0 .
For the case when λ = 0.5 , P n 0.5 is the midpoint of P n 0 and P n 1 with coordinates ( 1 / 2 + 1 / 2 n ,   1 / 2 n ,   ... ,   1 / 2 n ) . Then:
M ( P n 0.5 ) = M ( P n 0 + P n 1 2 ) = M ( P n 0 ) + M ( P n 1 ) 2
= M ( P n 1 ) 2  for  M ( P n 0 ) = 0
which is exactly as stated in Equations (10) and (11) with λ = 0.5 . Of course, Equations (12) and (13) represent a weaker value-validity condition than Equations (10) and (11). Note also that it is not assumed a priori that M is a linear function of λ . This linearity is a consequence of Equations (7)–(9).
The entropy H in Equation (1) with H ( P n 0 ) = 0 and H ( P n 1 ) = log   n does not meet these validity conditions. For example, for n = 2 and λ = 0.5 , H ( P 2 0.5 ) = H ( 0.75 ,   0.25 ) = 0.56 , which far exceeds the requirement log2/2 = 0.35 in Equation (13). Similarly, H ( P 4 0.5 ) = H ( 0.625 ,   0.125 ,   0.125 ,   0.125 ) = 1.07 > > log   4 / 2 = 0.69 and H ( P 20 0.5 ) = 2.09 > > log   20 / 2 = 1.5 0 . It can similarly be verified that H ( P n λ ) for all n and 0 < λ < 1 , and hence H ( P n ) for all P n from Equation (8), overstates the true or realistic extent of the randomness (uncertainty) that H is supposed to measure. Consequently, difference comparisons as in Equations (2b) and (2c) based on H are invalid. An alternative measure that meets the validity conditions for such comparisons will be introduced next.

3. The New Entropy

3.1. Derivation of H K

The logic or reasoning behind the H K in Equation (3) as a measure of randomness or uncertainty may be outlined as follows:
(1)
As a matter of convenience and as used throughout the rest of the paper, all individual probabilities will be considered ordered such that:
p 1 p 2 p n
(2)
Due to the constraint that i = 1 n p i = 1 , there is no loss of generality or information by focusing on p 2 , ... , p n .
(3)
Instead of considering i = 2 n f ( p i ) or a weighted mean or sum of f ( p 2 ) , ... , f ( p n ) for some function f of the individual p i ’s, one could consider the sum of the means of all pairs of the p i ( i = 2 , ... , n ) . Since an entropy measure needs to be zero-indifferent (expansible), i.e., unaffected by the addition of events with zero probabilities (e.g., [14] (Chapter 1)), a logical choice of pairwise means would be the geometric means p i p j for all i, j = 2, …, n (since obviously p i 0 = 0 ). Therefore, the measure consisting of the means p i p j , including those for i = j, can be expressed as:
H K = 2 2 i j n p i p j = i = 2 n j = 2 n p i p j + i = 2 n p i
where the multiplication factor 2 is included so that H K ( P n 1 ) = H(1/n,…,1/n) = n − 1 instead of (n−1)/2. With p 1 being the modal (largest) probability, this H K in Equation (15) is twice the sum of the pairwise geometric means of all the non-modal probabilities. Furthermore, from the fact that, for a set of numbers { x i } ,   ( i = 2 n x i ) 2 = i = 2 n j = 2 n x i x j and then setting x i = p i , it follows from the second expression in Equation (15) that:
H K = ( i = 2 n p i ) 2 + i = 2 n p i
which is the same as the formula in Equation (3).
As an alternative approach, one could begin by considering the power sum, or sum of order α , S α = ( i p i α ) 1 / α (e.g., [18] (pp. 138–139). Strict Schur-concavity, which is discussed below as an important property of an entropy and one that H in Equation (1) has, requires that the parameter α < 1 [18] (pp. 138–139). Since p i 0 (i = 1, …, n), a further restriction is that α be positive and hence 0 < α < 1 for the power sum S α . In order for S α to comply with the value-validity condition in Equation (11), it is clear that S α can only be the power sum of the non-modal probabilities so that:
S α ( p 2 , ... , p n ) = ( i = 2 n p i α ) 1 / α , 0 < α < 1
with S α ( P n 0 ) = 0 and S α ( P n 1 ) = ( n 1 ) 1 / α / n for the probability distributions in Equation (5). A reasonable upper bound would be S α ( p n 1 ) = n 1 . This requirement is met for α = 1 / 2 in Equation (17) and by the addition of i = 2 n p i , resulting in:
S 1 / 2 ( p 2 , ... , p n ) + S 1 ( p 2 , ... , p n ) = ( i = 2 n p i ) 2 + i = 2 n p i
which is the same as Equation (16) and for which H K ( P n 1 ) = n 1 .

3.2. Properties of H K

The properties of H K in Equation (16), some of which are readily apparent from its definition, can be outlined as follows:
Property 1. 
H K is a continuous function of all its individual arguments p 1 , ... , p n .
Property 2. 
H K is (permutation) symmetric in the p i ( i = 1 , ... , n ) .
Property 3. 
H K is zero-indifferent (expansible), i.e.,
H K ( p 1 , ... , p n , 0 , ... , 0 ) = H K ( p 1 , ... , p n )
Property 4. 
For any given P n = ( p 1 , ... , p n ) and the P n 0 and P n 1 in Equation (5):
H K ( P n 0 ) H K ( P n ) H K ( P n 1 ) ;   H K ( P n 0 ) = 0 ,   H K ( P n 1 ) = n 1
Property 5. 
From Equation (19), H K ( P n 1 ) is strictly increasing in n.
Property 6. 
H K is strictly Schur-concave so that, if P n = ( p 1 , ... , p n ) is majorized by Q n = ( q 1 , ... , q n ) (denoted by ):
P n Q n H K ( P n ) H K ( Q n )
with strict inequality unless P n is simply a permutation of Q n .
Property 7. 
H K is concave, but not strictly concave.
Property 8. 
H K meets the value-validity condition in Equation (10) with H K ( P n λ ) = λ ( n 1 ) .
Proof of Property 6. 
The strict Schur-concavity of H K = ( i = 2 n p i ) 2 + 1 p 1 follows immediately from the partial derivatives:
H K p 1 = 1 ;   H K p i = p i 1 / 2 j = 2 n p j ,   i = 2 , ... , n
and the fact that H K / p i is strictly increasing in i = 1, …, n (unless p i = p i + 1 ) ([18] (p. 84)). The majorization P n Q n in Equation (20) is a more precise statement than the vague notion that the components of P n are “more nearly equal” or “less spread out” than are those of Q n . By definition, if i = 1 n p i = i = 1 n q i (and with the ordering in Equation (14) for P n and Q n ):
P n Q n  if  i = 1 j p i i = 1 j q i ,   j = 1 , ... , n 1
([18] (p. 8)). ☐
Proof of Property 7. 
From Equation (16) and for the probability distributions P n and Q n and all λ [ 0 ,   1 ] :
H K [ λ P n + ( 1 λ ) Q n ] = [ i = 2 n ( λ p i + ( 1 λ ) q i ) 1 / 2 ] 2 + λ i = 2 n p i + ( 1 λ ) i = 1 n q i
From Minkowski’s inequality (e.g., [19] (p. 175)):
[ i = 2 n ( λ p i + ( 1 λ ) q i ) 1 / 2 ] 2 [ i = 2 n ( λ p i ) 1 / 2 ] 2 + [ i = 2 n ( ( 1 λ ) q i ) 1 / 2 ] 2
so that, from Equations (21) and (22):
H K [ λ P n + ( 1 λ ) Q n ] λ [ ( i = 2 n p i ) 2 + i = 2 n p i ] + ( 1 λ ) [ ( i = 2 n q i ) 2 + i = 2 n q i ] = = λ H K ( P n ) + ( 1 λ ) H K ( Q n )
proving that H K is concave. However, and importantly, H K is not strictly concave since the inequality in Equation (23) is not strict for all P n  and  Q n such as for P n = P n 1 and Q n = P n 0 in Equation (5) when
H K ( P n λ ) = H K [ λ P n 1 + ( 1 λ ) P n 0 ] = λ H K ( P n 1 ) + ( 1 λ ) H K ( P n 0 ) = λ ( n 1 )
as required by the value-validity conditions in Equations (10) and (11). ☐
  • Note 1: If a measure (function) M is strictly concave so that, instead of Equation (23), the inequality M [ λ P n + ( 1 λ ) Q n ] > λ M ( P n ) + ( 1 λ ) M ( Q n ) is strict for all P n ,   Q n ,  and  λ ( 0 , 1 ) , then the condition in Equation (11) cannot be met. The H in Equation (1) is one such measure.
  • Note 2: The extremal values H K ( P n 1 ) = H K ( 1 / n , , 1 / n ) = n 1 for a measure of randomness or uncertainty is also a logical requirement for valid difference comparisons. As a particular case of the proportional difference comparisons in Equation (2c), and for any integer m < n:
    H K ( P n + m 1 ) H K ( P n 1 ) = H K ( P n 1 ) H K ( P n m 1 )
    i.e., adding an amount m to n results in the same absolute change in the value of H K as does subtracting m from n in the equiprobable case. Or, in terms of the function f where H K ( P n 1 ) = f ( n ) , Equation (25) can be expressed more conveniently as:
    f ( n + m ) f ( n ) = f ( n ) f ( n m )
    The general solution of the functional equation in Equation (26) is f ( n ) = a n + b with real constants a and b [20] (p. 82), which equals H K ( P n 1 ) for a = 1 and b = −1.
  • Note 3: For the binary case of n = 2, H K ( P 2 1 ) = H K ( 0.5 , 0.5 ) = 1 , which equals H(0.5, 0.5) in Equation (1) if the base-2 logarithm is used. In fact, H(0.5, 0.5) = 1 is an axiom or required property, the normalization axiom, frequently used in information theory to justify the use of the base-2 logarithm in Equation (1) and bits as the unit of measurement [14] (Chapter 1). The binary entropy H K ( 1 p ,   p ) = 2 p  for  p 0.5 or:
    H K ( 1 p , p ) = 1 | 1 2 p |  for all  p [ 0 ,   1 ]

4. Generalization of H K

Instead of the pairwise geometric means in Equation (15), one could consider power means, or arithmetic means of order α , and hence the following parameterized family of entropies:
H K α = i = 2 n j = 2 n M α ( p i , p j ) + i = 2 n p i ;   M α ( p i ,   p j ) = ( p i α + p j α 2 ) 1 / α ,   < α 0
of which H K in Equations (15) and (16) is the particular member H K 0  as  α 0 . Since a measure of randomness (uncertainty) should be zero-indifferent (see Property 3 of H K ), it is clear from the formula in Equation (27) that α cannot be positive, i.e., α 0 where α = 0 means the limit when α 0 . If p i = 0 , p j = 0 ,  or  p i = p j = 0 ,   M α ( p i , p j ) is taken to be 0 for α 0 (see, e.g., [21] (Chapter 2) for the properties of such power means). One of the important properties of M α ( p i ,   p j ) is that it is a non-decreasing function of α   ( for  < α < ) and is strictly increasing unless p i = p j . Besides this M α , there are other types of means that could be considered (e.g., [18] (pp. 139–145), [22]).
Since M α ( p i , p j ) is strictly increasing in α   ( if  p i p j ) , it follows from Equation (27) that, for any probability distribution P n = ( p 1 , ... , p n ) :
H L ( P n ) H K α ( P n ) H K ( P n )  for all  α ( ,   0 )
where the lower limit H L ( P n ) = H K ( ) ( P n ) is the limit of H K α  as  α  and  H K ( P n ) is defined in Equations (15) and (16) and is the limit of H K α ( P n )  as  α 0 . The inequalities in Equation (28) are strict unless P n  equals  P n 0  or  P n 1 in Equation (5).
Each member of H K α has the same types of properties as those of H K discussed above. The strict Schur-concavity of H K α follows from the fact that (a) H K α is (permutation) symmetric in the p i   ( i = 1 , ... , n ) and (b) the partial derivatives, after setting i = 2 n p i = 1 p 1 in Equation (27):
H K α p 1 = 1 ,   H K α p i = j = 2 n ( p j α p i α + 1 2 ) ( 1 α ) / α ,   i = 2 , ... , n
are clearly strictly increasing in i = 1, …, n (for p i > p i + 1 ) for all α ( ,   1 ) . The case when α 0 was proved in the preceding subsection.
As with any reasonable measure of randomness or uncertainty, each member of H K α in Equation (27) is a compound measure consisting of two components: the dimension of the distribution or vector P n and the uniformity (evenness) with which the elements of P n are distributed. For any probability distribution P n = ( p 1 , ... , p n ) , this fact can be most simply represented by:
H K α ( P n ) = H K α ( P n 1 ) H K α * ( P n ) ,   H K α * ( P n ) [ 0 ,   1 ]
where H K α ( P n 1 ) = n 1 for the uniform distribution P n 1 in Equation (5) and where H K α * ( P n ) = H K α ( P n ) / ( n 1 ) reflects the uniformity (evenness) of P n . The H K α * basically controls for n. For the distribution in Equation (6), H K α * ( P n λ ) = λ .
The limiting member of H L in Equation (28) as α is defined by:
H L = i = 2 n j = 2 n min { p i ,   p j } + i = 2 n p i = i = 1 n j = 1 n min { p i , p j } 1
= 2 i = 2 n ( i 1 ) p i
where the expression in Equation (30b) can easily be seen to follow directly from Equation (30a) (remembering again the order in Equation (14)). The second expression in Equation (30a) has been briefly mentioned by Morales et al. [23] and the form in Equation (30b), divided by 2, has been suggested by Patil and Taillie [24] as one potential measure of diversity.

5. Comparative Analysis

5.1. Why the Preference for H K

From a practical point of view, what sets H K in Equation (3) or Equation (16) apart from any other member of the family H K α in Equation (27) is its ease of computation. Its values can easily be computed on a hand calculator for any probability distribution P n even when the dimension n is quite large. For other members of H K α , ( n 1 ) 2 pairwise means have to be computed, which becomes practically impossible without the use of a computer program even when n is not large. The computational effort for the member H L (when α ) is somewhat less than for other members. Nevertheless, the apparently simpler formula for H L in Equation (30b) requires that all p i be ordered as in Equation (14), which can be very tedious if done manually and nearly impossible if n is large.
The H K is also favored over other members of H K α when considering the agreement with some other measure based on Euclidean distance and the familiar standard deviation. Specifically, for any probability distribution P n = ( p 1 , ... , p n ) and P n 1 = ( 1 / n , ... , 1 / n ) , consider the following linearly decreasing function of the distance d ( P n ,   P n 1 ) :
D ( P n ) = ( n 1 ) [ 1 n n 1 d ( P n , P n 1 ) ] = ( n 1 ) ( 1 n n 1 s n ) = ( n 1 ) C N V
where s n is the standard deviation of p 1 , ... , p n (with devisor n rather than n – 1) and CNV is the coefficient of nominal variation [25,26]. It is clear from Equation (31) that, for P n 0  and  P n 1 in Equation (5), D ( P n 0 ) = 0 and D ( P n 1 ) = n 1 . Also, for the lambda distribution in Equation (6), D ( P n λ ) = λ D ( P n 1 ) = λ ( n 1 ) , so that D satisfies the value-validity condition in Equation (10). Of course, D is not zero-indifferent (see Property 3 for H K ).
Since the Euclidean distance and the standard deviation are such universally used measures, it is to be expected that an acceptable measure of randomness (uncertainty) should not differ substantially from D in Equation (31). From numerical examples, it is seen that values of H K in Equation (16) tend to be closer to those of D in Equation (31) than are the values of any other member of the H K α family in Equation (27). In order to demonstrate this fact, a computer simulation was used to generate a number of random distributions using the following algorithm. For each randomly generated probability distribution P n = ( p 1 , ... , p n ) , n was first generated as a random integer between 3 and 20, inclusive. Then, with the ordering in Equation (14), each p i was generated as a random number (to 5 decimal places) within the following intervals:
1 n p i 1
1 j = 1 i 1 p j n ( i 1 ) p i min { p i 1 , 1 j = 1 i 1 p j } , i = 2 , , n 1
and finally
p n = 1 i = 1 n 1 p i
For each such generated distribution, the values of H K , H L , and D were computed according to the formulas in Equations (16), (30b) and (31) as were their corresponding uniformity (evenness) indices from Equation (29). After excluding some (five) distributions P n that were nearly equal to P n 0  or  P n 1 in Equation (5), the results for 30 different distributions are summarized in Table 1.
It is apparent from the data in Table 1 that H K agrees quite closely with D and clearly more so than does H L . Exceptions are Data Sets 1, 11, and 26 when the H K -values differ considerably from those of D, but still less so than do the H L -values. If D is used to predict H K (i.e., for the fitted model H ^ K = D ), it is found for the 30 data sets in Table 1 that the coefficient of determination, when properly computed as R 2 = 1 ( H K D ) 2 / ( H K H ¯ K ) 2 [27], becomes R 2 = 0.98 (i.e., 98% of the variation of H K is explained by the fitted model H ^ K = D ) as compared to R 2 = 0.91 in the case of H L . Also, the root mean square (RMS) of the differences between the values of H K and D is found to be 0.64 as compared to 1.33 for H L and D . Similarly, when comparing the values of the indices H K * ,   H L * ,  and  D * , the H K * -values are considerably closer to the D * -values than are the H L * -values, with R M S ( H K * ,   D * ) = 0.05 and R M S ( H L * ,   D * ) = 0.10.
No other member of H K α in Equation (27) is generally in as close agreement with D as is H K , but more so than H L . This can be explained by the fact that (a) whenever there is a notable difference between the values of H K and D, those of H K tend to be less than those of D as seen from Table 1; and (b) H K α ( P n ) is a strictly increasing function of α for any given P n other than P n 0  and  P n 1 in Equation (5).

5.2. Comparative Weights on p 1 ,   ... ,   p n

The difference between values of H K and H L as demonstrated in Table 1, or between any of the members of the H K α family, is due to the fact that H K α places different weights or emphases on the p i (i = 1, …, n) depending upon α . When considering each pairwise mean M α ( p i ,   p j ) in Equation (27), p i  and  p j are weighted equally only when α = 1 . Then, since (a) M α ( p i ,   p j ) is strictly increasing in α   ( for  p i p j ) and (b) H K α is zero-indifferent (Property 3 of H K ) only for α 0 , the H K 0 = H K in Equations (15) and (16) is the zero-indifferent member of H K α that is always closest in value to H K 1 and whose pairwise means M 0 ( p i ,   p j ) are always closest to M 1 ( p i ,   p j ) for all i and j.
Besides the weights placed on each component of all pairs ( p i ,   p j ) , the weights given to each individual p i   ( i = 2 , ... , n ) can also be examined by expressing the H K α in Equation (27) as the following weighted sum:
H K α = i = 2 n [ j = 2 n ( p i α + p j α 2 ) 1 / α ( 1 p i ) + 1 ] p i = i = 2 n w α i p i
which shows that the weights w α i   ( i = 2 , ... , n ) are increasing in both α and i. In the case of H K , with α 0 in Equation (32), w 0 i = j = 2 n p j / p i + 1 for i = 2, …, n whereas, for H L when α ,   w i = j = 2 n min { p i ,   p j } p i 1 + 1 for i = 2, …, n. These weights for H K are basically a compromise between the weights for H L and those for H K 1 . Note also that these weights for H L from Equation (32) can differ substantially from those in Equation (30b) as can the weights for H K 1 from Equation (32) when compared with the weights from the expression H K 1 = n i = 2 n p i .
When comparing H K and H L , small p i ’s are given more weight by H K than by H L and the addition of low probability components to a set of events has more effect on H K than on H L . However, when weighting the pros and cons of such relative sensitivity to small p i ’s, it is important to keep in mind the relationship in Equation (29) and not jump to conclusion. For example, when going from P 4 = ( 0.40 ,   0.30 ,   0.20 ,   0.10 ) to Q 6 = ( 0.40 ,   0.30 ,   0.20 ,   0.05 ,   0.04 ,   0.01 ) , H K increases from H K ( P 4 ) = 2.32  to  H K ( Q 6 ) = 2.91 , a 25% increase, while H L ( P 4 ) = 2.00 and H L ( Q 6 ) = 2.12 , a 6% increase. However, from Equation (29), the dimensional component of both H K and H L increased by 67% (from n − 1 = 3 to n − 1 = 5) whereas the uniformity (evenness) components decreased by 25% in the case of H K (from 2.32/3 to 2.91/5) and 37% for H L (from 2.00/3 to 2.12/5). In this regard the 25% increase in randomness (uncertainty) as measured by H K does not appear unreasonable.

5.3. Inconsistent Orderings

Although all members of the family H K α in Equation (27) have the same types of properties, including the value-validity property in Equation (10), this does not necessarily imply that different members will always produce the same results for the comparisons in Equation (2). Such lack of consistency is inevitable whenever measures are used to summarize data sets into a single number. However, as stated by Patil and Taillie [24] (p. 551), “Inconsistent measures…are a familiar problem and should not be a cause for undue pessimism”, pointing out the fact that, for instance, the arithmetic mean and the median are not consistent measures of average (central tendency) and the standard deviation and the mean absolute deviation are inconsistent measures of variability (spread). One type of consistent results for all members of H K α is the size (order) comparison H K α ( P n ) > H K α ( Q n ) in Equation (2a) whenever P n is majorized by Q n and P n is not a permutation of Q n . This is the result of Equation (20) and the fact, as proved above, that H K α is strictly Schur-concave for all α ( ,   1 ) .
It is only when two measures M 1 and M 2 have a perfect linear relationship that (a) the comparison results from Equation (2) will always be consistent and (b) the compliance by M 1 with the value-validity conditions in Equations (10) and (11) also implies compliance by M 2 . In the case of H K and H L , and from the simulation results in Table 1, Pearson’s correlation coefficient between H K and H L is found to be r = 0.993, indicating a near perfect linear relationship between H K and H L . However, since the linearity is not truly perfect or exact, H K and H L will not always give the same results for the comparisons in Equation (2) as is evident from some of the data in Table 1.

6. Discussion

The value-validity condition in Equation (10) as a necessary requirement for valid difference comparisons as in Equation (2) is based on Euclidean distances. Such distances are also being used as a basis for the preference of H K over other potential members of the family of entropies in Equation (27). This distance metric is the standard one in engineering and science. The use of any other “distance” measures, such as directed divergencies discussed below, would seem to require particular justification in the context of value-validity assessment.
As a simple numerical example illustrating the reasoning behind the value-validity arguments in Equations (6)–(13) and the use of Euclidean distances, consider the following probability distributions based on P n λ in Equation (6):
P 5 0 = { p i ( 0 ) } = ( 1 ,   0 ,   0 ,   0 ,   0 )
P 5 0.5 = { p i ( 0.5 ) } = ( 0.6 ,   0.1 ,   0.1 ,   0.1 ,   0.1 )
P 5 1 = { p i ( 1 ) } = ( 0.2 ,   0.2 ,   0.2 ,   0.2 ,   0.2 )
The Euclidean distances d ( P 5 0.5 , P 5 0 ) = d ( P 5 0.5 , P 5 1 ) and | p i ( 0.5 ) p i ( 0 ) | = | p i ( 0.5 ) p i ( 1 ) | for i = 1 , ... , 5. A measure of uncertainty (randomness) M that takes on reasonable numerical values within the general bounds M ( P n 0 ) and M ( P n 1 ) should in this example satisfy the equality | M ( P 5 0.5 ) M ( P 5 0 ) | = | M ( P 5 0.5 ) M ( P 5 1 ) | so that, with M ( P 5 0 ) = 0 , M ( P 5 0.5 ) = M ( P 5 1 ) / 2 . That is, since P 5 0.5 is the same distance from P 5 0 as it is from P 5 1 and each element of P 5 0.5 is the same distance from the corresponding element of P n 0 as it is from that of P 5 1 , M would reflect this fact by taking on the value M ( P 5 0.5 ) = M ( P 5 1 ) / 2 . The H K in Equation (3) or Equation (16) meets this requirement with H K ( P 5 0.5 ) = H K ( P 5 1 ) / 2 = ( 5 1 ) / 2 = 2 . However, in the case of H in Equation (1), H ( P 5 0.5 ) = 1.23 > > H ( P 5 1 ) / 2 = 0.80 , a substantial overstatement of the extent of the uncertainty or randomness.
A similar comparison between H K and H for P 5 λ with λ = 0.25 and λ = 0.75 is given in Table 2 together with the results from some other probability distributions. The results are also given in terms of the normalized measures H K * ( P n ) = H K ( P n ) / ( n 1 ) and H * ( P n ) = H ( P n ) / log   n as well as D * ( P n ) = D ( P n ) / ( n 1 ) for D in Equation (31). As seen from Table 2, while H K * ( P 5 λ ) = D * ( P 5 λ ) = λ , H * ( P 5 λ ) > > λ for both λ -values. For all distributions in Table 2, the values of H K * are quite comparable to those of D * , but those of H * are all considerably greater.
The distributions P 10 ( 3 ) P 5 ( 8 ) are included in Table 2 to exemplify the types of contradictory results that may be obtained when making the difference comparisons in Equation (2) based on H K versus H . The distributions P 10 ( 3 ) and P 14 ( 4 ) in Table 2 are real data for the market shares (proportions) of the carbonated soft drinks industry in the U.S. and the world-wide market shares of cell phones, respectively (obtained by Googling “market shares” by industries). Some of the smaller market shares are not given in Table 2 because of space limitation, but were included in the computations. The H , which has been used as a measure of market concentration or rather of its converse, deconcentration (e.g., [28]), would indicate that these two industries have nearly the same market deconcentration. By contrast, when considered in terms of H K for which such comparison is valid because of the value-validity property of H K , the results in Table 2 show that the cell-phone industry is about 20% more deconcentrated than the soft-drink industry. Similarly, for the fictitious distributions P 4 ( 5 ) P 5 ( 8 ) in Table 2, the type of difference comparison in Equation (2b) shows that H K ( P 5 ( 8 ) ) H K ( P 4 ( 5 ) ) = 1.13 > H K ( P 5 ( 6 ) ) H K ( P 5 ( 7 ) ) = 0.98 whereas the result would have been the reverse had H been used for this comparison, with H ( P 5 ( 8 ) ) H ( P 4 ( 5 ) ) = 0.28 < H ( P 5 ( 6 ) ) H ( P 5 ( 7 ) ) = 0.35 .
Instead of using the Euclidean metric to formulate the value-validity conditions in Section 2, one could perhaps consider other potential “distance” measures such as divergencies, also referred to as “statistical distances”. The best known such measure of the divergence of the distribution P n = ( p 1 , ... , p n ) from the distribution Q n = ( q 1 , ... , q n ) is the Kullback-Leibler divergence [29] defined as:
K L D ( P n : Q n ) = i = 1 n p i   log ( p i q i )
This measure is directional or asymmetric in P n and Q n . A symmetric measure is the so-called Jensen–Shannon divergence (JSD) (e.g., [30,31,32,33]), which can be expressed in terms of the Kullback–Leibler divergence (KLD) as:
J S D ( P n , Q n ) = 1 2 K L D ( P n : M n ) + 1 2 K L D ( Q n : M n )
where M n = ( P n + Q n ) / 2 . Neither K L D nor J S D are metrics, but J S D is [34].
Consider now the family of distributions P n λ in Equation (6) and the extreme members of P n 0 and P n 1 in Equation (5). For the case of n = 5, for example, it is found that K L D ( P 5 0 : P 5 0.5 ) = 0.51 ( K L D ( P 5 0.5 : P 5 0 ) is undefined) and K L D ( P 5 1 : P 5 0.5 ) = 0.33 . In the case of J S D , M 5 ( P 5 0.5 , P 5 0 ) = ( 0.8 ,   0.05 , ... ,   0.05 ) so that J S D ( P 5 0.5 , P 5 0 ) = 0.16 . Similarly, M 5 ( P 5 0.5 , P 5 1 ) = ( 0.40 ,   0.15 , ... ,   0.15 ) and J S D ( P 5 0.5 , P 5 1 ) = 0.09 . These results differ greatly from those based on Euclidean distances for which d ( P 5 0.5 , P 5 0 ) = d ( P 5 0.5 , P 5 1 ) . The fact that d ( P n 0.5 , P n 0 ) = d ( P n 0.5 , P n 1 ) for all n, which is also reflected by the normalized H K * ( P n 0.5 ) = 0.5 , corresponds to the fact that each component of P n 0.5 is of equal distance from the corresponding components of P n 0 and P n 1 . However, no such correspondence exists for the divergence measures K L D and J S D .
The derivation of H K in Equation (3) or Equation (16) is based on the exclusion of the modal probability p 1 . Of course, p 1 can enter the expression for H K since i = 2 n p i = 1 p 1 . One may wonder what the result would be if a different p i were to be excluded. If the smallest p i is excluded, the measure would not be zero-indifferent (expansible). If any p i other than p 1 is excluded, then the measure would not be strictly Schur-concave as can be verified from the proof of Property 6 of H K . This property is essential for any measure of uncertainty (randomness). In fact, the exclusion of p 1 makes H K unique in this regard.
It should also be emphasized that even though the entropy H in Equation (1) lacks the value-validity property, it has many of the same properties as H K and has undoubtedly numerous useful and appropriate applications as demonstrated in the extensive published literature. The problems with H arise when it is used uncritically and indiscriminately in fields far from its origin: a statistical concept of communication theory. Both Shannon [35] and Wiener [36] cautioned against such uncritical applications. It is when H or its normalized form H * is used as a summary measure (statistic) of various attributes (characteristics) and when its values are interpreted and compared that its lack of the value-validity property can lead to incorrect and misleading results and conclusions. This is the motivation for introducing the new entropy H K as a measure that overcomes the lack of value validity by H.

7. Statistical Inference about H K

Consider now the case when each p i = n i / N is the multinomial sample estimate of the unknown population probability π i for i = 1, …, n and based on sample size N = i = 1 n n i . It may then be of interest to investigate the potential statistical bias of H K ( P n ) and to construct confidence intervals for the population measure H K ( Π n ) for the population probability distribution Π n = ( π 1 , ... , π n ) .

7.1. Bias

Using bold letters to distinguish random variables from their sample values and expanding H K ( P n ) in Equation (16) into a Taylor series about p i = π i for i = 2 , ... , n , the following result is obtained:
H K ( P n ) = H K ( Π n ) + i = 2 n ( 1 π i i = 2 n π i + 1 ) ( p i π i ) + 1 4 i = 2 n ( π i 3 / 2 i = 2 n π i + π i 1 ) ( p i π i ) 2 + 1 4 i = 2 n j = 2 n i j   ( 1 π i π j ) ( p i π i ) ( p j π j ) + ...
Taking the expected value of each side of Equation (33) and using the well-known expectations E ( p i π i ) = 0 , E ( p i π i ) 2 = π i ( 1 π i ) / N , and E [ ( p i π i ) ( p j π j ) ] = π i π j / N for i, j = 2 , ... , n , it is found that:
E [ H K ( P n ) ] = H K ( Π n ) + 1 4 N [ ( i = 2 n π i ) i = 2 n 1 π i + n 1 ] + O ( N 3 / 2 )
Equation (34) shows that the estimator H K ( P n ) , while asymptotically unbiased, does have a small bias for finite sample size N. However, unless N is small, this bias can effectively be ignored for all practical purposes.

7.2. Confidence Interval Construction

Under multinomial sampling and based on the delta method of large sample theory (e.g., [37] (Chapter 14)), the following convergence to the normal distribution holds:
N [ H K ( P n ) H K ( Π n ) ] d N o r m a l ( 0 ,   σ 2 )
where, in terms of partial derivatives:
σ 2 = i = 1 n π i ( H K ( Π n ) π i ) 2 [ i = 1 n π i ( H K ( Π n ) π i ) ] 2
That is, for large N, H K ( P n ) is approximately normally distributed with mean H K ( Π n ) and variance Var [ H K ( P n ) ] = σ 2 / N or standard error S E = σ / N . The limiting normal distribution in Equation (35) also holds when, as is necessary in practice, the σ 2 is replaced with the estimated variance σ ^ 2 by substituting the sample estimates (proportions) p i for the population probabilities π i for i = 1, …, n, resulting in the estimated standard error S E = σ ^ / N . It then readily follows from Equations (16) and (36) that:
S E = N 1 / 2 { n [ H K ( P n ) i = 2 n p i ] + H K ( P n ) [ 1 H K ( P n ) ] } 1 / 2
As a simple numerical example, consider the sample distribution P 4 = ( 0.4 ,   0.3 ,   0.25 ,   0.05 ) based on sample size N = 100. From Equation (16), H K ( P 4 ) = 2.22 and, from Equation (37), S E = 0.19 . Therefore, because of Equation (35), an approximate 95% confidence interval for the population measure H K ( Π n ) becomes 2.22 ± 1.96 ( 0.19 ) , or [1.85, 2.59]. Statistical hypotheses such as H 0 : H K ( Π n ) = H K ( Π m ) versus H 1 : H K ( Π n ) > H K ( Π m ) can also be tested based on Equations (35) and (37).

8. Conclusions

Since the ubiquitous Boltzmann–Shannon entropy H is only valid for making size (order or “larger than”) comparisons, the entropy H K is being introduced as an alternative measure of randomness (uncertainty) that is more informative than H in the sense that H K can also be used for making valid difference comparisons as in Equations (2b) and (2c). The H K , which is a particular member of the family of entropies H K α and is basically a compromise between the members H K 1 and H L = H K ( ) , has the types of desirable properties one would reasonably expect of a randomness (uncertainty) measure. One of the differences between H K and H L is that small probabilities have a greater influence on H K than on H L . The addition of some small probability events causes a larger increase in H K than in H L , but causes a smaller decrease in the uniformity (evenness) index H K * than in H L * as defined in Equation (29).
Besides being computationally most simple, which is certainly a practical advantage, H K is also that member of H K α that appears to be most nearly linearly (and decreasingly) related to the Euclidean distance between the points P n = ( p 1 , ... , p n ) and P n 1 = ( 1 / n , ... , 1 / n ) or to the standard deviation s n of p 1 , ... , p n . The s n is the usual measure of variability (spread) for a set of data, although it is not resistant against “outliers” (extreme and suspect data points). However, “outliers” are not a concern when dealing with probabilities p i ( i = 1 , ... , n ) . Therefore, s n cannot justifiably be criticized for being excessively influenced by large or small p i ’s, with the same argument extending to H K .

Acknowledgments

The author would like to thank the reviewers for constructive and helpful comments.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Von Boltzmann, L. Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. In Sitzungsberichte der Kaiserliche Akademie der Wissenschaften, II Abteil; (Vol. 66, Pt. 2); K.-K. Hof- und Staatsdruckerei in Commission bei C. Gerold’s Sohn: Wien, Austria, 1872; pp. 275–370. (In German) [Google Scholar]
  2. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
  3. Klir, G.J. Uncertainty and Information: Foundations of a Generalized Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
  4. Ruelle, D. Chance and Chaos; Princeton University Press: Princeton, NJ, USA, 1991. [Google Scholar]
  5. Han, T.S.; Kobayashi, K. Mathematics of Information and Coding; American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]
  6. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  7. Arndt, C. Information Measures: Information and Its Description in Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  8. Kapur, J.N. Measures of Information and Their Applications; Wiley: New Delhi, India, 1994. [Google Scholar]
  9. Kvålseth, T.O. Entropy. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; Part 5; pp. 436–439. [Google Scholar]
  10. Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1961; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
  11. Peitgen, H.-O.; Jürgens, H.; Saupe, D. Chaos and Fractals: New Frontiers of Science, 2nd ed.; Springer: New York, NY, USA, 2004. [Google Scholar]
  12. Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  13. Norwich, K.H. Information, Sensation, and Perception; Academic Press: San Diego, CA, USA, 1993. [Google Scholar]
  14. Aczél, J.; Daróczy, Z. On Measures of Information and Their Characterizations; Academic Press: New York, NY, USA, 1975. [Google Scholar]
  15. Kvålseth, T.O. Entropy evaluation based on value validity. Entropy 2014, 16, 4855–4873. [Google Scholar] [CrossRef]
  16. Hand, D.J. Measurement Theory and Practice; Wiley: London, UK, 2004. [Google Scholar]
  17. Kvålseth, T.O. The Lambda distribution and its applications to categorical summary measures. Adv. Appl. Stat. 2011, 24, 83–106. [Google Scholar]
  18. Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
  19. Bullen, P.S. A Dictionary of Inequalities; Addison Wesley Longman: Essex, UK, 1998. [Google Scholar]
  20. Aczél, J. Lectures on Functional Equations and Their Applications; Academic Press: New York, NY, USA, 1966. [Google Scholar]
  21. Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: Cambridge, UK, 1934. [Google Scholar]
  22. Ebanks, B. Looking for a few good means. Am. Math. Mon. 2012, 119, 658–669. [Google Scholar]
  23. Morales, D.; Pardo, L.; Vajda, I. Uncertainty of discrete stochastic systems: General theory and statistical inference. IEEE Trans. Syst. Man Cyber Part A 1996, 26, 681–697. [Google Scholar] [CrossRef]
  24. Patil, G.P.; Taillie, C. Diversity as a concept and its measurement. J. Am. Stat. Assoc. 1982, 77, 548–567. [Google Scholar] [CrossRef]
  25. Kvålseth, T.O. Coefficients of variation for nominal and ordinal categorical data. Percept. Mot. Skills 1995, 80, 843–847. [Google Scholar] [CrossRef]
  26. Kvålseth, T.O. Variation for categorical variables. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; Part 22; pp. 1642–1645. [Google Scholar]
  27. Kvålseth, T.O. Cautionary note about R2. Am. Stat. 1985, 39, 279–285. [Google Scholar]
  28. Nawrocki, D.; Carter, W. Industry competitiveness using Herfindahl and entropy concentration indices with firm market capitalization data. Appl. Econ. 2010, 42, 2855–2863. [Google Scholar] [CrossRef]
  29. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Match. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  30. Lin, J. Divergence measures based on Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
  31. Wong, A.K.C.; You, M. Entropy and distance of random graphs with application to structural pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1985, PAMI-7, 599–609. [Google Scholar] [CrossRef]
  32. Sagar, R.P.; Laguna, H.G.; Guevara, N.L. Electron pair density information measures in atomic systems. Int. J. Quantum Chem. 2011, 111, 3497–3504. [Google Scholar] [CrossRef]
  33. Antolin, J.; Angulo, J.C.; López-Rosa, S. Fisher and Jensen–Shannon divergences: Quantitative comparisons among distributions. Application to position and momentum atomic densities. J. Chem. Phys. 2009, 130, 074110. [Google Scholar] [CrossRef] [PubMed]
  34. Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef] [Green Version]
  35. Shannon, C.E. The bandwagon. IRE Trans. Inf. Theory 1956, 2, 3. [Google Scholar] [CrossRef]
  36. Weiner, N. What is information theory? IRE Trans. Inf. Theory 1956, 2, 48. [Google Scholar] [CrossRef]
  37. Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis: Theory and Practice; MIT Press: Cambridge, MA, USA, 1975. [Google Scholar]
Table 1. Values of H K , H L , and D in Equations (16), (30b), and (31) and of their corresponding uniformity (evenness) indices from Equation (29) based on 30 randomly generated probability distributions.
Table 1. Values of H K , H L , and D in Equations (16), (30b), and (31) and of their corresponding uniformity (evenness) indices from Equation (29) based on 30 randomly generated probability distributions.
Data Setn H K H L D H K * H L * D *
1162.171.283.890.140.090.26
21911.1410.3211.220.620.570.62
330.960.820.970.480.410.49
4184.283.215.050.250.190.15
5151.841.402.080.130.100.15
61512.6511.4212.570.900.820.90
71310.379.9310.340.860.830.86
8154.133.013.760.300.220.27
975.384.835.250.900.810.88
1040.780.650.860.260.220.29
11121.671.233.830.150.110.35
121714.7114.5614.710.920.910.92
13144.093.884.100.310.300.32
1486.045.395.950.860.770.85
151710.019.0310.130.630.560.63
1650.570.410.740.140.100.19
17104.853.825.100.540.420.57
1851.000.731.270.250.180.32
1952.051.162.090.510.400.52
201912.0011.0512.060.670.610.67
211913.7813.6413.780.770.760.77
22176.936.337.400.430.400.46
23128.167.718.170.740.700.74
242013.5611.9813.830.710.630.73
25173.473.073.580.220.190.22
26131.150.722.290.100.060.19
271410.699.6610.650.820.740.82
28121.681.661.680.150.150.15
29209.726.8811.090.510.360.58
3095.234.205.360.650.530.67
Table 2. Comparative results for H in Equation (1) and H K in Equation (16) and their normalized forms as well as D * from Equation (31) for various probability distributions.
Table 2. Comparative results for H in Equation (1) and H K in Equation (16) and their normalized forms as well as D * from Equation (31) for various probability distributions.
P n ( i ) H K H K * H H * D *
P 5 ( 1 ) = P 5 0.75 = ( 0.40 ,   0.15 ,   0.15 ,   0.15 ,   0.15 ) 3.000.751.500.930.75
P 5 ( 2 ) = P 5 0.25 = ( 0.80 ,   0.05 ,   0.05 ,   0.05 ,   0.05 ) 1.000.250.780.480.25
P 10 ( 3 ) = ( 0.26 ,   0.15 ,   0.15 ,   0.10 , ... ,   0.03 ) 6.870.762.090.910.77
P 14 ( 4 ) = ( 0.27 ,   0.25 ,   0.15 ,   0.07 , ... ,   0.01 ) 8.220.632.100.800.67
P 4 ( 5 ) = ( 0.40 ,   0.30 ,   0.20 ,   0.10 ) 2.320.771.280.920.74
P 5 ( 6 ) = ( 0.50 ,   0.20 ,   0.10 ,   0.10 ,   0.10 ) 2.450.611.360.840.61
P 5 ( 7 ) = ( 0.70 ,   0.10 ,   0.10 ,   0.05 ,   0.05 ) 1.470.371.010.630.37
P 5 ( 8 ) = ( 0.30 ,   0.20 ,   0.20 ,   0.20 ,   0.10 ) 3.450.861.560.970.84

Share and Cite

MDPI and ACS Style

Kvålseth, T.O. On the Measurement of Randomness (Uncertainty): A More Informative Entropy. Entropy 2016, 18, 159. https://doi.org/10.3390/e18050159

AMA Style

Kvålseth TO. On the Measurement of Randomness (Uncertainty): A More Informative Entropy. Entropy. 2016; 18(5):159. https://doi.org/10.3390/e18050159

Chicago/Turabian Style

Kvålseth, Tarald O. 2016. "On the Measurement of Randomness (Uncertainty): A More Informative Entropy" Entropy 18, no. 5: 159. https://doi.org/10.3390/e18050159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop