On the Measurement of Randomness (Uncertainty): A More Informative Entropy

Abstract: As a measure of randomness or uncertainty, the Boltzmann–Shannon entropy H has become one of the most widely used summary measures of a variety of attributes (characteristics) in different disciplines. This paper points out an often overlooked limitation of H: comparisons between differences in H-values are not valid. An alternative entropy HK is introduced as a preferred member of a new family of entropies for which difference comparisons are proved to be valid by satisfying a given value-validity condition. The HK is shown to have the appropriate properties for a randomness (uncertainty) measure, including a close linear relationship to a measurement criterion based on the Euclidean distance between probability distributions. This last point is demonstrated by means of computer generated random distributions. The results are also compared with those of another member of the entropy family. A statistical inference procedure for the entropy HK is formulated.


Introduction
For some probability distribution P n " pp 1 , ..., p n q, with p i ě 0 for i = 1, . . . , n and n ř i"1 p i " 1, the entropy HpP n q, or simply H, is defined by: H "´n ÿ i"1 p i log p i P r0, log ns (1) where the logarithm is the natural (base-e) logarithm. The probabilities p i pi " 1, ..., nq may be associated with a set of quantum states of a physical system in statistical mechanics or physics, a set of symbols or messages in a communication system, or, most generally, a set of mutually exclusive and exhaustive events. First used by Boltzmann [1] in statistical mechanics (as kH with k being the so-called Boltzmann constant) and later introduced by Shannon [2] as the basis for information theory (with base-2 logarithm and bits as the unit of measurement), this entropy H can appropriately be called the Boltzmann-Shannon entropy. Although interpreted in a number of different ways, the most common and general interpretation of H is as a measure of randomness or uncertainty of a set of random events (e.g., [3] (pp. 67-97), [4], [5] (Chapter 2), [6] (pp. 12, 13, 90)). A number of alternative entropy formulations have been proposed as parameterized generalizations of H in Equation (1) (see, e.g., [7][8][9]), but with limited success or impact. The most notable exception is the following one-parameter family of entropies by Rényi [10]: Di f f erence comparisons : MpP n q´MpQ m q ą MpR t q´MpS u q (2b) Proportional di f f erence comparisons : MpP n q´MpQ m q " crMpR t q´MpS u qs (2c) where c is a constant. While, because of the properties of H in Equation (1), there is no particular reason to doubt the validity of the size comparison in Equation (2a) involving H, the difference comparisons in Equations (2b) and (2c) are not valid for H as discussed below. In this paper, an alternative and equally simple entropy is introduced as: The term entropy is used for this measure of randomness or uncertainty since (a) it has many of the same properties as H in Equation (1) and (b) the entropy term has been used in such a variety of measurement situations for which H K can similarly be used. As is established in this paper, H K has the important advantage of being more informative than H in the sense that H K meets the conditions for valid difference comparisons as in Equations (2b) and (2c). It will also be argued that the H K is the preferred member of a family of entropies with similar properties. Statistical inference procedure for H K will also be outlined.

Conditions for Valid Difference Comparisons
Consider that M is a measure of randomness (uncertainty) such that its value MpP n q for any probability distribution P n " pp 1 , ..., p n q is bounded as: MpP 0 n q ď MpP n q ď MpP 1 n q where P 0 n and P 1 n are the degenerate and uniform distributions P 0 n " p1, 0, ..., 0q, P 1 n " p1{n, ..., 1{nq (5) and where one can set MpP 0 n q " 0. In order for the difference comparisons in Equations (2b) and (2c) to be permissible or valid, some condition needs to be imposed on M (see [15]). Specifically, all intermediate values MpP n q in Equation (4) have to provide numerical representations of the extent of randomness (uncertainty) that are true or realistic with respect to some acceptable criterion. While different types of validity are used in measurement theory [16] (pp. 129-134), value validity will be used here for this required property of M. In order to establish specific requirements for M to have value validity, a particular probability distribution proves useful and Euclidean distances will be used as a criterion.
Therefore, consider the recently introduced lambda distribution: where λ is a uniformity (evenness) parameter and with P 0 n and P 1 n in Equation (5) being special (extreme) cases [17]. This P λ n is simply the following weighted mean of P 0 n and P 1 n : From Equations (4) and (6), it follows that, for any given P n : MpP n q " MpP λ n q for one unique λ (8) so that validity conditions on MpP n q can equivalently be determined in terms of MpP λ n q. With probability distributions considered as points (vectors) in n-dimensional space and with the Euclidean distance function d being the basis for a validity criterion, the following requirement seems most natural and obvious [17]: Since MpP 0 n q " 0, i.e., there is no randomness when one p i " 1, it follows from Equation (9) that: as a value-validity condition. This condition also follows immediately from Equation (7) as: MpP λ n q " MrλP 1 n`p 1´λqP 0 n s " λMpP 1 n q`p1´λqMpP 0 n q which equals Equation (10) for MpP 0 n q " 0. For the case when λ " 0.5, P 0.5 n is the midpoint of P 0 n and P 1 n with coordinates p1{2`1{2n, 1{2n, ..., 1{2nq. Then: which is exactly as stated in Equations (10) and (11) with λ " 0.5. Of course, Equations (12) and (13) represent a weaker value-validity condition than Equations (10) and (11). Note also that it is not assumed a priori that M is a linear function of λ. This linearity is a consequence of Equations (7)- (9). The entropy H in Equation (1) with HpP 0 n q " 0 and HpP 1 n q " log n does not meet these validity conditions. For example, for n = 2 and λ " 0.5, HpP 0.5 2 q " Hp0.75, 0.25q " 0.56, which far exceeds the requirement log2/2 = 0.35 in Equation (13). Similarly, HpP 0.5 4 q " Hp0.625, 0.125, 0.125, 0.125q " 1.07 ąą log 4{2 " 0.69 and HpP 0.5 20 q " 2.09 ąą log 20{2 " 1.50. It can similarly be verified that HpP λ n q for all n and 0 ă λ ă 1, and hence HpP n q for all P n from Equation (8), overstates the true or realistic extent of the randomness (uncertainty) that H is supposed to measure. Consequently, difference comparisons as in Equations (2b) and (2c) based on H are invalid. An alternative measure that meets the validity conditions for such comparisons will be introduced next.

Derivation of H K
The logic or reasoning behind the H K in Equation (3) as a measure of randomness or uncertainty may be outlined as follows: (1) As a matter of convenience and as used throughout the rest of the paper, all individual probabilities will be considered ordered such that: (2) Due to the constraint that Since an entropy measure needs to be zero-indifferent (expansible), i.e., unaffected by the addition of events with zero probabilities (e.g., [14] (Chapter 1)), a logical choice of pairwise means would be the geometric means ? p i p j for all i, j = 2, . . . , n (since obviously a p i 0 " 0). Therefore, the measure consisting of the means ? p i p j , including those for i = j, can be expressed as: where the multiplication factor 2 is included so that H K pP 1 n q = H(1/n, . . . ,1/n) = n´1 instead of (n´1)/2. With p 1 being the modal (largest) probability, this H K in Equation (15) is twice the sum of the pairwise geometric means of all the non-modal probabilities. Furthermore, from the fact that, for a set of numbers tx i u ,ˆn ř i"2 x i˙2 " n ř i"2 n ř j"2 x i x j and then setting x i " ? p i , it follows from the second expression in Equation (15) that: which is the same as the formula in Equation (3).
As an alternative approach, one could begin by considering the power sum, or sum of order α, {α (e.g., [18] (pp. 138-139). Strict Schur-concavity, which is discussed below as an important property of an entropy and one that H in Equation (1) has, requires that the parameter α ă 1 [18] (pp. 138-139). Since p i ě 0 (i = 1, . . . , n), a further restriction is that α be positive and hence 0 ă α ă 1 for the power sum S α . In order for S α to comply with the value-validity condition in Equation (11), it is clear that S α can only be the power sum of the non-modal probabilities so that: S α pp 2 , ..., p n q "˜n with S α pP 0 n q " 0 and S α pP 1 n q " pn´1q 1{α {n for the probability distributions in Equation (5). A reasonable upper bound would be S α pp 1 n q " n´1. This requirement is met for α " 1{2 in Equation (17) and by the addition of n ř i"2 p i , resulting in: S 1{2 pp 2 , ..., p n q`S 1 pp 2 , ..., p n q "˜n which is the same as Equation (16) and for which H K pP 1 n q " n´1.

Properties of H K
The properties of H K in Equation (16), some of which are readily apparent from its definition, can be outlined as follows: Property 1. H K is a continuous function of all its individual arguments p 1 , ..., p n .

Property 6.
H K is strictly Schur-concave so that, if P n " pp 1 , ..., p n q is majorized by Q n " pq 1 , ..., q n q (denoted by ă): with strict inequality unless P n is simply a permutation of Q n . Property 7. H K is concave, but not strictly concave. (10) with H K pP λ n q " λpn´1q.

Property 8. H K meets the value-validity condition in Equation
Proof of Property 6. The strict Schur-concavity of H K "ˆn ř i"2 ? p i˙2`1´p1 follows immediately from the partial derivatives: .., n and the fact that BH K {B p i is strictly increasing in i = 1, . . . , n (unless p i " p i`1 ) ( [18] (p. 84)). The majorization P n ă Q n in Equation (20) is a more precise statement than the vague notion that the components of P n are "more nearly equal" or "less spread out" than are those of Q n . By definition, if q i (and with the ordering in Equation (14) for P n and Q n ): Proof of Property 7. From Equation (16) and for the probability distributions P n and Q n and all λ P r0, 1s: From Minkowski's inequality (e.g., [19] (p. 175)): so that, from Equations (21) and (22): proving that H K is concave. However, and importantly, H K is not strictly concave since the inequality in Equation (23) is not strict for all P n and Q n such as for P n " P 1 n and Q n " P 0 n in Equation (5) when as required by the value-validity conditions in Equations (10) and (11).
‚ Note 1: If a measure (function) M is strictly concave so that, instead of Equation (23), the inequality MrλP n`p 1´λqQ n s ą λMpP n q`p1´λqMpQ n q is strict for all P n , Q n , and λ P p0, 1q, then the condition in Equation (11) cannot be met. The H in Equation (1) is one such measure.

Note 2:
The extremal values H K`P 1 n˘" H K p1{n, . . . , 1{nq " n´1 for a measure of randomness or uncertainty is also a logical requirement for valid difference comparisons. As a particular case of the proportional difference comparisons in Equation (2c), and for any integer m < n: i.e., adding an amount m to n results in the same absolute change in the value of H K as does subtracting m from n in the equiprobable case. Or, in terms of the function f where H K pP 1 n q " f pnq, Equation (25) can be expressed more conveniently as: The general solution of the functional equation in Equation (26) is f pnq " an`b with real constants a and b [20] (p. 82), which equals H K pP 1 n q for a = 1 and b =´1. ‚ Note 3: For the binary case of n = 2, H K pP 1 2 q " H K p0.5, 0.5q " 1, which equals H(0.5, 0.5) in Equation (1) if the base-2 logarithm is used. In fact, H(0.5, 0.5) = 1 is an axiom or required property, the normalization axiom, frequently used in information theory to justify the use of the base-2 logarithm in Equation (1) and bits as the unit of measurement [14] (Chapter 1). The binary entropy H K p1´p, pq " 2p for p ď 0.5 or: (15), one could consider power means, or arithmetic means of order α, and hence the following parameterized family of entropies:

Instead of the pairwise geometric means in Equation
of which H K in Equations (15) and (16) is the particular member H K0 as α Ñ 0 . Since a measure of randomness (uncertainty) should be zero-indifferent (see Property 3 of H K ), it is clear from the formula in Equation (27) that α cannot be positive, i.e., α ď 0 where α " 0 means the limit when α Ñ 0 . If p i " 0, p j " 0, or p i " p j " 0, M α pp i , p j q is taken to be 0 for α ď 0 (see, e.g., [21] (Chapter 2) for the properties of such power means). One of the important properties of M α pp i , p j q is that it is a non-decreasing function of α pfor´8 ă α ă 8q and is strictly increasing unless p i " p j . Besides this M α , there are other types of means that could be considered (e.g., [18] (pp. 139-145), [22]). Since M α pp i , p j q is strictly increasing in α pif p i ‰ p j q, it follows from Equation (27) that, for any probability distribution P n " pp 1 , ..., p n q: where the lower limit H L pP n q " H Kp´8q pP n q is the limit of H Kα as α Ñ´8 and H K pP n q is defined in Equations (15) and (16) and is the limit of H Kα pP n q as α Ñ 0 . The inequalities in Equation (28) are strict unless P n equals P 0 n or P 1 n in Equation (5). Each member of H Kα has the same types of properties as those of H K discussed above. The strict Schur-concavity of H Kα follows from the fact that (a) H Kα is (permutation) symmetric in the p i pi " 1, ..., nq and (b) the partial derivatives, after setting n ř i"2 p i " 1´p 1 in Equation (27): .., n are clearly strictly increasing in i = 1, . . . , n (for p i ą p i`1 ) for all α P p´8, 1q. The case when α Ñ 0 was proved in the preceding subsection. As with any reasonable measure of randomness or uncertainty, each member of H Kα in Equation (27) is a compound measure consisting of two components: the dimension of the distribution or vector P n and the uniformity (evenness) with which the elements of P n are distributed. For any probability distribution P n " pp 1 , ..., p n q, this fact can be most simply represented by: H Kα pP n q " H Kα pP 1 n qHK α pP n q, HK α pP n q P r0, 1s where H Kα pP 1 n q " n´1 for the uniform distribution P 1 n in Equation (5) and where HK α pP n q " H Kα pP n q{pn´1q reflects the uniformity (evenness) of P n . The HK α basically controls for n. For the distribution in Equation (6), HK α pP λ n q " λ. The limiting member of H L in Equation (28) as α Ñ´8 is defined by: where the expression in Equation (30b) can easily be seen to follow directly from Equation (30a) (remembering again the order in Equation (14)). The second expression in Equation (30a) has been briefly mentioned by Morales et al. [23] and the form in Equation (30b), divided by 2, has been suggested by Patil and Taillie [24] as one potential measure of diversity.

Why the Preference for H K
From a practical point of view, what sets H K in Equation (3) or Equation (16) apart from any other member of the family H Kα in Equation (27) is its ease of computation. Its values can easily be computed on a hand calculator for any probability distribution P n even when the dimension n is quite large. For other members of H Kα , pn´1q 2 pairwise means have to be computed, which becomes practically impossible without the use of a computer program even when n is not large. The computational effort for the member H L (when α Ñ´8 ) is somewhat less than for other members. Nevertheless, the apparently simpler formula for H L in Equation (30b) requires that all p i be ordered as in Equation (14), which can be very tedious if done manually and nearly impossible if n is large.
The H K is also favored over other members of H Kα when considering the agreement with some other measure based on Euclidean distance and the familiar standard deviation. Specifically, for any probability distribution P n " pp 1 , ..., p n q and P 1 n " p1{n, ..., 1{nq, consider the following linearly decreasing function of the distance dpP n , P 1 n q: where s n is the standard deviation of p 1 , ..., p n (with devisor n rather than n -1) and CNV is the coefficient of nominal variation [25,26]. It is clear from Equation (31) that, for P 0 n and P 1 n in Equation (5), DpP 0 n q " 0 and DpP 1 n q " n´1. Also, for the lambda distribution in Equation (6), DpP λ n q " λDpP 1 n q " λpn´1q, so that D satisfies the value-validity condition in Equation (10). Of course, D is not zero-indifferent (see Property 3 for H K ).
Since the Euclidean distance and the standard deviation are such universally used measures, it is to be expected that an acceptable measure of randomness (uncertainty) should not differ substantially from D in Equation (31). From numerical examples, it is seen that values of H K in Equation (16) tend to be closer to those of D in Equation (31) than are the values of any other member of the H Kα family in Equation (27). In order to demonstrate this fact, a computer simulation was used to generate a number of random distributions using the following algorithm. For each randomly generated probability distribution P n " pp 1 , ..., p n q, n was first generated as a random integer between 3 and 20, inclusive. Then, with the ordering in Equation (14), each p i was generated as a random number (to 5 decimal places) within the following intervals:  (16), (30b) and (31) as were their corresponding uniformity (evenness) indices from Equation (29). After excluding some (five) distributions P n that were nearly equal to P 0 n or P 1 n in Equation (5), the results for 30 different distributions are summarized in Table 1. It is apparent from the data in Table 1 that H K agrees quite closely with D and clearly more so than does H L . Exceptions are Data Sets 1, 11, and 26 when the H K -values differ considerably from those of D, but still less so than do the H L -values. If D is used to predict H K (i.e., for the fitted modelĤ K " D), it is found for the 30 data sets in Table 1 that the coefficient of determination, when properly computed as R 2 " 1´ř pH K´D q 2 { ř pH K´HK q 2 [27], becomes R 2 " 0.98 (i.e., 98% of the variation of H K is explained by the fitted modelĤ K " D) as compared to R 2 " 0.91 in the case of H L . Also, the root mean square (RMS) of the differences between the values of H K and D is found to be 0.64 as compared to 1.  (27) is generally in as close agreement with D as is H K , but more so than H L . This can be explained by the fact that (a) whenever there is a notable difference between the values of H K and D, those of H K tend to be less than those of D as seen from Table 1; and (b) H Kα pP n q is a strictly increasing function of α for any given P n other than P 0 n and P 1 n in Equation (5).

Comparative Weights on p 1 , ..., p n
The difference between values of H K and H L as demonstrated in Table 1, or between any of the members of the H Kα family, is due to the fact that H Kα places different weights or emphases on the p i (i = 1, . . . , n) depending upon α. When considering each pairwise mean M α pp i , p j q in Equation (27), p i and p j are weighted equally only when α " 1. Then, since (a) M α pp i , p j q is strictly increasing in α pfor p i ‰ p j q and (b) H Kα is zero-indifferent (Property 3 of H K ) only for α ď 0, the H K0 " H K in Equations (15) and (16) is the zero-indifferent member of H Kα that is always closest in value to H K1 and whose pairwise means M 0 pp i , p j q are always closest to M 1 pp i , p j q for all i and j.
Besides the weights placed on each component of all pairs pp i , p j q, the weights given to each individual p i pi " 2, ..., nq can also be examined by expressing the H Kα in Equation (27) as the following weighted sum: which shows that the weights w αi pi " 2, ..., nq are increasing in both α and i. In the case of When comparing H K and H L , small p i 's are given more weight by H K than by H L and the addition of low probability components to a set of events has more effect on H K than on H L . However, when weighting the pros and cons of such relative sensitivity to small p i 's, it is important to keep in mind the relationship in Equation (29) and not jump to conclusion. For example, when going from P 4 " p0.40, 0.30, 0.20, 0.10q to Q 6 " p0.40, 0.30, 0.20, 0.05, 0.04, 0.01q, H K increases from H K pP 4 q " 2.32 to H K pQ 6 q " 2.91, a 25% increase, while H L pP 4 q " 2.00 and H L pQ 6 q " 2.12, a 6% increase. However, from Equation (29), the dimensional component of both H K and H L increased by 67% (from n´1 = 3 to n´1 = 5) whereas the uniformity (evenness) components decreased by 25% in the case of H K (from 2.32/3 to 2.91/5) and 37% for H L (from 2.00/3 to 2.12/5). In this regard the 25% increase in randomness (uncertainty) as measured by H K does not appear unreasonable.

Inconsistent Orderings
Although all members of the family H Kα in Equation (27) have the same types of properties, including the value-validity property in Equation (10), this does not necessarily imply that different members will always produce the same results for the comparisons in Equation (2). Such lack of consistency is inevitable whenever measures are used to summarize data sets into a single number. However, as stated by Patil and Taillie [24] (p. 551), "Inconsistent measures . . . are a familiar problem and should not be a cause for undue pessimism", pointing out the fact that, for instance, the arithmetic mean and the median are not consistent measures of average (central tendency) and the standard deviation and the mean absolute deviation are inconsistent measures of variability (spread). One type of consistent results for all members of H Kα is the size (order) comparison H Kα pP n q ą H Kα pQ n q in Equation (2a) whenever P n is majorized by Q n and P n is not a permutation of Q n . This is the result of Equation (20) and the fact, as proved above, that H Kα is strictly Schur-concave for all α P p´8, 1q.
It is only when two measures M 1 and M 2 have a perfect linear relationship that (a) the comparison results from Equation (2) will always be consistent and (b) the compliance by M 1 with the value-validity conditions in Equations (10) and (11) also implies compliance by M 2 . In the case of H K and H L , and from the simulation results in Table 1, Pearson's correlation coefficient between H K and H L is found to be r = 0.993, indicating a near perfect linear relationship between H K and H L . However, since the linearity is not truly perfect or exact, H K and H L will not always give the same results for the comparisons in Equation (2) as is evident from some of the data in Table 1.

Discussion
The value-validity condition in Equation (10) as a necessary requirement for valid difference comparisons as in Equation (2) is based on Euclidean distances. Such distances are also being used as a basis for the preference of H K over other potential members of the family of entropies in Equation (27). This distance metric is the standard one in engineering and science. The use of any other "distance" measures, such as directed divergencies discussed below, would seem to require particular justification in the context of value-validity assessment.
As a simple numerical example illustrating the reasoning behind the value-validity arguments in Equations (6)- (13) and the use of Euclidean distances, consider the following probability distributions based on P λ n in Equation (6): The Euclidean distances d`P 0.5 5 , P 0 5˘" d`P 0.5 5 , P 1 5˘a ndˇˇp iˇf or i " 1, ..., 5. A measure of uncertainty (randomness) M that takes on reasonable numerical values within the general bounds MpP 0 n q and MpP 1 n q should in this example satisfy the equalityˇM pP 0.5 5 q´MpP 0 5 qˇˇ"ˇˇMpP 0.5 5 q´MpP 1 5 qˇˇso that, with MpP 0 5 q " 0, MpP 0.5 5 q " MpP 1 5 q{2. That is, since P 0.5 5 is the same distance from P 0 5 as it is from P 1 5 and each element of P 0.5 5 is the same distance from the corresponding element of P 0 n as it is from that of P 1 5 , M would reflect this fact by taking on the value MpP 0.5 5 q " MpP 1 5 q{2. The H K in Equation (3) or Equation (16) meets this requirement with H K pP 0.5 5 q " H K pP 1 5 q{2 " p5´1q{2 " 2. However, in the case of H in Equation (1), HpP 0.5 5 q " 1.23 ąą HpP 1 5 q{2 " 0.80, a substantial overstatement of the extent of the uncertainty or randomness.
A similar comparison between H K and H for P λ 5 with λ " 0.25 and λ " 0.75 is given in Table 2 together with the results from some other probability distributions. The results are also given in terms of the normalized measures HKpP n q " H K pP n q{pn´1q and H˚pP n q " HpP n q{log n as well as D˚pP n q " DpP n q{pn´1q for D in Equation (31). As seen from Table 2, while HKpP λ 5 q " D˚pP λ 5 q " λ, H˚pP λ 5 q ąą λ for both λ-values. For all distributions in Table 2, the values of HK are quite comparable to those of D˚, but those of H˚are all considerably greater. The distributions P p3q 10 -P p8q 5 are included in Table 2 to exemplify the types of contradictory results that may be obtained when making the difference comparisons in Equation (2) Table 2 are real data for the market shares (proportions) of the carbonated soft drinks industry in the U.S. and the world-wide market shares of cell phones, respectively (obtained by Googling "market shares" by industries). Some of the smaller market shares are not given in Table 2 because of space limitation, but were included in the computations. The H, which has been used as a measure of market concentration or rather of its converse, deconcentration (e.g., [28]), would indicate that these two industries have nearly the same market deconcentration. By contrast, when considered in terms of H K for which such comparison is valid because of the value-validity property of H K , the results in Table 2 show that the cell-phone industry is about 20% more deconcentrated than the soft-drink industry. Similarly, for the fictitious distributions P  5¯" 0.35. Instead of using the Euclidean metric to formulate the value-validity conditions in Section 2, one could perhaps consider other potential "distance" measures such as divergencies, also referred to as "statistical distances". The best known such measure of the divergence of the distribution P n " pp 1 , ..., p n q from the distribution Q n " pq 1 , ..., q n q is the Kullback-Leibler divergence [29] defined as: his measure is directional or asymmetric in P n and Q n . A symmetric measure is the so-called Jensen-Shannon divergence (JSD) (e.g., [30][31][32][33]), which can be expressed in terms of the Kullback-Leibler divergence (KLD) as: JSDpP n , Q n q " 1 2 KLDpP n : M n q`1 2 KLDpQ n : M n q where M n " pP n`Qn q{2. Neither KLD nor JSD are metrics, but a JSD is [34]. Consider now the family of distributions P λ n in Equation (6) and the extreme members of P 0 n and P 1 n in Equation (5). For the case of n = 5, for example, it is found that KLDpP 0 5 : P 0.5 5 q " 0.51 (KLDpP 0.5 5 : P 0 5 q is undefined) and KLDpP 1 5 : P 0.5 5 q " 0.33. In the case of JSD, M 5 pP 0.5 5 , P 0 5 q " p0.8, 0.05, ..., 0.05q so that JSDpP 0.5 5 , P 0 5 q " 0.16. Similarly, M 5 pP 0.5 5 , P 1 5 q " p0.40, 0.15, ..., 0.15q and JSDpP 0.5 5 , P 1 5 q " 0.09. These results differ greatly from those based on Euclidean distances for which d`P 0.5 5 , P 0 5˘" d`P 0.5 5 , P 1 5˘.
The fact that d`P 0.5 n , P 0 n˘" d`P 0.5 n , P 1 n˘f or all n, which is also reflected by the normalized HKpP 0.5 n q " 0.5, corresponds to the fact that each component of P 0.5 n is of equal distance from the corresponding components of P 0 n and P 1 n . However, no such correspondence exists for the divergence measures KLD and JSD.
The derivation of H K in Equation (3) or Equation (16)  what the result would be if a different p i were to be excluded. If the smallest p i is excluded, the measure would not be zero-indifferent (expansible). If any p i other than p 1 is excluded, then the measure would not be strictly Schur-concave as can be verified from the proof of Property 6 of H K . This property is essential for any measure of uncertainty (randomness). In fact, the exclusion of p 1 makes H K unique in this regard. It should also be emphasized that even though the entropy H in Equation (1) lacks the value-validity property, it has many of the same properties as H K and has undoubtedly numerous useful and appropriate applications as demonstrated in the extensive published literature. The problems with H arise when it is used uncritically and indiscriminately in fields far from its origin: a statistical concept of communication theory. Both Shannon [35] and Wiener [36] cautioned against such uncritical applications. It is when H or its normalized form H˚is used as a summary measure (statistic) of various attributes (characteristics) and when its values are interpreted and compared that its lack of the value-validity property can lead to incorrect and misleading results and conclusions.

Conclusions
Since the ubiquitous Boltzmann-Shannon entropy H is only valid for making size (order or "larger than") comparisons, the entropy H K is being introduced as an alternative measure of randomness (uncertainty) that is more informative than H in the sense that H K can also be used for making valid difference comparisons as in Equations (2b) and (2c). The H K , which is a particular member of the family of entropies H Kα and is basically a compromise between the members H K1 and H L " H Kp´8q , has the types of desirable properties one would reasonably expect of a randomness (uncertainty) measure. One of the differences between H K and H L is that small probabilities have a greater influence on H K than on H L . The addition of some small probability events causes a larger increase in H K than in H L , but causes a smaller decrease in the uniformity (evenness) index HK than in HL as defined in Equation (29).
Besides being computationally most simple, which is certainly a practical advantage, H K is also that member of H Kα that appears to be most nearly linearly (and decreasingly) related to the Euclidean distance between the points P n " pp 1 , ..., p n q and P 1 n " p1{n, ..., 1{nq or to the standard deviation s n of p 1 , ..., p n . The s n is the usual measure of variability (spread) for a set of data, although it is not resistant against "outliers" (extreme and suspect data points). However, "outliers" are not a concern when dealing with probabilities p i pi " 1, ..., nq. Therefore, s n cannot justifiably be criticized for being excessively influenced by large or small p i 's, with the same argument extending to H K .