Entropy Evaluation Based on Value Validity

Besides its importance in statistical physics and information theory, the Boltzmann-Shannon entropy S has become one of the most widely used and misused summary measures of various attributes (characteristics) in diverse fields of study. It has also been the subject of extensive and perhaps excessive generalizations. This paper introduces the concept and criteria for value validity as a means of determining if an entropy takes on values that reasonably reflect the attribute being measured and that permit different types of comparisons to be made for different probability distributions. While neither S nor its relative entropy equivalent S* meet the value-validity conditions, certain power functions of S and S* do to a considerable extent. No parametric generalization offers any advantage over S in this regard. A measure based on Euclidean distances between probability distributions is introduced as a potential entropy that does comply fully with the value-validity requirements and its statistical inference procedure is discussed.


Introduction
Consider that 1 1 ,..., , with 1, are the probabilities of a set of n quantum states accessible to a system or of a set of n mutually exclusive and exhaustive events of some statistical experiment.Thus, p i is the probability of the system being in state i or of event i occurring ( i = 1,..., n ).The entropy (of the system or set of events) is then defined as: where k is some positive constant and where the logarithm is the natural one.In statistical mechanics, k may be Boltzmann's constant, while, in information theory, k = 1/ log2 so that S = − p i log 2 p i i=1 n  and the unit of measurement becomes bits as introduced by Shannon [1].When deriving Equation (1) axiomatically from some basic required properties (axioms), k becomes an arbitrary constant (e.g., [2,3]).For convenience, we shall set k = 1 throughout this paper.The entropy S, which provides a link between statistical mechanics and information theory, is interpreted somewhat differently in the two fields.In statistical mechanics, entropy is often considered to be a measure of the disorder of a system, although it may be argued that a more appropriate measure of disorder is the following dimensionless relative entropy [3] (pp. 366-357): In information theory, S is typically interpreted as a measure of the uncertainty, information content, or randomness of a set of events, while S * in ( 2) is considered as a measure of efficiency of a noise-free communication channel and 1− S * as a measure of its redundancy [4] (pp. 109-110).
Boltzmann [5] had used the function S in Equation (1) (or its continuous analog), but what Shannon [1] "did was to give a universal meaning to the function − p i  log p i and thereby make it possible to find other applications [6] (p. 476)".This function has indeed proved to be remarkably versatile and used as a measure of a variety of attributes in various fields of study, ranging from ecology (e.g., [7]) to psychology (e.g., [8]).It has also resulted in literally infinitely many alternative entropy formulations and generalizations such as the parameterized families of entropies given in Table 1 and for each of which the S in Equation ( 1) is a particular member.The real utility or contributions of those generalization efforts may be questioned, with some calling them "mindless curve-fitting" and stating that "The ratio of papers to ideas has gone to infinity" [9].
This paper is concerned with the use and misuse of S and S * in Equations ( 1) and ( 2) and other proposed entropies.Whatever an entropy measure is being used for, it is not uncommon for comparisons to be made between differences in entropy values and for statements or implications to occur about the absolute and relative values of the attributes (characteristics) being measured by means of the entropy.This can lead to incorrect and misleading results and conclusions unless certain conditions are met as discussed in this paper.If, using a simplified notation, e 1 , e 2 ,...denote the values of a generic entropy E for the probability distributions P n = (p 1 ,... p n ), Q m = (q 1 ,..., q m ),..., the various types of potential comparisons may be defined as follows: where c is a constant.
In particular, we shall address the following fundamental questions: Which conditions on an entropy are required for the comparisons in Equation ( 3) to be valid or permissible?Does S or S * in Equations ( 1) and ( 2) meet such valid comparison conditions, and if not, are there functions of S or S * that do?Do any of the entropy families in Table 1 have members that are superior to S in this regard?If none of those entropies meet such conditions, is there an alternative entropy formulation that does?

Properties of S
Although the properties of S(P n ), or simply S, in Equation (1) are discussed in various textbooks (e.g., [2][3][4]10,24]), they will be briefly outlined here so that we can conveniently refer to them throughout this paper.Some of the most important ones are as follows: (P1) S is a continuous function of all its arguments p 1 ,..., p n (so that small changes in some of the p i 's result in only a small change in the value of S). (P2) S is (permutation) symmetric in the p i (i = 1,..., n) .
(P6) S is strictly Schur-concave and hence, if P n is majorized by Q n (denoted by ): with strict inequality unless Q n is simply a permutation of P n .(P7) S is additive in the following sense.If {p ij } in the joint probability distribution for the quantum states for two parts of a system or for the events of two statistical experiments, with marginal probability distributions {p i+ } and {p + j } where p i+ = p ij j=1 m  and p + j = p ij i=1 n  for i = 1,..., n and j = 1,..., m , then, under independence: Most of these properties would seem to be necessary and desirable for any entropy.One could argue about the absolute necessity of Property P7 (e.g., [25]) and among the families of entropies in Table 1, only S 1 and S 4 have this property.The essential Property P6 is a precise way of stating that the value of S increases as components of a probability distribution become "more nearly equal", i.e., S(P n ) > S(Q n ) if the components of P n are "more nearly equal" or "less spread out" than those of Q n .In terms of majorization, and by definition [26], if the components of P n are ordered such that: (6) and similarly for Q n , then: (7) with p i i=1 n  = q i i=1 n  = 1.Of course, not all P n and Q n are comparable with respect to majorization.

Valid Comparison Conditions
If an entropy has the above Properties P1-P6, there would seem to be no particular reason to doubt that size (order) comparisons are reasonable or permissible.Thus, for S and Equation (1) with k = 1 and for, say, P 3 (1) = (0.90, 0.05, 0.05) and P 2 (2) = (0.70, 0.30) so that S(P 3 (1) ) =0.39 and S(P 2 (2) ) = 0.61, it would be reasonable to conclude that the disorder or uncertainty is greater in the second case than in the first.However, for the additional probability distributions P 2 (3) = (0.8, 0.2) and P 4 (4) = (0.70, 0.15, 0.10, 0.05), the result S(P 2 (2) ) − S(P 3 (1) ) = 0.22 and S(P 4 (4) ) − S(P 2 (3) ) = 0.41 simply states that the difference in S-values of 0.22 is less than that of 0.41.There is, however, no basis for assuming or suggesting that this result necessarily reflects the true differences in the disorder of the four systems or the uncertainty of the four sets of events.For such comparisons to be valid, additional conditions need to be imposed.We shall determine such validity conditions in a couple of different ways.
In measurement theory, "Validity describes how well the measured variable represents the attribute being measured, or how well it captures the concept which is the target of measurement" [27] (p.129).While there are different forms of validity, we shall use value validity and define it as follows: Definition: A measure has value validity if all its potential values provide numerical representations of the size (extent) of the attribute being measured that are true or realistic with respect to some acceptable criterion.
To determine the conditions for an entropy to have value validity, we shall use the recently introduced lambda distribution defined as: where λ is a parameter that reflects the uniformity or evenness of the distribution [28].The P n 0 and P n 1 in Equation ( 4) are particular (extreme) cases of this distribution.In fact, P n λ is a weighted mean of P n 0 and P n 1 , i.e.,: For a generic entropy E that is (strictly) Schur-concave (Property P6), and from the majorization for any given P n as is easily verified from Equations ( 6) and ( 7), it follows that: Consequently, validity conditions on E(P n ) can equivalently be formulated in terms of E(P n λ ).
By considering P n λ , P n 0 , and P n 1 as points (vectors) in n-dimensional space, Euclidean distances are then the logical choice as the basis of a criterion for the value validity of entropy E.Then, the following ratio equality presents itself as the natural and obvious requirement: Besides the standard Euclidean distance function d used in Equation ( 11), the same result 1 − λ would be obtained for all members of the Minkowski class of distance metrics.With E(P n 0 ) = 0 since there is no disorder or uncertainty when one p i = 1 (and the other p i 's equal 0) or when n = 1, (11) can be expressed as: and, in terms of the relative entropy: for all n and λ .This formulation is also an immediate consequence of (9), i.e.: If we accept E(P n 1 ) = logn as a reasonable maximum entropy for any given n, which is that of S in Equation ( 1) (with k = 1), then Equation ( 12) would become: However, a reasonable and justifiable alternative would clearly be E(P n 1 ) = n −1 so that Equation (12)   becomes: Of course, both expressions in Equations ( 15) and ( 16) give E(P n λ ) = 0 for n = 1 as is only reasonable.
The E(P n 1 ) = n −1 and Equation ( 12) also follow from simple functional equations.With E(P n 1 ) = f (n), it seems reasonable and most intuitive to suggest that increasing n by an integer value m (m < n) should result in the same absolute change in the value of the function f as when n is reduced by the same amount m, i.e.,: The general solution to this functional equation is: where a and b are arbitrary real constants [29] (p.82).Also, Equation ( 18) is the solution of Jensen's functional equation for integers ([29] (p.43), i.e.,: If, instead of Equation ( 17), one proposes: then the most general solution would be f (n) = a log n with arbitrary constant a [29] (p. 39).By setting a = 1 and hence E(P n 1 ) = logn , then, instead of Equation ( 16), Equation ( 12) becomes Equation (15).
Similarly, for any given (fixed) n, E(P n λ ) becomes a function g of λ only and for which it is proposed that: where μ is such that 0 ≤ λ + μ ≤ 1 and 0 ≤ λ − μ ≤ 1, with the general solution of Equation ( 21) being: with arbitrary constants c and d [29] (p.82).Since E(P n 0 ) = g(0) = 0 and E(P n 1 ) = g(1) = d , Equation (22)   results in Equation (12).Consequently, different lines of reasoning lead to Equations ( 12) and (15) or Equation ( 16) as conditions for an entropy E to have value validity and therefore making the difference comparisons in Equations (3b) and (3c) permissible.The basis for those conditions are the distance criterion in Equation ( 11), the mean-value relationship in Equation ( 14), and the difference relationships represented by the functional equations in Equations ( 17), ( 19)- (21).Those functional equations also directly support the validity of the comparisons in Equations (3b) and (3c).

Value-Valid Functions of S and S *
It is immediately apparent that neither S in Equation ( 1) nor S * in Equation ( 2) meet those validity conditions.It is found that S and S * consistently overstate the true extent of the attribute being measured, i.e., the attribute of system disorder or event uncertainty.Consider, for example, the lambda distribution in Equation ( 8) with λ = 0.5 and n = 4, i.e., 0.5 4 (0.625, 0.125, 0.125, 0.125) P = for which S = 1.07 and S * = 0.77, which are, respectively, substantially greater than the values (0.5) log 4 0.69 = and 0.5 as required by Equations ( 15) and (13).Each element of the distribution 0.5 4 P has the same distance from each element of P n 1 = (0.25, 0.25, 0.25, 0.25) as it does from each element of P n 0 = (1, 0, 0, 0), i.e.,
As another simple example, consider P 3 = (0.8, 0.15, 0.05) for which S = 0.61 and S * = 0.56.Since this P 3 -distribution is much closer to P 3 0 = (1, 0, 0) than it is to P 3 1 = (1/3, 1/3, 1/3) and since S ∈[0, 1.10] for n = 3 and since S * ∈[0, 1], these values of S = 0.61 and S * = 0.56 are unreasonably large.By comparison, for S(0.8, 0.15, 0.5) = S(P 3 λ ) in Equation (10), it is found that λ = 0.282 so that, from Equations ( 13) and ( 15), 0.282log3 = 0.31 and 0.28, respectively, would have been appropriate values, rather than 0.61 and 0.56, had the entropy (with upper bound log n) had value validity.When comparing the results from these two examples with the respective S-values of 1.07 and 0.61, it would not be a valid inference that the disorder (uncertainty) in the first case was about 75% greater than in the second case (i.e., as a particular case of Equation (3c)).This result would only apply to the S-values themselves and not to the attribute that S is supposed to measure (i.e., the disorder or uncertainty).The appropriate and valid comparison should be between the above entropy values of 0.69 and 0.31, showing a 123% increase in disorder (uncertainty).Even though S and S * do not meet the conditions for valid difference comparisons, perhaps some functions of S and S * do.We shall address this next.

The Case of S
In order to satisfy the validity requirement in Equation ( 15), we shall explore if there exists a function (or transformation) f such that: from which a transformed entropy S T could be obtained as: where P n λ is again the distribution defined in Equation (8).From the graphs of S(P n λ ) versus λ log n for some different values of n as shown in Figure 1, it is clear that no such function f exists for all λ and n.It is also evident from Figure 1 that S overstates the degree of disorder (uncertainty) throughout the range from 0 to log n and for different n.The absolute extent of such overstatement or lack of value validity appears to be greatest when S roughly equals (4/3) log n.Nevertheless, it would appear from Figure 1 that at least a reasonable degree of approximation could be achieved from Equations ( 23) and ( 24) if we restrict those functions to cases when, say, S ≤ 0.8log n , or S* ≤ 0.8for all n.When the function (model) S = α(λ log n) β is fitted to the different values of n and λ in Table 2 for S* ≤ 0.8 , regression analysis results in the parameter estimates α = 1.52and β = 0.78 .When these estimates are replaced with the nearest fraction (for convenience) 3/2 and 4/5 and when this fitted function is then inverted as in Equation ( 24), we obtain the transformed entropy: for S * (P n λ ) ≤ 0.8 (25) so that, for any probability distribution P n = (p 1 ,..., p n ) and from Equation (10): The values of S T (P n λ ) in Equation ( 25) for various λ and n as given in Table 2 are quite comparable with the corresponding values of λ log n.In fact, the coefficient of determination, when properly computed [30], is found to be R ) 2 = 0.998 , showing that about 99% of the variation of λ log n is explained (accounted for) by the model in Equation (25).
The entropy S T has all of the same Properties P1-P6 as does S, but it does not have the additivity Property P7.Of course, S T has the limitation that it is defined for the restricted range from 0 to [(2 / 3)(.8logn)] 5/4 .However, S T in Equation ( 26) does approximately meet the requirement in Equation ( 15) for its limited range so that difference comparisons as in Equations (3b) and (3c) are reasonably valid.

The Case of S *
For the relative entropy S * ∈[0, 1] in Equation ( 2), and in order to meet the validity condition in Equation ( 12) with E(P n 1 ) = 1, a function f is needed such that S * (P n λ ) = f (λ, n) and from which a transformed relative entropy S T * ∈[0, 1] follows as: * *

( ) [ ( ), ]
T n n S P g S P n It is apparent from Figure 1 that the functions f and g have to have the integer n as a variable.By exploring alternative functions or models for different n and λ, using regression analysis, and expressing parameter estimates as convenient fractions, the following result is obtained: where S * stands for either * λ This function (model) in Equation ( 28) does indeed provide excellent fit to different data points (n, λ) as seen from the results in Table 2.The values of S T It may also be noted that log 2 n + is frequently referred to as Hartley's measure or entropy ([24], Chapter 2) after Hartley [31].
For the interesting binary case; Equation (28) simplifies to: and noting that: Figure 2 shows a comparison between S T * and S * for distribution P 2 λ = (1− λ / 2, λ / 2) (upper graph) and for P 2 = (1− p, p) (lower graph), with the latter form of the distribution typically being used for depicting binary entropies (e.g., [4,24]).The dashed lines represent the entropy requirement for value validity in Equation ( 13), which, for the upper and lower graphs becomes, respectively: Note that, while the derivative of E * (1− p, p) with respect to p in Equation (31) does not exist at p = 0.5, E * (1−p, p) is continuous at p = 0.5 (Property P1).  2) (upper curve) and S T * (P 2 λ ) in Equation ( 30) (lower curve) as functions of λ Lower graph: S * ( p, 1− p) and S T * ( p, 1− p) as functions of p.The dashed lines in the two graphs represent Equation (31).
It may perhaps be tempting to use S T * in Equation ( 28) to propose the following entropies: which would, respectively, comply with Equations ( 15) and ( 16), at least to a high degree of approximation.If, instead of n, the n + in Equation ( 29) is used in Equation ( 32) and for S T * in Equation ( 28), then those two potential entropies S T ' and S T " would also be zero-indifferent (Property P3).However, neither S T ' nor S T " can be acceptable entropies as exemplified by the two distributions P 4 = (0.40, 0.35, 0.24, 0.01) and Q 4 = (0.40, 0.35, 0.25, 0) for which S(P 4 ) = 1.12 and S(Q 4 ) = 1.08 whereas, from Equations ( 32) and ( 28) using n + , S T ' (P 4 ) = 0.76, S T ' (Q 4 ) = 0.94, S T " (P 4 ) = 1.50, and S T " (Q 4 ) = 1.72.That is, in spite of the majorization when any reasonable entropy should be greater for P 4 than for Q 4 (Property P6), both S T ' and S T '' give the opposite result.It is easy to find other examples with the same results.

Assessment of Entropy Families
For a parameterized family of entropies S i , such as those defined in Table 1, to be viable beyond being an interesting mathematical exercise or a generalization for its own sake, one could certainly argue that S i would need to meet some conditions lacking by S in Equation (1).First, S i should have some properties that may be considered important or desirable and that S is lacking.Second, the flexibility provided by the incorporation of one or more parameters into the formulation of S i should be justifiable by the parameter(s) having some meaning or interpretation relative to the characteristic (attribute) that S i is supposed to measure.
With respect to the first condition, it is rather obvious from the expressions in Table 1 that none of those entropy families would be favored over S in Equation ( 1) in terms of their properties.In fact, some of those entropies are even lacking the essential Schur-concavity property (Property P6 in Section 2.1).The entropy S 3 in Table 1, which is a particular subset of S 8 with β = δ = 1 and λ = k / (1− α ), and which was defined for all real α , is strictly Schur-concave only for α ≥ 0 .This follows immediately from the fact that, with the p i 's ordered as in Equation ( 6), the partial derivative , n only if α > 0and strictly so if the inequalities in Equation ( 6) are all strict [26] (p.84).For the limiting case when α → 0, S 3 reduces to (1), which is strictly Schur-concave [26] (p.101).Similarly, S 10 was defined by Good [23] for non-negative integer values of α and β , but is not Schur-concave for all such α and β values.Baczkowski et al. [32] extended S 10 to permit α and β to take on real values and determined the rather restrictive (α, β )regions for the Schur-concavity of S 10 .
A brief comment is warranted about the potential case when the probability distribution P n = (p 1 ,..., p n ) is possibly incomplete, i.e., when p i ≤ 1 i=1 n  [10,11].Then, setting λ = k / (1− α ) for some constant k and β = δ = 1, the S 8 in Table 1 becomes: In the limiting case when α 1 → , and using L'Hôpital's rule, Equation (33) reduces to: The entropy in Equation ( 34) was first proposed by Rényi [11] for k = 1/log2, or equivalently, for k = 1 and the base-2 logarithm in Equation (34).In particular, when the probability distribution consists of a single probability p ∈(0, 1), then Equations ( 33) and (34) become: It is rather apparent from the expressions in Table 1 that none of those entropy families or individual members, including those in Equations ( 33) and (34), meet the validity conditions in Section 2.2.Clearly, none of them satisfy Equations ( 15) and ( 16) or the weaker condition in Equation ( 12).There appears to be no reason for preferring any of those entropies or their relative (normed) forms over S or * S in Equations ( 1) and ( 2) because of any substantial superiority with respect to value validity.
With respect to the flexibility provided by such generalized entropies, one could argue that the entropy parameters may potentially be selected to best fit some given situation or problem [2] (p.185) [33] (pp.298-301).However, any parameter selection has to have some meaningful basis or explanation, which is sorely lacking in the published literature.Of the various families of entropies in Table 1, Rényi's entropy S 1 has attracted the most attention in information theory and in physics where it is being used, for example, as a generalized measure of fractal dimension in chaos theory [34] (pp.686-688) [35] (pp.203-223).
Furthermore, such flexibility can alternatively be achieved by simply considering strictly increasing functions of S in Equation (1).As an example, consider Rényi's entropy 1 S in Table 1 with 2 α = , i.e., in Equation ( 8) and the values of n and λ in Table 2, and based on regression analysis, the following model is obtained: It then follows from Equation ( 10) that the same type of relationship as in Equation ( 35) should hold approximately for any probability distribution P n = (p 1 ,..., p n ).

The Euclidean Entropy
Since neither S in Equation (1) nor any of the entropies in Table 1 meet the validity condition in Equation (12) or in Equations ( 15) and ( 16), we shall search for an entropy that does.The most logical starting point is clearly the Euclidean distance relationship in Equation (11).Thus, for any distribution P n = (p 1 ,..., p n ), we can define: where P n 0 and P n 1 are those in Equation ( 4).With P n = P n λ in Equation ( 8), it is immediately apparent that this S E * satisfies the validity condition in Equation (13).Then, an entropy that satisfies condition Equation ( 16) can be defined in terms of Equation (36) as: where n + is defined in Equation (29).It seems appropriate to call this S E as the Euclidean entropy since it is based purely on Euclidean distances.The n + instead of n is used in the definition of S E to ensure that it is zero-indifferent (Property P3 in Section 2.1).
The S E can be expressed as: where s n + −1 is the standard deviation of the n + positive probabilities using n + −1 instead of n + as a divisor.From the first expression in Equation ( 38), we see that, for any given n + , S E is also a strictly increasing function of the so-called quadratic entropy 1 − p i 2 i=1 n  studied in [36].Note also that S E * in Equations ( 36) and ( 37) is the coefficient of nominal variation introduced by [37] as measure of variation for nominal categorical data.Also, from the Lagrange identity (e.g., [38] (p.3)) and the second expression in Equation (38), S E and S E * can be expressed in terms of pairwise differences between probabilities as: The S E can be seen to have all of the properties of S in Equation (1) as outlined in Section 2. for any given (fixed) n + from Equation (38) [26] (Chapter 3).The S E avoids the limitation pointed out for the potential entropies S T ' and S T '' in Equation (32).That is, the implication under Property P6 also holds when some of the elements of P n or Q n are zero.For example, for P 4 = (0.40, 0.35, 0.24, 0.01) and Q 4 = (0.40, 0.35, 0.25, 0), S E (P 4 ) = 1.96 > S E (Q 4 ) = 1.74, which is an appropriate result since , but for which S T ' and S T '' gave the opposite and unacceptable result.
To prove this last property of S E , it is sufficient to show that, for the distribution P n = (p 1 ,..., p n + , 0,..., 0) and using n instead of n + in the formula in Equation (38) and denoting this by S E (P n ;n), the value of S E (P n ;n) for this P n is strictly increasing in n for given (fixed) n + .Treating n as a continuous variable (for mathematical purposes), we obtain from Equation (38) the following partial derivative: The first term A ≤ 1/ 2 since p i so that ∂S E (P n ;n) / ∂n > 0 in Equation ( 39) for all n ≥ n + +1 , which complete the proof.Thus, if Q n = (q 1 ,..., q n ) for all q i > 0 is majorized by P n = (p 1 ,..., p n + , 0,..., 0), then S E (Q n ) > S E (P n ;n) > S E (p 1 ,..., p n + ).

Most importantly, and the reason for introducing S E and S E
* , is that they satisfy the validity requirement in Equations ( 16) and ( 13), respectively.For P n λ in Equation ( 8), the expressions for S E and S E * in Equations ( 37) and (38) become S E (P n λ ) = (n −1)λ and S E * (P n λ ) = λ .The S E * in Equation ( 36) also has an appealing interpretation: it is the relative extent to which the distance between P n and P n 1 is less than that between P n 0 and P n 1 .Such interpretation can also be made in terms of max P n d(P n , P n 1 ), which equals d(P n 0 , P n 1 ) since d(P n , P n 1 ) is strictly Schur-convex in

Statistical Inferences
We shall also consider the situation when the probability distribution P n = (p 1 ,..., p n ) consists of multinomial sample estimates p i = n i / N for i = 1,…, n and sample size N = n i i=1 n  , with the corresponding population distribution being Π n = (π 1 ,..., π n ) .For a generic entropy E, our interest may then be in making statistical inferences, especially confidence-interval construction, about the unknown population entropy E(Π n ) based on the sample distribution P n and the sample size N. From the delta method of the large sample theory ( [39], Chapter 14), the following convergence to the normal distribution holds: In other words, for large N, E(P n ) is approximately normally distributed with mean E(Π n ) and variance Var[E(P n )] = σ 2 / N or standard error SE = σ / N and where σ 2 is given by: The limiting normal distribution in Equation ( 40) still holds when, as is necessary in practice, the estimated variance σ 2 is substituted for σ 2 by replacing the population probabilities π i in Equation (41)   with their sample estimates p i , i = 1,..., n, yielding the estimated standard error SE ∧ = σ / N .

Concluding Comments
A number of conclusions may be made from this analysis using the concept of value validity of an entropy and based on the lambda distribution and criteria involving Euclidean distances and simple functional equations.Equations ( 12)-( 16) provide the additional conditions that an entropy E has to meet for E to have the value-validity property so that difference comparisons as in Equations (3b) and (3c) may be permissible.While neither the Boltzmann-Shannon entropy in Equation (1) nor any of the proposed entropy families in Table 1 satisfy those conditions, the transformed entropy S T in Equation ( 26) does for S(P n ) / log n ≤ 0.8 and also the relative entropy S T * in Equation ( 28) does to a reasonable degree of approximation.
Since no members of the generalized entropies in Table 1 has the advantage of value validity over S, and some may lack other properties of S as outlined in Section 2.1, one may question the need for what seems to have become almost an embarrassment of riches of entropies.One justifiable exception would be if the parameter(s) of a generalized entropy could be shown to have some particular meaning or interpretation that would be useful for explaining some phenomenon or result.However, such flexibility that may be provided by a parameterized family of entropies can also potentially be achieved by considering functions of S in Equation (1) as exemplified by Equation (35).
Whether an entropy E is used as a measure of disorder of a system in physics, uncertainty (information content) of a set of events in information theory, or of some other attribute or characteristic, the concern is with what types of comparisons can be made between values of E. If we argue that an E, such as S in Equation ( 1), should only be used for size ("greater than") comparisons as in Equation (3a), such advice will not always be heeded as demonstrated in the published literature, resulting in invalid and misleading conclusions and interpretations.Such a misuse problem is avoided and more informative results can be obtained if E has the value-validity property permitting difference comparisons in Equations (3b) and (3c) to be made.The Euclidean entropy S E in Equation ( 38) is proposed as one such more informative entropy.
As with any measure that summarizes a set of data into a single number, it is advisable that the results be used or interpreted with some caution and an entropy is no exception.Even though the S E in Equation ( 38) has the value-validity property and a number of other desirable properties so that it can be used for all the comparisons in Equations (3a)-(3c) as reasonable indications of the attribute (characteristic) being measured, this does not necessarily imply that another entropy with all the same properties would produce exactly the same results.Even S in Equation ( 1) and some member of Rényi's family S 1 in Table 1 such as α = 2, which both have the same Properties P1-P7 (Section 2.1), do not necessarily order their values in the same way for all probability distributions unless the distributions are comparable with respect to majorization.

1 except
for the additivity Property P7.It is strictly Schur-concave (Property P6) since (a) p i 2 i=1 n  is strictly Schur-convex and (b) S E is a strictly decreasing function of p i 2 i=1 n 