Abstract
While Shannon’s differential entropy adequately quantifies a dimensioned random variable’s information deficit under a given measurement system, the same cannot be said of differential weighted entropy in its existing formulation. We develop weighted and residual weighted entropies of a dimensioned quantity from their discrete summation origins, exploring the relationship between their absolute and differential forms, and thus derive a “differentialized” absolute entropy based on a chosen “working granularity” consistent with Buckingham’s -theorem. We apply this formulation to three common continuous distributions: exponential, Gaussian, and gamma and consider policies for optimizing the working granularity.
1. Introduction
Informational entropy, introduced by Shannon [] as an analogue of the thermodynamic concept developed by Boltzmann and Gibbs [], represents the expected information deficit prior to an outcome (or message) selected from a set or range of possibilities with known probabilities. Many modern applications using this concept have been developed, such as the so-called maximum entropy method for choosing the “best yet simplest” probabilistic model from amongst a set of parameterized models, which is statistically consistent with observed data. Informally, this can be stated as: “In order to produce a model which is statistically consistent with the observed results, model all that is known and assume nothing about that which is unknown. Given a collection of facts or observations, choose a model which is consistent with all these facts and observations, but otherwise make the model as ‘uniform’ as possible” [,]. Philosophically, this can be regarded as a quantitative version of “Occam’s razor” from the 14th Century - “Entities should not be multiplied without necessity”. Mathematically, this means that we find the parameter values which maximize the entropy of the model, subject to constraints that ensure the model is consistent with the observed data, and MacKay [] has given a Bayesian probabilistic explanation for the basis of Occam’s razor. This maximum entropy approach has found widespread applications in image processing to reconstruct images from noisy data [,] - for example, in Astronomy, where signal to noise levels are often extremely low - and in speech and language processing, including automatic speech recognition and automated translation [,].
This idea of informational entropy has been expanded and generalized. Tsallis [] proposed alternative definitions to embrace inter-message correlation [], though the information of a potential event remained solely dependent on its unexpectedness or “surprisal”. This is somewhat counterintuitive: “Man tosses 100 consecutive heads with coin” is very surprising but not important enough to justify a front-page headline. Conversely “Sugar rots your teeth” is of great importance but its lack of surprisal disqualifies it as news. “Aliens land in Trafalgar Square” is both surprising and important and we would expect it be a lead story. To reflect this, Guiaşu [] introduced the concept of “weighted entropy” whereby each possible outcome carried a specific informational importance, an idea expanded by Taneja and Tuteja [], Di Crescenzo and Longobardi [], and several others. Another modification considers the entropy of outcomes subject to specific constraints: for example, “residual entropy” was defined by Ebrahimi [] for lifetime distributions of components surviving beyond a certain minimum interval.
From the outset, Shannon identified two kinds of entropy: the “absolute” entropy of an outcome selected from amongst a set of discrete possibilities [] (p. 12) and the “differential” entropy of a continuous random variable [] (p. 36). The differential version of weighted entropy has found several applications: Pirmoradian et al. [] used it as a quality metric for unoccupied channels in a cognitive radio network and Tsui [] showed how it can characterize scattering in ultrasound detection. However, under Shannon’s definition the differential entropy of a physical variable requires the logarithm of a dimensioned quantity, an operation which necessitates careful interpretation [].
In this paper we examine the implications of this dimensioned logarithm argument to weighted entropy and show how an arbitrary choice of unit system can have profound effects on the nature of the results. We further propose and evaluate a potential remedy for this problem; namely a finite working granularity.
2. Absolute and Differential Entropies
Entropy may be regarded as the expected information gained by sampling a random variable (RV) with a known probability distribution. For example, if is a discrete RV and then outcome occurs on average once every observations and the mean information encoded as bits. However, it is common to use natural logarithms for which the information unit is the “nat” (≈1.44 bits). Entropy can therefore be defined as
where is the set of all possible . (An implicit assumption is that results from an independent identically distributed process: while Tsallis proposed a more generalized form to embrace inter-sample correlation [,], the current paper assumes independent probabilities.) Shannon extended (1) to cover continuous RVs as “differential” entropy
where is the probability density function (PDF) of . Two points may be noted: firstly, since in (2) x only affects the integrand through , is “position-independent”, i.e., for all real . Secondly is not, as one might naïvely suppose, the limit of as resolution tends to zero (see Theorem 9.3.1 in []). Furthermore, while is always positive (since , may be negative if most larger values of are . In the extreme case of a Dirac delta-function PDF, representing a deterministic—and therefore non-informative—outcome, the differential entropy would not be zero but minus infinity.
Take for example the Johnson-Nyquist noise in an electrical resistor: if the noise potential is Gaussian with an RMS value volts it is easy to show that nats. (Position-independence makes the bias voltage irrelevant.) Suppose that ; working in microvolts we obtain nats but in millivolts nats. If differential entropy truly represented information then a noise sample in microvolts would increase our information but measured in millivolts would decrease it. Thus, must be regarded as a relative, not an absolute measure and consistent units must be used for different variables to be meaningfully compared.
“Residual” entropy, where only outcomes above some threshold are considered, is given by []
where is called the “survival function” since in a life-test experiment it represents the proportion of the original component population expected to survive up to time . Some authors call the “hazard function” (a life-test metric equal to failure rate divided by surviving population) though this is only valid for the case of ; it is better interpreted as the PDF of subject to the condition . This somewhat eliminates the positional independence since a shift in only produces the same entropy when accompanied by an equal shift in , i.e., , but the contribution of each outcome to the total entropy still depends on rarity alone.
Guiaşu’s aforementioned weighted entropy [] introduces an importance “weighting” to outcome whose surprisal remains : the overall information of this outcome is redefined so entropy becomes . It seems intuitively reasonable that the differential analogue should be though if is a monotonic function we could define this more compactly as []:
and the residual weighted entropy
We have already noted that the logarithms of probability densities behave very differently from those of actual probabilities. Aside from the fact that may be greater than 1 (a negative entropy contribution) it is also typically a dimensioned quantity: for example if represents survival time then has dimension [Time]−1, leading to the importance of unit-consistency already noted. In the next section we explore more deeply the consequences of dimensionality.
3. Dimensionality
The underlying principle of dimensional analysis, sometimes called the “-theorem”, was published in 1914 by Buckingham [] and consolidated by Bridgman in 1922 []. In Bridgeman’s paraphrase [] (p. 37) an equation is “complete” if it retains the same form when the size of the fundamental units is changed. Newton’s Second Law for example states that where is the inertial force, the mass and the acceleration: if in SI units kg and ms−2 then the resulting force N, where the newton N is the SI unit of force. In the CGS system g and cms−2 so the force is dynes, the exact equivalent of four newtons. The equation is therefore “complete” under the -theorem which requires that each term be expressible as a product of powers of the base units: in this case [Mass][Length][Time]−2.
The problem of equations including logarithms (and indeed all transcendental functions) of dimensioned quantities has long been recognized. Buckingham opined that “… no purely arithmetic operator, except a simple numerical multiplier, can be applied to an operand which is not a dimensionless number, because we cannot assign any definite meaning to the result of such an operation” ([], p. 346). Bridgman was less dogmatic, citing as a counter-example the thermodynamic formula where is the absolute temperature, is pressure, and and are other dimensioned quantities ([], p. 75). It is true that the logarithm returns the index to which the base (e.g., …) must be raised in order to obtain the argument: for example if Pa (the Pa or pascal being the SI unit of pressure) then to what index must be raised to in order to obtain that value? It is not simply a matter of obtaining 200 from the exponentiation but 200 pascals. Furthermore, the problem would change if we were to switch from SI to CGS where the pressure is 2000 barye (1 barye being 1 dyne cm−2) though the physical reality behind the numbers would be the same.
However, in the current case it is the derivative of log pressure which is important, and since it has dimension [Temperature]−1 and the -theorem is therefore satisfied. Unfortunately, Shannon’s differential entropy has no such resolution since it is the absolute value of (not merely its derivative) which must have a numeric value. This kind of expression has historically provoked much debate and though there are several shades of opinion we confine ourselves to two competing perspectives:
Molyneux [] maintains that if grams then should be correctly interpreted as and “log(gram)” should be regarded as a kind of “additive dimension” (he suggests the notation 2.303 <gram>).
Matta et al. [] argue that “log(gram)” has no physical meaning; while Molyneux had dismissed this as pragmatically unimportant, they echo the views of Buckingham [] saying that dimensions are “… not carried at all in a logarithmic function”. According to Matta, must be interpreted as (the dimension of cancelled out by the unit).
Since most opinions fall more or less into one or other of these camps it will be sufficient to consider a simple dichotomy: we refer to the first of these as “Molyneux” and the second as “Matta”. Under the Molyneux interpretation the differential entropy must be expressed
which has an additive (and physical) dimension of “log(second)” (or <second>) in addition to the multiplicative (and non-physical) dimension of nats. Pragmatically this is not important since entropies of variables governed by different probability distributions may still be directly compared (assuming is always quantified in the same units). However, when we consider weighted entropy, we find that
where is the expectation of . Here Molyneux’s approach collapses since the expression has a multiplicative dimension nat-seconds and an additive dimension “”. Since the latter depends on the specific distribution, loses any independent meaning; comparing weighted entropies of two different variables would be like comparing the heights of two mountains in feet, defining a foot as 12 inches when measuring Everest and 6 when measuring Kilimanjaro.
So, if Molyneux’s interpretation fails, does Matta’s fare any better? Since Matta requires the elimination of dimensional units, we introduce the symbol to represent one dimensioned unit of (for example, if represents time in seconds then s). The Shannon differential entropy now becomes and the corresponding weighted entropy . At first glance this appears hopeful since the logarithm arguments are now dimensionless, but let us consider a specific example: the exponential distribution where the mean outcome . This yields which is (as one would expect) a monotonically increasing function of tending to as .
However, the weighted entropy which experiences a finite minimum when . Though dimensionally valid, this creates a dependence on the unit-system used. Figure 1 shows the entropy values plotted against the expectation for calculation in seconds and minutes, showing the shift in the minimum weighted entropy between the two unit systems. The absurdity of this becomes apparent when one considers two exponentially distributed random variables and with = 9 s and = 15 s: Table 1 shows that when computed in nat-hours but when computed in nat-seconds.
Figure 1.
Weighted and unweighted differential entropies for an exponential distribution plotted against expected outcome, calculation performed in seconds and minutes.
Table 1.
Comparison of the weighted and unweighted entropies for two exponential processes. Entropy units are nats (unweighted) and nat-seconds/nat-hours (weighted).
The underlying problem is as follows: since logarithm polarity depends on whether or not exceeds , different sections of the PDF may exert opposing influences on the integral (Figure 2). While this is unimportant for which has no finite minimum, is forced towards zero with decreasing , which ultimately counteracts the negative-going influence of the logarithm. The two factors therefore operate contrarily: zero surprisal appears as entropy minus infinity and zero importance as entropy zero. Two solutions suggest themselves: (i) combine and in an expression to which they both always contribute positively (e.g., a weighted sum, which in fact yields a weighed sum of expectation and unweighted entropy) and (ii) retain the product but redefine the logarithm argument such that surprisal is always positive. With this in mind, the following section considers the fundamental relationship between absolute and differential entropies.
Figure 2.
Positive and negative contributions of probability density to the unweighted and weighted entropies for two exponential distributions: (a) shows the PDFs, (b) the unweighted and (c) the weighted entropies. Positive contributions are shown in green and negative contributions in red.
4. Granularity
All physical quantities are ultimately quantified by discrete units; time for example as a number of regularly-occurring events (e.g., quartz oscillations) between two occurrences, which is ultimately limited by the Planck time ( s), though the smallest temporal resolution ever achieved is around s []. Finite granularity therefore exists in all practical measurements: if the smallest incremental step for a given system is then is really an approximation of a discrete distribution, outcomes 0, , …. having probabilities , , …etc., so
which may be expanded into two terms (in the manner of [])
and if is sufficiently small
where the logarithm argument in is “undimensionalised” (as per Matta et al. []) and is the information (in nats) needed to represent one dimensioned base-unit in the chosen measurement system: this provides the correctional “shift” needed when the unit-system is changed and thus makes (8) comply exactly with the -theorem.
The corresponding weighted entropy may be dealt with in the same manner
While the second term in (9) corresponds to the enigmatic “dimension” of (7), it now has an interpretation independent of the measurement system and allows weighted entropies from different distributions to be compared. However, a suitable must be chosen; while this need not correspond to the actual measurement resolution, it is necessary (in order for all entropy contributions to be non-negative) that across all random variables , … whose weighted entropies are to be compared. It must therefore not exceed
Similarly, the residual weighted entropy can be shown to be
where is the expectation of given . The maximum granularity now becomes
where is the -value pertinent to the random variable and is the corresponding survival function. Equations (9) and (11) also provide a clue as to the lower acceptable limit of the granularity: if were too small then the second terms in these expressions would dominate, making “weighted entropy” merely an overelaborate measure of expectation. Within this window of acceptable values, a compromise “working granularity” must be found. This will be addressed later.
5. Gamma, Exponential and Gaussian Distributions
For the purpose of studying this granular entropy, the following specific probability distributions were chosen:
- Exponential: this is the distribution of time intervals between independent spontaneous events.
- Gamma: this generalizes the Erlang distribution of a sequence of consecutive identically distributed independent spontaneous events; this generalization allows to be a non-integer.
- Gaussian (Normal): this represents the aggregate of many independent random variables in accordance with the central limit theorem. It is also the limit of the gamma as and has the largest possible entropy for a given variance [].
Figure 3 compares examples of the three distributions with the same mean, showing how the exponential and Gaussian are the limiting cases for the gamma distribution for equal to 1 and infinity respectively. As before, we assume that represents a time interval (though it could represent other physical quantities).
Figure 3.
Exponential, gamma and Gaussian distributions. Gamma for is identical to the exponential, while the Gaussian is the limiting case of gamma as for all the gammas, and the Gaussian (that of the gamma for ).
5.1. The Exponential Distribution
The exponential distribution models spontaneous events such as the decay of atoms in a radioactive isotope or “soft” electronic component failures. The PDF is where is the “rate parameter”: it has the property that is both the expectation and the standard deviation. Applying (9) and (11) we find that the regular and weighted entropies are:
Residual weighted entropy is worked out as an example in [] (p. 9): “granularized”, it can be written
which is clearly a linear function of with gradient . In the original formulation (with in place of ) this was problematic since the slope could be either positive and negative, but now by keeping we ensure the weighted entropy never decreases with and always remains constant or decreases with increasing .
5.2. The Gamma Distribution
The PDF of the gamma distribution is:
where (the gamma function). Since the variance and the expectation , we can obtain a gamma distribution with any desired expectation and standard deviation by setting and . Substituting (16) into (8) yields
where (the digamma function).
Since , (17) simplifies to (13) when . Similarly, the weighted entropy can be written
Using the recursive property [] and recalling that , we uncover a very simple relationship between the weighted and unweighted entropies:
(Note that (18) and (19) are consistent with (13) and (14) for .) In a similar manner, the residual weighted entropy can be shown to be
where (the upper incomplete gamma function) and . Though not a well-recognized function, this converges for all , may be defined 0 for () and computed to any required degree of accuracy using Simpson’s rule. Also note that when , the term containing vanishes and (20) simplifies to (15).
5.3. The Gaussian (or Normal) Distribution
The PDF of the Gaussian distribution is given by
where is the expectation and the standard deviation. While the distribution extends to infinity in both directions (unlike the exponential and gamma which are defined only for ) we have been considering temporal separation which can only be positive; for this reason we impose an additional restriction that such that never exceeds 0.0013, which may, for practical purposes, be neglected. The expression for the Shannon differential entropy has already been introduced in Section 2; “granularized”, the expression may be written
For the weighted entropy we substitute (21) into (9) and simplify to obtain
So, the weighted entropy is simply the unweighted entropy multiplied by the expectation. With the exception of the term this is almost the same as (19), and as becomes large the two expressions converge. This is to be expected since the central limit theorem [] requires that the sum of many independent random variables behaves as a Gaussian: since the gamma distribution represents the convolution of exponentials (each with expectation ), when is large (and thus small) the gamma and Gaussian acquire near-identical properties for .
To obtain an expression for the residual weighted entropy of the Gaussian distribution we substitute (21) into (11) and simplify to obtain:
where , and (the complementary error function).
5.4. Numerical Calculations
Figure 4 shows the weighted entropies computed for exponential, Gaussian and gamma distributions for a mean outcome of 1 s and two levels of granularity (0.05 and 0.1 s) across a range of standard deviations less than the mean. We make the following observations:
Figure 4.
Gamma, Gaussian and exponential entropies for two different granularities.
- Both the weighted and unweighted entropies for should in principle be zero (since here becomes a Dirac delta function located at ) but would actually tend to minus infinity as . Our “granularized” entropy definitions (8) and (9) cease to be meaningful in this region since they approximate absolute entropies which must be non-negative.
- Meaningful Gaussian curves cannot be computed for since this would significantly violate the assumption that all (see Section 5.3). Thus, the gamma “takes over” from the Gaussian across the range , thus providing a kind of “bridge” to the exponential case on the far right.
- Although ceases to rise significantly beyond , increases almost linearly up to . This is because the expanding upper tail of the distribution, though not significantly increasing the surprisal, nevertheless causes larger values to contribute more significantly.
6. Choosing a Working Granularity
In Section 4 we postulated the existence of a “window” from which an acceptable working value of must be chosen. Though we did not specify its limits, we noted that if were too small it would eliminate the nuance of “entropy” from and make it merely an overelaborate measure of expectation. For this reason, we suggest that be as large as possible, though not so large as to exceed the reciprocal of the maximum probability density and thus introduce negative surprisal. Here we test this suggestion and explore its implications for the distributions previously described.
6.1. The Upper Limit
The exponential distribution has the property of which the maximum is always constant and equal to the rate parameter . Thus if all distributions to be compared are exponential then where is the rate parameter for independent of . However, this property does not apply to the more general gamma distribution, whose modal value when substituted into (16) and (12) (noting that ) gives
Similarly for the Gaussian distribution the overall maximum probability density occurs when and so the maximum value for the range must be
Calculations were performed on a set of four distributions, each with an expected value of 1.0 s:
- Gaussian with standard deviation 0.3 s
- Gamma with (standard deviation 0.447 s)
- Gamma with (standard deviation 0.258 s)
- Exponential with s−1.
Figure 5 shows the residual PDFs for the first of these with the maximum probability density overlaid. Figure 6 compares this with the other three distributions: the maximum for the entire set (for s) is 6.938 s−1, so from (12) the maximum allowable granularity for comparing their entropies = 1/6.938 = 0.144 s.
Figure 5.
Residual probability density functions for the Gaussian distribution; the maximum probability density is overlaid as a function of .
Figure 6.
Comparison of maximum residual probability densities for four distributions with the same mean (1.0 s). The observed boundary maximum of the probability density for this range is 6.938s−1.
While this ensures positive surprisal throughout our range of interest, the granularity may nevertheless be subject to other constraints. To investigate further we introduce an alternative calculation for weighted entropy to which (9) and (11) may be compared. Consider a “histogram” of cells, each wide, the cell having constant probability ( being the lower cell boundary and ). Figure 7 shows the histograms for the Gaussian distribution with three different granularities ( taking the minimum value required to cover the range ) and . Clearly as increases the discretized distribution resembles less the corresponding continuous distribution. In each case the weighted entropy can be approximated
where is the horizontal position of the centroid of the PDF enclosed by cell and .
Figure 7.
Comparison of Gaussian PDF and discrete “histogram” approximation for three granularities.
Figure 8 compares the results of (27) with those of (24) across a range of granularities for , showing the values are almost identical for small but diverge as the granularity increases. The “upper limit” = 0.752 s (represented by the three-cell distribution in Figure 7) shown by the broken line appears to represent the lower boundary for large errors, though noticeable discrepancies do exist for all greater than half this value.
Figure 8.
Weighted entropies for Gaussian s, s, computed using (23) and (27). Broken line indicates = 0.752.
We therefore define the upper limit of granularity as the maximum for which the two weighted entropy approximations disagree by no more than a fraction of their combined average, i.e.,
which may be computed iteratively for any given distribution and -value. We choose as our benchmark = 0.321 (i.e., 32.1% maximum error) which corresponds to the previously computed = 0.752 s and plot the upper limits of for a range of values (see Figure 9).
Figure 9.
Upper granularity limits for 32.1% error between “theoretical” (12) and “practical” (28) residual weighted entropies for Gaussian ( = 1.0 s, = 0.1, 0.2, 0.3 s).
The granularity computed from (12) is mostly lower (and never significantly higher) than the value from (28) and the former could be regarded as a cautious “engineering” lower limit: for the range of distributions compared in Figure 6 this is 0.144s and Figure 10 shows the weighted entropy calculated using this value across the same range of . We make the following observations:
Figure 10.
Comparison of residual weighted entropies for the distributions of Figure 6 (expectations again set to 1.0).
- The residual weighted entropy for the exponential distribution has the strongest dependence on ; since the distribution shape (and variance) does not depend on , the increase in weighted entropy is caused entirely by the increased mean weighting.
- For the gamma distribution with , the residual variance decreases with increasing , the distribution becoming progressively more concentrated around its mean. This causes the entropy to fall, counteracting somewhat the increased weightings. Thus, the rise in weighted entropy with increasing is less pronounced than for the exponential distribution.
- For the gamma distribution with , these competing effects almost cancel each other, the decreased variance compensating almost exactly for the increased average weighting.
- The Gaussian results are similar, the weighted entropy now showing a pronounced decrease with increasing . Re-plotting the Gaussian graph for smaller -values (Figure 11) shows that a critical granularity exists (in this case = 0.0465 s) where the residual weighted entropy remains almost constant as is varied.
Figure 11. The effects of mean and variance on residual weighted entropy of a Gaussian RV ( = 1 s, = 0.3 s) for different values of (noted on the left for each curve) which clearly determines the dominating influence. There exists a critical value ( 0.0465 s) where the residual weighted entropy barely depends upon .
6.2. The Lower Limit
Having established an upper limit for granularity, we observe the effect of using lower values than this. Figure 11 shows results obtained from the Gaussian PDF, indicating that with different granularities the weighted entropy can both rise and fall with increasing , a situation not unlike that which arose from applying Molyneaux’s dimensionality approach to differential weighted entropy (see Section 3): the largest weighted entropy amongst a group of distributions now depends not on measurement units but on the measurement granularity. This is to be expected since as decreases, becomes progressively more “expectation-like” due to the increased influence of the second term in (11). However, we must ask at what point does cease to be a meaningful “entropy” and merely a measure of expectation? What additional condition might be imposed to prevent this from happening?
One possibility would be to constrain the granularity such that two scenarios, one with higher and the other with lower entropy should never be allowed to switch over when granularity is changed. However, there remains the possibility that acceptable ranges for different distributions to be compared do not overlap, and some may have to be compared with others based solely on an expectation-like weighted entropy.
7. Conclusions
We have identified and attempted to address the dimensionality problem present in Di Crescendo and Longobardi’s differential residual weighted entropy formulation []—namely the opposing influences of the positive and negative values of which (since is a dimensioned quantity) depend on the unit system. This does not affect Shannon’s differential entropy [] so long as consistent units are employed, but it does become important when appears as an all-positive weighting. We circumvent this problem by applying a “working granularity” to convert differential entropy into a “quasi-absolute” quantity, choosing to be the largest value required to make in all distributions of interest. We demonstrate this formulation using the residual exponential, gamma, and Gaussian distributions. There are many other issues to be investigated: firstly, we have assumed throughout a single random variable whose sample values are uncorrelated. The extension of this idea to the strongly correlated Tsallis [,] entropy definition remains to be explored. Furthermore, the application to joint entropies in multivariate distributions has yet to be investigated.
Author Contributions
Conceptualization, M.T. and G.H.; methodology, M.T. and G.H.; software, M.T.; validation, M.T.; formal analysis, M.T. and G.H.; investigation, M.T.; writing—original draft preparation, M.T.; writing—review and editing, M.T. and G.H.
Funding
This research received no external funding.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Hentschke, R. Thermodynamics for Physicists, Chemists and Materials Scientists; Springer: Cham, Switzerland, 2014. [Google Scholar]
- Berger, A.L.; Della Pietra, S.A.; Della Pietra, V.J. A Maximum Entropy Approach to Natural Language Processing. Comput. Linguist. 1996, 22, 1–36. [Google Scholar]
- Jaynes, E.T. How Does the Brain Do Plausible Reasoning? In Maximum Entropy and Bayesian Methods in Science and Engineering; Vol. 1 Foundations; Erikson, G.J., Smith, C.R., Eds.; Kluwer Academic: Norwell, MA, USA, 1988; pp. 1–24. [Google Scholar]
- MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003; Chapter 28; pp. 343–356. [Google Scholar]
- Burch, S.F.; Gull, S.F.; Skilling, J. Image Restoration by a Powerful Maximum Entropy Method. Comput. Vis. Graph. Image Process. 1983, 23, 111–124. [Google Scholar] [CrossRef]
- Gull, S.F.; Skilling, J. Maximum Entropy Method in Image Processing. IEE Proc. F 1984, 131, 646–659. [Google Scholar] [CrossRef]
- Rosenfeld, R. A Maximum Entropy Approach to Adaptive Statistical Language Modelling. Comput. Speech Lang. 1996, 10, 187–228. [Google Scholar] [CrossRef]
- Tsallis, C. Possible Generalization of Boltzmann-Gibbs Statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Cartwright, J. Roll Over, Boltzmann. Phys. World 2014, 27, 31–35. [Google Scholar] [CrossRef]
- Guiaşu, S. Weighted Entropy. Rep. Math. Phys. 1971, 2, 165–179. [Google Scholar] [CrossRef]
- Taneja, H.C.; Tuteja, R.K. Characterization of a Quantitative-Qualitative Measure of Inaccuracy. Kybernetika 1986, 22, 393–402. [Google Scholar]
- Di Crescendo, A.; Longobardi, M. On Weighted Residual and Past Entropies. Sci. Math. Jpn. 2006, 64, 255–266. Available online: https://arxiv.org/pdf/math/0703489.pdf (accessed on 20 August 2019).
- Ebrahimi, N. How to Measure Uncertainty in the Residual Life Time Distribution. Ind. J. Stat. Ser. A 1996, 58, 48–56. [Google Scholar]
- Pirmoradian, M.; Adigun, O.; Politis, C. Entropy-Based Opportunistic Spectrum Access for Cognitive Radio Networks. Trans. Emerg. Telecommun. Technol. 2014. [Google Scholar] [CrossRef]
- Tsui, P.-H. Ultrasound Detection of Scatterer Concentration by Weighted Entropy. Entropy 2015, 17. [Google Scholar] [CrossRef]
- Matta, C.F.; Massa, L.; Gubskaya, A.V.; Knoll, E. Can One Take the Logarithm or the Sine of a Dimensioned Unit or Quantity? Dimensional Analysis Involving Transcendental Functions. J. Chem. Ed. 2011, 88, 67–70. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 1990; pp. 228–229. [Google Scholar]
- Buckingham, E. On Physically Similar Systems. Phys. Rev. 1914, 4, 345–375. [Google Scholar] [CrossRef]
- Bridgeman, P.W. Dimensional Analysis; Yale University Press: London, UK, 1922. [Google Scholar]
- Molyneux, P. The Dimensions of Logarithmic Quantities. J. Chem. Eng. 1991, 68, 467–469. [Google Scholar]
- Boyle, R. Smallest Sliver of Time Yet Measured Sees Electrons Fleeing Atom. New Scientist. 11 November 2016. Available online: https://www.newscientist.com/article/2112537-smallest-sliver-of-time-yet-measured-sees-electrons-fleeing-atom/ (accessed on 30 November 2017).
- Sebah, P.; Gourdon, X. Introduction to the Gamma Function. Available online: http://pwhs.ph/wp-content/uploads/2015/05/gammaFunction.pdf (accessed on 17 March 2017).
- Schenkelberg, F. Central Limit Theorem. Available online: https://accendoreliability.com/central-limit-theorem/ (accessed on 21 May 2017).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).