Next Article in Journal
Optimization of Two-Stage Peltier Modules: Structure and Exergetic Efficiency
Previous Article in Journal
Potential and Evolution of Compressed Air Energy Storage: Energy and Exergy Analyses

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# The Entropy of a Discrete Real Variable

by
Scott Funkhouser
SPAWAR Systems Center Atlantic, Joint Base Charleston, North Charleston, SC 29406, USA
Entropy 2012, 14(8), 1522-1538; https://doi.org/10.3390/e14081522
Submission received: 12 June 2012 / Revised: 3 August 2012 / Accepted: 6 August 2012 / Published: 17 August 2012

## Abstract

:
The discrete Shannon entropy H was formulated only to measure indeterminacy effected through a set of probabilities, but the indeterminacy in a real-valued discrete variable depends on both the allowed outcomes $x$ and the corresponding probabilities $p$. A fundamental measure that is sensitive to both $x$ and $p$ is derived here from the total differential entropy of a continuous real variable and its conjugate in the discrete limit, where the conjugate is universally eliminated. The asymptotic differential entropy recovers H plus the new measure, named Ξ, which provides a novel probe of intrinsic organization in sequences of real numbers.

## 1. Introduction

Let Y be a discrete variable with $n > 1$ generic outcomes $y = { y 1 , ⋯ , y n }$ and corresponding probabilities $p = { p 1 , ⋯ , p n }$ such that
$∑ j = 1 n p j = 1$
Expressed in terms of the natural logarithm, the Shannon entropy attributed to Y in connection with $p$ is [1]
$H ( p ) = - ∑ j = 1 n p j log ( p j )$
For convenience let
$H 2 ( p ) ≡ H ( p ) log ( 2 )$
represent the original form defined in terms of the base-2 logarithm. $H = H ( p )$ is bound according to
$0 ≤ H ≤ log ( n )$
The maximum is generated only when all outcomes are equally probable. $H ( p )$ vanishes in the limit as some $p j ∈ p$ approaches unity, which represents a completely predetermined outcome.
Because it emerges naturally from the basic principles of encoded compression, the discrete Shannon entropy is widely recognized as an informatic measure. Specifically, consider a message $m$ consisting of $N ≫ 1$ symbols from a generalized alphabet $s = { s 1 , ⋯ , s a }$. Let $N j$ represent the total number of occurrences of a particular symbol $s j ∈ s$ within $m$. In the limit as N becomes infinitely large suppose that each $N j / N$ approaches a certain constant $w j$ that is characteristic to the “language” in which $m$ is composed. For sufficiently large N we therefore have $N j ≃ w j N$. The total number I of likely messages of length N is consequently fixed by $w = { w 1 , ⋯ , w a }$ according to
$I ≃ ∏ j = 1 n w j - N j$
The right side of Equation (5) is simply the inverse of the probability for finding any one particular $m$ among all messages of length N subject to $w$. The inventory of likely messages could be encoded as I different integers. The number β of bits required to register the encoded inventory is given by $β = ⌈ log 2 ( I ) ⌉$, and we therefore have the average compression rate
$β N ≃ H 2 ( w )$
Notwithstanding its significance, Equation (6) represents only a particular example of the expansive utility of the discrete Shannon entropy. Interpreted most broadly, and in accordance with the purpose for which it was derived, $H ( p )$ measures the indeterminacy in a random selection from among n different options with corresponding probabilities $p$ [1]. It is therefore appropriate to attribute a quantity of entropy $H ( p )$ to a discrete variable with probabilities $p$.
Consider a variable Y whose $n ≫ 1$ allowed outcomes $y$ are governed by some $p$ such that $H ( p )$ is near $log ( n )$. The nearly maximal entropy implies that Y exhibits no significant statistical preference toward any particular outcome or group of outcomes among $y$. Consequently an ideal observer who studied the behavior of Y could develop a successful model only to predict the most trivial details about future outcomes. Stated alternatively, the degree to which Y behaves deterministically is minimal. In contrast suppose that $H ( p )$ is much smaller than $log ( n )$. The nearly minimal entropy implies a strong statistical preference for some comparatively small number of $y j ∈ y$. As such Y would exhibit an appreciable degree of deterministic behavior. A successful, non-trivial predictive model could be therefore developed from observations of Y.
It is important to emphasize that H is a rigorous measure of indeterminacy in a variable only insofar as the indeterminacy depends on $p$ alone. Let X be a discrete variable whose n allowed outcomes $x = { x 1 , ⋯ , x n }$ are D-dimensional real vectors, i.e. $x j ∈ ℜ D$ for $j = 1 , ⋯ , n$. The elements of every one-dimensional $x$ are implicitly listed in ascending order throughout the following. Analogously to the real nature of $p$, the real nature of $x$ endows the outcomes of X with definite spatial attributes in which non-trivial deterministic behaviors could manifest. Consequently, as illustrated in the following two examples, H is not generally sufficient as a measure of indeterminacy in real-valued discrete variables.
Consider two variables $X e e$ and $X e r$ whose outcomes are confined to the real axis. Let the allowed outcomes of $X e e$ be equally probable and equally spaced, hence the subscript “$e e$”, and let the allowed outcomes of $X e r$ be equally probable but randomly spaced, hence the subscript “$e r$”. More formally, the allowed outcomes of $X e e$ are of the form $x e$, which is defined for $D = 1$ to be a set of n equally spaced points on the real axis. Let γ represent the arbitrary spacing in a given $x e$. For convenience we also define $p e ≡ { 1 / n , ⋯ , 1 / n }$ for a given n to represent a set of equal probabilities. The allowed outcomes of $X e r$ are of the form $x r$, which is defined for $D = 1$ to be indistinguishable from a set of n points distributed randomly over some simple finite segment $X 1$ of the real axis. The intrinsic spatial structure of $x e$ effects a commensurate degree of deterministic behavior in $X e e$. For instance, the difference between successive outcomes of $X e e$ is always an integer multiple of γ, and is therefore significantly more predictable than the difference between successive outcomes of $X e r$. Because $X e e$ and $X e r$ differ only in the spatial arrangement of their allowed outcomes, $X e e$ would behave more deterministically. By definition a smaller quantity of entropy should be therefore attributed to $X e e$. The discrete Shannon entropy attributed to both variables, however, would be identically maximal.
Consider next a complementary scenario in which the allowed outcomes are identically valued but the respective probabilities differ. Let $X r e$ and $X a e$ be two variables with $n ≫ 1$ one-dimensional allowed outcomes of the form $x e$. The set of probabilities governing $X r e$ is of the form $p r$, which is defined generally to be indistinguishable from n randomly selected positive real numbers, subsequently normalized to unity and listed in random order. The spatial assignment of $p r$ through $x e$ would be globally isotropic but locally irregular, despite the intrinsic structure of $x e$. The outcomes of $X a e$ are matched to a set of probabilities of the form $p a$, which is defined generally to be indistinguishable from n randomly selected positive real numbers, subsequently normalized to unity and listed in ascending order. The ordered arrangement of $p a$ effects a non-trivial deterministic bias favoring progressively larger outcomes in $X a e$. Note that the specific form of $p a$ in this example could be constructed simply by sorting the elements of $p r$, in which case $X a e$ would behave more deterministically but $H ( p r )$ and $H ( p a )$ would be identical.
The problems raised in the previous two paragraphs do not imply any flaw in H. The discrete Shannon entropy was formulated only to measure indeterminacy effected by a given $p$ in a generic discrete variable. The net indeterminacy in a real-valued discrete variable, however, depends on the respective intrinsic characteristics of $x$ and $p$, and on the manner in which $x$ and $p$ are matched. A special measure is therefore required for discrete real variables. Note that Equation (6) is insensitive to the nature of $s$, and the present considerations are therefore irrelevant in the context of the coding theorems. To be precise, the role of $H 2$ in Equation (6) is not truly as a measure of indeterminacy.
Although it included no special provisions for discrete real variables, Shannon’s seminal paper introduced a separate formulation for the entropy of a continuous real variable. Let $X$ be a probabilistically determined variable whose allowed outcomes span a continuum of real vectors within some region $X ∈ ℜ D$. Let the probability density $ρ ( x )$ be defined such that $ρ ( x ) d x$ gives the probability for an outcome of $X$ to lie within an infinitesimal element $d x$ centered on x, where $x ∈ ℜ D$ is an independent variable and $d x = d x ( 1 ) ⋯ d x ( D )$ represents implicitly the product of the differentials of each vector component of $x = ( x ( 1 ) , ⋯ , x ( D ) )$. We therefore require $ρ = ρ ( x )$ to vanish for all $x ∉ X$. The normalization of ρ is accordingly
$∫ ℜ D ρ d x = ∫ X ρ d x = 1$
Henceforth any integration is understood to span $ℜ D$ in the absence of explicit limits. The entropy $H = H ( ρ )$ attributed to $X$ in connection with ρ is [1]
$H = - ∫ ρ log ρ d x$
where $ρ log ( ρ )$ is defined to vanish for $ρ = 0$. The measure in Equation (8) is known as the differential (Shannon) entropy.
In contrast to H a single quantity of $H$ is subject to no finite bounds. Consequently a simple comparison between H and $H$ is not meaningful. For instance, the differential entropy of a normalized Gaussian
is $D log ( π e / ν )$. For ν equal to $π e$ the entropy vanishes, which signifies a predetermined outcome in the context of discrete real variables. Furthermore in the limit as ν becomes infinitely large $ρ G$ behaves as a D-dimensional delta-distribution $δ ( x )$, which describes a single predetermined real outcome. $H ( ρ G )$, however, decreases without bound as ν approaches infinity.
The apparent inconsistencies between H and $H$ are reconciled in the following manner. As a generalization of $ρ G$ let $ρ 0 = ρ 0 ( ν , x )$ be some well-behaved, unity-normalized function that becomes $δ ( x )$ in the limit as the positive, real parameter ν becomes infinitely large. In association with an arbitrary X, with n allowed outcomes $x$ and corresponding probabilities $p$, let the probability density $ρ X = ρ X ( ν , x )$ be defined such that
$ρ X ( ν , x ) ∼ B ∑ j = 1 n p j ρ 0 ( ν , x - x j )$
and thus
$lim ν → ∞ ρ X = ∑ j = 1 n p j δ ( x - x j )$
where $B = B ( ν )$ is the appropriate normalization coefficient for a given ν, and $x j$ and $p j$ span $x$ and $p$ respectively. The symbol “∼” is used in Equation (10) and throughout, where convenient, to signify asymptotic equality as ν approaches infinity. The right side of Equation (11) describes the probabilistic behavior of X. Consider an originally continuous test variable $X X$ whose outcomes are governed by $ρ X$. The entropy attributed to $X X$ is $H ( ρ X )$, which behaves asymptotically as
When ν is finite but very large the behavior of $X X$ is a hybrid of the continuous traits of $ρ 0$ and the discrete traits of $x$ and $p$. In the limit as $ν → ∞$ the two sets of traits become completely disassociated from one another, and $X X = X$ is characterized only by $x$ and $p$. The entropy associated with the probabilistic behavior of X should be therefore given by the asymptote of $H ( ρ X ) - H ( ρ 0 )$. From Equation (12) we readily obtain
thereby recovering H from $H$.
Although the analysis leading to Equation (13) may be instructive, $H ( ρ X )$ is asymptotically independent of $x$ and could not produce the desired new measure for discrete real variables. It is reasonable to suspect that some additional quantity of the form $H$ could be considered in conjunction with $H ( ρ X )$ to provide the required asymptotic sensitivity to both $x$ and $p$. In order to be meaningful, however, the additional measure should be intrinsically related to $ρ X$.
Within the paradigm of wave mechanics two distinct quantities of differential entropy are naturally associated with a given ρ. More precisely, two separate probability density functions are naturally associated with a given probability amplitude $ψ = ψ ( x )$, namely $ρ = | ψ | 2$ and the spectral density
where
is the Fourier-conjugate of ψ and $k ∈ ℜ D$ is an independent variable. We may interpret $ρ ^ = ρ ^ ( k )$ as the probability density governing the outcomes of a variable $K$ that is the conjugate of the variable $X$ governed by ρ. The entropy attributed to $K$ in connection with $ρ ^$ is $H ( ρ ^ )$. The total differential entropy attributed to the conjugated pair is therefore
$S ( ψ ) ≡ H ( | ψ | 2 ) + H ( | ψ ^ | 2 )$
A more formal derivation of $S = S ( ψ )$ follows from the total phase-space distribution [9]
$f = f ( x , k ) ≡ ρ ρ ^$
which is subject to the normalization
$∫ f d r = 1 ,$
where for any given x and k. Because f is simply a probability density in a $2 D$-dimensional vector space, it is appropriate to attribute a quantity of differential entropy
$- ∫ f log f d r = S$
to the conjugated pair.
The total differential entropy S has been the subject of extensive research. (See, for instance, [2,3,4,5,6,7,8]). It is worthwhile to mention here two general characteristics that distinguish S as a fundamentally important measure. Because f is dimensionless S, like the relative entropy, is free from a potentially serious flaw to which the differential Shannon entropy is otherwise vulnerable [10]. Furthermore, whereas $H ( ρ )$ and $H ( ρ ^ )$ may become infinitely negative, S is subject to the lower bound [2]
$S ≥ D log ( π e )$
Perhaps uniquely, a Gaussian amplitude generates the minimal S allowed by Equation (20) [2].
The purpose of this article is to demonstrate that a measure of indeterminacy for discrete real variables that is sensitive to both $x$ and $p$ emerges from the total differential entropy of $X X$ and its conjugate in the limit as ν diverges. Specifically, the conjugated analogue of Equation (13) includes an additional measure $Ξ = Ξ ( p , x )$, expressed most generally in terms of an integral, that behaves naturally as a measure of indeterminacy in the spatial configuration of $x$ and $p$. Furthermore the conjugate is universally eliminated in the limit as $X X$ becomes discrete and the entire residual entropy $H + Ξ$, named η is therefore attributed to X. Section 2 contains a derivation of η. The general characteristics of η are examined in Section 3. Section 4 presents a quantitative study demonstrating the basic behaviors of η as a measure of indeterminacy for discrete real variables. The primary conclusions are summarized and interpreted in Section 5.

## 2. Derivation of η

The terms defined in Section 1 retain their definitions throughout the following. Let $K X$ be the conjugate of a given $X X$. The probability density $ρ ^ X = ρ ^ X ( ν , k )$ governing the outcomes of $K X$ is formulated presently. Let $ψ X = ψ X ( ν , x )$ be defined such that $ρ X = | ψ X | 2$. The total differential entropy of the conjugated pair is thus $S ( ψ X ) = H ( ρ X ) + H ( ρ ^ X )$. This section contains a derivation of the asymptotic form of $S ( ψ X )$ expressed analogously to Equation (13).
The first step is to formulate $ψ X$. Following the standard wave-mechanical prescription, and in accordance with natural law, we attribute to each allowed outcome a separate amplitude [11]. The total amplitude is simply the sum of the individual amplitude waveforms multiplied by an appropriate normalization coefficient. Let the probability amplitude $ϕ 0 = ϕ 0 ( ν , x )$ be therefore defined such that $| ϕ 0 | 2 = ρ 0$, i.e.,
$lim ν → ∞ | ϕ 0 | 2 = δ ( x )$
The waveform assigned to each $x j ∈ x$ is
$ϕ j = ϕ j ( ν , x ) ≡ p j ϕ 0 ( ν , x - x j )$
and the total amplitude is accordingly
$ψ X = A ∑ j = 1 n ϕ j$
where $A = A ( ν )$ is defined such that $ρ X$ is unity-normalized for all ν. Because the real part of $ϕ 0$ becomes ever more sharply peaked with increasing ν, the cross terms $| A | 2 ϕ j ϕ l *$ in $| ψ X | 2$ are negligible for sufficiently large ν and vanish asymptotically. We therefore have
$ρ X ∼ | A | 2 ∑ j = 1 n | ϕ j | 2$
which is equivalent to Equation (10). Note that
$lim ν → ∞ | A | 2 = lim ν → ∞ B = 1$
The next step is to determine the respective forms of the conjugate amplitude $ψ ^ X = ψ ^ X ( ν , k )$ and density $ρ ^ X = | ψ ^ X | 2$. Let $ϕ ^ 0 = ϕ ^ 0 ( ν , k )$ represent the conjugate of $ϕ 0$, and let the associated density $| ϕ ^ 0 | 2$ be defined as $ρ ^ 0 = ρ ^ 0 ( ν , k )$. It follows from Equation (15), with $x - x j$ substituted for x, that
$ϕ ^ j = p j e - ı k · x j ϕ ^ 0$
where $ϕ ^ j = ϕ ^ j ( ν , k )$ is the conjugate of a given $ϕ j$. We therefore have
$ψ ^ X = A ϕ ^ 0 ∑ j = 1 n p j e - ı k · x j$
and thus
$ρ ^ X = | A | 2 τ ρ ^ 0$
where
An equivalent expression for $τ = τ ( k )$ following from the Euler identity is
For convenience we define $ζ = ζ ( k )$ such that $τ = 1 + 2 ζ$.
The asymptotic behavior of $ρ ^ X$ is ascertained readily from the explicit form of $ϕ ^ 0$ in Equation (15) with $ϕ 0$ expressed as $ρ 0 / ϕ 0 *$ for all non-vanishing $ϕ 0 *$. It follows from the behavior ascribed to $ρ 0$ that
which is independent of k. Because $∫ ρ 0 d k$ must remain unity-normalized $ϕ ^ 0$ must vanish everywhere asymptotically. As $ρ 0$ increasingly resembles $δ ( x )$ the real part of $ϕ 0 * ( 0 )$ must either increase or decrease without bound, which ensures that the right side of Equation (31) vanishes. Consequently $ψ ^ X$ and $ρ ^ X$ vanish asymptotically for all k.
It is also worthwhile to note that, because $ρ ^ X$ and $ρ ^ 0$ are both unity-normalized for all ν, Equations (28) and (30) imply
$lim ν → ∞ ζ 0 = 0$
where
$ε 0 ≡ ∫ ρ ^ 0 ε d k$
is the expected value of any $ε = ε ( k )$ averaged against $ρ ^ 0$. Equation (32) may be understood as a consequence of the asymptotic uniformity of $ρ ^ 0$ in conjunction with the usual trigonometric property
$lim b → ∞ ∫ - b b cos ( k ) d k = 0$
In the limit as ν diverges $ζ 0$ becomes simply the average of ζ over all space, which we denote by $ζ ¯$, and thus vanishes term-by-term in accordance with Equation (34). Expressed more generally,
$lim ν → ∞ ε 0 = ε ¯ .$
Returning to the derivation proper, the next step is to evaluate $H ( ρ ^ X )$ in the limit as ν approaches infinity. It follows from Equations (25) and (28) that
Substituting $1 + 2 ζ$ for τ in the left-most term on the right side of Equation (36) produces
Although $log ( ρ ^ 0 )$ decreases without bound it does so with asymptotic uniformity, hence
for any $k 0 ∈ ℜ D$. The middle term on the right side of Equation (37) is therefore asymptotically proportional to $ζ ¯$ and thus vanishes. Equation (37) accordingly becomes
The asymptotic form of the right-most term in Equation (39) is
$Ξ ( p , x ) ≡ - τ log ( τ ) ¯$
Because τ is periodic $Ξ = Ξ ( p , x )$ is identical to the average value of $- τ log ( τ )$ taken over some finite region $K 0 ∈ ℜ D$ spanning one period in each dimension. We therefore have
$Ξ = - 1 κ 0 ∫ K 0 τ log ( τ ) d k$
where
$κ 0 ≡ ∫ K 0 d k$
An analytical expression for $∫ τ log ( τ ) d k$ is possible only for a few rudimentary configurations of $x$ and $p$, and Equation (41) is thus critical for quantifying Ξ.
Finally, Equation (39) may be expressed in terms of a proper limit as
analogously to Equation (13). Combining Equations (13) and (43) produces the central result
where
$p , x ) ≡ H ( p ) + Ξ ( p , x )$
The nature and behavior of $p , x )$ are examined in the following Sections.

## 3. General Characteristics of η

By definition $S ( ψ X )$ measures the net indeterminacy in the conjugated pair $X X$ and $K X$. In the limit as ν becomes infinitely large, however, the pair is characterized completely by $x$ and $p$, independently of $ϕ 0$. The entropy attributed asymptotically to the pair is therefore η. Furthermore as $X x$ becomes discrete $ρ ^ x$ vanishes everywhere and $K x$ becomes therefore a “ghost” variable, doomed to exist without ever generating an outcome. The indeterminacy measured by η must be therefore evident in the probabilistic behaviors of X. Because X and its ghost conjugate together are indistinguishable from X alone we conclude that η is a generally valid measure of indeterminacy in discrete real variables.
Given that Ξ is the asymptotic form of $H ( ρ ^ X ) - H ( ρ ^ 0 )$ its attribution to X is perhaps counterintuitive, despite the elimination of the conjugate. In particular $H ( ρ )$ and $H ( ρ ^ )$ are typically anti-correlated among different ρ of a given generalized form. Recall, however, that $H ( ρ X )$ is asymptotically insensitive to $x$ and therefore is not generally correlated to $H ( ρ ^ X )$ in any appreciable manner among different $ρ X$. Not only is $H ( ρ ^ X )$ sensitive to both $x$ and $p$, it is also naturally correlated to the intrinsic indeterminacy effected by the definite spatial attributes of $x$. Stated most generally, greater spatial regularity in the configuration of $x$ and $p$ effects greater harmonic regularity in the spatial arrangement of the peaks in $ρ X$, for $ν ≫ 1$. Greater harmonic regularity in $ρ X$ implies greater localization in the associated Fourier spectrum, which accordingly generates a smaller $H ( ρ ^ X )$, for a given $ϕ 0$. In order to isolate the effects associated with $x$ and $p$ we exclude the contribution from $H ( ρ ^ 0 )$. In that manner Ξ measures the deterministic effects of real allowed outcomes.
Lower bounds on η and Ξ follow directly from the universal lower bound on S. Specifically, both $S ( ψ x )$ and $S ( ψ 0 )$ are subject to Equation (20) for all ν. We therefore have
$η ≥ 0$
and thus
$Ξ ≥ − H$
Regardless of the degree of spatial structure in the configuration of $x$ and $p$, no variable could behave more deterministically than a fixed real number. Equation (46) is therefore consistent with the attribution of zero entropy to a predetermined outcome.
Upper bounds on η and Ξ may be obtained by writing Equation (41) as
$Ξ = − ∫ k 0 τ 𝜅 0 log ( τ 𝜅 0 ) d k − log ( 𝜅 0 )$
where the right-most term has been reduced using
$τ _ = 1 𝜅 0 ∫ k 0 τ d k = 1 ,$
which is a consequence of Equation (34). The integral on the right side of Equation (48) is simply the differential Shannon entropy associated with a normalized density τ/k0 defined over $K 0$. Furthermore log(k0) is the maximal differential Shannon entropy that could be associated with $K 0$. Equation (48) therefore implies
$Ξ ≤ 0$
hence
$η ≤ H$
The upper bound on η is consistent with the basic expectations about the predictive utility of real outcomes. Consider some X that has been “disguised” by mapping the n allowed outcomes in $x$ to a set of n generic outcomes, while keeping the corresponding probabilities $p$ fixed. An ideal observer who studied only the generic outcomes could form a predictive model only to the extent allowed by $p$. Suppose that such an observer were subsequently allowed to study the real outcomes. The numeric details could never detract from the observer’s capability to develop a predictive model because, at very least, the outcomes from any $x$ could be interpreted simply as non-numeric symbols. The entropy attributed to X, therefore, should not be greater than the entropy attributed to the generic analogue, which is precisely what Equation (51) ensures.
For any $p$, H and η should be similar when the degree of deterministic structure within $x$ is negligible. We therefore expect
$η ( p , x r ) ≃ H ( p )$
which is validated in the following section. Among variables with spatially randomized outcomes, indeterminacy depends primarily on the intrinsic characteristics of $p$, hence Equation (52). Conversely, among variables with equally probable outcomes, indeterminacy depends only on x. In that manner η($p$e, x) measures the intrinsic indeterminacy in the spatial configuration of x. It is therefore fitting, as explicitly evident in Equation (30), that τ is sensitive to each of the n(n − 1)/2 uniquely defined positive intervals xlxj within a given x.
Because there is no preferred region within $ℜ D$, any meaningful measure of indeterminacy for discrete real variables must be invariant under spatial transformations of the form $x + Δ x ≡ { x 1 + Δ x , ⋯ , x n + Δ x }$ for any finite $Δ x ∈ ℜ D$. By inspecting either Equation (29) or Equation (30) we readily find that τ is translationally invariant, hence
$p , x + Δ x ) = p , x )$
Similarly transformations of the form $R x ≡ { R x 1 , ⋯ , R x n }$, where $R$ is a D-dimensional unitary rotation matrix, could not change the indeterminacy in any X. Consequently the entropy attributed to X must be rotationally invariant as well. The effects of such a rotation would be equivalent to replacing the original $τ ( k )$ with $τ ′ ( k ) = τ ( k R )$, which we may write as $τ ′ ( k ) = τ ( k ′ )$ after the variable substitution $k ′ = k R$. The rotation therefore amounts to nothing more than a cosmetic change of variables, which does not affect the average value of $τ log ( τ )$—whether taken over all space or taken over a finite region spanning a period in each dimension of $k ′$. We therefore have
$η ( p , R x ) = p , x )$
For the same reasons we also have
$η ( p , α x ) = p , x )$
for any non-vanishing $α ∈ ℜ$, where $α x ≡ { α x 1 , ⋯ , α x n }$. Contrary to Equation (55), suppose that the entropy of a real discrete variable were to increase under transformations of the form $α x$ for $| α | > 1$. Consider applying such transformations successively to the allowed outcomes of some non-trivial X. Given that the entropy must never exceed H the effects of the transformations must become negligible at some point, which would imply the existence of some preferred spatial scale for a given $x$ and $p$. The indeterminacy, however, could not be sensitive to such a scale. The scale-invariance expressed in Equation (55) is therefore necessary for any meaningful measure of indeterminacy in discrete real variables.

## 4. Quantitative Study

This section presents the results of numerical calculations of η chosen to validate the identification of η as the entropy of a discrete real variable and to demonstrate the utility of Ξ as a probe of deterministic structure in sequences of real numbers or non-numeric symbols. For convenience this introductory study is restricted to scenarios where $D = 1$; the conclusions are readily generalized to higher-dimensional spaces. In each calculation of η reported here the period-averaged value of −τlog(τ) was obtained from 25κ0/κ*⌋ samples, where κ* = 2 π/(xnx1) represents the finest periodicity in τ. Note that xnx1 is always the largest positive interval within a one-dimensional x. All computations were precise to 15 significant figures. The accuracy, in comparison to the “true” value of η produced by an infinite number of samples, is always better than $10 - 6 .$η.
The first sets of calculations examined below involve the basic types of $x$ and $p$ introduced in Section 1, matched in various instructive combinations. Each $p r$ was constructed from n different real numbers with 15 significant figures selected randomly from $( 0 , 1 )$, and subsequently normalized. Each $x r$ was constructed by randomly selecting n different integers from $[ 1 , 100 n ]$. This produces approximately the same effect as using the full range available with 15 significant figures, but dramatically reduces processing requirements. For each calculation involving $p r$ a corresponding calculation involving $p a$ is performed, and each $p a$ is constructed by sorting the elements of the corresponding $p r$. Finally, each reported quantity of η involving a randomly generated parameter is the average of 100 separate calculations with different randomizations.
Let us begin by examining how η measures indeterminacy effected through different types of p. Figure 1 displays calculations of η(pe, xe), η(pa, xe) and η(pr, xe) plotted as functions of n. Recall that η(pr, xe) and η(pa, xe) should behave respectively as quantities of entropy attributed to the variables Xre and Xae examined in Section 1. We find that η(pr, xe) is appreciably greater than η(pa, xe) for all but trivially small n, which is consistent with the expected result. Because the only difference between each pa and pr is the order of their respective terms, the differences between the associated quantities of entropy are entirely due to the contribution from Ξ.
Perhaps more importantly, and in opposition to the behavior of H alone, we find that $η ( p r , x e )$ is also appreciably greater than $η ( p e , x e )$ for all non-trivial n. Among generic discrete variables any non-uniformity in $p$ is a benefit to predictability, hence $H ( p r ) ≤ H ( p e )$. Among real-valued discrete variables, however, the predictive benefit of non-uniformities in $p$ depends on the manner in which the probabilities are spatially assigned. In the case of $X r e$ the indeterminacy effected by the irregular spatial assignment of $p r$ evidently outweighs the intrinsic predictive benefit of $p r$, which implies that $X e e$ behaves more deterministically than $X r e$. Though not necessarily expected a priori, that implication is not surprising.
Figure 1. Plots of $η ( p e , x e )$, $η ( p a , x e )$ and $η ( p r , x e )$ using the respective symbols “•” , “◃” and “+”.
Figure 1. Plots of $η ( p e , x e )$, $η ( p a , x e )$ and $η ( p r , x e )$ using the respective symbols “•” , “◃” and “+”.
As a complement to the previous scenario it is instructive to compare the effects of the three types of probabilities when matched to spatially randomized outcomes. Figure 2 displays calculations of $η ( p e , x r )$, $η ( p a , x r )$ and $η ( p r , x r )$ plotted as functions of n. Note that $η ( p a , x r )$ and $η ( p r , x r )$ nearly coincide for each n, and the symbols “⋄” and “*” are always superimposed in Figure 2. The entropy-difference $η ( p r , x r ) - η ( p a , x r )$ is shown in Figure 3. In contrast to the previous scenario $p e$ consistently effects the largest entropy among the three, which is expected from Equation (52) given that $H ( p e )$ is maximal. Note that $η ( p e , x r )$ would be even closer to $H ( p e )$ had the randomness of $x r$ not been restricted for the sake of computational feasibility. The similarity between $η ( p r , x r )$ and $η ( p a , x r )$ is also expected from Equation (52) given that $H ( p r ) = H ( p a )$. Furthermore insofar as $η ( p r , x r )$ and $η ( p a , x r )$ differ the former should be typically greater for sufficiently large n, which is confirmed in Figure 3.
Let us next examine calculations demonstrating how η measures indeterminacy effected through x, which is perhaps the most significant attribute of η. The most basic comparison corresponds to the scenario involving $X e e$ and $X e r$ in Section 1. It is immediately evident from Figure 1 and Figure 2 that $η ( p e , x r )$ is considerably larger than $η ( p e , x e )$ for all n, which is expected.
Figure 2. Plots of $η ( p e , x r )$, $η ( p a , x r )$ and $η ( p r , x r )$, using the respective symbols “×”, “⋄” and “*”. The solid line follows a plot of $log ( n )$ for reference.
Figure 2. Plots of $η ( p e , x r )$, $η ( p a , x r )$ and $η ( p r , x r )$, using the respective symbols “×”, “⋄” and “*”. The solid line follows a plot of $log ( n )$ for reference.
Figure 3. The entropy difference Δη plotted here is η(pr,xr) - η.(pa, xr).
Figure 3. The entropy difference Δη plotted here is η(pr,xr) - η.(pa, xr).
In order to test the sensitivity of η to more subtle differences in spatial structure the following additional sets of type $x$ are defined. For a given n let xp consist of the first n prime numbers. It follows from the Prime Number Theorem (PNT) that the local average gap in the vicinity of a given xj ∈ xp varies as log(j) for sufficiently large j. The term gap is used throughout in reference to a positive interval between consecutive terms in any given one-dimensional $x$ and is defined such that the j-th gap is
$g j ≡ x j + 1 − x j$
The global anisotropy associated with the PNT constitutes an intrinsic deterministic spatial characteristic. For a more pronounced anisotropy of the same kind, let $x , p = { 2 , 3 … }$ be defined for a given n such that its sequence of gaps consists of the first n − 1 prime gaps arranged in ascending order. As an example, for n = 6 we have $x , p = { 2 , 3 , 5 , 7 , 9 , 13 }$. Let $x ~ p = { 2 , 3 … }$ be defined for a given n such that its sequence of gaps consists of the first n − 1 prime gaps ordered randomly, with the exception that the first gap in $x ~ p$ is always 1. Finally let $x ˘ p = { 2 , 3 … }$ be defined such that its sequence of gaps is $g 1 ′ , g 3 ′ , g 2 ′ , g 5 ′ , g 4 ′ , …$, where $g j ′$ is the j-th prime gap. In other words, the sequence of gaps in $x ˘ p$ is constructed simply by alternating the order of $g i ′$ and $g i + 1 ′$ for all even i no greater than n − 2. Consequently every even-numbered element of $x ˘ p$ is guaranteed to be identical to the corresponding prime. As an example we have $x ˘ p = { 2 , 3 , 5 , 7 , 9 , 13 , 15 , 19 }$ for n = 8. Note that the average gap in the vicinity of some $x j ∈ x ˘ p$ differs only negligibly from the corresponding average gap in the primes, and $x ˘ p$ therefore exhibits the same global anisotropy inherent to xp.
Consider four variables with $n ≫ 1$ allowed outcomes $x p$, $x ´ p$, $x ˜ p$ and $x ˘ p$, respectively, all of which are equally probable. As $x ˜ p$ is the least deterministically configured among the four, $η ( p e , x ˜ p )$ should be largest. Furthermore, given that the sequence of gaps in $x ´ p$ ascends uniformly, the deterministic nature of the associated anisotropy is significantly greater than in $x p$ and $x ˘ p$. We therefore expect $η ( p e , x ´ p )$ to be the smallest entropy among the four. Figure 4 shows $η ( p e , x p )$, $η ( p e , x ´ p )$, $η ( p e , x ˜ p )$ and $η ( p e , x ˘ p )$ plotted as functions of n. Although $η ( p e , x ˜ p )$ and $η ( p e , x ˘ p )$ differ only negligibly for n smaller than approximately 200, for all larger n the entropy due to $x ˜ p$ is the greatest among the four. The small but non-negligible difference between $η ( p e , x ˜ p )$ and $η ( p e , x ˘ p )$ for large n is attributed to the global anisotropy in $x ˘ p$. The entropy due to $x ´ p$ is the smallest among the four. The behavior of η in these calculations is therefore consistent with the stated expectations.
The dramatic difference between $η ( p e , x p )$ and $η ( p e , x ˘ p )$ is noteworthy and unexpected. The smaller entropy due to $x p$ implies a greater degree of intrinsic determinacy in the arrangement of $x p$ along the real axis. Note that the implied deterministic structure could not be associated with the PNT. A detailed analysis of this finding is beyond the present scope and has become the subject of a separate investigation [12]. For the present purposes it is sufficient to mention, as the reader may verify readily, that the number of positive intervals of a given size d among the first $n ≫ 1$ primes is strongly correlated to the largest primordial factor of d [12]. That regularity could represent the deterministic property intimated by $p e , x p )$.
Figure 4. Plots of $η ( p e , x p )$, $η ( p e , x ´ p )$, $η ( p e , x ˘ p )$ and $η ( p e , x ˜ p )$, shown with “•”, “◃”, “⋄” and “*”, respectively, measuring intrinsic structure in the primes $x p$ and in three variants constructed by changing the order of the first $n - 2$ even prime gaps.
Figure 4. Plots of $η ( p e , x p )$, $η ( p e , x ´ p )$, $η ( p e , x ˘ p )$ and $η ( p e , x ˜ p )$, shown with “•”, “◃”, “⋄” and “*”, respectively, measuring intrinsic structure in the primes $x p$ and in three variants constructed by changing the order of the first $n - 2$ even prime gaps.

## 5. Summary and Conclusions

For finite ν the entropy attributed to the variable $X X$ governed by $ρ X ( ν , x )$ is simply $H ( ρ X )$. In the limit as $ν → ∞$, however, $X X$ becomes a discrete variable X whose behavior is insensitive to the function $ρ 0$ from which $ρ X$ is constructed. The entropy attributed asymptotically to X is therefore $H ( ρ X ) - H ( ρ 0 )$, which recovers the discrete Shannon entropy. Similarly when ν is finite a quantity of entropy $S ( ψ X )$ is attributed to the conjugated pair $X X$ and $K X$. The nature of the pair, however, is asymptotically insensitive to $ϕ 0$. The entropy attributed to the pair in the limit as ν diverges is therefore $S ( ψ X ) - S ( ϕ 0 )$, whose asymptotic form $H + Ξ$ is defined as η. Furthermore because the conjugate is asymptotically eliminated η must be attributed to X alone.
As H is a function only of $p$, the existence of some $x$-dependent contribution to the entropy of a discrete real variable is expected independently of the introduction of Ξ. Whereas the form of H uniquely exhibits the properties required for a self-consistent measure of indeterminacy effected by $p$ [1], the complete set of requirements for a measure of indeterminacy effected by $x$ and $p$ is not readily ascertained a priori. Among those required properties are certainly translational invariance, scale-invariance, rotational invariance, non-negativity and being bound from above by H, all of which were proven for η Section 3. Furthermore the calculations presented in Section 4 demonstrate that η behaves in the expected manner over a broad range of different configurations of $x$ and $p$.
The somewhat counterintuitive role of Ξ is a natural consequence of the relationship between a given wave-mechanical probability density and the corresponding spectral density. More broadly interpreted, the wave-mechanical origin of η may be understood as a natural consequence of a fundamental unity between information theory and quantum theory. Such an interpretation is well-motivated given the profound connections already known. For instance, the bounds on S lead to an uncertainty relation that is stronger than the most general form of the Heisenberg Uncertainty Principle [2]. Even more striking, the basic precepts of quantum theory have been derived from informatic principles [13].It is important to specify, however, that the validity of η is contingent upon no quantum-mechanical premise.
The distinctive feature of η is its dependence on x, which is necessary for a complete measure of indeterminacy in discrete real variables. Because of the generality of its form, η can be readily manipulated to probe intrinsic organization in sequences of real numbers. Section 4 presented calculations demonstrating the novel capability ofη in such applications. Future investigations will explore additional applications of η.

## Acknowledgments

This work was funded by the Naval Innovative Science and Engineering (NISE) Program under USA PUBLIC LAW 110-417, SECTION 219. The author is also grateful to the anonymous reviewers for their helpful suggestions.

## References

1. Shannon, C.E. A mathematical theory of communication. Bell. Syst. Tech. J. 1948, 27, 379. [Google Scholar] [CrossRef]
2. Bialynicki-Birula, I.; Mycielski, J. Uncertainty relations for information entropy in wave mechanics. Commun. Math. Phys. 1975, 44, 129–132. [Google Scholar] [CrossRef]
3. Gadre, S.R.; Bendale, R.D. Rigorous relationships among quantum-mechanical kinetic energy and atomic information entropies: Upper and lower bounds. Phys. Rev. A 1987, 36, 1932–1935. [Google Scholar] [CrossRef] [PubMed]
4. Lalazissis, G.A.; Massen, S.E.; Panos, C.P.; Dimitrova, S.S. Information entropy as a measure of the quality of a nuclear density distribution. Int. J. Mod. Phys. E 1998, 7, 485–494. [Google Scholar] [CrossRef]
5. Guevara, N.L.; Sagar, R.P.; Esquivel, R.O. Shannon-information entropy sum as a correlation measure in atomic systems. Phys. Rev. A 2003, 67, 012507. [Google Scholar] [CrossRef]
6. Massen, S.E. Application of information entropy to nuclei. Phys. Rev. C 2003, 67, 014314. [Google Scholar] [CrossRef]
7. Shi, Q.; Kais, S. Discontinuity of Shannon information entropy for two-electron atoms. J. Chem. Phys. 2004, 309, 127–131. [Google Scholar] [CrossRef]
8. Bialynicki-Birula, I.; Rudnicki, L. Entropic uncertainty relations in quantum physics. In Statistical Complexity; Sen, K.D., Ed.; Springer: Berlin, Heidelberg, Germany, 2007; pp. 1–34. [Google Scholar]
9. Ghosh, S.K.; Berkowitz, M.; Parr, R.G. Transcription of ground-state density-functional theory into a local thermodynamics. Proc. Natl. Acad. Sci. USA 1984, 81, 8028. [Google Scholar] [CrossRef] [PubMed]
10. Kullback, S.; Leibler, R.A. On information and sufficiency. Annals Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
11. Feynman, R.P. Theory of Fundamental Processes; W. A. Benjamin, Inc.: New York, NY, USA, 1962; pp. 1–6. [Google Scholar]
12. Cartwright, C.; Funkhouser, S.; Sengupta, D.; Williams, B. Periodicity in the intervals between primes. Am. J. Comput. Math. 2012. submitted for publication. [Google Scholar]
13. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011, 84, 012311. [Google Scholar] [CrossRef]

## Share and Cite

MDPI and ACS Style

Funkhouser, S. The Entropy of a Discrete Real Variable. Entropy 2012, 14, 1522-1538. https://doi.org/10.3390/e14081522

AMA Style

Funkhouser S. The Entropy of a Discrete Real Variable. Entropy. 2012; 14(8):1522-1538. https://doi.org/10.3390/e14081522

Chicago/Turabian Style

Funkhouser, Scott. 2012. "The Entropy of a Discrete Real Variable" Entropy 14, no. 8: 1522-1538. https://doi.org/10.3390/e14081522