The Entropy of a Discrete Real Variable

Funkhouser, Scott

doi:10.3390/e14081522

Open AccessArticle

The Entropy of a Discrete Real Variable

by

Scott Funkhouser

SPAWAR Systems Center Atlantic, Joint Base Charleston, North Charleston, SC 29406, USA

Entropy 2012, 14(8), 1522-1538; https://doi.org/10.3390/e14081522

Submission received: 12 June 2012 / Revised: 3 August 2012 / Accepted: 6 August 2012 / Published: 17 August 2012

Download

Browse Figures

Versions Notes

Abstract

:

The discrete Shannon entropy H was formulated only to measure indeterminacy effected through a set of probabilities, but the indeterminacy in a real-valued discrete variable depends on both the allowed outcomes

x

and the corresponding probabilities

p

. A fundamental measure that is sensitive to both

x

and

p

is derived here from the total differential entropy of a continuous real variable and its conjugate in the discrete limit, where the conjugate is universally eliminated. The asymptotic differential entropy recovers H plus the new measure, named Ξ, which provides a novel probe of intrinsic organization in sequences of real numbers.

Keywords:

Shannon entropy; information entropy; information theory

1. Introduction

Let Y be a discrete variable with

n > 1

generic outcomes

y = {y_{1}, \dots, y_{n}}

and corresponding probabilities

p = {p_{1}, \dots, p_{n}}

such that

\sum_{j = 1}^{n} p_{j} = 1

(1)

Expressed in terms of the natural logarithm, the Shannon entropy attributed to Y in connection with

p

is [1]

H (p) = - \sum_{j = 1}^{n} p_{j} \log (p_{j})

(2)

For convenience let

H_{2} (p) \equiv \frac{H (p)}{\log (2)}

(3)

represent the original form defined in terms of the base-2 logarithm.

H = H (p)

is bound according to

0 \leq H \leq \log (n)

(4)

The maximum is generated only when all outcomes are equally probable.

H (p)

vanishes in the limit as some

p_{j} \in p

approaches unity, which represents a completely predetermined outcome.

Because it emerges naturally from the basic principles of encoded compression, the discrete Shannon entropy is widely recognized as an informatic measure. Specifically, consider a message

m

consisting of

N ≫ 1

symbols from a generalized alphabet

s = {s_{1}, \dots, s_{a}}

. Let

N_{j}

represent the total number of occurrences of a particular symbol

s_{j} \in s

within

m

. In the limit as N becomes infinitely large suppose that each

N_{j} / N

approaches a certain constant

w_{j}

that is characteristic to the “language” in which

m

is composed. For sufficiently large N we therefore have

N_{j} ≃ w_{j} N

. The total number I of likely messages of length N is consequently fixed by

w = {w_{1}, \dots, w_{a}}

according to

I ≃ \prod_{j = 1}^{n} w_{j}^{- N_{j}}

(5)

The right side of Equation (5) is simply the inverse of the probability for finding any one particular

m

among all messages of length N subject to

w

. The inventory of likely messages could be encoded as I different integers. The number β of bits required to register the encoded inventory is given by

β = ⌈ \log_{2} (I) ⌉

, and we therefore have the average compression rate

\frac{β}{N} ≃ H_{2} (w)

(6)

Notwithstanding its significance, Equation (6) represents only a particular example of the expansive utility of the discrete Shannon entropy. Interpreted most broadly, and in accordance with the purpose for which it was derived,

H (p)

measures the indeterminacy in a random selection from among n different options with corresponding probabilities

p

[1]. It is therefore appropriate to attribute a quantity of entropy

H (p)

to a discrete variable with probabilities

p

.

Consider a variable Y whose

n ≫ 1

allowed outcomes

y

are governed by some

p

such that

H (p)

is near

\log (n)

. The nearly maximal entropy implies that Y exhibits no significant statistical preference toward any particular outcome or group of outcomes among

y

. Consequently an ideal observer who studied the behavior of Y could develop a successful model only to predict the most trivial details about future outcomes. Stated alternatively, the degree to which Y behaves deterministically is minimal. In contrast suppose that

H (p)

is much smaller than

\log (n)

. The nearly minimal entropy implies a strong statistical preference for some comparatively small number of

y_{j} \in y

. As such Y would exhibit an appreciable degree of deterministic behavior. A successful, non-trivial predictive model could be therefore developed from observations of Y.

It is important to emphasize that H is a rigorous measure of indeterminacy in a variable only insofar as the indeterminacy depends on

p

alone. Let X be a discrete variable whose n allowed outcomes

x = {x_{1}, \dots, x_{n}}

are D-dimensional real vectors, i.e.

x_{j} \in ℜ^{D}

for

j = 1, \dots, n

. The elements of every one-dimensional

x

are implicitly listed in ascending order throughout the following. Analogously to the real nature of

p

, the real nature of

x

endows the outcomes of X with definite spatial attributes in which non-trivial deterministic behaviors could manifest. Consequently, as illustrated in the following two examples, H is not generally sufficient as a measure of indeterminacy in real-valued discrete variables.

Consider two variables

X_{e e}

and

X_{e r}

whose outcomes are confined to the real axis. Let the allowed outcomes of

X_{e e}

be equally probable and equally spaced, hence the subscript “

e e

”, and let the allowed outcomes of

X_{e r}

be equally probable but randomly spaced, hence the subscript “

e r

”. More formally, the allowed outcomes of

X_{e e}

are of the form

x_{e}

, which is defined for

D = 1

to be a set of n equally spaced points on the real axis. Let γ represent the arbitrary spacing in a given

x_{e}

. For convenience we also define

p_{e} \equiv {1 / n, \dots, 1 / n}

for a given n to represent a set of equal probabilities. The allowed outcomes of

X_{e r}

are of the form

x_{r}

, which is defined for

D = 1

to be indistinguishable from a set of n points distributed randomly over some simple finite segment

X_{1}

of the real axis. The intrinsic spatial structure of

x_{e}

effects a commensurate degree of deterministic behavior in

X_{e e}

. For instance, the difference between successive outcomes of

X_{e e}

is always an integer multiple of γ, and is therefore significantly more predictable than the difference between successive outcomes of

X_{e r}

. Because

X_{e e}

and

X_{e r}

differ only in the spatial arrangement of their allowed outcomes,

X_{e e}

would behave more deterministically. By definition a smaller quantity of entropy should be therefore attributed to

X_{e e}

. The discrete Shannon entropy attributed to both variables, however, would be identically maximal.

Consider next a complementary scenario in which the allowed outcomes are identically valued but the respective probabilities differ. Let

X_{r e}

and

X_{a e}

be two variables with

n ≫ 1

one-dimensional allowed outcomes of the form

x_{e}

. The set of probabilities governing

X_{r e}

is of the form

p_{r}

, which is defined generally to be indistinguishable from n randomly selected positive real numbers, subsequently normalized to unity and listed in random order. The spatial assignment of

p_{r}

through

x_{e}

would be globally isotropic but locally irregular, despite the intrinsic structure of

x_{e}

. The outcomes of

X_{a e}

are matched to a set of probabilities of the form

p_{a}

, which is defined generally to be indistinguishable from n randomly selected positive real numbers, subsequently normalized to unity and listed in ascending order. The ordered arrangement of

p_{a}

effects a non-trivial deterministic bias favoring progressively larger outcomes in

X_{a e}

. Note that the specific form of

p_{a}

in this example could be constructed simply by sorting the elements of

p_{r}

, in which case

X_{a e}

would behave more deterministically but

H (p_{r})

and

H (p_{a})

would be identical.

The problems raised in the previous two paragraphs do not imply any flaw in H. The discrete Shannon entropy was formulated only to measure indeterminacy effected by a given

p

in a generic discrete variable. The net indeterminacy in a real-valued discrete variable, however, depends on the respective intrinsic characteristics of

x

and

p

, and on the manner in which

x

and

p

are matched. A special measure is therefore required for discrete real variables. Note that Equation (6) is insensitive to the nature of

s

, and the present considerations are therefore irrelevant in the context of the coding theorems. To be precise, the role of

H_{2}

in Equation (6) is not truly as a measure of indeterminacy.

Although it included no special provisions for discrete real variables, Shannon’s seminal paper introduced a separate formulation for the entropy of a continuous real variable. Let

X

be a probabilistically determined variable whose allowed outcomes span a continuum of real vectors within some region

X \in ℜ^{D}

. Let the probability density

ρ (x)

be defined such that

ρ (x) d x

gives the probability for an outcome of

X

to lie within an infinitesimal element

d x

centered on x, where

x \in ℜ^{D}

is an independent variable and

d x = d x^{(1)} \dots d x^{(D)}

represents implicitly the product of the differentials of each vector component of

x = (x^{(1)}, \dots, x^{(D)})

. We therefore require

ρ = ρ (x)

to vanish for all

x \notin X

. The normalization of ρ is accordingly

\int_{ℜ^{D}} ρ d x = \int_{X} ρ d x = 1

(7)

Henceforth any integration is understood to span

ℜ^{D}

in the absence of explicit limits. The entropy

H = H (ρ)

attributed to

X

in connection with ρ is [1]

H = - \int ρ \log (ρ) d x

(8)

where

ρ \log (ρ)

is defined to vanish for

ρ = 0

. The measure in Equation (8) is known as the differential (Shannon) entropy.

In contrast to H a single quantity of

H

is subject to no finite bounds. Consequently a simple comparison between H and

H

is not meaningful. For instance, the differential entropy of a normalized Gaussian

ρ_{G} = ρ_{G} (ν, x) \equiv {(\frac{ν}{\sqrt{π}})}^{D} e^{- ν^{2} x^{2}}

(9)

is

D \log (\sqrt{π e} / ν)

. For ν equal to

\sqrt{π e}

the entropy vanishes, which signifies a predetermined outcome in the context of discrete real variables. Furthermore in the limit as ν becomes infinitely large

ρ_{G}

behaves as a D-dimensional delta-distribution

δ (x)

, which describes a single predetermined real outcome.

H (ρ_{G})

, however, decreases without bound as ν approaches infinity.

The apparent inconsistencies between H and

H

are reconciled in the following manner. As a generalization of

ρ_{G}

let

ρ_{0} = ρ_{0} (ν, x)

be some well-behaved, unity-normalized function that becomes

δ (x)

in the limit as the positive, real parameter ν becomes infinitely large. In association with an arbitrary X, with n allowed outcomes

x

and corresponding probabilities

p

, let the probability density

ρ_{X} = ρ_{X} (ν, x)

be defined such that

ρ_{X} (ν, x) \sim B \sum_{j = 1}^{n} p_{j} ρ_{0} (ν, x - x_{j})

(10)

and thus

lim_{ν \to \infty} ρ_{X} = \sum_{j = 1}^{n} p_{j} δ (x - x_{j})

(11)

where

B = B (ν)

is the appropriate normalization coefficient for a given ν, and

x_{j}

and

p_{j}

span

x

and

p

respectively. The symbol “∼” is used in Equation (10) and throughout, where convenient, to signify asymptotic equality as ν approaches infinity. The right side of Equation (11) describes the probabilistic behavior of X. Consider an originally continuous test variable

X_{X}

whose outcomes are governed by

ρ_{X}

. The entropy attributed to

X_{X}

is

H (ρ_{X})

, which behaves asymptotically as

H (ρ_{X}) \sim - B \sum_{j = 1}^{n} p_{j} \int ρ_{0} \log (B p_{j} ρ_{0}) d x

(12)

When ν is finite but very large the behavior of

X_{X}

is a hybrid of the continuous traits of

ρ_{0}

and the discrete traits of

x

and

p

. In the limit as

ν \to \infty

the two sets of traits become completely disassociated from one another, and

X_{X} = X

is characterized only by

x

and

p

. The entropy associated with the probabilistic behavior of X should be therefore given by the asymptote of

H (ρ_{X}) - H (ρ_{0})

. From Equation (12) we readily obtain

lim_{ν \to \infty} (H (ρ_{X}) - H (ρ_{0})) = - \sum_{j = 1}^{n} p_{j} \log (p_{j})

(13)

thereby recovering H from

H

.

Although the analysis leading to Equation (13) may be instructive,

H (ρ_{X})

is asymptotically independent of

x

and could not produce the desired new measure for discrete real variables. It is reasonable to suspect that some additional quantity of the form

H

could be considered in conjunction with

H (ρ_{X})

to provide the required asymptotic sensitivity to both

x

and

p

. In order to be meaningful, however, the additional measure should be intrinsically related to

ρ_{X}

.

Within the paradigm of wave mechanics two distinct quantities of differential entropy are naturally associated with a given ρ. More precisely, two separate probability density functions are naturally associated with a given probability amplitude

ψ = ψ (x)

, namely

ρ = {| ψ |}^{2}

and the spectral density

\hat{ρ} (k) = {|\hat{ψ} (k)|}^{2}

(14)

where

\hat{ψ} = \hat{ψ} (k) = {(\frac{1}{\sqrt{2 π}})}^{D} \int ψ e^{- i k \cdot x} d x

(15)

is the Fourier-conjugate of ψ and

k \in ℜ^{D}

is an independent variable. We may interpret

\hat{ρ} = \hat{ρ} (k)

as the probability density governing the outcomes of a variable

K

that is the conjugate of the variable

X

governed by ρ. The entropy attributed to

K

in connection with

\hat{ρ}

is

H (\hat{ρ})

. The total differential entropy attributed to the conjugated pair is therefore

S (ψ) \equiv {H (| ψ |}^{2}) + H (| \hat{ψ} |^{2})

(16)

A more formal derivation of

S = S (ψ)

follows from the total phase-space distribution [9]

f = f (x, k) \equiv ρ \hat{ρ}

(17)

which is subject to the normalization

\int f d r = 1,

(18)

where

r \equiv (x^{(1)}, \dots, x^{(D)}, k^{(1)}, \dots, k^{(D)})

for any given x and k. Because f is simply a probability density in a

2 D

-dimensional vector space, it is appropriate to attribute a quantity of differential entropy

- \int f \log (f) d r = S

(19)

to the conjugated pair.

The total differential entropy S has been the subject of extensive research. (See, for instance, [2,3,4,5,6,7,8]). It is worthwhile to mention here two general characteristics that distinguish S as a fundamentally important measure. Because f is dimensionless S, like the relative entropy, is free from a potentially serious flaw to which the differential Shannon entropy is otherwise vulnerable [10]. Furthermore, whereas

H (ρ)

and

H (\hat{ρ})

may become infinitely negative, S is subject to the lower bound [2]

S \geq D \log (π e)

(20)

Perhaps uniquely, a Gaussian amplitude generates the minimal S allowed by Equation (20) [2].

The purpose of this article is to demonstrate that a measure of indeterminacy for discrete real variables that is sensitive to both

x

and

p

emerges from the total differential entropy of

X_{X}

and its conjugate in the limit as ν diverges. Specifically, the conjugated analogue of Equation (13) includes an additional measure

Ξ = Ξ (p, x)

, expressed most generally in terms of an integral, that behaves naturally as a measure of indeterminacy in the spatial configuration of

x

and

p

. Furthermore the conjugate is universally eliminated in the limit as

X_{X}

becomes discrete and the entire residual entropy

H + Ξ

, named η is therefore attributed to X. Section 2 contains a derivation of η. The general characteristics of η are examined in Section 3. Section 4 presents a quantitative study demonstrating the basic behaviors of η as a measure of indeterminacy for discrete real variables. The primary conclusions are summarized and interpreted in Section 5.

2. Derivation of η

The terms defined in Section 1 retain their definitions throughout the following. Let

K_{X}

be the conjugate of a given

X_{X}

. The probability density

{\hat{ρ}}_{X} = {\hat{ρ}}_{X} (ν, k)

governing the outcomes of

K_{X}

is formulated presently. Let

ψ_{X} = ψ_{X} (ν, x)

be defined such that

ρ_{X} = {| ψ_{X} |}^{2}

. The total differential entropy of the conjugated pair is thus

S (ψ_{X}) = H (ρ_{X}) + H ({\hat{ρ}}_{X})

. This section contains a derivation of the asymptotic form of

S (ψ_{X})

expressed analogously to Equation (13).

The first step is to formulate

ψ_{X}

. Following the standard wave-mechanical prescription, and in accordance with natural law, we attribute to each allowed outcome a separate amplitude [11]. The total amplitude is simply the sum of the individual amplitude waveforms multiplied by an appropriate normalization coefficient. Let the probability amplitude

ϕ_{0} = ϕ_{0} (ν, x)

be therefore defined such that

| ϕ_{0} |^{2} = ρ_{0}

, i.e.,

lim_{ν \to \infty} | ϕ_{0} |^{2} = δ (x)

(21)

The waveform assigned to each

x_{j} \in x

is

ϕ_{j} = ϕ_{j} (ν, x) \equiv \sqrt{p_{j}} ϕ_{0} (ν, x - x_{j})

(22)

and the total amplitude is accordingly

ψ_{X} = A \sum_{j = 1}^{n} ϕ_{j}

(23)

where

A = A (ν)

is defined such that

ρ_{X}

is unity-normalized for all ν. Because the real part of

ϕ_{0}

becomes ever more sharply peaked with increasing ν, the cross terms

{| A |}^{2} ϕ_{j} ϕ_{l}^{*}

in

| ψ_{X} |^{2}

are negligible for sufficiently large ν and vanish asymptotically. We therefore have

ρ_{X} \sim {| A |}^{2} \sum_{j = 1}^{n} | ϕ_{j} |^{2}

(24)

which is equivalent to Equation (10). Note that

lim_{ν \to \infty} {| A |}^{2} = lim_{ν \to \infty} B = 1

(25)

The next step is to determine the respective forms of the conjugate amplitude

{\hat{ψ}}_{X} = {\hat{ψ}}_{X} (ν, k)

and density

{\hat{ρ}}_{X} = {| {\hat{ψ}}_{X} |}^{2}

. Let

{\hat{ϕ}}_{0} = {\hat{ϕ}}_{0} (ν, k)

represent the conjugate of

ϕ_{0}

, and let the associated density

| {\hat{ϕ}}_{0} |^{2}

be defined as

{\hat{ρ}}_{0} = {\hat{ρ}}_{0} (ν, k)

. It follows from Equation (15), with

x - x_{j}

substituted for x, that

{\hat{ϕ}}_{j} = \sqrt{p_{j}} e^{- ı k \cdot x_{j}} {\hat{ϕ}}_{0}

(26)

where

{\hat{ϕ}}_{j} = {\hat{ϕ}}_{j} (ν, k)

is the conjugate of a given

ϕ_{j}

. We therefore have

{\hat{ψ}}_{X} = A {\hat{ϕ}}_{0} \sum_{j = 1}^{n} \sqrt{p_{j}} e^{- ı k \cdot x_{j}}

(27)

and thus

{\hat{ρ}}_{X} = {| A |}^{2} τ {\hat{ρ}}_{0}

(28)

where

τ \equiv {|\sum_{j = 1}^{n} \sqrt{p_{j}} e^{- ı k \cdot x_{j}}|}^{2}

(29)

An equivalent expression for

τ = τ (k)

following from the Euler identity is

τ = 1 + 2 \sum_{j = 1}^{n - 1} \{\sum_{l = j + 1}^{n} \sqrt{p_{j} p_{l}} cos ((x_{l} - x_{j}) \cdot k)\}

(30)

For convenience we define

ζ = ζ (k)

such that

τ = 1 + 2 ζ

.

The asymptotic behavior of

{\hat{ρ}}_{X}

is ascertained readily from the explicit form of

{\hat{ϕ}}_{0}

in Equation (15) with

ϕ_{0}

expressed as

ρ_{0} / ϕ_{0}^{*}

for all non-vanishing

ϕ_{0}^{*}

. It follows from the behavior ascribed to

ρ_{0}

that

{\hat{ϕ}}_{0} \sim {(\frac{1}{\sqrt{2 π}})}^{D} \frac{1}{ϕ_{0}^{*} (0)}

(31)

which is independent of k. Because

\int ρ_{0} d k

must remain unity-normalized

{\hat{ϕ}}_{0}

must vanish everywhere asymptotically. As

ρ_{0}

increasingly resembles

δ (x)

the real part of

ϕ_{0}^{*} (0)

must either increase or decrease without bound, which ensures that the right side of Equation (31) vanishes. Consequently

{\hat{ψ}}_{X}

and

{\hat{ρ}}_{X}

vanish asymptotically for all k.

It is also worthwhile to note that, because

{\hat{ρ}}_{X}

and

{\hat{ρ}}_{0}

are both unity-normalized for all ν, Equations (28) and (30) imply

lim_{ν \to \infty} {〈ζ〉}_{0} = 0

(32)

where

{〈ε〉}_{0} \equiv \int {\hat{ρ}}_{0} ε d k

(33)

is the expected value of any

ε = ε (k)

averaged against

{\hat{ρ}}_{0}

. Equation (32) may be understood as a consequence of the asymptotic uniformity of

{\hat{ρ}}_{0}

in conjunction with the usual trigonometric property

lim_{b \to \infty} \int_{- b}^{b} cos (k) d k = 0

(34)

In the limit as ν diverges

{〈ζ〉}_{0}

becomes simply the average of ζ over all space, which we denote by

\bar{ζ}

, and thus vanishes term-by-term in accordance with Equation (34). Expressed more generally,

lim_{ν \to \infty} {〈ε〉}_{0} = \bar{ε} .

(35)

Returning to the derivation proper, the next step is to evaluate

H ({\hat{ρ}}_{X})

in the limit as ν approaches infinity. It follows from Equations (25) and (28) that

H ({\hat{ρ}}_{X}) \sim - {〈τ \log ({\hat{ρ}}_{0})〉}_{0} - {〈τ \log (τ)〉}_{0}

(36)

Substituting

1 + 2 ζ

for τ in the left-most term on the right side of Equation (36) produces

H ({\hat{ρ}}_{X}) \sim H ({\hat{ρ}}_{0}) - 2 {〈ζ \log ({\hat{ρ}}_{0})〉}_{0} - {〈τ \log (τ)〉}_{0}

(37)

Although

\log ({\hat{ρ}}_{0})

decreases without bound it does so with asymptotic uniformity, hence

{〈ζ \log ({\hat{ρ}}_{0})〉}_{0} \sim {〈ζ〉}_{0} \log ({\hat{ρ}}_{0} (k_{0}))

(38)

for any

k_{0} \in ℜ^{D}

. The middle term on the right side of Equation (37) is therefore asymptotically proportional to

\bar{ζ}

and thus vanishes. Equation (37) accordingly becomes

H ({\hat{ρ}}_{X}) \sim H ({\hat{ρ}}_{0}) - {〈τ \log (τ)〉}_{0}

(39)

The asymptotic form of the right-most term in Equation (39) is

Ξ (p, x) \equiv - \bar{τ \log (τ)}

(40)

Because τ is periodic

Ξ = Ξ (p, x)

is identical to the average value of

- τ \log (τ)

taken over some finite region

K_{0} \in ℜ^{D}

spanning one period in each dimension. We therefore have

Ξ = - \frac{1}{κ_{0}} \int_{K_{0}} τ \log (τ) d k

(41)

where

κ_{0} \equiv \int_{K_{0}} d k

(42)

An analytical expression for

\int τ \log (τ) d k

is possible only for a few rudimentary configurations of

x

and

p

, and Equation (41) is thus critical for quantifying Ξ.

Finally, Equation (39) may be expressed in terms of a proper limit as

lim_{ν \to \infty} (H ({\hat{ρ}}_{X}) - H ({\hat{ρ}}_{0})) = Ξ

(43)

analogously to Equation (13). Combining Equations (13) and (43) produces the central result

lim_{ν \to \infty} (S (ψ_{X}) - S (ϕ_{0})) = p, x)

(44)

where

p, x) \equiv H (p) + Ξ (p, x)

(45)

The nature and behavior of

p, x)

are examined in the following Sections.

3. General Characteristics of η

By definition

S (ψ_{X})

measures the net indeterminacy in the conjugated pair

X_{X}

and

K_{X}

. In the limit as ν becomes infinitely large, however, the pair is characterized completely by

x

and

p

, independently of

ϕ_{0}

. The entropy attributed asymptotically to the pair is therefore η. Furthermore as

X_{x}

becomes discrete

{\hat{ρ}}_{x}

vanishes everywhere and

K_{x}

becomes therefore a “ghost” variable, doomed to exist without ever generating an outcome. The indeterminacy measured by η must be therefore evident in the probabilistic behaviors of X. Because X and its ghost conjugate together are indistinguishable from X alone we conclude that η is a generally valid measure of indeterminacy in discrete real variables.

Given that Ξ is the asymptotic form of

H ({\hat{ρ}}_{X}) - H ({\hat{ρ}}_{0})

its attribution to X is perhaps counterintuitive, despite the elimination of the conjugate. In particular

H (ρ)

and

H (\hat{ρ})

are typically anti-correlated among different ρ of a given generalized form. Recall, however, that

H (ρ_{X})

is asymptotically insensitive to

x

and therefore is not generally correlated to

H ({\hat{ρ}}_{X})

in any appreciable manner among different

ρ_{X}

. Not only is

H ({\hat{ρ}}_{X})

sensitive to both

x

and

p

, it is also naturally correlated to the intrinsic indeterminacy effected by the definite spatial attributes of

x

. Stated most generally, greater spatial regularity in the configuration of

x

and

p

effects greater harmonic regularity in the spatial arrangement of the peaks in

ρ_{X}

, for

ν ≫ 1

. Greater harmonic regularity in

ρ_{X}

implies greater localization in the associated Fourier spectrum, which accordingly generates a smaller

H ({\hat{ρ}}_{X})

, for a given

ϕ_{0}

. In order to isolate the effects associated with

x

and

p

we exclude the contribution from

H ({\hat{ρ}}_{0})

. In that manner Ξ measures the deterministic effects of real allowed outcomes.

Lower bounds on η and Ξ follow directly from the universal lower bound on S. Specifically, both

S (ψ_{x})

and

S (ψ_{0})

are subject to Equation (20) for all ν. We therefore have

η \geq 0

(46)

and thus

Ξ \geq - H

(47)

Regardless of the degree of spatial structure in the configuration of

x

and

p

, no variable could behave more deterministically than a fixed real number. Equation (46) is therefore consistent with the attribution of zero entropy to a predetermined outcome.

Upper bounds on η and Ξ may be obtained by writing Equation (41) as

Ξ = - \int_{k_{0}} \frac{τ}{𝜅_{0}} \log (\frac{τ}{𝜅_{0}}) d k - \log (𝜅_{0})

(48)

where the right-most term has been reduced using

\overline{τ} = \frac{1}{𝜅_{0}} \int_{k_{0}} τ d k = 1,

(49)

which is a consequence of Equation (34). The integral on the right side of Equation (48) is simply the differential Shannon entropy associated with a normalized density τ/k₀ defined over

K_{0}

. Furthermore log(k₀) is the maximal differential Shannon entropy that could be associated with

K_{0}

. Equation (48) therefore implies

Ξ \leq 0

(50)

hence

η \leq H

(51)

The upper bound on η is consistent with the basic expectations about the predictive utility of real outcomes. Consider some X that has been “disguised” by mapping the n allowed outcomes in

x

to a set of n generic outcomes, while keeping the corresponding probabilities

p

fixed. An ideal observer who studied only the generic outcomes could form a predictive model only to the extent allowed by

p

. Suppose that such an observer were subsequently allowed to study the real outcomes. The numeric details could never detract from the observer’s capability to develop a predictive model because, at very least, the outcomes from any

x

could be interpreted simply as non-numeric symbols. The entropy attributed to X, therefore, should not be greater than the entropy attributed to the generic analogue, which is precisely what Equation (51) ensures.

For any

p

, H and η should be similar when the degree of deterministic structure within

x

is negligible. We therefore expect

η (p, x_{r}) ≃ H (p)

(52)

which is validated in the following section. Among variables with spatially randomized outcomes, indeterminacy depends primarily on the intrinsic characteristics of

p

, hence Equation (52). Conversely, among variables with equally probable outcomes, indeterminacy depends only on x. In that manner η(

p

_e, x) measures the intrinsic indeterminacy in the spatial configuration of x. It is therefore fitting, as explicitly evident in Equation (30), that τ is sensitive to each of the n(n − 1)/2 uniquely defined positive intervals x_l − x_j within a given x.

Because there is no preferred region within

ℜ^{D}

, any meaningful measure of indeterminacy for discrete real variables must be invariant under spatial transformations of the form

x + Δ x \equiv {x_{1} + Δ x, \dots, x_{n} + Δ x}

for any finite

Δ x \in ℜ^{D}

. By inspecting either Equation (29) or Equation (30) we readily find that τ is translationally invariant, hence

p, x + Δ x) = p, x)

(53)

Similarly transformations of the form

R x \equiv {R x_{1}, \dots, R x_{n}}

, where

R

is a D-dimensional unitary rotation matrix, could not change the indeterminacy in any X. Consequently the entropy attributed to X must be rotationally invariant as well. The effects of such a rotation would be equivalent to replacing the original

τ (k)

with

τ^{'} (k) = τ (k R)

, which we may write as

τ^{'} (k) = τ (k^{'})

after the variable substitution

k^{'} = k R

. The rotation therefore amounts to nothing more than a cosmetic change of variables, which does not affect the average value of

τ \log (τ)

—whether taken over all space or taken over a finite region spanning a period in each dimension of

k^{'}

. We therefore have

η (p, R x) = p, x)

(54)

For the same reasons we also have

η (p, α x) = p, x)

(55)

for any non-vanishing

α \in ℜ

, where

α x \equiv {α x_{1}, \dots, α x_{n}}

. Contrary to Equation (55), suppose that the entropy of a real discrete variable were to increase under transformations of the form

α x

for

| α | > 1

. Consider applying such transformations successively to the allowed outcomes of some non-trivial X. Given that the entropy must never exceed H the effects of the transformations must become negligible at some point, which would imply the existence of some preferred spatial scale for a given

x

and

p

. The indeterminacy, however, could not be sensitive to such a scale. The scale-invariance expressed in Equation (55) is therefore necessary for any meaningful measure of indeterminacy in discrete real variables.

4. Quantitative Study

This section presents the results of numerical calculations of η chosen to validate the identification of η as the entropy of a discrete real variable and to demonstrate the utility of Ξ as a probe of deterministic structure in sequences of real numbers or non-numeric symbols. For convenience this introductory study is restricted to scenarios where

D = 1

; the conclusions are readily generalized to higher-dimensional spaces. In each calculation of η reported here the period-averaged value of −τlog(τ) was obtained from 2⁵ ⌊ κ₀/κ_*⌋ samples, where κ_* = 2 π/(x_n − x₁) represents the finest periodicity in τ. Note that x_n − x₁ is always the largest positive interval within a one-dimensional x. All computations were precise to 15 significant figures. The accuracy, in comparison to the “true” value of η produced by an infinite number of samples, is always better than

10^{- 6} .

η.

The first sets of calculations examined below involve the basic types of

x

and

p

introduced in Section 1, matched in various instructive combinations. Each

p_{r}

was constructed from n different real numbers with 15 significant figures selected randomly from

(0, 1)

, and subsequently normalized. Each

x_{r}

was constructed by randomly selecting n different integers from

[1, 100 n]

. This produces approximately the same effect as using the full range available with 15 significant figures, but dramatically reduces processing requirements. For each calculation involving

p_{r}

a corresponding calculation involving

p_{a}

is performed, and each

p_{a}

is constructed by sorting the elements of the corresponding

p_{r}

. Finally, each reported quantity of η involving a randomly generated parameter is the average of 100 separate calculations with different randomizations.

Let us begin by examining how η measures indeterminacy effected through different types of p. Figure 1 displays calculations of η(p_e, x_e), η(p_a, x_e) and η(p_r, x_e) plotted as functions of n. Recall that η(p_r, x_e) and η(p_a, x_e) should behave respectively as quantities of entropy attributed to the variables X_re and X_ae examined in Section 1. We find that η(p_r, x_e) is appreciably greater than η(p_a, x_e) for all but trivially small n, which is consistent with the expected result. Because the only difference between each p_a and p_r is the order of their respective terms, the differences between the associated quantities of entropy are entirely due to the contribution from Ξ.

Perhaps more importantly, and in opposition to the behavior of H alone, we find that

η (p_{r}, x_{e})

is also appreciably greater than

η (p_{e}, x_{e})

for all non-trivial n. Among generic discrete variables any non-uniformity in

p

is a benefit to predictability, hence

H (p_{r}) \leq H (p_{e})

. Among real-valued discrete variables, however, the predictive benefit of non-uniformities in

p

depends on the manner in which the probabilities are spatially assigned. In the case of

X_{r e}

the indeterminacy effected by the irregular spatial assignment of

p_{r}

evidently outweighs the intrinsic predictive benefit of

p_{r}

, which implies that

X_{e e}

behaves more deterministically than

X_{r e}

. Though not necessarily expected a priori, that implication is not surprising.

Figure 1. Plots of

η (p_{e}, x_{e})

,

η (p_{a}, x_{e})

and

η (p_{r}, x_{e})

using the respective symbols “•” , “◃” and “+”.

Figure 1. Plots of

η (p_{e}, x_{e})

,

η (p_{a}, x_{e})

and

η (p_{r}, x_{e})

using the respective symbols “•” , “◃” and “+”.

As a complement to the previous scenario it is instructive to compare the effects of the three types of probabilities when matched to spatially randomized outcomes. Figure 2 displays calculations of

η (p_{e}, x_{r})

,

η (p_{a}, x_{r})

and

η (p_{r}, x_{r})

plotted as functions of n. Note that

η (p_{a}, x_{r})

and

η (p_{r}, x_{r})

nearly coincide for each n, and the symbols “⋄” and “*” are always superimposed in Figure 2. The entropy-difference

η (p_{r}, x_{r}) - η (p_{a}, x_{r})

is shown in Figure 3. In contrast to the previous scenario

p_{e}

consistently effects the largest entropy among the three, which is expected from Equation (52) given that

H (p_{e})

is maximal. Note that

η (p_{e}, x_{r})

would be even closer to

H (p_{e})

had the randomness of

x_{r}

not been restricted for the sake of computational feasibility. The similarity between

η (p_{r}, x_{r})

and

η (p_{a}, x_{r})

is also expected from Equation (52) given that

H (p_{r}) = H (p_{a})

. Furthermore insofar as

η (p_{r}, x_{r})

and

η (p_{a}, x_{r})

differ the former should be typically greater for sufficiently large n, which is confirmed in Figure 3.

Let us next examine calculations demonstrating how η measures indeterminacy effected through x, which is perhaps the most significant attribute of η. The most basic comparison corresponds to the scenario involving

X_{e e}

and

X_{e r}

in Section 1. It is immediately evident from Figure 1 and Figure 2 that

η (p_{e}, x_{r})

is considerably larger than

η (p_{e}, x_{e})

for all n, which is expected.

Figure 2. Plots of

η (p_{e}, x_{r})

,

η (p_{a}, x_{r})

and

η (p_{r}, x_{r})

, using the respective symbols “×”, “⋄” and “*”. The solid line follows a plot of

\log (n)

for reference.

Figure 2. Plots of

η (p_{e}, x_{r})

,

η (p_{a}, x_{r})

and

η (p_{r}, x_{r})

, using the respective symbols “×”, “⋄” and “*”. The solid line follows a plot of

\log (n)

for reference.

Figure 3. The entropy difference Δη plotted here is η(p_r,x_r) - η.(p_a, x_r).

In order to test the sensitivity of η to more subtle differences in spatial structure the following additional sets of type

x

are defined. For a given n let x_p consist of the first n prime numbers. It follows from the Prime Number Theorem (PNT) that the local average gap in the vicinity of a given x_j ∈ x_p varies as log(j) for sufficiently large j. The term gap is used throughout in reference to a positive interval between consecutive terms in any given one-dimensional

x

and is defined such that the j-th gap is

g_{j} \equiv x_{j + 1} - x_{j}

(56)

The global anisotropy associated with the PNT constitutes an intrinsic deterministic spatial characteristic. For a more pronounced anisotropy of the same kind, let

{\overset{,}{x}}_{p} = {2, 3 \dots}

be defined for a given n such that its sequence of gaps consists of the first n − 1 prime gaps arranged in ascending order. As an example, for n = 6 we have

{\overset{,}{x}}_{p} = {2, 3, 5, 7, 9, 13}

. Let

{\tilde{x}}_{p} = {2, 3 \dots}

be defined for a given n such that its sequence of gaps consists of the first n − 1 prime gaps ordered randomly, with the exception that the first gap in

{\tilde{x}}_{p}

is always 1. Finally let

{\overset{˘}{x}}_{p} = {2, 3 \dots}

be defined such that its sequence of gaps is

g_{1}^{'}, g_{3}^{'}, g_{2}^{'}, g_{5}^{'}, g_{4}^{'}, \dots

, where

g_{j}^{'}

is the j-th prime gap. In other words, the sequence of gaps in

{\overset{˘}{x}}_{p}

is constructed simply by alternating the order of

g_{i}^{'}

and

g_{i + 1}^{'}

for all even i no greater than n − 2. Consequently every even-numbered element of

{\overset{˘}{x}}_{p}

is guaranteed to be identical to the corresponding prime. As an example we have

{\overset{˘}{x}}_{p} = {2, 3, 5, 7, 9, 13, 15, 19}

for n = 8. Note that the average gap in the vicinity of some

x_{j} \in {\overset{˘}{x}}_{p}

differs only negligibly from the corresponding average gap in the primes, and

{\overset{˘}{x}}_{p}

therefore exhibits the same global anisotropy inherent to x_p.

Consider four variables with

n ≫ 1

allowed outcomes

x_{p}

,

{\overset{´}{x}}_{p}

,

{\tilde{x}}_{p}

and

{\overset{˘}{x}}_{p}

, respectively, all of which are equally probable. As

{\tilde{x}}_{p}

is the least deterministically configured among the four,

η (p_{e}, {\tilde{x}}_{p})

should be largest. Furthermore, given that the sequence of gaps in

{\overset{´}{x}}_{p}

ascends uniformly, the deterministic nature of the associated anisotropy is significantly greater than in

x_{p}

and

{\overset{˘}{x}}_{p}

. We therefore expect

η (p_{e}, {\overset{´}{x}}_{p})

to be the smallest entropy among the four. Figure 4 shows

η (p_{e}, x_{p})

,

η (p_{e}, {\overset{´}{x}}_{p})

,

η (p_{e}, {\tilde{x}}_{p})

and

η (p_{e}, {\overset{˘}{x}}_{p})

plotted as functions of n. Although

η (p_{e}, {\tilde{x}}_{p})

and

η (p_{e}, {\overset{˘}{x}}_{p})

differ only negligibly for n smaller than approximately 200, for all larger n the entropy due to

{\tilde{x}}_{p}

is the greatest among the four. The small but non-negligible difference between

η (p_{e}, {\tilde{x}}_{p})

and

η (p_{e}, {\overset{˘}{x}}_{p})

for large n is attributed to the global anisotropy in

{\overset{˘}{x}}_{p}

. The entropy due to

{\overset{´}{x}}_{p}

is the smallest among the four. The behavior of η in these calculations is therefore consistent with the stated expectations.

The dramatic difference between

η (p_{e}, x_{p})

and

η (p_{e}, {\overset{˘}{x}}_{p})

is noteworthy and unexpected. The smaller entropy due to

x_{p}

implies a greater degree of intrinsic determinacy in the arrangement of

x_{p}

along the real axis. Note that the implied deterministic structure could not be associated with the PNT. A detailed analysis of this finding is beyond the present scope and has become the subject of a separate investigation [12]. For the present purposes it is sufficient to mention, as the reader may verify readily, that the number of positive intervals of a given size d among the first

n ≫ 1

primes is strongly correlated to the largest primordial factor of d [12]. That regularity could represent the deterministic property intimated by

p_{e}, x_{p})

.

Figure 4. Plots of

η (p_{e}, x_{p})

,

η (p_{e}, {\overset{´}{x}}_{p})

,

η (p_{e}, {\overset{˘}{x}}_{p})

and

η (p_{e}, {\tilde{x}}_{p})

, shown with “•”, “◃”, “⋄” and “*”, respectively, measuring intrinsic structure in the primes

x_{p}

and in three variants constructed by changing the order of the first

n - 2

even prime gaps.

Figure 4. Plots of

η (p_{e}, x_{p})

,

η (p_{e}, {\overset{´}{x}}_{p})

,

η (p_{e}, {\overset{˘}{x}}_{p})

and

η (p_{e}, {\tilde{x}}_{p})

, shown with “•”, “◃”, “⋄” and “*”, respectively, measuring intrinsic structure in the primes

x_{p}

and in three variants constructed by changing the order of the first

n - 2

even prime gaps.

5. Summary and Conclusions

For finite ν the entropy attributed to the variable

X_{X}

governed by

ρ_{X} (ν, x)

is simply

H (ρ_{X})

. In the limit as

ν \to \infty

, however,

X_{X}

becomes a discrete variable X whose behavior is insensitive to the function

ρ_{0}

from which

ρ_{X}

is constructed. The entropy attributed asymptotically to X is therefore

H (ρ_{X}) - H (ρ_{0})

, which recovers the discrete Shannon entropy. Similarly when ν is finite a quantity of entropy

S (ψ_{X})

is attributed to the conjugated pair

X_{X}

and

K_{X}

. The nature of the pair, however, is asymptotically insensitive to

ϕ_{0}

. The entropy attributed to the pair in the limit as ν diverges is therefore

S (ψ_{X}) - S (ϕ_{0})

, whose asymptotic form

H + Ξ

is defined as η. Furthermore because the conjugate is asymptotically eliminated η must be attributed to X alone.

As H is a function only of

p

, the existence of some

x

-dependent contribution to the entropy of a discrete real variable is expected independently of the introduction of Ξ. Whereas the form of H uniquely exhibits the properties required for a self-consistent measure of indeterminacy effected by

p

[1], the complete set of requirements for a measure of indeterminacy effected by

x

and

p

is not readily ascertained a priori. Among those required properties are certainly translational invariance, scale-invariance, rotational invariance, non-negativity and being bound from above by H, all of which were proven for η Section 3. Furthermore the calculations presented in Section 4 demonstrate that η behaves in the expected manner over a broad range of different configurations of

x

and

p

.

The somewhat counterintuitive role of Ξ is a natural consequence of the relationship between a given wave-mechanical probability density and the corresponding spectral density. More broadly interpreted, the wave-mechanical origin of η may be understood as a natural consequence of a fundamental unity between information theory and quantum theory. Such an interpretation is well-motivated given the profound connections already known. For instance, the bounds on S lead to an uncertainty relation that is stronger than the most general form of the Heisenberg Uncertainty Principle [2]. Even more striking, the basic precepts of quantum theory have been derived from informatic principles [13].It is important to specify, however, that the validity of η is contingent upon no quantum-mechanical premise.

The distinctive feature of η is its dependence on x, which is necessary for a complete measure of indeterminacy in discrete real variables. Because of the generality of its form, η can be readily manipulated to probe intrinsic organization in sequences of real numbers. Section 4 presented calculations demonstrating the novel capability ofη in such applications. Future investigations will explore additional applications of η.

Acknowledgments

This work was funded by the Naval Innovative Science and Engineering (NISE) Program under USA PUBLIC LAW 110-417, SECTION 219. The author is also grateful to the anonymous reviewers for their helpful suggestions.

References

Shannon, C.E. A mathematical theory of communication. Bell. Syst. Tech. J. 1948, 27, 379. [Google Scholar] [CrossRef]
Bialynicki-Birula, I.; Mycielski, J. Uncertainty relations for information entropy in wave mechanics. Commun. Math. Phys. 1975, 44, 129–132. [Google Scholar] [CrossRef]
Gadre, S.R.; Bendale, R.D. Rigorous relationships among quantum-mechanical kinetic energy and atomic information entropies: Upper and lower bounds. Phys. Rev. A 1987, 36, 1932–1935. [Google Scholar] [CrossRef] [PubMed]
Lalazissis, G.A.; Massen, S.E.; Panos, C.P.; Dimitrova, S.S. Information entropy as a measure of the quality of a nuclear density distribution. Int. J. Mod. Phys. E 1998, 7, 485–494. [Google Scholar] [CrossRef]
Guevara, N.L.; Sagar, R.P.; Esquivel, R.O. Shannon-information entropy sum as a correlation measure in atomic systems. Phys. Rev. A 2003, 67, 012507. [Google Scholar] [CrossRef]
Massen, S.E. Application of information entropy to nuclei. Phys. Rev. C 2003, 67, 014314. [Google Scholar] [CrossRef]
Shi, Q.; Kais, S. Discontinuity of Shannon information entropy for two-electron atoms. J. Chem. Phys. 2004, 309, 127–131. [Google Scholar] [CrossRef]
Bialynicki-Birula, I.; Rudnicki, L. Entropic uncertainty relations in quantum physics. In Statistical Complexity; Sen, K.D., Ed.; Springer: Berlin, Heidelberg, Germany, 2007; pp. 1–34. [Google Scholar]
Ghosh, S.K.; Berkowitz, M.; Parr, R.G. Transcription of ground-state density-functional theory into a local thermodynamics. Proc. Natl. Acad. Sci. USA 1984, 81, 8028. [Google Scholar] [CrossRef] [PubMed]
Kullback, S.; Leibler, R.A. On information and sufficiency. Annals Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Feynman, R.P. Theory of Fundamental Processes; W. A. Benjamin, Inc.: New York, NY, USA, 1962; pp. 1–6. [Google Scholar]
Cartwright, C.; Funkhouser, S.; Sengupta, D.; Williams, B. Periodicity in the intervals between primes. Am. J. Comput. Math. 2012. submitted for publication. [Google Scholar]
Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011, 84, 012311. [Google Scholar] [CrossRef]

© 2012 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Funkhouser, S. The Entropy of a Discrete Real Variable. Entropy 2012, 14, 1522-1538. https://doi.org/10.3390/e14081522

AMA Style

Funkhouser S. The Entropy of a Discrete Real Variable. Entropy. 2012; 14(8):1522-1538. https://doi.org/10.3390/e14081522

Chicago/Turabian Style

Funkhouser, Scott. 2012. "The Entropy of a Discrete Real Variable" Entropy 14, no. 8: 1522-1538. https://doi.org/10.3390/e14081522

Article Menu

The Entropy of a Discrete Real Variable

Abstract

1. Introduction

2. Derivation of η

3. General Characteristics of η

4. Quantitative Study

5. Summary and Conclusions

Acknowledgments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI