A Non-Negative Measure of Information for Continuous Probability Distributions

Machu, François Xavier; Cocks, Jeremy; Wang, Ru Julie; El Kaabouchi, Aziz; Zhu, Yueqing; Lhernault, Maryam; Wang, Qiuping Alexandre

doi:10.3390/math14132311

Open AccessArticle

A Non-Negative Measure of Information for Continuous Probability Distributions

by

François Xavier Machu

^1,*,

Jeremy Cocks

¹,

Ru Julie Wang

²,

Aziz El Kaabouchi

³,

Yueqing Zhu

⁴,

Maryam Lhernault

¹ and

Qiuping Alexandre Wang

¹

EsieaLab, Systèmes Complexes et Information Quantique, ESIEA, 9 Rue Vésale, 75005 Paris, France

²

JOLIBRAIN, 77 Rue Pargaminières, 31000 Toulouse, France

³

ESTACA, Parc Universitaire Laval-Changé, Rue Georg, 53000 Laval, France

⁴

College of Mathematics and Computer Science, Wuhan Textile University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(13), 2311; https://doi.org/10.3390/math14132311

Submission received: 27 April 2026 / Revised: 5 June 2026 / Accepted: 23 June 2026 / Published: 30 June 2026

(This article belongs to the Special Issue New Developments in Calculus of Variations)

Download

Browse Figure

Review Reports Versions Notes

Abstract

In this work, we investigate the possibility of using varentropy, an information measure previously proposed for discrete probability distribution, as a measure of the probabilistic uncertainty of continuous probability distribution. We show that varentropy allows avoiding negative values and some undesirable features of informational entropy encountered while using the Boltzmann–Shannon formula (and others) for continuous probability distributions.

Keywords:

information; entropy; differential entropy; measure of probabilistic uncertainty

MSC:

54C70; 82M30

1. Introduction

Entropy and information are among the most fundamental concepts in thermodynamics, statistical mechanics and information theory [1,2,3,4,5,6,7]. According to the common view, entropy and information (entropy for short in what follows) are two names for the same thing. They are a measure of disorder or statistical uncertainty associated with probability. The relationship between entropy and probability has been subject to rigorous mathematical study since the work of Shannon [2] and Khinchin [7]. Nevertheless, some confusion persists concerning the expression of entropy for continuous probability distributions [4,5,8].

The best known and most employed statistical expression of informational entropy is in the form of a logarithmic functional of probability distribution (Boltzmann–Shannon entropy), a form proposed for the first time by Boltzmann in his H-theorem [1] with continuous distribution of particles, then used by Gibbs in his work on statistical mechanics [3], and by Shannon in his information theory [2]. An axiomatic derivation (or proof of the uniqueness) of the formula was given in [2] by using discrete probability distributions. For a system having W discrete microstates, each having probability

p_{i}

(

i = 1, 2, \dots, W)

, the Boltzmann–Shannon (BS) entropy

S

is given by

S = - \sum_{i = 1}^{W} p_{i} l n p_{i}

(1)

(We suppose here that the Boltzmann constant

k_{B} = 1

). This information measure is always positive because

{1 \geq p}_{i} \geq 0

.

When the variable

x

of the states becomes continuous, entropy is sometimes called differential or continuous entropy given by [2,3,4,8]:

S = - \int ρ (x) l n ρ (x) d x,

(2)

where

ρ (x)

is the probability density distribution giving the probability

d p (x) = ρ (x) d x

of finding the system in the state interval between

x

and

x + d x

, across the range of all possible states. The first use of this integral form dates back to Boltzmann [1]. Gibbs mentioned this formula in his book [3]. Shannon [2] as well intuitively took Equation (2) for granted as an analogue of Equation (1) without giving a mathematical derivation or proof of its uniqueness. As far as we know, an axiomatic derivation of Equation (2), as was done for Equation (1) by Shannon [2] and Khinchin [7], is still missing to date.

In this work, we focus on one specific mathematical property of Equation (2) concerning its sign. As mentioned above, in the case of Equation (1) with a discrete probability distribution, since

p_{i}

is positive and smaller than 1,

l n p_{i} \leq 0

, which guarantees

S \geq 0

[2]. However, in the case of a continuous probability distribution as in Equation (2),

ρ (x)

can be larger than 1, leading to the possibility that

l n ρ (x) > 0

and, therefore, to negative informational entropy. The reader can see a list of continuous entropies calculated from Equation (2) for many probability density distributions, in which most entropies can be negative. The list includes some common distributions such as uniform, normal and exponential distributions [8]. This negative entropy problem can occur to many other entropy measures proposed in different contexts as generalization of BS formula (see, for example, [9]). As all these formulas contain adjustable parameters and recover the BS formula when the parameters take specific values, they risk negative values with continuous probability distribution in the same way as Equation (2). For example, the continuous Tsallis entropy

S_{q} = \frac{1 - \int ρ^{q} (x) d x}{q - 1}

[10] and the Renyi entropy

R_{q} = \frac{l n \int ρ^{q} (x) d x}{1 - q}

[11] are both negative when

\int ρ^{q} (x) d x > 1

, which can happen for

ρ > 1

with

q > 1

and for

ρ < 1

with

q < 1

. Most of the generalized continuous entropies containing the integral of

\int ρ^{q} (x) d x

(counterpart of

\sum_{i} p_{i}^{q}

) [12] show negative values.

As is well known, thermodynamic entropy cannot be negative due to the third law of thermodynamics [6]. It may contain a constant in addition to the function term of probability, as proposed by Boltzmann, Gibbs and Shannon [1,2,3]. It is this function, depending only on probability, that assumes the role of entropy as a measure of disorder or probabilistic uncertainty. Logically, this function is expected to be zero in a non-probabilistic situation, as is the case of all the different forms of entropy, generalized or not, for discrete probability distribution; when only one state has

p_{i} = 1

, all other states have

p_{j \neq i} = 0

. Hence, the functions of probability of both thermodynamic entropy and informational entropy are expected to have positive lower bounds. But it is not the case with those entropy functions using continuous distribution as mentioned above. Many of them enter into contradiction with the definition of entropy as an uncertainty measure and even lose the finite lower bound. For example, the continuous entropy of the uniform distribution

ρ (x) = \frac{1}{b - a}

in the interval

a \leq x \leq b

is

S = l o g (b - a)

, as given by Equation (2) [8,13]. We straightforwardly have

S = 0

if

b - a = 1

, while the probabilistic uncertainty of such a distribution should not be zero. On the other hand, when

b \to a

,

ρ (x) \to \infty

, which can be considered equivalent to the Dirac delta function (see Section 3.1 below) with the normalization

\int_{- \infty}^{\infty} ρ (x) d x = 1

. The Dirac delta function is a continuous counterpart of the discrete probability distribution:

p_{a} = 1

and all other

p_{x \neq a} = 0

, a situation where the probabilistic uncertainty should be zero from Equation (1). However, Equation (2) yields

S = \log (b - a) \to - \infty

. This loss of lower bound implies that it is impossible to add some constant to S to shift it to a positive domain as proposed by Jaynes (see Section 4 below) [4]. A similar paradox happens with other continuous entropies, for instance

S = 1 - \log λ

[8] of the exponential distribution

ρ (x) ~ e^{- λ x}

(

0 \leq x \leq \infty

).

These undesirable features of continuous entropy raise the question whether Equation (2) is still suitable for measuring probabilistic uncertainty and whether it deserves the name entropy. Despite this uncertain situation, continuous entropies have been considered acceptable [8,13] and widely used in many applications (see [11,14,15], for example) sometimes with their negative values [14]. There are also other solutions to this negativity issue in using relative entropy or Kulback–Leibler divergence instead of Equation (2) [13,14], although this is not a solution to the issue of Equation (2) itself.

As is well known, the use of Equation (2) can be traced back to Boltzmann [1]. It has been proposed rather by intuition than by mathematical proof as has been done for Equation (1) by Shannon [2]. From a mathematical point of view, Equation (1) cannot be simply replaced by Equation (2) because, when

p_{i}

is replaced by its continuous counterpart

d p (x) = ρ d x

, a divergent term

- l n d x \propto l n W

appears [4,5], which implies that the replacement of

p_{i}

by

ρ d x

in Equation (1) is questionable. Jaynes has tried to avoid negative entropy using a continuous version of BS entropy

S = - \int_{1}^{\infty} ρ (x) l n \frac{ρ (x)}{m (x)} d x + l n W

where

m (x)

is called the invariant measure of the density of discrete values of x [4,5]. The term

l n W

, divergent when

W \to \infty

, was simply removed, giving a continuous entropy

S^{c} = S + \int_{1}^{\infty} ρ (x) l n m (x) d x

. According to Jaynes [4], it is possible to choose the invariant measure

m (x)

in an appropriate (albeit ad hoc) way for

S^{c}

to be positive.

In what follows, we present an alternative informational entropy called varentropy, which possibly helps to avoid negative value and other undesirable properties of continuous entropy. Varentropy is an extension of the fundamental equation of equilibrium thermodynamics

δ U = T δ S + δ W

, where

δ U

is a variation of internal energy

U

in a reversible process,

δ S

a variation of thermodynamic entropy,

T

the temperature, and

δ W

the work done during the quasi-equilibrium equilibrium process [6]. We show that varentropy, when extended to continuous probability distributions, allows us to avoid negative informational entropy in many cases where BS formula yields negative values.

2. Definition of Varentropy

Varentropy was defined on the basis of the variational form of the entropy of the second law of thermodynamics [16,17]. This is a definition from scratch without any prerequisite or postulate about the property of entropy. The motivation was to look for a probabilistic uncertainty measure that has a sound physical background and is maximizable to generate probability distribution. This objective implies that this measure cannot be defined with a given formula because the maximization of a given functional can only yield one or two distributions. One just cannot maximize a given functional for any distribution. For example, the Shannon entropy in Equation (1), which has been widely used as a universal uncertainty measure for any probability distribution, can only be maximized for exponential or uniform distribution, depending on the constraints [4,5] and the Tsallis or Renyi entropy can only be maximized for q-exponential distributions [10,11]. Hence, this measure we looked for should be something more general and fundamental than the formulas of the different entropies [9,12]; more fundamental in that it should have some physical background relative to thermodynamic entropy; more general in that it can reproduce the different entropies for different distributions, more or less in similar ways as a differential equation of physical law generates different trajectories with different interactions, as has been shown in [16,17,18].

The idea of [16] was to start from the first law of equilibrium thermodynamics given by

δ U = T δ S + δ W

. As is well known, in classical statistical mechanics, the internal energy is the average

\bar{E}

of the energy

E_{i}

of all microstates i, i.e.,

U = \bar{E} = \sum_{i = 1}^{W} p_{i} E_{i}

, and the work done in an infinitesimal reversible process is given by the average of the energy change

{δ E}_{i}

of each state:

δ W = \bar{δ E} = \sum_{i = 1}^{W} p_{i} {δ E}_{i}

, where

p_{i} = f (E_{i})

is the probability of finding the system at the state

i = 1,2, \dots, W

having energy

E_{i}

. The statistical expression of the fundamental equation

δ U = T δ S + δ W

becomes then

δ \sum_{i = 1}^{W} p_{i} E_{i} = T δ S + \sum_{i = 1}^{W} p_{i} {δ E}_{i}

, giving a statistical expression of the variation of entropy during the process

δ S = \frac{1}{T} (δ \sum_{i = 1}^{W} p_{i} E_{i} - \sum_{i = 1}^{W} p_{i} {δ E}_{i}) = \frac{1}{T} (δ \bar{E} - \bar{δ E})

, giving

δ S = \frac{1}{T} \sum_{i = 1}^{W} δ p_{i} E_{i}

. From the above calculation, we see that this variational expression of thermodynamic entropy as a function of

δ p_{i}

multiplied by the random variable energy

E_{i}

is a statistical form of the first law. For us, this expression reveals a general kinship between a measure of probabilistic uncertainty (S) and the related random variable (

E_{i}

) with its probability distribution

p_{i}

. This kinship was deeply hidden in the first and second law of thermodynamics and can only be seen when these laws are expressed in statistical form as shown above. In a previous work [10], we have extended this relationship to any single random variable

x_{i}

with its probability distribution

p_{i} = f (x_{i})

and defined varentropy

S_{V}

in the variational form:

δ S_{V} = A \sum_{i = 1}^{W} x_{i} δ p_{i}

. This extension is purely mathematical;

x_{i}

can be any random variable (frequency of words, price, population, position, etc.).

S_{V}

measures the uncertainty in

x_{i}

, and its functional form and property (extensivity, additivity, concavity, etc.) are entirely determined by

x_{i}

and

p_{i}

.

S_{V}

is the thermodynamic entropy only when

x_{i}

is energy

E_{i}

.

It turns out that this variational definition of varentropy could really generate many known entropies with discrete distributions. For example,

S_{V}

has the form of the Boltzmann–Shannon entropy Equation (1) when

p_{i}

is exponential distribution, it is the nonadditive Tsallis or additive Renyi entropy if

p_{i}

is q-exponential distribution, and can take different forms for other distributions (Power law, stretched exponential, Cauchy, Gauss, etc.), and

S_{V} = - \frac{1 - \sum_{i} p_{i}^{1 - \frac{1}{α}}}{1 - \frac{1}{α}}

(nonadditive) for power law for example [16,17,18,19]. A remarkable property of varentropy is that each functional form of

S_{V}

can be maximized using the calculus of variation

δ (S_{V} - A \bar{x}) = 0

to generate its original distribution. For instance, maximizing Equation (1) generates exponential distribution, maximizing Tsallis entropy generates q-exponential, and maximizing the power law varentropy

S_{V} = - \frac{1 - \sum_{i} p_{i}^{1 - \frac{1}{α}}}{1 - \frac{1}{α}}

generates the power law

p_{i} \propto x_{i}^{- α}

(

α

positive) [16]. This property of varentropy is the meaning of the statement “maximizable measure” and is an intrinsic nature of varentropy, because its origin is in the variational definition

δ S = \frac{1}{T} \sum_{i = 1}^{W} δ p_{i} E_{i}

, which is equivalent to writing

δ S_{V} = A (δ \bar{x} - \bar{δ x})

or

δ (S_{V} - A \bar{x}) = - A \bar{δ x}

. This means that the maximization of

S_{V}

subject to the constraint of the constant

\bar{x}

, i.e.,

δ (S_{V} - A \bar{x}) = 0

, implies

\bar{δ x} = 0

. This expression can be better understood if we consider the thermodynamic case where

x_{i}

is energy

E_{i}

and

\bar{δ x} = \bar{δ E} = \sum_{i = 1}^{W} p_{i} {δ E}_{i} = δ W

, which is the work in the variational process considered as the virtual work. In this way,

δ W = 0

can be understood as prescribed by the principle of virtual work and yields the maximum entropy calculus (Maxent principle)

δ (S_{V} - A \bar{x}) = 0

. This kinship between two fundamental principles was studied in [20] with the conclusion that, in the case of thermodynamic entropy, maxent with the calculus

δ (S - A \bar{E}) = 0

can be considered as a law of physics derived from the fundamental principle of virtual work.

In what follows, we extend this varentropy to continuous probability replacing

p_{i} = f (x_{i})

by

d p (x) = ρ (x) d x

. Considering

\bar{x} = \int x ρ (x) d x

and

\bar{δ x} = \int δ x ρ (x) d x

, we arrive at

δ S_{V} = A (δ \bar{x} - \bar{δ x}) = A \int x δ ρ (x) d x,

(3)

where

A \in R

is a constant to be chosen according to the nature of

S_{V}

. For example, in a reversible thermodynamic process where

x = E

,

S_{V}

is the thermodynamic entropy with

A = \frac{1}{T}

, as shown above; for the exponential decay distribution, we can choose

A = 1

or

A = - 1

, depending on the domain of

x

in the considered distribution (see below). It should be noticed in the above discussion of maximization of varentropy that A also plays the role of Lagrange multiplier in the maximum varentropy to generate probability distribution. It is then natural to see different A for different varentropies and distributions due to the different physics or processes. It is worth stressing that an important role of

A

is to guarantee

S_{V}

be positive, as required both for entropy and for any measure of statistical uncertainty, which is the main aim of this work.

Suppose

f

is bijective in

[a, b]

, so we can write

x = f^{- 1} (ρ)

, Equation (3) becomes

δ S_{V} = A \int_{a}^{b} f^{- 1} [ρ (x)] δ ρ (x) d x

. Then we define the function of the upper bound as follows:

\forall x \in [a, b], F [ρ (x)] = \int_{0}^{ρ (x)} f^{- 1} (t) d t

. According to the second fundamental theorem of calculus,

F

is differentiable with respect

ρ (x)

\forall x \in [a, b] : \frac{d}{d ρ (x)} F [ρ (x)] = f^{- 1} [ρ (x)]

, so that we can write

δ S_{V} = A \int_{a}^{b} [\frac{d}{d ρ (x)} \int_{0}^{ρ (x)} f^{- 1} (t) d t] δ ρ (x) d x = A \int_{a}^{b} [δ \int_{0}^{ρ (x)} f^{- 1} (t) d t] d x = δ \{A \int_{a}^{b} [\int_{0}^{ρ (x)} f^{- 1} (t) d t] d x + C\}

, which implies

S_{V} = A \int_{a}^{b} [\int_{0}^{ρ (x)} f^{- 1} (t) d t] d x + C,

(4)

where

C

is a constant.

If

f

is not bijective in

[a, b]

, we can cut the domain

[a, b]

into w sub-domains

[a_{k}, a_{k + 1}]

with

k = 0,1, 2 \dots w - 1

(

a_{0} = a, a_{w} = b

) in such a way that, in each sub-domain k,

f (x)

is bijective and a local varentropy

S_{V}^{(k)}

can be calculated using

S_{V}^{(k)} = A_{k} \int_{a_{k}}^{a_{k + 1}} [\int_{0}^{ρ (x)} f^{- 1} (t) d t] d x

. The total varentropy

S_{V}

is then the sum

S_{V} = \sum_{k} S_{V}^{(k)}

[19] or:

S_{V} = \sum_{k = 0}^{w - 1} A_{k} \int_{a_{k}}^{a_{k + 1}} [\int_{0}^{ρ (x)} f^{- 1} (t) d t] d x + C .

(5)

If

f (x)

is not bijective in

[a, b]

but is differentiable, one can use the relationships

δ ρ (x) = f^{'} (x) δ x

and

x f^{'} (x) δ x = δ \int_{0}^{x} t f^{'} (t) d t

to change the definition of varentropy in the following way:

δ S_{V} = A \int x δ ρ (x) d x = A \int_{a}^{b} x f^{'} (x) δ x d x = \int_{a}^{b} [δ \int_{0}^{x} t f^{'} (t) d t] d x

, which implies

S_{V} = A \int_{a}^{b} [\int_{0}^{x} t f^{'} (t) d t] d x + C,

(6)

which can be used for any differentiable distribution.

3. Examples of Varentropy

3.1. Uniform Distribution

The uniform distribution is given by

ρ (x) = f (x) = \{\begin{matrix} \frac{1}{b - a}, a \leq x \leq b \\ 0, x < a or x > b \end{matrix}

, which has the BS entropy given by

S = l n (b - a)

. This entropy is zero when

b - a = 1

, becomes negative in the interval

b - a < 1

and tends to negative infinity when

b \to a

instead of zero as expected for a deterministic case [13].

S = l n (b - a)

has another issue related to the scale of

x

. Suppose

x

is a coordinate distance (idem in following discussion), logarithm of a dimensioned quantity is improper, leading to different values of entropy over the same distance but measured in different units (for example, S = 0 over 1 m, but

S = \ln 100 \approx 4.61

over 100 cm and

S < 0

over 0.001 km). The origin of this paradox is the improper use of logarithm

l n ρ (x)

in Equation (2) since

ρ (x)

has a dimension

m^{- 1}

.

For calculating varentropy of this distribution, let’s consider a linear distribution between a and b:

ρ (x) = f (x) = \{\begin{matrix} \frac{1}{Z} (s (x - a) + \frac{1}{b - a}), a \leq x \leq b \\ 0, x < a or x > b \end{matrix}

with a positive slope s, where normalization constant

Z = \int_{a}^{b} (s (x - a) + \frac{1}{b - a}) d x = {[\frac{s}{2} {(x - a)}^{2} + \frac{x}{b - a}]}_{a}^{b} = \frac{s}{2} {(b - a)}^{2} + 1

. For simplicity, we use Equation (6) where

f^{'} (x) = \frac{s}{Z},

leading to

S_{V} = A \int_{a}^{b} [\int_{0}^{x} t f^{'} (t) d t] d x + C = A \int_{a}^{b} [\frac{s}{Z} \int_{0}^{x} t d t] d x = \frac{s A}{6 Z} (b^{3} - a^{3}) + C .

This expression of

S_{V}

does not have the dimensional problem since slope s has the dimension

m^{- 2}

and A has the dimension

m^{- 1}

by definition in Equation (3), yielding dimensionless

S_{V}

. As A is an arbitrary constant, let A = 1/s and C = 0, and we obtain

S_{V} = \frac{1}{6 Z} (b^{3} - a^{3})

, which becomes the varentropy of the uniform distribution when

s \to 0

(

Z \to 1)

:

S_{V} = \frac{1}{6} (b^{3} - a^{3}),

which is always positive.

S_{V} \to 0

when

b \to a

, which is natural for this non-probabilistic case (no uncertainty) where

ρ (x)

is a Dirac delta function at

x = a

, i.e.,

ρ (x - a) = \{\begin{matrix} \infty if x = a \\ 0 if x \neq a \end{matrix}

with the normalization

\int_{a}^{b} ρ (x - a) d x = 1

. This expression of

S_{V}

can also be derived, with a bit longer computation, from Equation (4), which we will use for the following bijective distribution functions.

3.2. Exponential Distributions

Exponential distribution is given by

ρ (x) = \frac{1}{Z} e^{- α x}

for

0 < x < \infty

and

α > 0

,

Z

being the normalization constant. Its BS entropy has been calculated and reads

S = 1 - l n α

[6], which is inevitably negative when

α

is sufficiently large.

Now we calculate varentropy with Equation (4) using

f^{- 1} (t) = \frac{\ln (Z t)}{- α}

. It is straightforward to calculate

\int_{0}^{ρ (x)} \frac{\ln (Z t)}{- α} d t = \frac{ρ (x) \ln (Z ρ (x)) - ρ (x)}{- α}

, so

S_{V} = - \frac{A}{α} \int_{0}^{\infty} [ρ (x) \ln (Z ρ (x)) - ρ (x)] d x + C

. Let

C = - \frac{A}{α} \int_{0}^{\infty} ρ (x) d x = - \frac{A}{α}

and

A = 1

, and we get

S_{V} = - \frac{1}{α} \int_{0}^{\infty} ρ \ln Z ρ d x,

which is different from Equation (2). There is no problem of improper use of logarithm because

Z ρ

is dimensionless. The positive value of this expression can be seen by substituting

ρ (x) = \frac{1}{Z} e^{- α x}

into the integral to obtain

S_{V} = - \frac{1}{α} \int_{0}^{\infty} \frac{1}{Z} e^{- α x} (- α x) d x = \frac{1}{α} > 0

.

In order to show the role of the constant A to guarantee positive

S_{V}

, let us suppose an increasing exponential distribution

ρ (x) = \frac{1}{Z} e^{- α x}

for

- \infty < x \leq 0

with

α < 0

. After the same calculation as above, we reach

S_{V} = \frac{A}{α}

. As

α < 0

, we can choose

A = - 1

to have a positive entropy

S_{V} = - \frac{1}{α}

.

3.3. Stretched Exponential Distribution

We consider the continuous stretched exponential distribution

ρ (x) = \frac{1}{Z} e^{- x^{β}}

for positive

0 < x < \infty

and

β > 0

,

Z

being the normalization constant

Z = \int_{0}^{\infty} e^{- x^{β}} d x

. Let us first calculate its entropy

S_{B S}

:

S = - \int_{0}^{\infty} \frac{1}{Z} e^{- x^{β}} \ln (\frac{1}{Z} e^{- x^{β}}) d x = \frac{\ln Z}{Z} \int_{0}^{\infty} e^{- x^{β}} d x + \frac{1}{Z} \int_{0}^{\infty} x^{β} e^{- x^{β}} d x .

Considering the change in variable

t = x^{β}

and the definition of Gamma function, we obtain

S = \frac{\ln Z}{Z β} Γ (\frac{1}{β}) + \frac{1}{Z β} Γ (\frac{1}{β} + 1)

, which becomes, with the equality

Γ (x + 1) = x Γ (x)

,

S = \frac{1}{Z β} Γ (\frac{1}{β}) [\ln Z + \frac{1}{β}]

. Considering the normalization constant

Z = \frac{1}{β} Γ (\frac{1}{β})

, we have:

S = \frac{1}{β} - \ln β + \ln (Γ (\frac{1}{β})),

which is negative whenever

\ln β > \frac{1}{β} + \ln (Γ (\frac{1}{β}))

.

Now let us see the varentropy of the continuous stretched exponential distribution

{δ S}_{V} = A \int_{0}^{\infty} x δ ρ (x) d x

. Introducing the change in variable

x = t^{\frac{1}{β}},

we have

δ ρ (x) = δ (\frac{1}{Z} e^{- t}) = - \frac{1}{Z} e^{- t} δ t

and

{δ S}_{V} = - \frac{A}{Z} \int_{0}^{\infty} t^{\frac{1}{β}} e^{- t} δ t d x = - \frac{A}{Z} \int_{0}^{\infty} δ [\int_{0}^{t} τ^{\frac{1}{β}} e^{- τ} d τ + C] d x = - \frac{A}{Z} δ [\int_{0}^{\infty} γ (\frac{1}{β} + 1, t) d x + C]

, where

C

is a constant of integration and

γ (\frac{1}{β} + 1, t) = \int_{0}^{t} τ^{\frac{1}{β}} e^{- τ} d τ

is the lower incomplete gamma function. As

ρ (x)

is bijective,

t = x^{β} = \ln \frac{1}{Z ρ (x)}

, the varentropy reads

S_{V} = - \frac{A}{Z} [\int_{0}^{\infty} γ (\frac{1}{β} + 1, \ln \frac{1}{Z ρ (x)}) d x + C] .

Let us choose

A = - 1

and

C = 0

, the stretched exponential varentropy reads

S_{V} = \frac{β}{Γ (\frac{1}{β})} \int_{0}^{\infty} γ (\frac{1}{β} + 1, x^{β}) d x,

where we used

Z = \frac{1}{β} Γ (\frac{1}{β})

, and

S_{V} \geq 0

because

β > 0

,

Γ (\frac{1}{β}) > 0

, and

γ (\frac{1}{β} + 1, x^{β})

> 0.

3.4. Continuous Normal Distribution

We consider the continuous normal distribution

ρ (x) = \frac{1}{Z} e^{- \frac{({x - μ)}^{2}}{2 σ^{2}}}

for any

x \in R

, where

Z = σ \sqrt{2 π}

is the normalization constant, and

μ

is the mean. Its BS entropy has been calculated and reads

S = \ln (σ \sqrt{2 π e})

[6], which is negative when

σ \sqrt{2 π e} < 1

.

Before calculating its varentropy, let us remind that the normal distribution is not a bijective function. According to Equation (5), we must separate the domain of

x

into two parts,

- \infty \leq x - μ \leq 0

and

0 \leq x - μ \leq \infty

, in each of which the normal distribution is bijective. Let

y = x - μ

and

ρ (y) = \frac{1}{Z} e^{- \frac{y^{2}}{2 σ^{2}}}

, we have

{δ S}_{V} = A \int_{- \infty}^{\infty} (y + μ) δ ρ (y) d y = A \int_{- \infty}^{0} y δ ρ (y) d y + A \int_{0}^{\infty} y δ ρ (y) d y + A \int_{- \infty}^{\infty} μ δ ρ (y) d y

. The third term should be zero since

\int_{- \infty}^{\infty} μ δ ρ (y) = μ δ (1) = 0

due to the normalization

\int_{- \infty}^{\infty} ρ (y) = 1

.

Now let us make a change in variable

t = {(\frac{y}{\sqrt{2} σ})}^{2} \geq 0

or

y = \pm \sqrt{2} σ t^{1 / 2}

and write

ρ (t) = \frac{1}{Z} e^{- t}

. The first term of varentropy is

A \int_{- \infty}^{0} y δ ρ (y) d y = - A \int_{\infty}^{0} \sqrt{2} σ t^{\frac{1}{2}} δ (\frac{1}{Z} e^{- t}) d y = - \frac{A}{Z} δ \int_{0}^{\infty} \sqrt{2} σ γ (\frac{3}{2}, \ln \frac{1}{Z ρ (y)}) d x

where we used

t = \ln \frac{1}{Z ρ (y)}

. The second term gives the same result as the first one. So

{δ S}_{V} = - \frac{2 A}{\sqrt{π}} δ \int_{0}^{\infty} γ (\frac{3}{2}, \ln \frac{1}{σ \sqrt{2 π} ρ (x)}) d x

, which implies

S_{V} = - \frac{2 A}{\sqrt{π}} \int_{0}^{\infty} γ (\frac{3}{2}, \ln \frac{1}{σ \sqrt{2 π} ρ (y)}) d y + C

. Let

A = - 1

and

C = 0

, we get

S_{V} = \frac{2}{\sqrt{π}} \int_{0}^{\infty} γ (\frac{3}{2}, \frac{y^{2}}{2 σ^{2}}) d y,

which is always positive and independent from

μ

.

3.5. Power Law Distribution

A typical example of continuous power law is the Pareto law

ρ (x) = \frac{β}{x^{β + 1}}

, for

1 < β < \infty

. The continuous Boltzmann–Gibbs entropy

S_{B G}

is calculated as follows [6]:

S = - \int_{1}^{\infty} ρ (x) l n ρ (x) d x = - l n β + 1 + \frac{1}{β} .

The negative value of BS entropy takes place in the interval

l n β > 1 + \frac{1}{β}

or

β > 3.59

.

Now let us calculate the varentropy for the power law distribution

ρ (x) = \frac{1}{Z} x^{- \frac{1}{b}}

. From the definition of varentropy, we write

{δ S}_{V} = A \int {(Z ρ)}^{- b} δ ρ d x = A \int δ (\frac{Z^{- b}}{1 - b} ρ^{1 - b}) d x = δ {\frac{A}{1 - b} \int ({Z^{- b} ρ}^{1 - b} - m) d x}

}. Let

A = 1

, the continuous varentropy reads

S_{V} = \int ρ \frac{{(Z ρ)}^{- b} - m}{1 - b} d x,

where the function m is such that

\int ρ m (x) d x = C

is a constant of the variation, i.e.,

δ C = 0

.

For Pareto PDF

ρ (x) = \frac{β}{x^{β + 1}}

with β

= \frac{1}{b} - 1

,

x_{m i n} = 1

and

x_{m a x} = \infty

, we obtain

S_{V} = \frac{\int_{1}^{\infty} β^{- (β + 1)} ρ^{β / (β + 1)} d x - C}{β / (β + 1)} = \frac{β + 1}{β} (\frac{β}{β - 1} - C) .

Let

C = 1

, we obtain

S_{V} = \frac{β + 1}{β (β - 1)},

which is always positive for

1 < β < \infty

as plotted in Figure 1 as a function of

β

.

4. Discussion

To summarize, despite some undesirable features, the continuous entropies, as a heritage of the long history of statistical mechanics and information theory, continue to be considered as a measure of the uncertainty in continuous random variables, sometimes with the help of relative entropies to avoid negative values [11,13,14,15]. We have proposed here an alternative measure called varentropy to avoid several undesirable features of continuous entropies. Examples of varentropy for several well-known continuous probabilities have been calculated from its variational definition, showing the following features with respect to continuous entropies.

Varentropy $S_{V}$ is positive for the distributions studied in this work. The continuous entropies of these distributions can be negative.
$S_{V}$ is zero for deterministic case as expected for a measure of probabilistic uncertainty, while continuous entropy goes to minus infinity.
$S_{V}$ can avoid the improper use of logarithm $l n ρ$ because the probability density distribution is a dimensioned quantity. Other generalized continuous entropies [10,12] containing the term $\sum_{i} ρ_{i}^{q}$ have the same undesirable feature: loss of scale invariance for example.
$S_{V}$ has a sound physical background because it is defined with a variational equation, which is just the statistical form of the first law of thermodynamics. Different from other entropies defined with given functional formulas, it has great flexibility to generate different functionals for different distributions.
As each given formula of varentropy is maximized for its distribution, it is the optimal measure of the uncertainty of that distribution, meaning that its value is always the largest one among all the possible measures (Shannon, Tsallis, etc.) for the same distribution. A case study of this feature was presented in [19,21].

Further investigation is necessary to confirm these features of varentropy for other continuous distributions.

One should have noticed that, in this work, for varentropy to be positive, a constant A must be chosen according to the kind of distribution function. The existence of such a constant is quite natural in such a general definition of a quantity whose nature differs in different situations. Such a constant is also in the Boltzmann–Shannon formula, written as

S = - A \sum_{i = 1}^{W} p_{i} l n p_{i}

. For a binary system (

W = 2

), the choice of

A = 1

defines the unit of the information measure in bit. When the formula is used for ideal gas to measure the probabilistic uncertainty of the distribution of internal energy, we must write

A = k_{B}

, the Boltzmann constant, for

S

to be the thermodynamic entropy of the gas. But it is impossible to choose a single constant

A

to make continuous entropies always positive because, as discussed above in Section 3, they can be positive and negative for a given distribution, and sometimes go to minus infinity.

Finally, we would like to mention that we need information measures for continuous probability distributions in both classical and quantum physics. An example is the calculation of the path entropy [22,23] of random dynamics using path integral method [24] and the classical path probability [23] or the quantum propagator [24], which are both continuous exponential functions of the classical action [25,26]; hence, the use of the BS formula (or other generalized ones) risks giving negative values. The path entropy is an increasing function of time of random motion [23,25,26] and an important ingredient in the study of the irreversible processes in both classical and quantum world [22].

Author Contributions

Conceptualization, Q.A.W.; Validation, F.X.M., J.C., R.J.W., A.E.K., Y.Z., M.L. and Q.A.W.; Formal analysis, R.J.W., A.E.K., Y.Z. and Q.A.W.; Investigation, F.X.M., R.J.W., A.E.K., Y.Z., M.L. and Q.A.W.; Resources, R.J.W. and Y.Z.; Writing—original draft, F.X.M., J.C., R.J.W. and Q.A.W.; Writing—review and editing, F.X.M. and Q.A.W.; Supervision, F.X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Ru Julie Wang was employed by the company JOLIBRAIN, 77 rue Pargaminières, 31000, Toulouse, France. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential of conflict of interest.

References

Boltzmann, L. Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Sitzungsberichte Akad. Der Wiss. 1872, 66, 275–370. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379. [Google Scholar] [CrossRef]
Gibbs, J.W. Elementary Principles in Statistical Mechanics; Charles Scribner’s Sons: New York, NY, USA, 1902. [Google Scholar]
Jaynes, E.T. Gibbs vs Boltzmann Entropies. Am. J. Phys. 1965, 33, 391. [Google Scholar] [CrossRef]
Jaynes, E.T. Information theory and statistical mechanics. In Statistical Physics; Brandeis University Summer Institute Lectures in Theoretical Physics; W. A. Benjamin, Inc.: New York, NY, USA, 1963; Volume 3, pp. 181–218. [Google Scholar]
Müller, I.; Müller, W.H. Fundamentals of Thermodynamics and Applications; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Khinchin, A.I. Mathematical Foundations of Information Theory; Dover: New York, NY, USA, 1957. [Google Scholar]
Available online: https://en.wikipedia.org/wiki/Differential_entropy (accessed on 30 January 2025).
Amigó, J.M.; Balogh, S.G.; Hernández, S. A Brief Review of Generalized Entropies. Entropy 2018, 20, 813. [Google Scholar] [CrossRef] [PubMed]
Alomani, G.; Kayid, M. Further Properties of Tsallis Entropy and Its Application. Entropy 2023, 25, 199. [Google Scholar] [CrossRef] [PubMed]
Alawady, M.A.; Barakat, H.M.; Al Luhayb, A.S.M.; Mansour, G.M. Some properties on dynamic cumulative Tsallis residual entropy measures based on Sarmanov family with applications to motor data. AIMS Math. 2026, 11, 8271–8307. [Google Scholar] [CrossRef]
Esteban, M.D.; Morales, B. A Summary on Entropy Statistics. Kybernetika 1995, 31, 337. [Google Scholar]
Rioul, O. This is IT: A Primer on Shannon’s Entropy and Information. In L’Information, Séminaire Poincaré 2018; Birkhäuser: Basel, Switzerland, 2021; Volume XXIII, pp. 43–77. [Google Scholar]
Grassucci, E.; Comminiello, D.; Uncini, A. An Information-Theoretic Perspective on Proper Quaternion Variational Autoencoders. Entropy 2021, 23, 856. [Google Scholar] [CrossRef] [PubMed]
Via, J.; Ramirez, D.; Santamaria, I. Properness and Widely Linear Processing of Quaternion Random Vectors. IEEE Trans. Inf. Theory 2010, 56, 3502–3515. [Google Scholar] [CrossRef]
Wang, Q.A. Probability distribution and entropy as a measure of uncertainty. J. Phys. A Math. Theor. 2008, 41, 065004. [Google Scholar] [CrossRef]
Ou, C.; El Kaabouchi, A.; Nivanen, L.; Chen, J.; Tsobnang, F.; Le Méhauté, A.; Wang, Q.A. Maximizable Informational Entropy as a Measure of Probabilistic Uncertainty. Int. J. Mod. Phys. B 2010, 24, 3461–3468. [Google Scholar] [CrossRef]
Abe, S. Generalized entropy optimized by a given arbitrary distribution. J. Phys. A Math. Gen. 2003, 36, 8733. [Google Scholar] [CrossRef]
Jiang, J.; Metz, F.; Beck, C.; Lefevre, S.; Chen, J.C.; Pezeril, M.; Wang, Q.A. Double power law degree distribution and informational entropy in urban road networks. Int. Mod. Phys. C 2011, 22, 33. [Google Scholar] [CrossRef]
Wang, Q.A.; Ye, Q. Derivation of the Maximum Entropy Principle from the Virtual Work Principle. Eur. Phys. J. Plus 2025, 140, 1234. [Google Scholar] [CrossRef]
El Kaabouchi, A.; Machu, F.X.; Cocks, J.; Wang, R.; Zhu, Y.Y.; Wang, Q.A. Study of a measure of efficiency as a tool for applying the principle of least effort to the derivation of the Zipf and Pareto laws. Adv. Complex Syst. 2021, 24, 2150013. [Google Scholar] [CrossRef]
Davis, S.; Gonzales, S. Hamiltonian formalism and path entropy maximization. J. Phys. A Math. Theor. 2015, 48, 425003. [Google Scholar] [CrossRef]
Lin, T.; Wang, R.; Bi, W.P.; El Kaabouchi, A.; Pujos, C.; Calvayrac, F.; Wang, Q.A. Path probability distribution of stochastic motion of non dissipative systems: A classical analog of Feynman factor of path integral. Chaos Solitons Fractals 2013, 57, 129. [Google Scholar] [CrossRef][Green Version]
Feynman, R.P.; Hibbs, A.R. Quantum Mechanics and Path Integrals; McGraw-Hill Publishing Company: New York, NY, USA, 1965. [Google Scholar]
Wang, Q.A.; El Kaabouchi, A. From Random Motion of Hamiltonian Systems to Boltzmann H Theorem and Second Law of Thermodynamics—A Pathway by Path Probability. Entropy 2014, 16, 885. [Google Scholar] [CrossRef]
General, I. Principle of maximum caliber and quantum physics. Phys. Rev. E 2018, 98, 012110. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Evolution of

S_{V} = \frac{β + 1}{β (β - 1)}

, which decreases from infinity to zero with increasing

β

in the intervals

1 < β < \infty

.

Figure 1. Evolution of

S_{V} = \frac{β + 1}{β (β - 1)}

, which decreases from infinity to zero with increasing

β

in the intervals

1 < β < \infty

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Machu, F.X.; Cocks, J.; Wang, R.J.; El Kaabouchi, A.; Zhu, Y.; Lhernault, M.; Wang, Q.A. A Non-Negative Measure of Information for Continuous Probability Distributions. Mathematics 2026, 14, 2311. https://doi.org/10.3390/math14132311

AMA Style

Machu FX, Cocks J, Wang RJ, El Kaabouchi A, Zhu Y, Lhernault M, Wang QA. A Non-Negative Measure of Information for Continuous Probability Distributions. Mathematics. 2026; 14(13):2311. https://doi.org/10.3390/math14132311

Chicago/Turabian Style

Machu, François Xavier, Jeremy Cocks, Ru Julie Wang, Aziz El Kaabouchi, Yueqing Zhu, Maryam Lhernault, and Qiuping Alexandre Wang. 2026. "A Non-Negative Measure of Information for Continuous Probability Distributions" Mathematics 14, no. 13: 2311. https://doi.org/10.3390/math14132311

APA Style

Machu, F. X., Cocks, J., Wang, R. J., El Kaabouchi, A., Zhu, Y., Lhernault, M., & Wang, Q. A. (2026). A Non-Negative Measure of Information for Continuous Probability Distributions. Mathematics, 14(13), 2311. https://doi.org/10.3390/math14132311

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Non-Negative Measure of Information for Continuous Probability Distributions

Abstract

1. Introduction

2. Definition of Varentropy

3. Examples of Varentropy

3.1. Uniform Distribution

3.2. Exponential Distributions

3.3. Stretched Exponential Distribution

3.4. Continuous Normal Distribution

3.5. Power Law Distribution

4. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI