You are currently viewing a new version of our website. To view the old version click .
Entropy
  • Article
  • Open Access

1 July 2016

Cumulative Paired φ-Entropy

,
and
Department of Statistics and Econometrics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lange Gasse, Nürnberg 90403, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.

Abstract

A new kind of entropy will be introduced which generalizes both the differential entropy and the cumulative (residual) entropy. The generalization is twofold. First, we simultaneously define the entropy for cumulative distribution functions (cdfs) and survivor functions (sfs), instead of defining it separately for densities, cdfs, or sfs. Secondly, we consider a general “entropy generating function” φ, the same way Burbea et al. (IEEE Trans. Inf. Theory 1982, 28, 489–495) and Liese et al. (Convex Statistical Distances; Teubner-Verlag, 1987) did in the context of φ-divergences. Combining the ideas of φ-entropy and cumulative entropy leads to the new “cumulative paired φ-entropy” ( C P E φ ). This new entropy has already been discussed in at least four scientific disciplines, be it with certain modifications or simplifications. In the fuzzy set theory, for example, cumulative paired φ-entropies were defined for membership functions, whereas in uncertainty and reliability theories some variations of C P E φ were recently considered as measures of information. With a single exception, the discussions in the scientific disciplines appear to be held independently of each other. We consider C P E φ for continuous cdfs and show that C P E φ is rather a measure of dispersion than a measure of information. In the first place, this will be demonstrated by deriving an upper bound which is determined by the standard deviation and by solving the maximum entropy problem under the restriction of a fixed variance. Next, this paper specifically shows that C P E φ satisfies the axioms of a dispersion measure. The corresponding dispersion functional can easily be estimated by an L-estimator, containing all its known asymptotic properties. C P E φ is the basis for several related concepts like mutual φ-information, φ-correlation, and φ-regression, which generalize Gini correlation and Gini regression. In addition, linear rank tests for scale that are based on the new entropy have been developed. We show that almost all known linear rank tests are special cases, and we introduce certain new tests. Moreover, formulas for different distributions and entropy calculations are presented for C P E φ if the cdf is available in a closed form.

1. Introduction

The φ-entropy
E φ ( F ) = R φ ( f ( x ) ) d x ,
where f is a probability density function and φ is a strictly concave function, was introduced by [1]. If we set φ ( u ) = - u ln u , u [ 0 , 1 ] , we get Shannon’s differential entropy as the most prominent special case.
Shannon et al. [2] derived the “entropy power fraction” and showed that there is a close relationship between Shannon entropy and variance. In [3], it was demonstrated that Shannon’s differential entropy satisfies an ordering of scale and thus is a proper measure of scale (MOS). Recently, the discussion in [4] has shown that entropies can be interpreted as a measure of dispersion. In the discrete case, minimal Shannon entropy means maximal certainty about the random outcome of an experiment. A degenerate distribution minimizes the Shannon entropy as well as the variance of a discrete quantitative random variable. For such a degenerate distribution, Shannon entropy and variance both take the value 0. However, there is an important difference between the differential entropy and the variance when discussing a discrete quantitative random variable with support [ a , b ] . The differential entropy is maximized by a uniform distribution over [ a , b ] , while the variance is maximal if both interval bounds a and b have the probability mass of 0 . 5 (cf. [5]). A similar result holds for a discrete random variable with a finite number of realizations. Therefore, it is doubtful that Equation (1) is a true measure of dispersion.
We propose to define the φ-entropy for cumulative distribution functions (cdfs) F and survivor functions (sf) 1 - F instead of for density functions f. Throughout the paper, we define F ¯ : = 1 - F . By applying this modification we get
C P E φ ( F ) = R φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x ,
where cdf F is absolutely continuous, C P E means “cumulative paired entropy”, and φ is the “entropy generating function” defined on [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . We will assume that φ is concave on [ 0 , 1 ] throughout most of this paper. In particular, we will show that Equation (2) satisfies a popular ordering of scale and attains its maximum if the domain is an interval [ a , b ] , while a, b occur with a probability of 1 / 2 . This means that Equation (2) behaves like a proper measure of dispersion.
In addition, we generalize results from the literature, focusing on the Shannon case with φ ( u ) = - u ln u , u [ 0 , 1 ] (cf. [6]), the cumulative residual entropy
C R E ( F ) = - R + F ¯ ( x ) ln F ¯ ( x ) d x
(cf. [7]), and the cumulative entropy
C E ( F ) = - R F ( x ) ln F ( x ) d x
(cf. [8,9]). In the literature, this entropy is interpreted as a measure of information rather than dispersion without any clarification on what kind of information is considered.
A first general aim of this paper is to show that entropies can rather be interpreted as measures of dispersion than as measures of information. A second general aim is to demonstrate that the entropy generating function φ, the weight function J in L-estimation, the dispersion function d which serves as a criterion for minimization in robust rank regression, and the scores-generating function ϕ 1 are closely related.
Specific aims of this paper are:
  • To show that the cdf-based entropy Equation (2) originates in several distinct scientific areas.
  • To demonstrate the close relationship between Equation (2) and the standard deviation.
  • To derive maximum entropy (ME) distributions under simple and more complex restrictions and to show that commonly known as well as new distributions solve the ME principle.
  • To derive the entropy maximized by a given distribution under certain restrictions.
  • To formally prove that Equation (2) is a measure of dispersion.
  • To propose an L-estimator for Equation (2) and derive its asymptotic properties.
  • To use Equation (2) in order to obtain new related concepts measuring the dependence of random variables (such as mutual φ-information, φ-correlation, and φ-regression).
  • To apply Equation (2) to get new linear rank tests for the comparison of scale.
The paper is structured in the same order as these aims. After this introduction, in the second section we give a short review of the literature that is concerned with Equation (2) or related measures. The third section begins by summarizing reasons for defining entropies for cdfs and sfs instead of defining them for densities. Next, some equivalent characterizations of Equation (2) are given, provided the derivative of φ exists. In the fourth section, we use the Cauchy–Schwarz inequality to derive an upper bound for Equation (2), which provides sufficient conditions for the existence of C P E . In addition, more stringent conditions for the existence are directly proven. In the fifth section, the Cauchy–Schwarz inequality allows to derive ME distributions if the variance is fixed. For more complicated restrictions we attain ME distributions by solving the Euler–Lagrange conditions. Following the generalized ME principle (cf. [10]), we change the perspective and ask which entropy is maximized if the variance and the population’s distribution is fixed. The sixth section is of key importance because the properties of Equation (2) as a measure of dispersion is analyzed in detail. We show that Equation (2) satisfies an often applied ordering of scale by [3], is invariant with respect to translations and equivariant with respect to scale transformations. Additionally, we provide certain results concerning the sum of independent random variables. In the seventh section, we propose an L-estimator for C P E φ . Some basic properties of this estimator like influence function, consistency, and asymptotic normality are shown. In the eighth section, we introduce several new statistical concepts based on C P E φ , which are generalizing divergence, mutual information, Gini correlation, and Gini regression. Additionally, we show that new linear rank tests for dispersion can be based on C P E φ . The known linear rank tests like the Mood- or the Ansari-Bradley tests are special cases of this general approach. However, in this paper we exclude most of the technical details for they will be presented in several accompanying papers. In the last section we compute Equation (2) for certain generating functions φ and some selected families of distributions.

2. State of the Art—An Overview

Entropies are usually defined on the simplex of probability vectors, which are summing up to one (cf. [2,11]). Until now it has been rather usual to calculate the Shannon entropy not for vectors of probability or probability density functions f, but for distribution functions F. The corresponding Shannon entropy is given by
C P E S ( F ) = - R F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) d x .
Nevertheless, we have identified five scientific disciplines directly or implicitly working with an entropy based on distribution functions or survivor functions:
  • Fuzzy set theory,
  • Generalized ME principle,
  • Theory of dispersion of ordered categorial variables,
  • Uncertainty theory,
  • Reliability theory.

2.1. Fuzzy Set Theory

To the best of our knowledge, Equation (5) was initially introduced by [12]. However, they did not consider the entropy for a cdf F. Instead, they were concerned with a so-called membership function μ A that quantifies the degree to which a certain element x of a set Ω belongs to a subset A Ω . Membership functions were introduced by [13] within the framework of the “fuzzy set theory”.
It is important to note that if all elements of Ω are mapped to the value 1 / 2 , maximum uncertainty about x belonging to a set A will be attained.
This main property is one of the axioms of membership functions. In the aftermath of [12] numerous modifications to the term “entropy” have been made and axiomatizations of the membership functions have been stated (see, e.g., the overview in [14]).
Finally, those modifications proceeded parallel to a long history of extensions and parametrizations of the term entropy for probability vectors and densities. It began with [15] up to [16,17], who provided a superstructure of those generalizations consisting of a very general form of the entropy, including the φ-entropy Equation (1) as a special case. Burbea et al. [1] introduced the term φ-entropy. If both φ ( x ) and φ ( 1 - x ) appeared in the entropy, as in the Fermi-Dirac entropy (cf. [18], p. 191), they used the term “paired” φ-entropy.

2.2. Generalized Maximum Entropy Principle

Regardless of the debate in the fuzzy set theory and the theory of measurement of dispersion, Kapur [10] showed that a growth model with a logistic growth rate is yielded as the solution of maximizing Equation (5) under two simple constraints. This provides an example for the “generalized maximum entropy principle” postulated by Kesavan et al. [19]. In addition to that, the simple ME principle introduced by [20,21] derives a distribution which maximizes an entropy given certain constraints. Furthermore, the generalization of [19] consists of determining the φ-entropy, which is maximized given a distribution and some constraints. Finally, they used a slightly modified formula Equation (5). The cdf had to be replaced by a monotonically increasing function with logistic shape.

2.3. Theory of Dispersion

Irrespectively of the discussion on membership functions in the fuzzy set theory and the proposals of generalizing the Shannon entropy, Leik [22] discussed a measure of dispersion for ordered categorial variables with a finite number k of categories x 1 < x 2 < < x k . His measure is based on the distance between the k - 1 -dimensional vectors of cumulated frequencies ( F 1 , F 2 , , F k - 1 ) and ( 1 / 2 , 1 / 2 , , 1 / 2 ) . Both vectors only coincide if the extreme categories x 1 and x k appear with same frequency. This represents the case of maximal dispersion. Consider
C P E φ ( F ) = i = 1 k - 1 φ ( F i ) + φ ( 1 - F i )
as discrete version of Equation (2). Setting φ ( u ) = min { u , 1 - u } , we get the measure of Leik as a special case of Equation (6) up to a change of sign. Vogel et al. [23] considered φ ( u ) = - u ln ( u ) and the Shannon variation of Equation (6) as measure of dispersion for ordered categorial variables. Numerous modifications of Leik’s measure of dispersion have been published. In [24,25,26,27,28,29], the authors implicitly used φ ( u ) = 1 / 4 - ( u - 1 / 2 ) 2 or equivalently φ ( u ) = u ( 1 - u ) . Most of the discussion was conducted in the journal “Perceptual and Motor Skills”. For a recent overview of measuring dispersion including ordered categorial variables see, e.g., [30]. Instead of dispersion, some articles are concerned with related concepts for ordered categorial variables, like bipolarization and inequality (cf. [31,32,33,34,35]). A class of measures of dispersion for ordered categorial variables with a finite number of categories that is similar to Equation (6) has been introduced by Klein and Yager [36,37] independently of each other. They had obviously not been aware of the discussion in “Perceptual and Motor Skills”. Both authors gave axiomatizations to describe which functions φ will be appropriate for measuring dispersion. However, at least Yager [37] recognized the close relationship between those measures and the general term “entropy” in the fuzzy set theory. He introduced the term “dissonance” to more precisely characterize measures of dispersion for ordered categorial variables. In the language of information theory, maximum dissonance describes an extreme case in which there is still some information. But this information is extremely contradictory. As an example, we could ask in the field of product evaluation to what degree information, which states that 50 percent of the recommendations are extremely good and at the same time 50 percent are extremely bad, is useful to make a purchase decision. This is an important difference to the Shannon entropy, which is maximal if there is no information at all, i.e., all categories occur with same probability.
Bowden [38] defines the location entropy function h ( x ) = - F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) , given a value of x. He emphasizes the possibility to construct measures of spread and symmetry based on this function. To the best of our knowledge, Bowden [38] is the only one to mention the application of cumulated paired Shannon entropy to continuous distributions so far.

2.4. Uncertainty Theory

Reference ([6] (first edition 2004) can be considered the founder of the uncertainty theory. This theory is concerned with formalizing data consisting of expert opinions rather than formalizing data gathered by repeating a random experiment. Liu slightly modified the Kolmogoroff axioms of probability theory to receive an uncertainty measure, following which he defined uncertain variables, uncertainty distribution functions, and moments of uncertain variables. Liu argued that “an event is the most uncertain if its uncertainty measure is 0 . 5 , because the event and its complement may be regarded as ‘equally likely’ ” ([6], p. 14). Liu’s maximum uncertainty principle states: “For any event, if there are multiple reasonable values that an uncertain measure may take, the value as close to 0 . 5 as possible is assigned to the event” [6] (p. 14). Similar to the fuzzy set theory, the distance between the uncertainty distribution and the value 0 . 5 can be measured by the Shannon-type entropy Equation (5). Apparently for the first time in the third edition of 2010, he explicitly calculated Equation (5) for several distributions (e.g., the logistic distribution) and derived upper bounds. He applied the ME principle to uncertainty distributions. The preferred constraint is to predetermine values of mean and variance ([6], p. 83ff.). In this case, the logistic distribution maximizes Equation (5). In this context, the logistic distribution plays the same role in uncertainty theory as the Gaussian distribution in probability theory. The Gaussian distribution maximizes the differential entropy, given values for mean and variance. Therefore, in uncertainty theory the logistic distribution is called “normal The authors of distribution”. [39] provided Equation (5) as a function of the quantile function. In addition to that, the authors of [40] chose φ ( u ) = u ( 1 - u ) , u [ 0 , 1 ] , as entropy generating function and derived the ME distribution as a discrete uniform distribution, which is concentrated on the endpoints of the compact domain [ a , b ] if no further restrictions are assumed. Popoviciu [5] attained the same distribution by maximizing the variance. Chen et al. [41] introduced cross entropies and divergence measures based on general functions φ. Further literature on this topic is provided by [42,43,44].

2.5. Reliability Theory

Entropies also play a prominent role in reliability theory. They were initially introduced in the fields of hazard rates and residual lifetime distributions (cf. [45]). In addition, the authors of [46,47] introduced the cumulative residual entropy Equation (3), discussed its properties, and derived the exponential and the Weibull distribution by an ME principle, given the coefficient of variation. This work went into detail on the advantage of defining entropy via survivor functions instead of probability density functions. Rao et al. [46] refer to the extensive criticism on the differential entropy by [48]. Moreover, Zografos et al. [49] generalized the Shannon-type cumulative residual entropy to an entropy of the Rényi type. Furthermore, Drissi et al. [50] considered random variables with general support. They also presented solutions for the maximization of Equation (3), provided that more general restrictions are considered. Similar to [51], they identified the logistic distribution to be the ME distribution, given mean, variance, and a symmetric form of the distribution function.
Di Crescenzo et al. [9] analyzed Equation (4) for cdfs and discussed its stochastic properties. Sunoj et al. [52] plugged the quantile function into the Shannon-type entropy Equation (4) and presented expressions if the quantile function possesses a closed form, but not the cdf. In recent papers an empirical version of Equation (3) is used as goodness-of-fit test (cf. [53]).
Additionally, C R E and C E are applied to the distribution function of the residual lifetime ( X - t | X > t ) and the inactivity time ( t - X | X < t ) (cf. [54]). This can directly be generalized to the C P E framework.
Moreover, Psarrakos et al. [55] provides an interesting alternative generalization of the Shannon case. In this paper we focus on the class of concave functions φ. Special extensions to non-concave functions will be subject to future research.
This brief overview shows that different disciplines are accessing an entropy based on distribution functions. The contributions of the fuzzy set theory, the uncertainty theory, and the reliability theory all have the exclusive consideration of continuous random variables in common. The discussions about entropy in reliability theory on the one hand and fuzzy set theory and uncertainty theory, respectively, on the other hand were conducted independently of each other without even noticing the results of the other disciplines. However, Liu’s uncertainty theory benefits from the discussion in the fuzzy set theory. In the theory of dispersion of ordered categorial variables the authors do not appear to be aware of their implicit use a concept of entropy. Nevertheless the situation is somewhat different to that of the other areas since only discrete variables were discussed. Kiesl’s dissertation [56] provides a theory of measures of the form Equation (6) with numerous applications. However, an intensive discussion of Equation (2) is missing and will be provided here.

3. Cumulative Paired φ-Entropy for Continuous Variables

3.1. Definition

We focus on absolute continuous cdfs F with density functions f. The set of all those distribution functions is called F . We call a function “entropy generating function” if it is non-negative and concave on the domain [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . In this case, φ ( u ) + φ ( 1 - u ) is a symmetric function with respect to 1 / 2 .
Definition 1. 
The functional C P E φ : F R 0 + with
C P E φ ( F ) = R φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x
is called cumulative paired φ-entropy for F F with entropy generating function φ.
Up to now, we assumed the existence of C P E φ . In the following section we will discuss some sufficient criteria ensuring the existence of C P E φ . If X is a random variable with cdf F, we occasionally use the notation C P E φ ( X ) instead.
Next, some examples of well established concave entropy generating functions φ and corresponding cumulative paired φ-entropies will be given.
  • Cumulative paired α-entropy C P E α : Following [57], let φ be given by
    φ ( u ) = u u α - 1 - 1 1 - α , u [ 0 , 1 ] ,
    for α > 0 . The corresponding so-called cumulative paired α-entropy is
    C P E α ( F ) = R F ( x ) F ( x ) α - 1 - 1 1 - α + F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x .
  • Cumulative paired Gini entropy C P E G : For α = 2 we get
    C P E G ( F ) = 2 R F ( x ) F ¯ ( x ) d x
    as a special case of C P E α .
  • Cumulative paired Shannon entropy C P E S : Set φ ( u ) = - u ln u , u [ 0 , 1 ] . Thus,
    C P E S ( F ) = - R F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) d x
    gives the entropy which was already mentioned in the introduction. It is a special case of C P E α for α 1 .
  • Cumulative paired Leik entropy C P E L : The function
    φ ( u ) = min { u , 1 - u } = 1 2 - u - 1 2 , u [ 0 , 1 ] ,
    represents the limiting case of a linear concave function φ. The measure of dispersion proposed by [22] implicitly makes use of φ such that we call
    C P E L ( F ) = 2 R min { F ( x ) , F ¯ ( x ) } d x
    cumulative paired Leik entropy.
Figure 1 gives an impression of the previously mentioned generating functions φ .
Figure 1. Some entropy generating functions φ.

3.2. Advantages of Entropies Based on Cdfs

The authos of [46,47] list several reasons for better defining an entropy for distribution functions rather than defining it for density functions. Starting point is the well-known critique of Shannon’s differential entropy - f ( x ) ln f ( x ) d x that was expressed by several authors like [48,58] and (p. 58f) in [59].
Transferred to cumulative paired entropies, the advantages of entropies based on distribution functions (cf. [46]) are as follows:
  • C P E φ is based on probabilities and has a consistent definition for both discrete and continuous random variables.
  • C P E φ is always non-negative.
  • C P E φ can easily be estimated by the empirical distribution function. This estimation is strongly consistent, due to the strong consistency of the empirical distribution function.
Problems of the differential entropy are occasionally discussed in case of grouped data, at which the usual Shannon entropy is calculated for each group probability. With an increasing amount of groups, the Shannon entropy not only does not converge to the respective differential entropy, but it even diverges (cf., e.g., (p. 54) in ([59], (p. 239) in [60]). In the next section we will show that the discrete version of C P E φ converges to C P E φ as the number of groups approaches infinity.

3.3. C P E φ for Grouped Data

First, we show the notation for characterizing grouped data. The interval [ x ˜ 0 , x ˜ k ] is divided into k subintervals with limits x ˜ 0 < x ˜ 1 < . . . < x ˜ k - 1 < x ˜ k . The range of each group is called Δ x i = x ˜ i - x ˜ i - 1 for i = 1 , 2 , . . . , k . Let X be a random variable with absolute continuous distribution function F, which is only known at the limits of each group. The probabilities of each group are denoted by p i = F ( x ˜ i ) - F ( x ˜ i - 1 ) , i = 1 , 2 , . . . , k . X * is the random variable whose distribution function F * is yielded by linear interpolation of the values of F at the limits of successive groups. Finally, X * is the result of adding an independent, uniformly distributed random variable to X. It holds that
F * ( x ) = F ( x ˜ i - 1 ) + p i Δ x i ( x - x ˜ i - 1 ) if   x ˜ i - 1 < x x ˜ i
for x R , F * ( x ) = 0 for x x ˜ 0 and F * ( x ) = 1 for x > x ˜ k .
Let X * denote the respective random variable of F * . The probability density function f * of X * is defined by f * ( x ) = p i / Δ x i for x ˜ i - 1 < x x ˜ i , i = 1 , 2 , . . . , k .
Lemma 1. 
Let φ be an entropy generating function with antiderivative S φ . The paired cumulative φ-entropy of the distribution function in Equation (12) is given as follows:
C P E φ ( X * ) = i = 1 k Δ x i p i S φ ( F ( x ˜ i ) ) - S φ ( F ( x ˜ i - 1 ) ) + S φ ( F ¯ ( x ˜ i - 1 ) ) - S φ ( F ¯ ( x ˜ i ) ) .
Proof. 
For x ( x ˜ i - 1 , x i ] , we have
F * ( x ) = a i + b i x with b i = p i Δ x i and a i = F ( x ˜ i - 1 ) - b i x ˜ i - 1
with a i + b i x ˜ i - 1 = F ( x ˜ i - 1 ) , a i + b i x ˜ i = F ( x ˜ i ) , 1 - a i - b i x ˜ i - 1 = F ¯ ( x ˜ i - 1 ) , and 1 - a i - b i x ˜ i = F ¯ ( x ˜ i ) , i = 1 , 2 , , k . With y = a i + b i x and d x = 1 / b i d y we have
C P E φ ( X * ) = i = 1 k x ˜ i - 1 x ˜ i φ ( a i + b i x ) + φ ( 1 - a i - b i x ) d x = i = 1 k 1 b i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) + φ ( 1 - y ) d y = i = 1 k Δ x i p i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) d y - F ¯ ( x ˜ i - 1 ) F ¯ ( x ˜ i ) φ ( y ) d y = i = 1 k Δ x i p i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) d y + F ¯ ( x ˜ i ) F ¯ ( x ˜ i - 1 ) φ ( y ) d y = i = 1 k Δ x i p i S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) + S φ F ¯ ( x ˜ i - 1 ) - S φ F ¯ ( x ˜ i ) .
 ☐
Considering this result, we can easily prove the convergence property for C P E φ ( X * ) :
Theorem 1. 
Let φ be a generating function with antiderivative S φ and let F be a continuous distribution function of the random variable X with support [ a , b ] . X * is the corresponding random variable for grouped data with Δ x = ( b - a ) / k , k > 0 . Then the following holds:
C P E φ ( X * ) a b φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x f o r   k .
Proof. 
Consider equidistant classes with Δ x i = Δ x = ( b - a ) / k , i = 1 , 2 , . . . , k . Subsequently, Equation (13) results in
C P E φ ( X * ) = i = 1 k S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) + S φ F ¯ ( x ˜ i - 1 ) - S φ F ¯ ( x ˜ i ) F ( x ˜ i ) - F ( x ˜ i - 1 ) Δ x .
With k we have Δ x 0 such that for F continuous we get F ( x ˜ i ) - F ( x ˜ i - 1 ) 0 . The antiderivative S φ has the derivative φ almost everywhere such that with k
i = 1 k S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) Δ x a b φ ( F ( x ) ) d x .
An analogue argument holds for the second term of Equation (14). ☐
In addition to this theoretical result we get the following useful expressions for C P E φ for grouped data and a specific choice of φ as Corollary 1 shows:
Corollary 1. 
Let φ be s.t.
φ ( u ) = - u ln u f o r   α = 1 - u u α - 1 - 1 1 - α f o r   α 1 ,
where u [ 0 , 1 ] . Then for α = 1
C P E S ( X * ) = - 1 2 i = 1 k Δ x i p i F ( x ˜ i ) 2 ln F ( x ˜ i ) - F ( x ˜ i - 1 ) 2 ln F ( x ˜ i - 1 ) - 1 2 i = 1 k Δ x i p i F ¯ ( x ˜ i - 1 ) 2 ln F ¯ ( x ˜ i - 1 ) - F ¯ ( x ˜ i ) 2 ln F ¯ ( x ˜ i ) + 1 2 ( x ˜ k - x ˜ 0 )
and for α 1
C P E α ( X * ) = 1 1 - α i = 1 k Δ x i p i ( 1 α + 1 F ( x ˜ i ) α + 1 - F ( x ˜ i - 1 ) α + 1 + F ¯ ( x ˜ i - 1 ) α + 1 - F ¯ ( x ˜ i ) α + 1 ) - ( x ˜ k - x ˜ 0 ) .
Proof. 
Using the antiderivatives
S α ( u ) = - 1 2 u 2 ln u + 1 4 u 2 for   α = 1 1 1 - α 1 α + 1 u α + 1 - 1 2 u 2 for   α 1 ,
since p i = F ( x ˜ i ) - F ( x ˜ i - 1 ) , it holds that
1 p i F ( x ˜ i ) 2 - F ( x ˜ i - 1 ) 2 + F ¯ ( x ˜ i - 1 ) 2 - F ¯ ( x ˜ i ) 2 = F ( x ˜ i ) - F ( x ˜ i - 1 ) F ( x ˜ i ) + F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) + F ¯ ( x ˜ i - 1 ) - F ¯ ( x ˜ i ) F ¯ ( x ˜ i - 1 ) + F ¯ ( x ˜ i ) F ( x ˜ i ) - F ( x ˜ i - 1 ) = 2
for i = 1 , 2 , . . . , k . The results follow immediately. ☐

3.4. Alternative Representations of C P E φ

In case φ ( 0 ) = φ ( 1 ) = 0 holds and φ is differentiable, one can provide several alternative representations of C P E φ in addition to Eqaution (7). These alternative representations will be useful in the following to find conditions ensuring the existence of C P E φ and to find some simple estimators.
Proposition 1. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . In this case, for F F with quantile function F - 1 ( u ) , density function f, and quantile density function q ( u ) = 1 / f ( F - 1 ( u ) ) , for u [ 0 , 1 ] , the following holds:
C P E φ ( F ) = 0 1 φ ( u ) + φ ( 1 - u ) q ( u ) d u ,
C P E φ ( F ) = 0 1 ( φ ( 1 - u ) - φ ( u ) ) F - 1 ( u ) d u ,
C P E φ ( F ) = R x ( φ ( F ¯ ( x ) ) - φ ( F ( x ) ) ) f ( x ) d x .
Proof. 
Apply probability integral transform U = F ( X ) and partial integration. ☐
Due to φ ( 0 ) = φ ( 1 ) = 0 it holds that
0 1 ( φ ( 1 - u ) - φ ( u ) ) d u = 0 .
This property supports the understanding of C P E φ being a covariance for which the Cauchy–Schwarz inequality gives an upper bound:
Corollary 2. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . Then if U is uniformly distributed on [ 0 , 1 ] and X F :
C P E φ ( F ) = C o v ( φ ( 1 - U ) - φ ( U ) , F - 1 ( U ) ) ,
C P E φ ( F ) = C o v ( φ ( F ¯ ( X ) ) - φ ( F ( X ) ) , X ) .
Proof. 
Let μ = E [ X ] , then since E [ φ ( 1 - U ) - φ ( U ) ] = 0 ,
C P E φ ( F ) = 0 1 ( φ ( 1 - u ) - φ ( u ) ) F - 1 ( u ) d u = 0 1 ( φ ( 1 - u ) - φ ( u ) ) ( F - 1 ( u ) - μ ) d u .
 ☐
Depending on the context, we switch between these alternative representations of C P E φ .

4. Sufficient Conditions for the Existence of CPE φ

4.1. Deriving an Upper Bound for C P E φ

The Cauchy–Schwarz inequality for Equations (18) and (19), respectively, provides an upper bound for C P E φ if the variance σ 2 = E [ ( F - 1 ( u ) - μ ) 2 ] exists and
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u <
holds. The existence of the upper bound simultaneously ensures the existence of C P E φ .
Proposition 2. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . If Equation (20) holds, then for X F with V a r ( X ) < and quantile function F - 1 , we have
C P E φ ( F ) E ( φ ( 1 - U ) - φ ( U ) ) 2 E ( F - 1 ( U ) - μ ) 2
C P E φ ( F ) E ( φ ( F ¯ ( X ) ) - φ ( F ( X ) ) ) 2 σ 2 .
Proof. 
The statement follows from
E ( φ ( 1 - U ) - φ ( U ) ) ( F - 1 ( U ) - μ ) 2 0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u × E ( F - 1 ( U ) - μ ) 2 .
 ☐
Next, we consider the upper bound for the cumulative paired α-entropy:
Corollary 3. 
Let X be a random variable having a finite variance. Then
C P E α ( X ) σ α 1 - α 2 1 2 α - 1 - B ( α , α )
for α > 1 / 2 , α 1 with
C P E S ( X ) π σ 3
for α = 1 .
Proof. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) and φ ( u ) = ( α u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , 1 ] , we have
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = α 1 - α 2 0 1 u α - 1 - ( 1 - u ) α - 1 2 d u = 2 α 1 - α 2 0 1 u 2 ( α - 1 ) d u - 2 0 1 u α - 1 ( 1 - u ) α - 1 d u = 2 α 1 - α 2 1 2 α - 1 - B ( α , α ) .
α > 1 / 2 is required for the existence of C P E α ( X ) . For α = 1 we have φ ( u ) = - u ln u and φ ( u ) = - ln u - 1 , u [ 0 , 1 ] , such that
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = 0 1 ln 1 - u u 2 d u = π 2 3 .
 ☐
In the framework of uncertainty theory, the upper bound for the paired cumulative Shannon entropy was derived by [51] (see also [6], p. 83). For α = 2 we get the upper bound for the paired cumulative Gini entropy
C P E G ( X ) σ 2 3 .
This result has already been proven for non-negative uncertainty variables by [40]. Finally, one yields the following upper bound for the paired cumulative Leik entropy:
Corollary 4. 
Let X be a random variable with existing variance. Then
C P E L [ X ] 2 σ .
Proof. 
Use
0 1 ( sign ( u - 1 / 2 ) - sign ( 1 / 2 - u ) ) 2 d u = 4
to get the result. ☐

4.2. Stricter Conditions for the Existence of C P E α

So far, we only considered sufficient conditions for an existing variance. Following the arguments in [46,50], which were used for the special case of cumulative residual and residual Shannon entropy, one can derive stricter sufficient conditions for the existence of C P E α .
Theorem 2. 
If E ( | X | p ) < for p > 1 , then C P E α < for α > 1 / p .
Proof. 
To prepare the proof we first note that
u u α - 1 - 1 1 - α - u ln u u u β - 1 - 1 1 - β 1 - u
holds for 0 < β < 1 < α and 0 u 1 .
The second fact required for the proof is that
0 F ¯ ( x ) d x < and - 0 F ( x ) d x <
if E ( X ) < , because
E ( X ) = 0 F ¯ ( x ) d x + - 0 F ( x ) d x .
Third, it holds that
P ( - X y ) P ( | X | y ) for   y > 0 ,
because
P ( | X | y ) = 1 - P ( | X | < y ) = 1 - ( P ( X < y ) - P ( X - y ) ) = 1 - P ( X < y ) + P ( X - y ) = 1 - P ( X < y ) + P ( - X y ) .
C P E α consists of four indefinite integrals:
C P E α = 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x + - 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x + - 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x + 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x .
It must be shown separately that these integrals converge.
The convergence of the first two terms follows directly from the existence of E ( X ) . With Equations (27) and (28) we have for α > 0
0 F ( x ) F ( x ) α - 1 - 1 1 - α d x 0 F ¯ ( x ) d x <
and
- 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x - 0 F ( x ) d x < .
For the third term we have to demonstrate that
- 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x <
for α > 1 / p . If p > 1 , there is a β with 1 / p < β < 1 while β < α . With Equation (27) it is for - < x 0 that
F ( x ) F ( x ) α - 1 1 - α F ( x ) F ( x ) β - 1 1 - β 1 1 - β F ( x ) β
because 1 - β > 0 .
With F ( x ) = P ( X x ) = P ( - X - x ) there exists
1 1 - β F ( x ) β 1 1 - β for   0 - x 1 = 1 β - 1 P ( - X - x ) β 1 β - 1 P ( | X | - x ) β for   1 < - x < .
For p > 0 the transformation g ( y ) = y p is monotonically increasing for y > 1 . Using the Markov inequality we get
P ( | X | y ) E [ | X | p ] y p .
Putting these results together, we attain
- 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x 1 1 - β + 1 1 - β 1 E [ | X | p ] β y p β d y <
for β > 1 / p (and thus for p β > 1 ) and due to 1 1 / y q d y < for q > 1 .
It remains to show the convergence of the fourth term:
0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x <
for α > 1 / p . For p > 1 , there is a β with 1 / p < β < 1 and β < α . Due to Equation (27) and 1 - β > 0 for 0 x < it is true that
F ¯ ( x ) F ¯ ( x ) α - 1 1 - α F ¯ ( x ) F ¯ ( x ) β - 1 1 - β 1 1 - β F ¯ ( x ) β .
With F ¯ ( x ) = P ( X > x ) we have
1 1 - β F ¯ ( x ) β 1 1 - β for   0 x 1 = 1 β - 1 P ( X x ) β 1 β - 1 P ( | X | x ) β for   1 < x < .
Now, the Markov inequality gives
P ( | X | y ) E ( | X | p ) y p .
In summary, we have
- 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x 1 1 - β + 1 1 - β 1 E [ | X | p ] β y p β d y <
for β > 1 / p and by 1 1 / y q d y < for q > 1 . This completes the proof. ☐
Following Theorem 2, depending on the number of existing moments, specific conditions for α arise in order to ensure the existence of C P E α :
  • If the variance of X exists (i.e., p = 2 ), C P E α ( X ) exists for α > 1 / 2 .
  • For p > 1 , E [ | X | p ] < is sufficient for the existence of C P E S (i.e., α = 1 ).
  • For p = 1 , E [ | X | p ] < is sufficient for the existence of C P E G (i.e., α = 2 ).

5. Maximum CPE φ Distributions

5.1. Maximum C P E φ Distributions for Given Mean and Variance

Equality in the Cauchy–Schwarz inequality gives a condition under which the upper bound is attained. This is the case if an affine linear relation between F - 1 ( U ) respectively X and φ ( 1 - U ) - φ ( U ) respectively φ ( F ¯ ( x ) ) - φ ( F ( X ) ) exists with probability 1. Since the quantile function is monotonically increasing, such an affine linear function can only exist if φ ( 1 - u ) - φ ( u ) is monotonic as well (de- or increasing). This implies that φ needs to be a concave function on [ 0 , 1 ] . In order to derive a maximum C P E φ distribution under the restriction that mean and variance are given, one may only consider concave generating functions φ.
We summarize this obvious but important result in the following Theorem:
Theorem 3. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . Then F is the maximum C P E φ distribution with prespecified mean μ and variance σ 2 of X F iff a constant b R exists such that
P F - 1 ( U ) - μ = σ E ( φ ( 1 - U ) - φ ( U ) ) 2 ( φ ( 1 - U ) - φ ( U ) ) = 1 .
Proof. 
The upper bound of the Cauchy–Schwarz inequality will be attained if there are constants a , b R such that the first restriction equals
P F - 1 ( U ) = a + b ( φ ( 1 - U ) - φ ( U ) ) = 1 .
The property φ ( 0 ) = φ ( 1 ) = 0 leads to E φ ( 1 - U ) - φ ( U ) = 0 such that
μ = 0 1 F - 1 ( u ) d u = a + b 0 1 ( φ ( 1 - u ) - φ ( u ) ) d u = a .
This means that there is a constant b R with
P F - 1 ( U ) - μ = b ( φ ( 1 - U ) - φ ( U ) ) = 1 .
The second restriction postulates that
σ 2 = 0 1 ( F - 1 ( u ) - μ ) 2 d u = b 2 E ( φ ( 1 - U ) - φ ( U ) ) 2 .
φ is concave on [ 0 , 1 ] with
- φ ( 1 - u ) - φ ( u ) 0 , u [ 0 , 1 ] .
Therefore, φ ( 1 - u ) - φ ( u ) is monotonically increasing. The quantile function is also monotonically increasing such that b has to be positive. This gives
b = σ E ( φ ( 1 - U ) - φ ( U ) ) 2 .
 ☐
The quantile function of the Tukey’s λ distribution is given by
Q ( u , λ ) = 1 λ ( u λ - ( 1 - u ) λ ) , u [ 0 , 1 ] , λ 0 .
Its mean and variance are
μ = 0 and σ 2 = 2 λ 2 1 2 λ + 1 - B ( λ + 1 , λ + 1 ) .
The domain is given by [ - 1 / λ , 1 / λ ] for λ > 0 .
By discussing the paired cumulative α-entropy, one can prove the new result that the Tukey’s λ distribution is the maximum C P E α distribution for prespecified mean and variance. Tukey’s λ distribution takes on the role of the Student-t distribution if one changes from the differential entropy to C P E α (cf. [61]).
Corollary 5. 
The cdf F maximizes C P E α for α > 1 / 2 under the restrictions of a given mean μ and given variance σ 2 iff F is the cdf of the Tukey λ distribution with λ = α - 1 .
Proof. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , 1 ] , we have
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = α 1 - α 2 0 1 ( ( 1 - u ) α - 1 - u α - 1 ) 2 d u = 2 α 1 - α 2 1 2 α - 1 - B ( α , α )
for α > 1 / 2 . As a consequence, the constant b is given by
b = 1 2 σ 1 - α α 1 2 α - 1 - B ( α , α ) - 1 / 2 ,
and the maximum C P E α distribution results in
F - 1 ( u ) - μ = σ 2 1 - α α 1 2 α - 1 - B ( α , α ) - 1 / 2 α 1 - α ( 1 - u ) α - 1 - u α - 1 = σ | α - 1 | 2 1 2 α - 1 - B ( α , α ) - 1 / 2 u α - 1 - ( 1 - u ) α - 1 α - 1 .
F - 1 can easily be identified as the quantile function of a Tukey’s λ distribution with λ = α - 1 and α > 1 / 2 . ☐
For the Gini case ( α = 2 ), one obtains the quantile function of a uniform distribution
F - 1 ( u ) = μ + σ 1 2 6 2 u - 1 = μ + σ 3 ( 2 u - 1 ) , u [ 0 , 1 ] ,
with domain [ μ - 3 σ , μ + 3 σ ] . This maximum C P E G distribution corresponds essentially to the distribution derived by Dai et al. [40].
The fact that the logistic distribution is the maximum C P E S distribution, provided mean and variance are given, was derived by Chen et al. [51] in the framework of uncertainty theory and by ([50], p. 4) in the framework of reliability theory. Both proved this result using Euler–Lagrange equations. In the interest of completeness, we provide an alternative proof via the upper bound of the Cauchy–Schwarz inequality.
Corollary 6. 
The cdf F maximizes C P E S under the restrictions of a known mean μ and a known variance σ 2 iff F is the cdf of a logistic distribution.
Proof. 
Since
0 1 ln 1 - u u 2 d u = π 2 3 ,
one receives
F - 1 ( u ) - μ = σ π / 3 ln 1 - u u , u [ 0 , 1 ] .
Inverting gives the distribution function of the logistic distribution with mean μ and variance 1:
F ( x ) = 1 1 + exp - π 3 x - μ σ , x R .
 ☐
As a last example we consider the cumulative paired Leik entropy C P E L .
Corollary 7. 
The cdf F maximizes C P E L under restrictions of a known mean μ and a known variance σ 2 iff for F holds
F ( x ) = 0 f o r   x < μ - σ 1 / 2 f o r   μ - σ x < μ + σ 1 f o r   x μ + σ .
Proof. 
From φ ( u ) = min { u , 1 - u } and φ ( u ) = sign ( 1 / 2 - u ) , u [ 0 , 1 ] , follows that
F - 1 ( u ) - μ = σ sign ( u - 1 / 2 ) , u [ 0 , 1 ] .
 ☐
Therefore, the maximization of C P E L with given mean and variance leads to a distribution whose variance is maximal on the interval [ μ - σ , μ + σ ] .

5.2. Maximum C P E φ Distributions for General Moment Restrictions

Drissi et al. [50] discuss general moment restrictions of the form
- c i ( x ) f ( x ) d x = 0 1 c i ( F - 1 ( U ) ) d u = k i , i = 1 , 2 , , k ,
for which the existence of the moments is assumed. By using Euler–Lagrange equations they show that
F ¯ ( x ) = 1 1 + exp ( i = 1 r λ i c i ( x ) ) , x R ,
maximizes the residual cumulative entropy - R F ¯ ( x ) ln F ¯ ( x ) d x under constraints Equation (31). Moreover, they demonstrated that the solution needs to be symmetric with respect to μ. Here, λ i , i = 1 , 2 , . . . , k , are the Lagrange parameters which are determined by the moment restrictions, provided a solution exists. Rao et al. [47] shows that for distributions with support R + the ME distribution is given by
F ¯ ( x ) = exp - i = 1 r λ i c i ( x ) , x > 0
if the restrictions Equation (31) are again required.
One can easily examine the shape of a distribution which maximizes the cumulative paired φ-entropy under the constraints Equation (31). This maximum C P E φ distribution can no longer be derived by the upper bound of the Cauchy–Schwarz inequality if i > 2 . One has to solve the Euler–Lagrange equations for the objective function
0 1 ( φ ( u ) - φ ( 1 - u ) ) F - 1 ( u ) d u - i = 1 k λ i ( c i ( F - 1 ( u ) ) - k i )
with Lagrange parameters λ i , i = 1 , 2 , , k . The Euler–Lagrange equations lead to the optimization problem
i = 1 k λ i c i ( F - 1 ( u ) ) = φ ( 1 - u ) - φ ( u ) , u [ 0 , 1 ] ,
for i = 1 , 2 , . . . , k . Once again there is a close relation between the derivative of the generating function and the quantile function, provided a solution of the optimization problem Equation (32) exists.
The following example shows that the optimization problem Equation (32) leads to a well-known distribution if constraints are chosen carefully in case of a Shannon-type entropy.
Example 1. 
The power logistic distribution is defined by the distribution function
F ( x ) = 1 1 + exp - λ sign ( x ) x γ , x R ,
for γ > 0 . The corresponding quantile function is
F - 1 ( u ) = 1 λ 1 / γ sign ( u - 1 / 2 ) ln 1 - u u 1 / γ , u [ 0 , 1 ] .
This quantile function is also solution of Equation (33) given φ ( u ) = - u ln u , u [ 0 , 1 ] , under the constraint E | X | γ + 1 = c . The maximum of the cumulative paired Shannon entropy under the constraint E | X | γ + 1 = c is given by
C P E S ( X ) = 0 1 ln 1 - u u 1 λ 1 / γ sign ( u - 1 / 2 ) · ln 1 - u u 1 / γ d u = 1 λ 1 / γ 0 1 ln 1 - u u ( γ + 1 ) / γ d u = λ E ( | X | γ + 1 ) .
Setting γ = 1 leads to the familiar result for the upper bound of C P E S given the variance.

5.3. Generalized Principle of Maximum Entropy

Kesavan et al. [19] introduced the generalized principle of an ME problem which describes the interplay of entropy, constraints, and distributions. A variation of this principle is the aim of finding an entropy that is maximized by a given distribution and some moment restrictions.
This problem can easily be solved for C P E φ if mean and variance are given, due to the linear relationship between φ ( 1 - u ) - φ ( u ) and the quantile function F - 1 ( u ) of the maximum C P E φ distribution provided by the Cauchy–Schwarz inequality. However, it is a precondition for F - 1 ( u ) that φ ( 1 - u ) - φ ( u ) is strictly monotonic on [ 0 , 1 ] in order to be a quantile function. Therefore, the concavity of φ ( u ) and the condition φ ( 0 ) = φ ( 1 ) = 0 are of key importance.
We demonstrate the solution to the generalized principle of the maximum entropy problem for the Gaussian and the Student-t distribution.
Proposition 3. 
Let φ, Φ and Φ - 1 be the density, the cdf and the quantile function of a standard Gaussian random variable. The Gaussian distribution is the maximum C P E φ distribution for a given mean μ and variance σ 2 for C P E φ with entropy generating function
φ ( u ) = φ ( Φ - 1 ( u ) ) , u [ 0 , 1 ] .
Proof. 
With
φ ( u ) = φ ( Φ - 1 ( u ) ) φ ( Φ - 1 ( u ) ) = - Φ - 1 ( u ) , u [ 0 , 1 ] ,
the condition for the maximum C P E φ distribution with mean μ and variance σ 2 becomes
F - 1 ( u ) - μ = σ 0 1 ( 2 Φ - 1 ( u ) ) 2 d u 2 Φ - 1 ( u ) , u [ 0 , 1 ] .
By substituting 0 1 ( 2 Φ - 1 ( u ) ) 2 d u = 4 , it follows that
F - 1 ( u ) - μ = σ Φ - 1 ( u ) , u [ 0 , 1 ] ,
such that F - 1 is the quantile function of a Gaussian distribution with mean μ and variance σ 2 . ☐
An analogue result holds for the Student-t distribution with k degrees of freedom. In this case, the main difference to the Gaussian distribution is the fact that the entropy generating function possesses no closed form but is obtained by numerical integration of the quantile function.
Corollary 8. 
Let t k respectively t k - 1 be the cdf respectively the quantile function of a Student-t distribution with k degrees of freedom for k > 2 . μ + k k - 2 t k - 1 is the maximum C P E φ quantile function for a given mean μ and variance σ 2 iff
φ ( u ) = k - 2 k 0 u t k - 1 ( p ) d p , u [ 0 , 1 ] .
Proof. 
Starting with
φ ( u ) = - k - 2 k t k - 1 ( u ) , u [ 0 , 1 ] ,
and the symmetry of the t k distribution around μ, we get the condition
F - 1 ( u ) - μ = σ 0 1 ( 2 t k - 1 ( u ) ) 2 d u 2 k - 2 k t k - 1 ( u ) , u [ 0 , 1 ] .
With 0 1 ( t k - 1 ( u ) ) 2 d u = k / ( k - 2 ) we get the quantile function of the t distribution with k degrees of freedom and mean μ:
F - 1 ( u ) - μ = σ k - 2 k t k - 1 = t k - 1 ( u ) , u [ 0 , 1 ] .
 ☐
Figure 2 shows the shape of the entropy generating function φ for several distributions generated by the generalized ME principle.
Figure 2. Several entropy generating functions φ derived from the generalized maximum entropy (ME) principle.

6. CPE φ as a Measure of Scale

6.1. Basic Properties of C P E φ

The cumulative residual entropy ( C R E ) introduced by [46], the generalized cumulative residual entropy (GCRE) of [50], and the cumulative entropy ( C E ) discussed by [8,9], have always been interpreted as measures of information. However, all these approaches do not explain which kind of information was considered. In contrast to this interpretation as measures of information, Oja [3] proved that the differential entropy satisfies a special ordering of scale and has certain meaningful properties of measures of scale. In [4], the authors discussed the close relationship between differential entropy and variance. In the discrete case the Shannon entropy can be interpreted as a measure of diversity, which is a concept of dispersion if there is no ordering and no distance between the realizations of a random variable. In this section, we will clarifying the important role which the variance plays for the existence of C P E φ .
Therefore, we intend to provide a deeper insight in C P E φ as a proper MOS. We start by showing that C P E φ has typical properties of an MOS. In detail, a proper MOS should always be non-negative and attain its minimal value 0 for a degenerated distribution. If a finite interval [ a , b ] is considered as support, an MOS should attain its maximum if a and b occur with probability 1 / 2 . C P E φ possesses all these properties as shown in the next proposition.
Proposition 4. 
Let φ : [ 0 , 1 ] R with φ ( u ) > 0 for u ( 0 , 1 ) and φ ( 0 ) = φ ( 1 ) = 0 . Let X be a random variable with support D and C P E φ is assumed to exist. Then the following properties hold:
1. 
C P E φ ( X ) 0 .
2. 
C P E φ ( X ) = 0 iff there exists an x * with P ( X = x * ) = 1 .
3. 
C P E φ ( X ) attains its maximum iff there exist a , b with - < a < b < such that P ( X = a ) = P ( X = b ) = 1 / 2 .
Proof. 
  • Follows from the non-negativity of φ.
  • If there is an x * R with P ( X = x * ) = 1 , then F X ( x ) = 0 and F ¯ X ( x ) { 0 , 1 } for all x R . Due to φ ( 0 ) = φ ( 1 ) = 0 follows φ ( F X ( x ) ) = φ ( F ¯ X ( x ) ) = 0 for all x R .
    Set C P E φ ( X ) = 0 . Due to the non-negativity of the integrand of C P E φ , φ ( F X ( x ) ) + φ ( F ¯ X ( x ) ) = 0 must hold for x R . Since φ ( u ) > 0 , 0 < u < 1 , it follows F X ( x ) , F ¯ X ( x ) { 0 , 1 } for x [ 0 , 1 ] .
  • Let C P E φ ( X ) have a finite maximum. Since φ ( u ) + φ ( 1 - u ) has a unique maximum at u = 1 / 2 , the maximum of C P E φ ( X ) is
    2 D φ ( 1 / 2 ) d u = 2 φ ( 1 / 2 ) D d u .
    In order to attain the assumed finite maximum, the support D has to be a finite interval [ a , b ] . Here, 2 φ ( 1 / 2 ) ( b - a ) is the maximum. Now, it is sufficient to construct a distribution with support [ a , b ] that attains this maximum. Set
    F ( x ) = 0 for   x < a 1 / 2 for   a x b 1 for   x b ,
    then C P E φ ( F ) = a b φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x = 2 φ ( 1 / 2 ) ( b - a ) . Therefore, F is C P E φ -maximal.
    To prove the other direction of statement 3 we consider an arbitrary distribution G with survival function G ¯ and support [ a , b ] . Due to φ ( 0 ) = φ ( 1 ) = 0 and φ ( u ) + φ ( 1 - u ) 2 φ ( 1 / 2 ) , it holds that
    C P E φ ( G ) = a b φ ( G ( x ) ) + φ ( G ¯ ( x ) ) d x 2 φ ( 1 / 2 ) ( b - a ) = C P E φ ( F ) .
     ☐

6.2. C P E φ and Oja’s Axioms for Measures of Scale

Oja ([3] p. 159) defined a MOS as follows:
Definition 2. 
Let F be a set of continuous distribution functions and ⪯ an appropriate ordering of scale on F . T : F R is called MOS, if
1. 
T ( a X + b ) = | a | T ( X ) for all a , b R , F F .
2. 
T ( X 1 ) T ( X 2 ) , for X 1 F 1 , X 2 F 2 , F 1 , F 2 F with F 1 F 2 .
Oja [3] discussed several orderings of scale. He showed in particular that Shannon entropy and variance satisfy a partial quantile based ordering of scale, which has been discussed by [62]. Referring to [63,64] criticized that this ordering and the location-scale family of distributions focused by Oja [3] were too restrictive. He discussed a more general nonparametric model of dispersion based on a more general ordering of scale (cf. [65,66]). In line with [4], we focus on the scale ordering proposed by [62].
Definition 3. 
Let F 1 , F 2 be continuous cdfs with respective quantile functions F 1 - 1 and F 2 - 1 . F 2 is said to be more spread out than F 1 ( F 1 1 F 2 ) if
F 2 - 1 ( u ) - F 2 - 1 ( v ) F 1 - 1 ( u ) - F 1 - 1 ( v ) f o r   a l l 0 < u < v < 1 .
If F 1 , F 2 are absolutely continuous with density functions f 1 , f 2 , 1 can be characterized equivalently by the property that F 2 - 1 F 1 - 1 ( x ) - x is monotonically non-decreasing or
f 1 ( F 1 - 1 ( u ) ) f 2 ( F 2 - 1 ( u ) ) , u [ 0 , 1 ]
(cf. [3], p. 160).
Next, we show that C P E φ is an MOS in the sense of [3]. This following lemma examines the behavior of C P E φ with respect to affine linear transformations, referring to the first axiom of Definition 2:
Lemma 2. 
Let F be the cdf of the random variable X. Then
C P E φ ( a X + b ) = | a | C P E φ ( X ) .
Proof. 
For Y = a X + b , it follows that
- φ ( P ( Y y ) ) d y = - P X y - b a d y for   a > 0 - P X > y - b a d y for   a < 0 .
Substitution of x = ( y - b ) / a with d y = a d x gives
- φ ( P ( Y y ) ) d y = a - P X x d x for   a > 0 - a - P X > x d x for   a < 0 .
Likewise, we have that
- φ ( P ( Y > y ) ) d y = a - P X > x d x for   a > 0 - a - P X x d x for   a < 0 ,
such that
C P E φ ( a X + b ) = | a | C P E φ ( X ) .
 ☐
In order to satisfy the second axiom of Oja’s definition of a measure of scale, C P E φ has to satisfy the ordering of scale ⪯. This is shown by the following lemma:
Lemma 3. 
Let F 1 and F 2 be continuous cdfs of the random variables X 1 and X 2 with F 1 1 F 2 . Then the following holds:
C P E φ ( X 1 ) C P E φ ( X 2 ) .
Proof. 
One can show with u = F i ( x ) that
C P E φ ( F i ) = 0 1 φ ( u ) 1 f i ( F i - 1 ( u ) ) d u + 0 1 φ ( 1 - u ) 1 f i ( F i - 1 ( u ) ) d u
for i = 1 , 2 . Therefore,
C P E φ ( F 1 ) - C P E φ ( F 2 ) = 0 1 φ ( u ) 1 f 1 ( F 1 - 1 ( u ) ) - 1 f 2 ( F 2 - 1 ( u ) ) d u + 0 1 φ ( 1 - u ) 1 f 1 ( F 1 - 1 ( u ) ) - 1 f 2 ( F 2 - 1 ( u ) ) d u .
If F 1 1 F 2 and hence f 1 F 1 - 1 ( u ) f 2 F 2 - 1 ( u ) for u [ 0 , 1 ] , it follows that C P E φ ( F 1 ) - C P E φ ( F 2 ) 0 . ☐
As a consequence of Lemma 2 and Lemma 3, C P E φ is an MOS in the sense of [3]. Thus, not only variance, differential entropy, and other statistical measures have the properties of measures of scale, but also C P E φ .

6.3. C P E φ and Transformations

Ebrahimi et al. ([4] p. 323), the authors considered cdf F 1 , F 2 on domain D 1 , D 2 and density functions f 1 , f 2 , which are connected via F 2 ( x ) = F 1 g - 1 ( x ) , x D 1 , via a differentiable transformation g : D 1 D 2 , that is F 2 ( y ) = F 1 g ( y ) respectively f 2 ( y ) = f 1 g - 1 ( y ) d g - 1 ( y ) / d y for y D 1 . Thus, they demonstrated for Shannon’s differential entropy H that the transformation only affects the difference:
H ( f 2 ) = H ( f 1 ) - D 2 ln d g - 1 ( y ) d y f 2 ( y ) d y .
For C P E φ , one gets a less explicit relationship between C P E φ ( F 2 ) and C P E φ ( F 1 ) :
C P E φ ( F 2 ) = D 1 φ ( F 1 ( y ) ) + φ ( F ¯ 1 ( y ) ) d g - 1 ( y ) d y .
Transformations with | g ( y ) | 1 , y D 2 , are of special interest since these transformations do not diminish measures of scale. In Theorem 1, Ebrahimi et al. [4] showed that F 1 1 F 2 holds if | g ( y ) | 1 for y D 2 . Hence, no MOS can be diminished by this specific transformation, especially neither Shannon entropy nor C P E φ .
Ebrahimi et al. [4] considered the special transformation g ( x ) = a x + b , x D 1 . They showed that Shannon’s differential entropy is moved additively by this transformation, which is not expected for an MOS. Furthermore, the standard deviation is changed by the factor | a | , which is also true for C P E φ as shown in Lemma 2.

6.4. C P E φ for Sums of Independent Random Variables

As is generally known, variance and differential entropy behave additively for the sum of independent random variables X and Y. More general entropies such as the Rényi or the Havrda & Charvát entropy are only subadditive (cf. [18], p. 194).
Neither the property of additivity nor the property of subadditivity could be shown for cumulative paired φ-entropies. Instead, they possess the maximum property if φ is a concave function on [ 0 , 1 ] . This means that, for two independent variables X and Y, C P E φ ( X + Y ) is lower-bounded by the maximum of the two individual entropies C P E φ ( X ) and C P E φ ( Y ) . This result was shown by [46] for the cumulative residual Shannon entropy. The following Theorem generalizes this result, while the proof partially follows Theorem 2 of [46].
Theorem 4. 
Let X and Y be independent random variables and φ a concave function on the interval [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . Then we have
C P E φ ( X + Y ) max C P E φ ( X ) , C P E φ ( Y ) .
Proof. 
Let X and Y be independent random variables with distribution functions F X , F Y and densities f X , f Y . Using the convolution formula, we immediately get
P ( X + Y t ) = - F X ( t - y ) f Y ( y ) d y = E Y [ F X ( t - Y ) ] , t R .
Applying Jensen’s inequality for a concave function φ to Equation (37) results in
E Y φ ( F X ( t - Y ) ) φ E Y F X ( t - Y )
and
E Y φ ( F ¯ X ( t - Y ) ) φ E Y F ¯ X ( t - Y ) .
The existence of the expectation is assumed. To prove the Theorem, we begin with
C P E φ [ X + Y ] = - φ E Y F X ( t - Y ) + φ E Y F ¯ X ( t - Y ) d t .
By using Equations (38) and (39), setting z = t - y , and exchanging the order of integration, one yields
C P E φ [ X + Y ] - - φ F X ( t - y ) + φ F ¯ X ( t - y ) d t f Y ( y ) d y = - - φ F X ( z ) + φ F ¯ X ( z ) d z f Y ( y ) d y = - φ F X ( z ) + φ F ¯ X ( z ) d z = C P E φ [ X ] .
 ☐
In the context of uncertainty theory, Liu [6] considered a different definition of independence for uncertain variables leading to the simpler additivity property
C P E φ ( X + Y ) = C P E φ ( X ) + C P E φ ( Y )
for independent uncertain variables X and Y.

7. Estimation of CPE φ

Beirlant et al. [67] presented an overview of differential entropy estimators. Essentially, all proposals are based on the estimation of a density function f inheriting all typical problems of nonparametric estimation of a density function. Among others, the problems are biasedness, choice of a kernel, and optimal choice of the smoothing parameter (cf. [68], p. 215ff.). However, C P E φ is based on cdf F for which several natural estimators with desirable stochastic properties, derived from the Theorem of Glivenko and Cantelli (cf. [69], p. 61), exist. For a simple random sample ( X 1 , . . . , X n ) , independently distributed random variables with identical distribution function F, the authors of [8,9] estimated F using the empirical distribution function F n ( x ) = 1 n I ( X i x ) for x R . Moreover, they showed for the cumulative entropy C E ( F ) = - R F ( x ) ln F ( x ) d x that the estimator C E ( F n ) is consistent for C E ( F ) (cf. [8]). In particular, for F being the distribution function of a uniform distribution, they provided the expected value of the estimator and demonstrated that the estimator is asymptotically normal. For F being the cdf of an exponential distribution, they additionally derived the variance of the estimator.
In the following, we generalize the estimation approach of [8] by embedding it into the well-established theory of L-estimators (cf. [70], p. 55ff.). If φ is differentiable, then C P E φ can be represented as the covariance between the random variable X and φ ( F ¯ ( X ) ) - φ ( F ( X ) ) :
C P E φ ( F ) = E X φ ( F ¯ ( X ) ) - φ ( F ( X ) ) .
An unbiased estimator for this covariance is
C P E φ ( F n ) = 1 n i = 1 n X i ( φ ( 1 - F n ( X i ) ) - φ ( F n ( X i ) ) ) = 1 n i = 1 n X n : i ( φ ( 1 - F n ( X n : i ) ) - φ ( F n ( X n : i ) ) ) = 1 n i = 1 n φ 1 - i n + 1 - φ i n + 1 X n : i = i = 1 n c n i X n : i
where
c n i = 1 n φ 1 - i n + 1 - φ i n + 1 , i = 1 , 2 , , n .
This results in an L-estimator i = 1 n J ( i / ( n + 1 ) ) X n : i with J ( u ) = φ ( 1 - u ) - φ ( u ) , u ( 0 , 1 ) . By applying known results for the influence functions of L-estimators (cf. [70]), we get for the influence function of C P E φ :
I F ( x ; C P E φ , F ) = 0 1 u f ( F - 1 ( u ) ) ( φ ( 1 - u ) - φ ( u ) ) d u - F ( x ) 1 1 f ( F - 1 ( u ) ) ( φ ( 1 - u ) - φ ( u ) ) d u .
In particular, the derivative is
d I F ( x ; C P E φ , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) , x R .
This means that the influence function will be completely determined by the antiderivative of φ ( F ( x ) ) . The following examples demonstrate that the influence function of C P E φ can easily be calculated if the underlying distribution F is logistic. We consider the Shannon, the Gini, and the α-entropy cases.
Example 2. 
Beginning with the derivative
d I F ( x ; C P E S , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) = ln F ( x ) F ¯ ( x ) = x , x R ,
we arrive at
I F ( x , C P E S , F ) = 1 2 x 2 + C , x R .
The influence function is not bounded and proportional to the influence function of the variance, which implies that variance and C P E S have a similar asymptotic and robustness behavior. The integration constant C has to be determined such that E I F ( x ; C P E S , F ) = 0 :
C = - 1 2 E ( X 2 ) = - 1 2 π 2 3 = - π 2 6 .
Example 3. 
Using the Gini entropy C P E G and the logistic distribution function F we have
d I F ( x ; C P E G , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) = 2 ( 2 F ( x ) - 1 ) = 2 e x - 1 e x + 1 = 2 tanh x 2 , x R .
Integration gives the influence function
I F ( x , C P E G , F ) = 4 ln cosh x 2 + C , x R .
By applying numerical integration we get C = - 1 . 2741 .
Example 4. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) the derivative of the influence function is given by
d I F ( x ; C P E α , F ) d x = φ ( F ¯ ( x ) ) - φ F ( x ) = α 1 - α 1 - e ( α - 1 ) x ( 1 + e x ) α - 1 = α 1 - α 1 ( 1 + e x ) α - 1 - 1 ( 1 + e - x ) α - 1 , x R .
Integration leads to the influence function
I F ( x ; C P E α , F ) = 2 F 1 ( α , α ; α + 1 ; - e - x ) e α x α 1 + e - x + 1 e x + 1 α + 1 α - 1 1 + e x + e ( α - 1 ) x ( e x + 1 ) α - 1 ,
where
2 F 1 ( α , α ; α + 1 ; - e - x ) = α 0 1 t α - 1 1 + t e - x - α d t + C , x R .
Under certain conditions (cf. [71], p. 143) concerning J, or φ and F, L- estimators are consistent and asymptotically normal. So, the cumulative paired φ-entropy is
C P E φ ( F n ) a s y N C P E φ ( F ) , 1 n A ( F , C P E φ )
with asymptotic variance
A ( F , C P E φ ) = V a r ( I F ( X ; C P E φ ( F ) , F ) ) = - F ( x ) 1 φ ( 1 - u ) - φ ( u ) f ( F - 1 ( u ) ) d u 2 f ( x ) d x .
The following examples consider the Shannon and the Gini case for which the condition that is sufficient to guarantee asymptotic normality can easily be checked. We consider again the cdf F of the logistic distribution.
Example 5. 
For the cumulative paired Shannon entropy it holds that
C P E S ( F n ) a s y N C P E S ( F ) , 4 45 π 4
since
A ( F , L ) = V a r ( I F ( X ; C P E φ ( F ) , F ) ) = 1 4 V a r ( X 2 ) = 1 4 E ( X 4 ) - E ( X 2 ) = 4 45 π 4 .
Example 6. 
In the Gini case we get
C P E G ( F n ) a s y N C P E G ( F ) , 2.8405
since by numerical integration
A ( F , L ) = - 4 ln cosh x 2 - 1 . 2274 2 e - x ( 1 + e - x ) 2 d x = 2.8405 .
It is known that L-estimators have a remarkable small-sample bias. Following [72], the bias can be reduced by applying the Jackknife method. It is well-known that asymptotical distributions can be used to construct approximate confidence intervals as well as that they can be applied for hypothesis tests in the one- or two-sample case. ([70], p. 116ff.) discussed asymptotic efficient L-estimators for a parameter of scale θ. Klein et al. [73] examine how the entropy generating function φ will be determined by the requirement that C P E φ ( F n ) has to be asymptotically efficient.

9. Some Cumulative Paired Entropies for Selected Distribution Functions

In the following, we derive closed form expressions for some cumulative paired φ-entropies. We mimic the procedure of ([4], p. 326) to some degree. Table 1 of their paper contains multiple formulas of the differential entropy for the most popular statistical distributions. Several of these distributions will also be considered in the following. Since cumulative entropies depend on the distribution function or equivalently on the quantile function, we focus on families of distributions for which these functions have a closed form expression. Furthermore, we only discuss standardized random variables since the parameter of scale only has a multiplicative effect on C P E φ and the parameter of location has no effect. For the standard Gaussian distribution we provide the value of C P E S by numerical integration rounded to two decimal places since the probability function has no explicit form. For the Gumbel distribution however, there is a closed form expression for the distribution function – nevertheless, we were unable to establish a closed form of C P E S and C P E G . Therefore, we applied numerical integration in this case as well. In the following, next to the Gamma function Γ ( a ) and the Beta function B ( a , b ) , we use
  • the incomplete Gamma function
    Γ ( x ; a ) = 0 x y a - 1 e - y d y for   x > 0 , a > 0 ,
  • the incomplete Beta function
    B ( x ; a , b ) = 0 x u a - 1 ( 1 - u ) b - 1 d u for   0 < x < 1 , a , b > 0 ,
  • and the Digamma function
    ψ ( a ) = d d a ln Γ ( a ) , a > 0 .

9.1. Uniform Distribution

Let X have the standard uniform distribution. Then we have
C P E S ( X ) = 3 2 , C P E G ( X ) = 1 3 , C P E L ( X ) = 1 2 , C P E α ( X ) = 1 α + 1 .

9.2. Power Distribution

Let X have the Beta distribution on [ 0 , 1 ] with parameter α > 0 and b = 1 , i.e., density function f X ( x ) = a x a - 1 for x [ 0 , 1 ] , then we have
C P E S ( X ) = a ( a + 1 ) 2 + ψ a + 1 a - a + 1 a ψ a + 2 a + 1 a ψ ( 1 ) , C P E G ( X ) = 2 a ( 1 + a ) ( 1 + 2 a ) , C P E L ( X ) = a a + 1 1 - 1 2 1 / a , C P E α ( X ) = 1 a ( 1 - α ) B 1 a , α + 1 - α a ( 1 - α ) ( 1 + α a ) .

9.3. Triangular Distribution with Parameter c

Let X have a triangular distribution with density function
f ( x ) = 2 x / c for   0 < x < c 2 ( 1 - x ) / ( 1 - c ) for   c x < 1 .
Then the following holds:
C P E S ( X ) = π 2 6 + ln 2 ( 1 - ln 2 ) , C P E G ( X ) = 2 3 c 2 + ( 1 - c ) 2 - 2 5 c 3 + ( 1 - c ) 3 , C P E L ( X ) = 1 3 ( 2 - c ) - 3 - 2 3 2 1 - c , C P E α ( X ) = 1 1 - α 2 2 α + 1 c α + 1 + ( 1 - c ) α + 1 + c B c ; 1 2 , α + 1 + 1 - c B 1 - c ; 1 2 , α + 1 - 2 .

9.4. Laplace Distribution

Let X follow the Laplace distribution with density function f X ( x ) = 1 / 2 exp ( - | x | ) for x R , then we have
C P E S ( X ) = π 2 6 + ln 2 ( 1 - ln 2 ) , C P E G ( X ) = 3 2 , C P E L ( X ) = 2 , C P E α ( X ) = 4 α - 1 1 2 α - 1 1 α - 1 - 1 2 α .

9.5. Logistic Distribution

Let X follow the logistic distribution with distribution function F X ( x ) = 1 / ( 1 + exp ( - x ) ) for x R , then we have
C P E S ( X ) = π 2 3 , C P E G ( X ) = 2 , C P E L ( X ) = 4 ln 2 , C P E α = 2 α - 1 ( ψ ( α ) - ψ ( 1 ) ) .

9.6. Tukey λ Distribution

Let X follow the Tukey λ distribution with quantile function F - 1 ( U ) = 1 / λ u λ - ( 1 - u ) 1 - λ for 0 u 1 and λ > - 1 . Then the following holds:
C P E S ( X ) = 2 ( λ + 1 ) 2 1 + 1 + 1 λ ( λ + 1 ) ψ ( λ + 1 ) - ψ ( λ + 2 ) - ψ ( 1 ) , C P E G ( X ) = 4 λ + 1 1 + 1 λ , C P E L ( X ) = 2 1 λ + 1 1 2 λ + 1 + B 1 2 ; 2 , λ , C P E α ( X ) = 2 1 1 - α λ 3 - λ α - 2 ( λ + α ) λ 2 ( λ + 1 ) ( λ + α ) + B ( α + 1 , λ ) .

9.7. Weibull Distribution

Let X follow the Weibull distribution with distribution function F X ( x ) = 1 - e - x c for x > 0 , c > 0 , then we have
C P E S ( X ) = 1 c Γ 1 c 1 + i = 1 1 i ! 1 i 1 / c - 1 i + 1 1 / c , C P E G ( X ) = 2 c Γ 1 c - 1 2 Γ 1 2 c , C P E L ( X ) = 2 ( ln 2 ) 1 / c + 1 c Γ 1 c - 2 Γ ln 2 ; 1 c , C P E α ( X ) = 1 c Γ 1 c 1 α 1 / c + i = 1 α i ( - 1 ) i i - 1 / c .

9.8. Pareto Distribution

Let X follow the Pareto distribution with distribution function F X ( x ) = 1 - x - c for x > 1 , c > 1 , then we have
C P E S ( X ) = 1 c - 1 ψ 2 - 1 c + ψ 1 - 1 c - c c - 1 ψ ( 1 ) + 4 c , C P E G ( X ) = 2 c ( c - 1 ) ( 2 c - 1 ) , C P E L ( X ) = 2 1 c - 1 , C P E α ( X ) = 1 1 - α c ( 1 - α ) ( c α - 1 ) ( c - 1 ) - 1 c B α , 1 - 1 c .

9.9. Gaussian Distribution

By means of numerical integration we calculated the following values for the standard Gaussian distribution:
C P E S ( X ) = 1.806 , C P E G ( X ) = 1.128 , C P E L ( X ) = 1.596 .
C P E α for α [ 0.5 , 3 ] and the standard Gaussian distribution can be seen in Figure 4.
Figure 4. C P E α , α [ 0.5 , 3 ] for the standard Gaussian and the Student-t distribution.

9.10. Student-t Distribution

By means of numerical integration and for ν = 3 degrees of freedom we calculated the following values for the Student-t distribution
C P E S ( X ) = 2.947 , C P E G ( X ) = 3.308 , C P E L ( X ) = 2.205 .
As can be seen in Figure 4, the heavy tails of the Student-t distribution result in a higher value for C P E α as compared with the Gaussian distribution.

10. Conclusions

A new kind of entropy has been introduced that generalizes Shannon’s differential entropy. The main difference to the previous discussion of entropies is the fact that the new entropy is defined for distribution functions instead of density functions. This paper shows that this definition has a long tradition in several scientific disciplines like fuzzy set theory, reliability theory, and more recently in uncertainty theory. With only one exception within all the disciplines, the concepts had been discussed independently. Along with that, the theory of dispersion measures for ordered categorical variables refers to measures based on distribution functions, without realizing that implicitly some sort of entropies are applied. Using the Cauchy–Schwarz inequality, we were able to show the close relationship between the new kind of entropy named cumulative paired φ-entropy and the standard deviation. More precisely, the standard deviation yields an upper limit for the new entropy. Additionally, the Cauchy–Schwarz inequality can be used to derive maximum entropy distributions provided that there are constraints specifying values of mean and variance. Here, the logistic distribution takes on the same key role for the cumulative paired Shannon entropy which the Gaussian distribution takes by maximizing the differential entropy. As a new result we have demonstrated that Tukey’s λ distribution is a maximum entropy distribution if using the entropy generating function φ which is known from the Harvda and Charvát entropy. Moreover, some new distributions can be derived by considering more general constraints. A change in perspective allows to determine the entropy that will be maximized by a certain distribution if, e.g., mean and variance are known. In this context the Gaussian distribution gives a simple solution. Since cumulative paired φ-entropy and variance are closely related, we have investigated whether the cumulative paired φ-entropy is a proper measure of scale. We show that it satisfies the axioms which were introduced by Oja for measures of scale. Several further properties, concerning the behavior under transformations or the sum of independent random variables, have been proven. Consequently, we have given first insights on how to estimate the new entropy. In addition, based on cumulative paired φ-entropy we have introduced new concepts like φ-divergence, mutual φ-information, and φ-correlation. φ-regression and linear rank tests for scale alternatives were considered as well. Furthermore, formulas have been derived for some popular distributions with cdf or quantile function in closed form and for certain cumulative paired φ-entropies.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive criticism, which helped to improve the presentation of this paper significantly. Furthermore, we would like to thank Michael Grottke for helpful advises.

Author Contributions

Ingo Klein conceived the new entropy concept, investigated its properties and wrote an initial version of the manuscript. Benedikt Mangold cooperated especially by checking, correcting and improving the mathematical details including the proofs. He examined the entropy’s properties by simulation. Monika Doll contributed by mathematical and linguistic revision. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Burbea, J.; Rao, C.R. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 1982, 28, 489–495. [Google Scholar] [CrossRef]
  2. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  3. Oja, H. On location, scale, skewness and kurtosis of univariate distributions. Scand. J. Stat. 1981, 8, 154–168. [Google Scholar]
  4. Ebrahimi, N.; Massoumi, E.; Soofi, E.S. Ordering univariate distributions by entropy and variance. J. Econometr. 1999, 90, 317–336. [Google Scholar] [CrossRef]
  5. Popoviciu, T. Sur les équations algébraique ayant toutes leurs racines réelles. Mathematica 1935, 9, 129–145. (In French) [Google Scholar]
  6. Liu, B. Uncertainty Theory. Available online: http://orsc.edu.cn/liu/ut.pdf (accessed on 27 June 2016).
  7. Wang, F.; Vemuri, B.C.; Rao, M.; Chen, Y. A New & Robust Information Theoretic Measure and Its Application to Image Alignment: Information Processing in Medical Imaging; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2732, pp. 388–400. [Google Scholar]
  8. Di Crescenzo, A.; Longobardi, M. On cumulative entropies and lifetime estimation. In Methods and Models in Artificial and Natural Computation; Mira, J.M., Ed.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 132–141. [Google Scholar]
  9. Di Crescenzo, A.; Longobardi, M. On cumulative entropies. J. Stat. Plan. Inference 2009, 139, 4072–4087. [Google Scholar] [CrossRef]
  10. Kapur, J.N. Derivation of logistic law of population growth from maximum entropy principle. Natl. Acad. Sci. Lett. 1983, 6, 429–433. [Google Scholar]
  11. Hartley, R. Transmission of information. Bell Syst. Tech. J. 1928, 7, 535–563. [Google Scholar] [CrossRef]
  12. De Luca, A.; Termini, S. A definition of a nonprobabilistic entropy in the setting of fuzzy set theory. Inf. Control 1972, 29, 301–312. [Google Scholar] [CrossRef]
  13. Zadeh, L. Probability measures of fuzzy events. J. Math. Anal. Appl. 1968, 23, 421–427. [Google Scholar] [CrossRef]
  14. Pal, N.R.; Bezdek, J.C. Measuring fuzzy uncertainty. IEEE Trans. Fuzzy Syst. 1994, 2, 107–118. [Google Scholar] [CrossRef]
  15. Rényi, A. On measures of entropy and information. In Fourth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Oakland, CA, USA, 1961; pp. 547–561. [Google Scholar]
  16. Esteban, M.D.; Morales, D. A summary on entropy statistics. Kybernetika 1995, 31, 337–346. [Google Scholar]
  17. Cichocki, A.; Amari, S. Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
  18. Arndt, C. Information Measures; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  19. Kesavan, H.K.; Kapur, J.N. The generalizedmaximumentropy principle. IEEE Trans. Syst. Man Cyber. 1989, 19, 1042–1052. [Google Scholar] [CrossRef]
  20. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  21. Jaynes, E.T. Information theory and statistical mechanics. II. Phys. Rev. 1957, 108, 171–190. [Google Scholar] [CrossRef]
  22. Leik, R.K. A measure of ordinal consensus. Pac. Sociol. Rev. 1966, 9, 85–90. [Google Scholar] [CrossRef]
  23. Vogel, H.; Dobbener, R. Ein Streuungsmaß für komparative Merkmale. Jahrbücher für Nationalökonomie und Statistik 1982, 197, 145–157. (In German) [Google Scholar]
  24. Kvålseth, T.O. Nominal versus ordinal variation. Percept. Mot. Skills 1989, 69. [Google Scholar] [CrossRef]
  25. Berry, K.J.; Mielke, P.W. Assessment of variation in ordinal data. Percept. Motor Skills 1992, 74, 63–66. [Google Scholar] [CrossRef]
  26. Berry, K.J.; Mielke, P.W. Indices of ordinal variation. Percept. Motor Skills 1992, 74, 576–578. [Google Scholar] [CrossRef]
  27. Berry, K.J.; Mielke, P.W. A test of significance for the index of ordinal variation. Percept. Motor Skills 1994, 79, 291–1295. [Google Scholar]
  28. Blair, J.; Lacy, M.G. Measures of variation for ordinal data. Percept. Motor Skills 1996, 82, 411–418. [Google Scholar] [CrossRef]
  29. Blair, J.; Lacy, M.G. Statistics of ordinal variation. Sociol. Methods Res. 2000, 28, 251–280. [Google Scholar] [CrossRef]
  30. Gadrich, T.; Bashkansky, E.; Zitikas, R. Assessing variation: A unifying approach for all scales of measurement. Qual. Quant. 2015, 49, 1145–1167. [Google Scholar] [CrossRef]
  31. Allison, R.A.; Foster, J.E. Measuring health inequality using qualitative data. J. Health Econ. 2004, 23, 505–524. [Google Scholar] [CrossRef] [PubMed]
  32. Zheng, B. Measuring inequality with ordinal data: A note. Res. Econ. Inequal. 2008, 16, 177–188. [Google Scholar]
  33. Abul Naga, R.H.; Yalcin, T. Inequality measurement for ordered response health data. J. Health Econ. 2008, 27, 1614–1625. [Google Scholar] [CrossRef] [PubMed]
  34. Zheng, B. A new approach to measure socioeconomic inequality in health. J. Econ. Inequal. 2011, 9, 555–577. [Google Scholar] [CrossRef]
  35. Apouey, B.; Silber, J. Inequality and bi-polarization in socioeconomic status and health: Ordinal approaches. Res. Econ. Inequal. 2013, 21, 77–109. [Google Scholar]
  36. Klein, I. Rangordnungsstatistiken als Verteilungsmaßzahlen für ordinalskalierte Merkmale: I. Streuungsmessung. In Diskussionspapiere des Lehrstuhls für Statistik und; Ökonometrie der Universität: Erlangen-Nürnberg, Germany, 1999; Volume 27. (In German) [Google Scholar]
  37. Yager, R.R. Dissonance—A measure of variability for ordinal random variables. Int. J. Uncertain. Fuzzin. Knowl. Based Syst. 2001, 9, 39–53. [Google Scholar]
  38. Bowden, R.J. Information, measure shifts and distribution metrics. Statistics 2012, 46, 249–262. [Google Scholar] [CrossRef]
  39. Dai, W. Maximum entropy principle for quadratic entropy of uncertain variables. Available online: http://orsc.edu.cn/online/100314.pdf (accessed on 27 June 2016).
  40. Dai, W.; Chen, X. Entropy of function of uncertain variables. Math. Comput. Model. 2012, 55, 754–760. [Google Scholar] [CrossRef]
  41. Chen, X.; Kar, S.; Ralescu, D.A. Cross-entropy measure of uncertain variables. Inf. Sci. 2012, 201, 53–60. [Google Scholar] [CrossRef]
  42. Yao, K.; Gao, J.; Dai, W. Sine entropy for uncertain variables. Int. J. Uncertain. Fuzzin. Knowl. Based Syst. 2013, 21, 743–753. [Google Scholar] [CrossRef]
  43. Yao, K.; Ke, H. Entropy operator for membership function of uncertain set. Appl. Math. Comput. 2014, 242, 898–906. [Google Scholar] [CrossRef]
  44. Ning, Y.; Ke, H.; Fu, Z. Triangular entropy of uncertain variables with application to portfolio selection. Soft Comput. 2015, 19, 2203–2209. [Google Scholar] [CrossRef]
  45. Ebrahimi, N. How to measure uncertainty in the residual lifetime distribution. Sankhya Ser. A 1996, 58, 48–56. [Google Scholar]
  46. Rao, M.; Chen, Y.; Vemuri, B.C.; Wang, F. Cumulative residual entropy: A new measure of information. IEEE Trans. Inf. Theory 2004, 50, 1220–1228. [Google Scholar] [CrossRef]
  47. Rao, M. More on a new concept of entropy and information. J. Theor. Probabil. 2005, 18, 967–981. [Google Scholar] [CrossRef]
  48. Schroeder, M.J. An alternative to entropy in the measurement of information. Entropy 2004, 6, 388–412. [Google Scholar] [CrossRef]
  49. Zografos, K.; Nadarajah, S. Survival exponential entropies. IEEE Trans. Inf. Theory 2005, 51, 1239–1246. [Google Scholar] [CrossRef]
  50. Drissi, N.; Chonavel, T.; Boucher, J.M. Generalized cumulative residual entropy distributions with unrestricted supports. Res. Lett. Signal Process. 2008, 2008. [Google Scholar] [CrossRef]
  51. Chen, X.; Dai, W. Maximum entropy principle for uncertain variables. Int. J. Fuzzy Syst. 2011, 13, 232–236. [Google Scholar]
  52. Sunoj, S.M.; Sankaran, P.G. Quantile based entropy function. Stat. Probabil. Lett. 2012, 82, 1049–1053. [Google Scholar] [CrossRef]
  53. Zardasht, V.; Parsi, S.; Mousazadeh, M. On empirical cumulative residual entropy and a goodness-of-fit test for exponentiality. Stat. Pap. 2015, 56, 677–688. [Google Scholar] [CrossRef]
  54. Navarro, J.; del Aguila, Y.; Asadi, M. Some new results on the cumulative residual entropy. J. Stat. Plan. Inference 2010, 140, 310–322. [Google Scholar] [CrossRef]
  55. Psarrakos, G.; Navarro, J. Generalized cumulative residual entropy and record values. Metrika 2013, 76, 623–640. [Google Scholar] [CrossRef]
  56. Kiesl, H. Ordinale Streuungsmaße; JOSEF-EUL-Verlag: Köln, Germany, 2003. (In German) [Google Scholar]
  57. Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
  58. Jumarie, G. Relative Information: Theories and Applications; Springer: Berlin/Heidelberg, Germany, 1990. [Google Scholar]
  59. Kapur, J.N. Measures of Information and their Applications; New Age International Publishers: New Delhi, India, 1994. [Google Scholar]
  60. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
  61. Kapur, J.N. Generalized Cauchy and Students distributions as maximum entropy distributions. Proc. Natl. Acad. Sci. India 1988, 58, 235–246. [Google Scholar]
  62. Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models: III. Dispersion. Ann. Stat. 1976, 5, 1139–1158. [Google Scholar] [CrossRef]
  63. Behnen, K.; Neuhaus, G. Rank Tests with Estimated Scores and their Applications; Teubner-Verlag: Stuttgart, Germany, 1989. [Google Scholar]
  64. Burger, H.U. Dispersion orderings with applications to nonparametric tests. Stat. Probabil. Lett. 1993, 16. [Google Scholar] [CrossRef]
  65. Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models: IV. Spread. In Contributions to Statistics; Jurečková, J., Ed.; Academic Press: New York, NY, USA, 1979; pp. 33–40. [Google Scholar]
  66. Pfanzagl, J. Asymptotic Expansions for General Statistical Models; Springer: New York, NY, USA, 1985. [Google Scholar]
  67. Beirlant, J.; Dudewicz, E.J.; Györfi, L.; van der Meulen, E.C. Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 1997, 6, 17–39. [Google Scholar]
  68. Büning, H.; Trenkler, G. Nichtparametrische Statistische Methoden; de Gruyter: Berlin, Germany, 1994. [Google Scholar]
  69. Serfling, R.J. Approximation Theorems in Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
  70. Huber, P.J. Robust Statistics; John Wiley & Sons: New York, NY, USA, 1981. [Google Scholar]
  71. Jurečková, J.; Sen, P.K. Robust Statistical Procedures: Asymptotics and Interrelations; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
  72. Parr, W.C.; Schucany, W.R. Jackknifing L-statistics with smooth weight functions. J. Am. Stat. Assoc. 1982, 77, 629–638. [Google Scholar] [CrossRef]
  73. Klein, I.; Mangold, B. Cumulative paired φ-entropies—Estimation and Robustness. Unpublished work. 2016. [Google Scholar]
  74. Klein, I.; Mangold, B. Cumulative paired φ -entropies and two sample linear rank tests for scale alternatives. Unpublished work. 2016. [Google Scholar]
  75. Klein, I.; Mangold, B. φ-correlation and φ-regression. Unpublished work. 2016. [Google Scholar]
  76. Pardo, L. Statistical Inferences based on Divergence Measures; Chapman & Hall: Boca Raton, FL, USA, 2006. [Google Scholar]
  77. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  78. Berk, R.H.; Jones, D.H. Goodness-of-fit statistics that dominate the Kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 1979, 47, 47–59. [Google Scholar] [CrossRef]
  79. Donoho, D.; Jin, J. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 2004, 32, 962–994. [Google Scholar]
  80. Park, S.; Rao, M.; Shin, D.W. On cumulative residual Kullback–Leibler information. Stat. Probabil. Lett. 2012, 82, 2025–2032. [Google Scholar] [CrossRef]
  81. Di Crescenzo, A.; Longobardi, M. Some properties and applications of cumulative Kullback–Leibler information. Appl. Stoch. Models Bus. Ind. 2015, 31, 875–891. [Google Scholar] [CrossRef]
  82. Liese, F.; Vajda, I. Convex Statistical Distances; Teubner-Verlag: Leipzig, Germany, 1987. [Google Scholar]
  83. Csiszár, I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. 1963, 8, 85–108. (In German) [Google Scholar]
  84. Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. B 1966, 28, 131–142. [Google Scholar]
  85. Cressie, N.; Read, T. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B 1984, 46, 440–464. [Google Scholar]
  86. Jager, L.; Wellner, J.A. Goodness-of-fit tests via phi-divergences. Ann. Stat. 2007, 35, 2018–2053. [Google Scholar] [CrossRef]
  87. Parr, W.C.; Schucany, W.R. Minimum distance and robust estimation. J. Am. Stat. Assoc. 1980, 75, 616–624. [Google Scholar] [CrossRef]
  88. Nelsen, R.B. An Introduction to Copulas; Springer: New York, NY, USA, 1999. [Google Scholar]
  89. Hall, P.; Wolff, R.C.; Yao, Q. Methods for estimating a conditional distribution function. J. Am. Stat. Assoc. 1999, 94, 154–163. [Google Scholar] [CrossRef]
  90. Schechtman, E.; Yitzhaki, S. A measure of association based on Gini’s mean difference. Commun. Stat. Theory Methods 1987, 16, 207–231. [Google Scholar] [CrossRef]
  91. Schechtman, E.; Yitzhaki, S. On the proper bounds of the Gini correlation. Econ. Lett. 1999, 63, 133–138. [Google Scholar] [CrossRef]
  92. Yitzhaki, S. Gini’s mean difference: A superior measure of variability for non-normal distributions. Metron 2003, 61, 285–316. [Google Scholar]
  93. Olkin, I.; Yitzhaki, S. Gini regression analysis. Int. Stat. Rev. 1992, 60, 185–196. [Google Scholar] [CrossRef]
  94. Hettmansperger, T.P. Statistical Inference Based on Ranks; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
  95. Jaeckel, L.A. Estimating regression coefficients by minimizing the dispersion of residuals. Ann. Math. Stat. 1972, 43, 1449–1458. [Google Scholar] [CrossRef]
  96. Jurečková, J. Nonparametric estimate of regression coefficients. Ann. Math. Stat. 1971, 42, 1328–1338. [Google Scholar] [CrossRef]
  97. Kloke, J.D.; McKean, J.W. Rfit: Rank-based estimation for linear models. R J. 2012, 4, 57–64. [Google Scholar]
  98. McKean, J.W.; Kloke, J.D. Efficient and adaptive rank-based fits for linear models with skew-normal errors. J. Stat. Distrib. Appl. 2014, 1. [Google Scholar] [CrossRef]
  99. Hettmansperger, T.P.; McKean, J.W. Robust Nonparametric Statistical Methods; Chapman & Hall: New York, NY, USA, 2011. [Google Scholar]
  100. Koul, H.L.; Sievers, G.L.; McKean, J. An estimator of the scale parameter for the rank analysis of linear models under general score functions. Scand. J. Stat. 1987, 14, 131–141. [Google Scholar]
  101. Ansari, A.R.; Bradley, R.A. Rank-sum tests for dispersion. Ann. Math. Stat. 1960, 31, 142–149. [Google Scholar] [CrossRef]
  102. Hájek, J.; Šidák, Z.; Sen, P.K. Theory of Rank Tests; Academic Press: San Diego, CA, USA, 1999. [Google Scholar]
  103. Mood, A.M. On the asymptotic efficiency of certain nonparametric two-sample tests. Ann. Math. Stat. 1954, 25, 514–522. [Google Scholar] [CrossRef]
  104. Klotz, J. Nonparametric tests for scale. Ann. Math. Stat. 1961, 33, 498–512. [Google Scholar] [CrossRef]
  105. Basu, A.P.; Woodworth, G. A note on nonparametric tests for scale. Ann. Math. Stat. 1967, 38, 274–277. [Google Scholar] [CrossRef]
  106. Shiraishi, T.A. The asymptotic power of rank tests under scale-alternatives including contaminated distributions. Ann. Math. Stat. 1986, 38, 513–522. [Google Scholar] [CrossRef]
  107. Sukhatme, B.V. On certain two-sample nonparametric tests for variances. Ann. Math. Stat. 1957, 28, 188–194. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.