Cumulative Paired φ-Entropy

Abstract: A new kind of entropy will be introduced which generalizes both the differential entropy and the cumulative (residual) entropy. The generalization is twofold. First, we simultaneously define the entropy for cumulative distribution functions (cdfs) and survivor functions (sfs), instead of defining it separately for densities, cdfs, or sfs. Secondly, we consider a general “entropy generating function” φ, the same way Burbea et al. (IEEE Trans. Inf. Theory 1982, 28, 489–495) and Liese et al. (Convex Statistical Distances; Teubner-Verlag, 1987) did in the context of φ-divergences. Combining the ideas of φ-entropy and cumulative entropy leads to the new “cumulative paired φ-entropy” (CPEφ). This new entropy has already been discussed in at least four scientific disciplines, be it with certain modifications or simplifications. In the fuzzy set theory, for example, cumulative paired φ-entropies were defined for membership functions, whereas in uncertainty and reliability theories some variations of CPEφ were recently considered as measures of information. With a single exception, the discussions in the scientific disciplines appear to be held independently of each other. We consider CPEφ for continuous cdfs and show that CPEφ is rather a measure of dispersion than a measure of information. In the first place, this will be demonstrated by deriving an upper bound which is determined by the standard deviation and by solving the maximum entropy problem under the restriction of a fixed variance. Next, this paper specifically shows that CPEφ satisfies the axioms of a dispersion measure. The corresponding dispersion functional can easily be estimated by an L-estimator, containing all its known asymptotic properties. CPEφ is the basis for several related concepts like mutual φ-information, φ-correlation, and φ-regression, which generalize Gini correlation and Gini regression. In addition, linear rank tests for scale that are based on the new entropy have been developed. We show that almost all known linear rank tests are special cases, and we introduce certain new tests. Moreover, formulas for different distributions and entropy calculations are presented for CPEφ if the cdf is available in a closed form.


Introduction
The φ-entropy where f is a probability density function and φ is a strictly concave function, was introduced by [1].
If we set φ(u) = −u ln u, u ∈ [0, 1], we get Shannon's differential entropy as the most prominent special case.Shannon et al. [2] derived the "entropy power fraction" and showed that there is a close relationship between Shannon entropy and variance.In [3], it was demonstrated that Shannon's differential entropy satisfies an ordering of scale and thus is a proper measure of scale (MOS).Recently, the discussion in [4] has shown that entropies can be interpreted as a measure of dispersion.In the discrete case, minimal Shannon entropy means maximal certainty about the random outcome of an experiment.A degenerate distribution minimizes the Shannon entropy as well as the variance of a discrete quantitative random variable.For such a degenerate distribution, Shannon entropy and variance both take the value 0. However, there is an important difference between the differential entropy and the variance when discussing a discrete quantitative random variable with support [a, b].The differential entropy is maximized by a uniform distribution over [a, b], while the variance is maximal if both interval bounds a and b have the probability mass of 0.5 (cf.[5]).A similar result holds for a discrete random variable with a finite number of realizations.Therefore, it is doubtful that Equation ( 1) is a true measure of dispersion.
We propose to define the φ-entropy for cumulative distribution functions (cdfs) F and survivor functions (sf) 1 − F instead of for density functions f .Throughout the paper, we define F := 1 − F. By applying this modification we get where cdf F is absolutely continuous, CPE means "cumulative paired entropy", and φ is the "entropy generating function" defined on [0, 1] with φ(0) = φ(1) = 0. We will assume that φ is concave on [0, 1] throughout most of this paper.In particular, we will show that Equation (2) satisfies a popular ordering of scale and attains its maximum if the domain is an interval [a, b], while a, b occur with a probability of 1/2.This means that Equation (2) behaves like a proper measure of dispersion.
In addition, we generalize results from the literature, focusing on the Shannon case with φ(u) = −u ln u, u ∈ [0, 1] (cf.[6]), the cumulative residual entropy (cf. [7]), and the cumulative entropy (cf. [8,9]).In the literature, this entropy is interpreted as a measure of information rather than dispersion without any clarification on what kind of information is considered.
A first general aim of this paper is to show that entropies can rather be interpreted as measures of dispersion than as measures of information.A second general aim is to demonstrate that the entropy generating function φ, the weight function J in L-estimation, the dispersion function d which serves as a criterion for minimization in robust rank regression, and the scores-generating function ϕ 1 are closely related.Specific aims of this paper are: The paper is structured in the same order as these aims.After this introduction, in the second section we give a short review of the literature that is concerned with Equation (2) or related measures.The third section begins by summarizing reasons for defining entropies for cdfs and sfs instead of defining them for densities.Next, some equivalent characterizations of Equation ( 2) are given, provided the derivative of φ exists.In the fourth section, we use the Cauchy-Schwarz inequality to derive an upper bound for Equation (2), which provides sufficient conditions for the existence of CPE.In addition, more stringent conditions for the existence are directly proven.In the fifth section, the Cauchy-Schwarz inequality allows to derive ME distributions if the variance is fixed.For more complicated restrictions we attain ME distributions by solving the Euler-Lagrange conditions.Following the generalized ME principle (cf.[10]), we change the perspective and ask which entropy is maximized if the variance and the population's distribution is fixed.The sixth section is of key importance because the properties of Equation ( 2) as a measure of dispersion is analyzed in detail.We show that Equation (2) satisfies an often applied ordering of scale by [3], is invariant with respect to translations and equivariant with respect to scale transformations.Additionally, we provide certain results concerning the sum of independent random variables.In the seventh section, we propose an L-estimator for CPE φ .Some basic properties of this estimator like influence function, consistency, and asymptotic normality are shown.In the eighth section, we introduce several new statistical concepts based on CPE φ , which are generalizing divergence, mutual information, Gini correlation, and Gini regression.Additionally, we show that new linear rank tests for dispersion can be based on CPE φ .The known linear rank tests like the Mood-or the Ansari-Bradley tests are special cases of this general approach.However, in this paper we exclude most of the technical details for they will be presented in several accompanying papers.In the last section we compute Equation (2) for certain generating functions φ and some selected families of distributions.

State of the Art-An Overview
Entropies are usually defined on the simplex of probability vectors, which are summing up to one (cf.[2,11]).Until now it has been rather usual to calculate the Shannon entropy not for vectors of probability or probability density functions f , but for distribution functions F. The corresponding Shannon entropy is given by CPE S (F) = − R F(x) ln F(x) + F(x) ln F(x)dx. ( Nevertheless, we have identified five scientific disciplines directly or implicitly working with an entropy based on distribution functions or survivor functions: 1. Fuzzy set theory, 2. Generalized ME principle, 3. Theory of dispersion of ordered categorial variables, 4. Uncertainty theory, 5. Reliability theory.

Fuzzy Set Theory
To the best of our knowledge, Equation (5) was initially introduced by [12].However, they did not consider the entropy for a cdf F. Instead, they were concerned with a so-called membership function µ A that quantifies the degree to which a certain element x of a set Ω belongs to a subset A ⊆ Ω. Membership functions were introduced by [13] within the framework of the "fuzzy set theory".
It is important to note that if all elements of Ω are mapped to the value 1/2, maximum uncertainty about x belonging to a set A will be attained.
This main property is one of the axioms of membership functions.In the aftermath of [12] numerous modifications to the term "entropy" have been made and axiomatizations of the membership functions have been stated (see, e.g., the overview in [14]).
Finally, those modifications proceeded parallel to a long history of extensions and parametrizations of the term entropy for probability vectors and densities.It began with [15] up to [16,17], who provided a superstructure of those generalizations consisting of a very general form of the entropy, including the φ-entropy Equation (1) as a special case.Burbea et al. [1] introduced the term φ-entropy.If both φ(x) and φ(1 − x) appeared in the entropy, as in the Fermi-Dirac entropy (cf.[18], p. 191), they used the term "paired" φ-entropy.

Generalized Maximum Entropy Principle
Regardless of the debate in the fuzzy set theory and the theory of measurement of dispersion, Kapur [10] showed that a growth model with a logistic growth rate is yielded as the solution of maximizing Equation (5) under two simple constraints.This provides an example for the "generalized maximum entropy principle" postulated by Kesavan et al. [19].In addition to that, the simple ME principle introduced by [20,21] derives a distribution which maximizes an entropy given certain constraints.Furthermore, the generalization of [19] consists of determining the φ-entropy, which is maximized given a distribution and some constraints.Finally, they used a slightly modified formula Equation (5).The cdf had to be replaced by a monotonically increasing function with logistic shape.

Theory of Dispersion
Irrespectively of the discussion on membership functions in the fuzzy set theory and the proposals of generalizing the Shannon entropy, Leik [22] discussed a measure of dispersion for ordered categorial variables with a finite number k of categories x 1 < x 2 < . . .< x k .His measure is based on the distance between the k − 1-dimensional vectors of cumulated frequencies (F 1 , F 2 , . . ., F k−1 ) and (1/2, 1/2, . . ., 1/2).Both vectors only coincide if the extreme categories x 1 and x k appear with same frequency.This represents the case of maximal dispersion.Consider as discrete version of Equation (2).Setting φ(u) = min{u, 1 − u}, we get the measure of Leik as a special case of Equation ( 6) up to a change of sign.Vogel et al. [23] considered φ(u) = −uln(u) and the Shannon variation of Equation ( 6) as measure of dispersion for ordered categorial variables.Numerous modifications of Leik's measure of dispersion have been published.In [24][25][26][27][28][29], the authors implicitly used φ(u) = 1/4 − (u − 1/2) 2 or equivalently φ(u) = u(1 − u).Most of the discussion was conducted in the journal "Perceptual and Motor Skills".For a recent overview of measuring dispersion including ordered categorial variables see, e.g., [30].Instead of dispersion, some articles are concerned with related concepts for ordered categorial variables, like bipolarization and inequality (cf.[31][32][33][34][35]).A class of measures of dispersion for ordered categorial variables with a finite number of categories that is similar to Equation ( 6) has been introduced by Klein and Yager [36,37] independently of each other.They had obviously not been aware of the discussion in "Perceptual and Motor Skills".Both authors gave axiomatizations to describe which functions φ will be appropriate for measuring dispersion.However, at least Yager [37] recognized the close relationship between those measures and the general term "entropy" in the fuzzy set theory.He introduced the term "dissonance" to more precisely characterize measures of dispersion for ordered categorial variables.In the language of information theory, maximum dissonance describes an extreme case in which there is still some information.But this information is extremely contradictory.As an example, we could ask in the field of product evaluation to what degree information, which states that 50 percent of the recommendations are extremely good and at the same time 50 percent are extremely bad, is useful to make a purchase decision.This is an important difference to the Shannon entropy, which is maximal if there is no information at all, i.e., all categories occur with same probability.
Bowden [38] defines the location entropy function h(x) = −F(x) ln F(x) + F(x) ln F(x), given a value of x.He emphasizes the possibility to construct measures of spread and symmetry based on this function.To the best of our knowledge, Bowden [38] is the only one to mention the application of cumulated paired Shannon entropy to continuous distributions so far.

Uncertainty Theory
Reference ( [6] (first edition 2004) can be considered the founder of the uncertainty theory.This theory is concerned with formalizing data consisting of expert opinions rather than formalizing data gathered by repeating a random experiment.Liu slightly modified the Kolmogoroff axioms of probability theory to receive an uncertainty measure, following which he defined uncertain variables, uncertainty distribution functions, and moments of uncertain variables.Liu argued that "an event is the most uncertain if its uncertainty measure is 0.5, because the event and its complement may be regarded as 'equally likely' " ( [6], p. 14).Liu's maximum uncertainty principle states: "For any event, if there are multiple reasonable values that an uncertain measure may take, the value as close to 0.5 as possible is assigned to the event" [6] (p. 14).Similar to the fuzzy set theory, the distance between the uncertainty distribution and the value 0.5 can be measured by the Shannon-type entropy Equation ( 5).Apparently for the first time in the third edition of 2010, he explicitly calculated Equation ( 5) for several distributions (e.g., the logistic distribution) and derived upper bounds.He applied the ME principle to uncertainty distributions.The preferred constraint is to predetermine values of mean and variance ( [6], p. 83ff.).In this case, the logistic distribution maximizes Equation (5).In this context, the logistic distribution plays the same role in uncertainty theory as the Gaussian distribution in probability theory.The Gaussian distribution maximizes the differential entropy, given values for mean and variance.Therefore, in uncertainty theory the logistic distribution is called "normal The authors of distribution".[39] provided Equation ( 5) as a function of the quantile function.In addition to that, the authors of [40] chose φ(u) = u(1 − u), u ∈ [0, 1], as entropy generating function and derived the ME distribution as a discrete uniform distribution, which is concentrated on the endpoints of the compact domain [a, b] if no further restrictions are assumed.Popoviciu [5] attained the same distribution by maximizing the variance.Chen et al. [41] introduced cross entropies and divergence measures based on general functions φ.Further literature on this topic is provided by [42][43][44].

Reliability Theory
Entropies also play a prominent role in reliability theory.They were initially introduced in the fields of hazard rates and residual lifetime distributions (cf.[45]).In addition, the authors of [46,47] introduced the cumulative residual entropy Equation (3), discussed its properties, and derived the exponential and the Weibull distribution by an ME principle, given the coefficient of variation.This work went into detail on the advantage of defining entropy via survivor functions instead of probability density functions.Rao et al. [46] refer to the extensive criticism on the differential entropy by [48].Moreover, Zografos et al. [49] generalized the Shannon-type cumulative residual entropy to an entropy of the Rényi type.Furthermore, Drissi et al. [50] considered random variables with general support.They also presented solutions for the maximization of Equation ( 3), provided that more general restrictions are considered.Similar to [51], they identified the logistic distribution to be the ME distribution, given mean, variance, and a symmetric form of the distribution function.
Di Crescenzo et al. [9] analyzed Equation (4) for cdfs and discussed its stochastic properties.Sunoj et al. [52] plugged the quantile function into the Shannon-type entropy Equation (4) and presented expressions if the quantile function possesses a closed form, but not the cdf.In recent papers an empirical version of Equation ( 3) is used as goodness-of-fit test (cf.[53]).
Additionally, CRE and CE are applied to the distribution function of the residual lifetime (X − t|X > t) and the inactivity time (t − X|X < t) (cf.[54]).This can directly be generalized to the CPE framework.Moreover, Psarrakos et al. [55] provides an interesting alternative generalization of the Shannon case.In this paper we focus on the class of concave functions φ.Special extensions to non-concave functions will be subject to future research.
This brief overview shows that different disciplines are accessing an entropy based on distribution functions.The contributions of the fuzzy set theory, the uncertainty theory, and the reliability theory all have the exclusive consideration of continuous random variables in common.The discussions about entropy in reliability theory on the one hand and fuzzy set theory and uncertainty theory, respectively, on the other hand were conducted independently of each other without even noticing the results of the other disciplines.However, Liu's uncertainty theory benefits from the discussion in the fuzzy set theory.In the theory of dispersion of ordered categorial variables the authors do not appear to be aware of their implicit use a concept of entropy.Nevertheless the situation is somewhat different to that of the other areas since only discrete variables were discussed.Kiesl's dissertation [56] provides a theory of measures of the form Equation ( 6) with numerous applications.However, an intensive discussion of Equation ( 2) is missing and will be provided here.

Definition
We focus on absolute continuous cdfs F with density functions f .The set of all those distribution functions is called F .We call a function "entropy generating function" if it is non-negative and concave on the domain [0, 1] with φ(0) = φ(1) = 0.In this case, φ(u) + φ(1 − u) is a symmetric function with respect to 1/2.

Definition 1. The functional CPE
is called cumulative paired φ-entropy for F ∈ F with entropy generating function φ.
Up to now, we assumed the existence of CPE φ .In the following section we will discuss some sufficient criteria ensuring the existence of CPE φ .If X is a random variable with cdf F, we occasionally use the notation CPE φ (X) instead.
Next, some examples of well established concave entropy generating functions φ and corresponding cumulative paired φ-entropies will be given.

Cumulative paired Shannon entropy CPE
gives the entropy which was already mentioned in the introduction.It is a special case of CPE α for α → 1. 4. Cumulative paired Leik entropy CPE L : The function represents the limiting case of a linear concave function φ.The measure of dispersion proposed by [22] implicitly makes use of φ such that we call cumulative paired Leik entropy.
Figure 1 gives an impression of the previously mentioned generating functions φ .

Advantages of Entropies Based on Cdfs
The authos of [46,47] list several reasons for better defining an entropy for distribution functions rather than defining it for density functions.Starting point is the well-known critique of Shannon's differential entropy − f (x) ln f (x)dx that was expressed by several authors like [48,58] and (p.58f) in [59].
Transferred to cumulative paired entropies, the advantages of entropies based on distribution functions (cf.[46]) are as follows: 1. CPE φ is based on probabilities and has a consistent definition for both discrete and continuous random variables.2. CPE φ is always non-negative.3. CPE φ can easily be estimated by the empirical distribution function.This estimation is strongly consistent, due to the strong consistency of the empirical distribution function.
Problems of the differential entropy are occasionally discussed in case of grouped data, at which the usual Shannon entropy is calculated for each group probability.With an increasing amount of groups, the Shannon entropy not only does not converge to the respective differential entropy, but it even diverges (cf., e.g., (p.54) in ( [59], (p.239) in [60]).In the next section we will show that the discrete version of CPE φ converges to CPE φ as the number of groups approaches infinity.

CPE φ for Grouped Data
First, we show the notation for characterizing grouped data.The interval [ x0 , xk ] is divided into k subintervals with limits x0 < x1 < ... < xk−1 < xk .The range of each group is called ∆x i = xi − xi−1 for i = 1, 2, ..., k.Let X be a random variable with absolute continuous distribution function F, which is only known at the limits of each group.The probabilities of each group are denoted by p i = F( xi ) − F( xi−1 ), i = 1, 2, ..., k.X * is the random variable whose distribution function F * is yielded by linear interpolation of the values of F at the limits of successive groups.Finally, X * is the result of adding an independent, uniformly distributed random variable to X.It holds that Let X * denote the respective random variable of F * .The probability density function Lemma 1.Let φ be an entropy generating function with antiderivative S φ .The paired cumulative φ-entropy of the distribution function in Equation ( 12) is given as follows: Proof.For x ∈ ( xi−1 , x i ], we have Considering this result, we can easily prove the convergence property for CPE φ (X * ): Theorem 1.Let φ be a generating function with antiderivative S φ and let F be a continuous distribution function of the random variable X with support [a, b].X * is the corresponding random variable for grouped data with ∆x = (b − a)/k, k > 0. Then the following holds: Proof.Consider equidistant classes with ∆x i = ∆x = (b − a)/k, i = 1, 2, ..., k.Subsequently, Equation ( 13) results in With k → ∞ we have ∆x → 0 such that for F continuous we get F( xi ) − F( xi−1 ) → 0. The antiderivative S φ has the derivative φ almost everywhere such that with k → An analogue argument holds for the second term of Equation (14).
In addition to this theoretical result we get the following useful expressions for CPE φ for grouped data and a specific choice of φ as Corollary 1 shows: Proof.Using the antiderivatives The results follow immediately.

Alternative Representations of CPE φ
In case φ(0) = φ(1) = 0 holds and φ is differentiable, one can provide several alternative representations of CPE φ in addition to Eqaution (7).These alternative representations will be useful in the following to find conditions ensuring the existence of CPE φ and to find some simple estimators.Proposition 1.Let φ be a non-negative and differentiable function on the domain [0, 1] with derivative φ and φ(0) = φ(1) = 0.In this case, for F ∈ F with quantile function F −1 (u), density function f , and quantile density function q(u) = 1/ f (F −1 (u)), for u ∈ [0, 1], the following holds: Proof.Apply probability integral transform U = F(X) and partial integration.
This property supports the understanding of CPE φ being a covariance for which the Cauchy-Schwarz inequality gives an upper bound: Corollary 2. Let φ be a non-negative and differentiable function on the domain [0, 1] with derivative φ and φ(0 Depending on the context, we switch between these alternative representations of CPE φ .

Deriving an Upper Bound for CPE φ
The Cauchy-Schwarz inequality for Equations ( 18) and (19), respectively, provides an upper bound for CPE φ if the variance holds.The existence of the upper bound simultaneously ensures the existence of CPE φ .
Proposition 2. Let φ be a non-negative and differentiable function on the domain [0, 1] with derivative φ and φ(0 20) holds, then for X ∼ F with Var(X) < ∞ and quantile function F −1 , we have Proof.The statement follows from Next, we consider the upper bound for the cumulative paired α-entropy: Corollary 3. Let X be a random variable having a finite variance.Then for α = 1.
In the framework of uncertainty theory, the upper bound for the paired cumulative Shannon entropy was derived by [51] (see also [6], p. 83).For α = 2 we get the upper bound for the paired cumulative Gini entropy This result has already been proven for non-negative uncertainty variables by [40].Finally, one yields the following upper bound for the paired cumulative Leik entropy: Corollary 4. Let X be a random variable with existing variance.Then to get the result.

Stricter Conditions for the Existence of CPE α
So far, we only considered sufficient conditions for an existing variance.Following the arguments in [46,50], which were used for the special case of cumulative residual and residual Shannon entropy, one can derive stricter sufficient conditions for the existence of CPE α .

Proof.
To prepare the proof we first note that The second fact required for the proof is that Third, it holds that because CPE α consists of four indefinite integrals: It must be shown separately that these integrals converge.The convergence of the first two terms follows directly from the existence of E(X).With Equations ( 27) and ( 28) we have for α > 0 For the third term we have to demonstrate that 0 For p > 0 the transformation g(y) = y p is monotonically increasing for y > 1.Using the Markov inequality we get Putting these results together, we attain 0 for β > 1/p (and thus for pβ > 1) and due to ∞ 1 1/y q dy < ∞ for q > 1.It remains to show the convergence of the fourth term: Now, the Markov inequality gives In summary, we have 0 This completes the proof.
Following Theorem 2, depending on the number of existing moments, specific conditions for α arise in order to ensure the existence of CPE α : 1.If the variance of X exists (i.e., p = 2), CPE α (X)

Maximum CPE φ Distributions for Given Mean and Variance
Equality in the Cauchy-Schwarz inequality gives a condition under which the upper bound is attained.This is the case if an affine linear relation between F −1 (U) respectively X and φ (1 − U) − φ (U) respectively φ (F(x)) − φ (F(X)) exists with probability 1.Since the quantile function is monotonically increasing, such an affine linear function can only exist if φ (1 − u) − φ (u) is monotonic as well (de-or increasing).This implies that φ needs to be a concave function on [0, 1].In order to derive a maximum CPE φ distribution under the restriction that mean and variance are given, one may only consider concave generating functions φ.
We summarize this obvious but important result in the following Theorem: Theorem 3. Let φ be a non-negative and differentiable function on the domain [0, 1] with derivative φ and φ(0) = φ(1) = 0. Then F is the maximum CPE φ distribution with prespecified mean µ and variance Proof.The upper bound of the Cauchy-Schwarz inequality will be attained if there are constants a, b ∈ R such that the first restriction equals The property φ(0 This means that there is a constant b ∈ R with The second restriction postulates that Therefore, φ (1 − u) − φ (u) is monotonically increasing.The quantile function is also monotonically increasing such that b has to be positive.This gives .
The quantile function of the Tukey's λ distribution is given by Its mean and variance are The domain is given by [−1/λ, 1/λ] for λ > 0. By discussing the paired cumulative α-entropy, one can prove the new result that the Tukey's λ distribution is the maximum CPE α distribution for prespecified mean and variance.Tukey's λ distribution takes on the role of the Student-t distribution if one changes from the differential entropy to CPE α (cf.[61]).
Corollary 5.The cdf F maximizes CPE α for α > 1/2 under the restrictions of a given mean µ and given variance σ 2 iff F is the cdf of the Tukey λ distribution with λ = α − 1.
, and the maximum CPE α distribution results in F −1 can easily be identified as the quantile function of a Tukey's λ distribution with λ = α − 1 and α > 1/2.
For the Gini case (α = 2), one obtains the quantile function of a uniform distribution . This maximum CPE G distribution corresponds essentially to the distribution derived by Dai et al. [40].
The fact that the logistic distribution is the maximum CPE S distribution, provided mean and variance are given, was derived by Chen et al. [51] in the framework of uncertainty theory and by ( [50], p. 4) in the framework of reliability theory.Both proved this result using Euler-Lagrange equations.In the interest of completeness, we provide an alternative proof via the upper bound of the Cauchy-Schwarz inequality.Corollary 6.The cdf F maximizes CPE S under the restrictions of a known mean µ and a known variance σ 2 iff F is the cdf of a logistic distribution.
one receives Inverting gives the distribution function of the logistic distribution with mean µ and variance 1: As a last example we consider the cumulative paired Leik entropy CPE L .
Corollary 7. The cdf F maximizes CPE L under restrictions of a known mean µ and a known variance σ 2 iff for F holds Therefore, the maximization of CPE L with given mean and variance leads to a distribution whose variance is maximal on the interval [µ − σ, µ + σ].

Maximum CPE φ Distributions for General Moment Restrictions
Drissi et al. [50] discuss general moment restrictions of the form for which the existence of the moments is assumed.By using Euler-Lagrange equations they show that maximizes the residual cumulative entropy − R F(x) ln F(x)dx under constraints Equation (31).Moreover, they demonstrated that the solution needs to be symmetric with respect to µ.Here, λ i , i = 1, 2, ..., k, are the Lagrange parameters which are determined by the moment restrictions, provided a solution exists.Rao et al. [47] shows that for distributions with support R + the ME distribution is given by if the restrictions Equation ( 31) are again required.One can easily examine the shape of a distribution which maximizes the cumulative paired φ-entropy under the constraints Equation ( 31).This maximum CPE φ distribution can no longer be derived by the upper bound of the Cauchy-Schwarz inequality if i > 2. One has to solve the Euler-Lagrange equations for the objective function with Lagrange parameters λ i , i = 1, 2, . . ., k.The Euler-Lagrange equations lead to the optimization problem for i = 1, 2, ..., k.Once again there is a close relation between the derivative of the generating function and the quantile function, provided a solution of the optimization problem Equation ( 32) exists.
The following example shows that the optimization problem Equation (32) leads to a well-known distribution if constraints are chosen carefully in case of a Shannon-type entropy.
Example 1.The power logistic distribution is defined by the distribution function for γ > 0. The corresponding quantile function is This quantile function is also solution of Equation (33) given ).
Setting γ = 1 leads to the familiar result for the upper bound of CPE S given the variance.

Generalized Principle of Maximum Entropy
Kesavan et al. [19] introduced the generalized principle of an ME problem which describes the interplay of entropy, constraints, and distributions.A variation of this principle is the aim of finding an entropy that is maximized by a given distribution and some moment restrictions.
This problem can easily be solved for CPE φ if mean and variance are given, due to the linear relationship between φ (1 − u) − φ (u) and the quantile function F −1 (u) of the maximum CPE φ distribution provided by the Cauchy-Schwarz inequality.However, it is a precondition for F −1 (u) that φ (1 − u) − φ (u) is strictly monotonic on [0, 1] in order to be a quantile function.Therefore, the concavity of φ(u) and the condition φ(0) = φ(1) = 0 are of key importance.
We demonstrate the solution to the generalized principle of the maximum entropy problem for the Gaussian and the Student-t distribution.Proposition 3. Let ϕ, Φ and Φ −1 be the density, the cdf and the quantile function of a standard Gaussian random variable.The Gaussian distribution is the maximum CPE φ distribution for a given mean µ and variance σ 2 for CPE φ with entropy generating function the condition for the maximum CPE φ distribution with mean µ and variance σ 2 becomes By substituting 1 0 (2Φ −1 (u)) 2 du = 4, it follows that such that F −1 is the quantile function of a Gaussian distribution with mean µ and variance σ 2 .
An analogue result holds for the Student-t distribution with k degrees of freedom.In this case, the main difference to the Gaussian distribution is the fact that the entropy generating function possesses no closed form but is obtained by numerical integration of the quantile function.

Corollary 8. Let t k respectively t −1
k be the cdf respectively the quantile function of a Student-t distribution with k degrees of freedom for k > 2. µ + k k−2 t −1 k is the maximum CPE φ quantile function for a given mean µ and variance σ 2 iff and the symmetry of the t k distribution around µ, we get the condition we get the quantile function of the t distribution with k degrees of freedom and mean µ: Figure 2 shows the shape of the entropy generating function φ for several distributions generated by the generalized ME principle.

Basic Properties of CPE φ
The cumulative residual entropy (CRE) introduced by [46], the generalized cumulative residual entropy (GCRE) of [50], and the cumulative entropy (CE) discussed by [8,9], have always been interpreted as measures of information.However, all these approaches do not explain which kind of information was considered.In contrast to this interpretation as measures of information, Oja [3] proved that the differential entropy satisfies a special ordering of scale and has certain meaningful properties of measures of scale.In [4], the authors discussed the close relationship between differential entropy and variance.In the discrete case the Shannon entropy can be interpreted as a measure of diversity, which is a concept of dispersion if there is no ordering and no distance between the realizations of a random variable.In this section, we will clarifying the important role which the variance plays for the existence of CPE φ .
Therefore, we intend to provide a deeper insight in CPE φ as a proper MOS.We start by showing that CPE φ has typical properties of an MOS.In detail, a proper MOS should always be non-negative and attain its minimal value 0 for a degenerated distribution.If a finite interval [a, b] is considered as support, an MOS should attain its maximum if a and b occur with probability 1/2.CPE φ possesses all these properties as shown in the next proposition.Proposition 4. Let φ : [0, 1] → R with φ(u) > 0 for u ∈ (0, 1) and φ(0) = φ(1) = 0. Let X be a random variable with support D and CPE φ is assumed to exist.Then the following properties hold: 1. CPE φ (X) ≥ 0. 2. CPE φ (X) = 0 iff there exists an x * with P(X = x * ) = 1. 3. CPE φ (X) attains its maximum iff there exist a, b with −∞ < a < b < ∞ such that

Proof.
1. Follows from the non-negativity of φ. 2. If there is an x * ∈ R with P(X = x * ) = 1, then F X (x) = 0 and F X (x) ∈ {0, 1} for all x ∈ R. Due to In order to attain the assumed finite maximum, the support D has to be a finite interval that attains this maximum.Set Oja [3] discussed several orderings of scale.He showed in particular that Shannon entropy and variance satisfy a partial quantile based ordering of scale, which has been discussed by [62].Referring to [63,64] criticized that this ordering and the location-scale family of distributions focused by Oja [3] were too restrictive.He discussed a more general nonparametric model of dispersion based on a more general ordering of scale (cf.[65,66]).In line with [4], we focus on the scale ordering proposed by [62].Definition 3. Let F 1 , F 2 be continuous cdfs with respective quantile functions F −1 1 and F −1 2 .F 2 is said to be more spread out than F If F 1 , F 2 are absolutely continuous with density functions f 1 , f 2 , 1 can be characterized equivalently by the property that F −1 (cf. [3], p. 160).
Next, we show that CPE φ is an MOS in the sense of [3].This following lemma examines the behavior of CPE φ with respect to affine linear transformations, referring to the first axiom of Definition 2: Lemma 2. Let F be the cdf of the random variable X.Then In order to satisfy the second axiom of Oja's definition of a measure of scale, CPE φ has to satisfy the ordering of scale .This is shown by the following lemma: Lemma 3. Let F 1 and F 2 be continuous cdfs of the random variables X 1 and X 2 with F 1 1 F 2 .Then the following holds: CPE φ (X 1 ) ≤ CPE φ (X 2 ).

Proof. One can show with
As a consequence of Lemma 2 and Lemma 3, CPE φ is an MOS in the sense of [3].Thus, not only variance, differential entropy, and other statistical measures have the properties of measures of scale, but also CPE φ .

CPE φ and Transformations
Ebrahimi et al. ([4] p. 323), the authors considered cdf F 1 , F 2 on domain D 1 , D 2 and density functions f 1 , f 2 , which are connected via ) respectively f 2 (y) = f 1 g −1 (y) dg −1 (y)/dy for y ∈ D 1 .Thus, they demonstrated for Shannon's differential entropy H that the transformation only affects the difference: For CPE φ , one gets a less explicit relationship between CPE φ (F 2 ) and CPE φ (F 1 ): Transformations with |g (y)| ≥ 1, y ∈ D 2 , are of special interest since these transformations do not diminish measures of scale.In Theorem 1, Ebrahimi et al. [4] showed that F 1 1 F 2 holds if |g (y)| ≥ 1 for y ∈ D 2 .Hence, no MOS can be diminished by this specific transformation, especially neither Shannon entropy nor CPE φ .
Ebrahimi et al. [4] considered the special transformation g(x) = ax + b, x ∈ D 1 .They showed that Shannon's differential entropy is moved additively by this transformation, which is not expected for an MOS.Furthermore, the standard deviation is changed by the factor |a|, which is also true for CPE φ as shown in Lemma 2.

CPE φ for Sums of Independent Random Variables
As is generally known, variance and differential entropy behave additively for the sum of independent random variables X and Y.More general entropies such as the Rényi or the Havrda & Charvát entropy are only subadditive (cf.[18], p. 194).
Neither the property of additivity nor the property of subadditivity could be shown for cumulative paired φ-entropies.Instead, they possess the maximum property if φ is a concave function on [0, 1].This means that, for two independent variables X and Y, CPE φ (X + Y) is lower-bounded by the maximum of the two individual entropies CPE φ (X) and CPE φ (Y).This result was shown by [46] for the cumulative residual Shannon entropy.The following Theorem generalizes this result, while the proof partially follows Theorem 2 of [46].Theorem 4. Let X and Y be independent random variables and φ a concave function on the interval [0, 1] with φ(0) = φ(1) = 0. Then we have Proof.Let X and Y be independent random variables with distribution functions F X , F Y and densities f X , f Y .Using the convolution formula, we immediately get Applying Jensen's inequality for a concave function φ to Equation (37) results in and The existence of the expectation is assumed.To prove the Theorem, we begin with By using Equations ( 38) and (39), setting z = t − y, and exchanging the order of integration, one yields In the context of uncertainty theory, Liu [6] considered a different definition of independence for uncertain variables leading to the simpler additivity property for independent uncertain variables X and Y.

Estimation of CPE φ
Beirlant et al. [67] presented an overview of differential entropy estimators.Essentially, all proposals are based on the estimation of a density function f inheriting all typical problems of nonparametric estimation of a density function.Among others, the problems are biasedness, choice of a kernel, and optimal choice of the smoothing parameter (cf.[68], p. 215ff.).However, CPE φ is based on cdf F for which several natural estimators with desirable stochastic properties, derived from the Theorem of Glivenko and Cantelli (cf.[69], p. 61), exist.For a simple random sample (X 1 , ..., X n ), independently distributed random variables with identical distribution function F, the authors of [8,9] estimated F using the empirical distribution function F n (x) = 1 n I(X i ≤ x) for x ∈ R.Moreover, they showed for the cumulative entropy CE(F) = − R F(x) ln F(x)dx that the estimator CE(F n ) is consistent for CE(F) (cf.[8]).In particular, for F being the distribution function of a uniform distribution, they provided the expected value of the estimator and demonstrated that the estimator is asymptotically normal.For F being the cdf of an exponential distribution, they additionally derived the variance of the estimator.
In the following, we generalize the estimation approach of [8] by embedding it into the well-established theory of L-estimators (cf.[70], p. 55ff.).If φ is differentiable, then CPE φ can be represented as the covariance between the random variable X and φ (F(X)) − φ (F(X)): An unbiased estimator for this covariance is where This results in an L-estimator ∑ n i=1 J(i/(n + 1))X n:i with J(u) = φ (1 − u) − φ (u), u ∈ (0, 1).By applying known results for the influence functions of L-estimators (cf.[70]), we get for the influence function of CPE φ : In particular, the derivative is This means that the influence function will be completely determined by the antiderivative of φ (F(x)).
The following examples demonstrate that the influence function of CPE φ can easily be calculated if the underlying distribution F is logistic.We consider the Shannon, the Gini, and the α-entropy cases.
Example 2. Beginning with the derivative The influence function is not bounded and proportional to the influence function of the variance, which implies that variance and CPE S have a similar asymptotic and robustness behavior.The integration constant C has to be determined such that E [IF(x; CPE S , F)] = 0 : Example 3. Using the Gini entropy CPE G and the logistic distribution function F we have Integration gives the influence function By applying numerical integration we get C = −1.2741.
Integration leads to the influence function where Under certain conditions (cf.[71], p. 143) concerning J, or φ and F, Lestimators are consistent and asymptotically normal.So, the cumulative paired φ-entropy is with asymptotic variance The following examples consider the Shannon and the Gini case for which the condition that is sufficient to guarantee asymptotic normality can easily be checked.We consider again the cdf F of the logistic distribution.

Example 5. For the cumulative paired Shannon entropy it holds that
Example 6.In the Gini case we get It is known that L-estimators have a remarkable small-sample bias.Following [72], the bias can be reduced by applying the Jackknife method.It is well-known that asymptotical distributions can be used to construct approximate confidence intervals as well as that they can be applied for hypothesis tests in the one-or two-sample case.( [70], p. 116ff.)discussed asymptotic efficient L-estimators for a parameter of scale θ.Klein et al. [73] examine how the entropy generating function φ will be determined by the requirement that CPE φ (F n ) has to be asymptotically efficient.

Related Concepts
Several statistical concepts are closely related to cumulative paired φ-entropies.These concepts generalize some results which are known from literature.We begin with the cumulative paired φ-divergence that was discussed for the first time by [41], who called it "generalized cross entropy".Their focus was on uncertain variables, whereas ours is on random variables.The second concept generalizes mutual information, which is defined for Shannon's differential entropy, to mutual φ-information.We consider two random variables X and Y.The task is to decompose CPE φ (Y) into two kinds of variation such that the so-called external variation measures how much of CPE φ (Y) can be explained by X.This procedure mimics the well-known decomposition of variance and allows to define directed measures of dependence for X and Y.The third concept deals with dependence.More precisely, we introduce a new family of correlation coefficients that measure the strength of a monotonic relationship between X and Y. Well-known coefficients like the Gini correlation can be embedded in this approach.The fourth concept treats the problem of linear regression.CPE φ can serve as general measure of dispersion that has to be minimized to estimate the regression coefficients.This approach will be identified as a special case of rank-based regression or R regression.Here, the robustness properties of the rank-based estimator can directly be derived from the entropy generating function φ .Moreover, asymptotics can be derived from theory of rank-based regression.The last concept we discuss applies CPE φ to linear rank tests for the difference of scale.Known results, especially concerning the asymptotics, can be transferred from the theory of linear rank tests to this new class of tests.In this paper, we only sketch the main results and focus on examples.For a detailed discussion including proofs we refer to a series of papers by ) , which are currently work in progress.
The cumulative paired φ-divergence for two random variables is defined as follows.
Definition 4. Let X and Y be two random variables with cdfs F X and F Y .Then the cumulative paired φ-divergence of X and Y is given by The following examples introduce cumulative paired φ-divergences for the Shannon, the α-entropy, the Gini, and the Leik cases: Example 7.
1. Considering φ(u) = −u ln u, u ∈ [0, ∞), we obtain the cumulative paired Shannon divergence 3. For α = 2 we receive as a special case the cumulative paired Gini divergence CPD S is equivalent to the Anderson-Darling functional (cf.[77]) and has been used by [78] for a goodness-of-fit test, where F X represents the empirical distribution.Likewise, CPD S serves as a goodness-of-fit test (cf.[79]).
Further work in this area with similar concepts was done by [80,81], using the notation cumulative residual Kullback-Leiber (CRKL) information and cumulative Kullback-Leiber (CKL) information.
Based on work from [82][83][84][85] a general function φ α was discussed by [86]: Up to a multiplicative constant, φ α includes all of the aforementioned examples.In addition, the Hellinger distance is a special case for α = 1/2 that leads to the cumulative paired Hellinger divergence: For a strictly concave function φ, Chen et al. [41] proved that CPE φ (X, Y) ≥ 0 and CPE φ (X, Y) = 0 iff X and Y have identical distributions.Thus, the cumulative paired φ-divergence can be interpreted as a kind of a distance between distribution functions.As an application, Chen et al. [41] mentioned the "minimum cross-entropy principle".They proved that X follows a logistic distribution if CPD S is minimized, given that Y is exponentially distributed and the variance of X is fixed.If F Y is an empirical distribution and F X has an unknown vector of parameters θ, CPD φ can be minimized to attain a point estimator for θ (cf.[87]).The large class of goodness-of-fit tests based on CPD φ , discussed by Jager et al. [86], has already been mentioned.

Mutual Cumulative φ-Information
Let X and Y again be random variables with cdfs F X , F Y , density functions f X , f Y , and the conditional distribution function F Y|X .D X and D Y denote the supports of X and Y. Then we have which is the variation of Y given X = x.Averaging with respect to x leads to the internal variation For a concave entropy generating function φ, this internal variation cannot be greater than the total variation CPE φ (Y).More precisely, it holds: We consider the non-negative difference This expression measures the part of the variation of Y that can be explained by the variable X (= external variation) and shall be named "mutual cumulative paired φ-information" MCPI φ (cf.Rao et al. [46] using the term "cross entropy", (p. 3) in [50]).MCPI φ is equivalent to the transinformation that is defined for Shannon's differential entropy (cf.[60], p. 20f.).In contrast to transinformation, MCPI φ is not symmetric, so Cumulative paired mutual φ-information is the starting point for two directed measures of strength of φ-dependence between X and Y, namely "directed (measure) of cumulative paired φ-dependence", DCPD.The first one is and the second one is Both expressions measure the relative decrease in variation of Y if X is known.The domain is [0, 1].The lower bound 0 is taken if Y and X are independent, while the upper bound 1 corresponds to E X (CPE φ (Y|X)) = 0.In this case, from φ(u) > 0 for 0 < u < 1 and φ(0) = φ(1) = 0, we can conclude that the conditional distribution F Y|X (y|x) has to be degenerated.Thus, for every x ∈ D X there is exactly one y * ∈ D Y with P(Y = y * |X = x) = 1.Therefore, there is a perfect association between X and Y.The next example illustrates these concepts and demonstrates the advantage of considering both types of measures of dependence.
Note that X and Y follow univariate standard Gaussian distributions, whereas X + Y follows a univariate Gaussian distribution with mean 0 and variance 2(1 + ρ).Considering this, one can conclude that By plugging this quantile function into the defining equation of the cumulative paired φ-entropy one yields For ρ → −1, the cumulative paired φ-entropy behaves like the variance or the standard deviation.All measures approach 0 for ρ → −1, such that CPE φ can be used as a measure of risk since the risk can be completely eliminated in a portfolio with perfectly negative correlated returns of assets.To be more precise, it is to say that CPE φ rather behaves like the standard deviation than the variance.For ρ = 0, the variance of the sum equals the sum of the variances, but the standard deviation of the sum is equal to or smaller than the sum of the individual standard deviations.This is also true for CPE φ .
In case of the bivariate standard Gaussian distribution, Y|x is Gaussian as well with mean ρx and variance 1 − ρ 2 for x ∈ R and −1 < ρ < 1.Therefore, the quantile function of Y|x is Using this quantile function, the cumulative paired φ-entropy for the conditional random variable Y|x is Just like the variance of Y|x, CPE φ does not depend on x in case of a bivariate Gaussian distribution.This implies that the internal variation is 1 − ρ 2 CPE φ (Y), as well.For ρ → 1, the bivariate distribution becomes degenerated and the internal variation consequently approaches 0. The mutual cumulative paired φ-information is given by MCPI φ takes the value 0 if and only if ρ 2 = 0, in which case X and Y are independent.The two measures of directed cumulative φ-dependence for this example are ρ completely determines the values for both measures of directed dependence.Provided the upper bound 1 will be attained, there is a perfect linear relation between Y and X.
As a second example we consider the dependence structure of the Farlie-Gumbel-Morgenstern copula (FGM copula).For the sake of brevity, we define a copula C as bivariate distribution function with uniform marginals for two random variables U and V with support [0, 1].For details concerning copulas see, e.g., [88].
be the FGM copula (cf.[88], p. 68).With To get expressions in closed form we consider the Gini case with φ(u) = u(1 − u), u ∈ [0, 1].After some simple calculations we have Averaging over the uniform distribution of V leads to the internal variation With CPE G (U) = 1/3, the mutual cumulative Gini information and the directed cumulative measure of Gini dependence are It is well-known that only a small range of dependence can be covered by the FGM copula (cf.[88], p. 129).
Hall et al. [89] discussed several methods for estimating a conditional distribution.The results can be used for estimating the mutual φ-information and the two directed measures of dependence.This will be the task of future research.

φ-Correlation
Schechtman et al. [90] introduced Gini correlations of two random variables X and Y with distribution functions F X and F Y as The numerator equals 1/4 of the Gini mean difference where the expectation is calculated for two independent and with F X identically distributed random variables X 1 and X 2 .Gini's mean difference coincides with the cumulative paired Gini entropy CPE G (X) in the following way: Therefore, in the same way that Gini's mean difference can be generalized to the Gini correlation, CPE φ can be generalized to the φ-correlation.
Let X, Y be two random variables and let CPE φ (X), CPE φ (Y) be the corresponding cumulative paired φ-entropies, then and are called φ-correlations of X and Y. Since E(φ (F Y (Y)) − φ (F Y (Y))) = 0, the numerator is the covariance between X and φ (F Y (Y)) − φ (F Y (Y)).
The first example verifies that the Gini correlation is a proper special case of the φ-correlation.
The following basic properties of φ-correlations can easily be checked with the arguments applied by [90]: if there is a strictly increasing (decreasing) transformation g such that 5. If X and Y are independent, then Γ X,Y = Γ(Y, X) = 0. 6.If a + bX and c + dY are exchangeable for some constants a, b, c, d ∈ R with b, d > 0, then In the last subsection we have seen that two directed measures of φ-dependence do not rely on φ if a bivariate Gaussian distribution is considered.The same holds for φ-correlations as will be demonstrated in the following example.
Example 15.Let (X, Y) be a bivariate standard Gaussian random variable with Pearson correlation coefficient ρ.Thus, all φ-correlations coincide with ρ as the following consideration shows: Dividing this by CPE φ (X) yields the result.
Weighted sums of random variables appear for example in portfolio optimization.The diversification effect concerns negative correlations between the returns of assets.Thus, the risk of a portfolio can be significantly smaller than the sum of the individual risks.Now, we analyze whether cumulative paired φ-entropies can serve as a risk measure as well.Therefore, we have to examine the diversification effect for CPE φ .
First, we display the total risk CPE φ (Y) as a weighted sum of individual risks.Essentially, the weights need to be the φ-correlations of the individual returns with the portfolio return: For the diversification effect the total risk CPE φ (Y) has to be displayed as a function of the φ-correlations between X i and X j , i, j = 1, 2, . . ., k.A similar result was provided by [92] for the Gini correlation without proof.Let Y = ∑ k i=1 a i X i and set . ., k, then the following decomposition of the square of CPE φ (Y) holds: This is similar to the representation for the variance of Y, where Γ φ (X i , X j ) takes the role of the Pearson correlation and CPE φ (X i ) the role of the standard deviation for i, j = 1, 2, . . ., k.
Schechtman et al. [90] also introduced an estimator for the Gini correlation and derived its asymptotic distribution.For the proof it is useful to note that the numerator of the Gini correlation can be represented as a U-statistic.For the general case of the φ-correlation it is necessary to derive the influence function and to calculate its variance.This will be done in [75].

φ-Regression
Based on the Gini correlation Olkin et al. [93] considered the traditional ordinary least squares (OLS) approach in regression analysis where Y is the dependent variable and x is the independent variable.They modified it by minimizing the covariance between the error term ε in a linear regression model and the ranks of ε with respect to α and β.Ranks are the sample analogue of the theoretical distribution function F ε , such that the Gini mean difference Cov(ε, F ε ) is the center of this new approach for regression analysis.Olkin et al. [93] noticed that this approach is already known as "rank based regression" or short "R regression" in robust statistics.In robust regression analysis the more general optimization criteria Cov(ε, ϕ(F ε )) has been considered, where ϕ denotes a strictly increasing score function (cf.[94], p. 233).The choice ϕ(u) = 1 − 2u leads to the Gini mean difference, which is the scores generating function of the Wilcoxon scores.The rank based regression approach with general scores generating function , is equivalent to the generalization of the Gini regression to a so-called φ-regression based on the criteria function which has to be minimized to obtain α and β.Therefore, cumulative paired φ-entropies are special cases of the dispersion function that [95,96] proposed as optimization criteria for R regression.More precisely, R estimation proceeds in two steps.In the first step has to be minimized with respect to β.Let βφ denote this estimator.In the second step α will be estimated separately by αφ = med i (y i − x i βφ ).
The authors of [97,98] gave an overview of recent developments in rank based regression.We will apply their main results to φ-regression.In [99], the authors showed that the following property holds for the influence function of βφ : where (x 0 , y 0 ) represents an outlier.φ determines the influence of an outlier in the dependent variable on the estimator βφ .
The scale parameter τ φ is given by .
The influence function shows that βφ is asymptotically normal: For φ (1 − u) − φ (u) bounded, Koul et al. [100] proposed a consistent estimator τφ for the scale parameter τ φ .This asymptotic property can again be used to construct approximate confidence limits for the regression coefficients, to derive a Wald test for the general linear hypothesis, to derive a goodness-of-fit test, and to define a measure of determination (cf.[97])).
The R package "Rfit" has the option to include individual φ-functions into rank based regression (cf.[97]).Using this option and the dataset "telephone", which is available with several outliers in "Rfit", we compare the fit of the Shannon regression (α → 1), the Leik regression, and the α-regression (for several values of α) with the OLS regression.Figure 3 shows on the left the original data, the OLS, and the Shannon regression, while on its right side outliers were excluded to get a more detailed impression of the differences between the φ-regressions.In comparison with the very sensitive OLS regression all rank based regression techniques behave similarly.In case of a known error distribution, McKean et al. [98] showed an asymptotically efficient estimator for τ φ .This procedure also determines the entropy generating function φ.In case of an unknown error distribution but some available information with respect to skewness and leptokurtosis, a data-driven (adaptive) procedure was proposed by them.

Two-Sample Rank Test on Dispersion
Based on CPE φ the linear rank statistics can be used as a test statistic for alternatives of scale, where R 1 , R 2 , . . ., R n are the ranks of X 1 , X 2 , . . ., X n in the pooled sample X 1 , X 2 , . . ., X n , Y 1 , Y 2 , . . ., Y m .All random variables are assumed to be independent.Some of the linear rank statistics which are well-known from the literature are special cases of Equation (56) as will be shown in the following examples: Ansari et al. [101] suggest the statistic as a two-sample test for alternatives of scale (cf. [102], p. 104).Apparently, we have which is identical to the test statistic suggested by [103] up to an affine linear relation (cf.[68], p. 149f.).This test statistic is given by S M = ∑ n i=1 (R i − (n + m + 1)/2) 2 , thus, the resulting relation is given by In the following, the scores of the Mood test will be generated by the generating function of CPE G .
Dropping the requirement of concavity of φ, one finds analogies to other well-known test statistics.
which is identical to the quantile test statistic for alternatives of scale up to an affine linear relation ( [102], p. 105).
The asymptotic distribution of linear rank tests based on CPE φ can be derived from the theory of linear rank test, as discussed in [102].The asymptotic distribution under the null hypothesis is needed to be able to make an approximate test decision given a significance level α.The asymptotic distribution under the alternative hypothesis is needed for an approximate evaluation of the test power and the choice of the required sample size in order to ensure a given effect size, respectively.
We consider the centered linear rank statistic Under the null hypothesis of identical scale parameters and the assumption that Under the null hypothesis of identical scale, the centered linear rank statistic CPE S (R) is asymptotically normal with variance nm n + m 63 − 6π 2  108 .
If the alternative hypothesis H 1 for a density function f 0 is given by for σ > 0 and σ = 1, then set and assume This result follows immediately from [102], p. 267, Theorem 1, together with the Remark on, p. 268.
This simplifies the variance of the asymptotic normal distribution.
Since the asymptotic normality of the test statistic of the Ansari-Bradley test and the Mood test under the alternative hypothesis have been examined intensely (cf., e.g., [103,104]), we focus in the following example on the new Shannon test: Example 20.Set φ(u) = −u ln u, u ∈ [0, 1] and let f 0 be the density function of a standard Gaussian distribution, such that ϕ 1 (u; f 0 ) = −1 + Φ −1 (u) 2 and I 1 ( f 0 ) = 1.As a consequence, we have where the integrals have been evaluated by numerical integration.Then under the alternative Equation ( 58): Hereafter, one can discuss the asymptotic efficiency of linear rank tests based on cumulative paired φ-entropy.If f 0 is the true density and 1 gives the desired asymptotic efficiency (cf.[102], p. 317).The asymptotic efficiency of the Ansari-Bradley test (and the asymptotic equivalent Siegel-Tukey test, respectively) and the Mood test have been analyzed by [104][105][106].The asymptotic relative efficiency (ARE) with respect to the traditional F-test for differences in scale for two Gaussian distributions has been discussed by [103].This asymptotic relative efficiency between Mood test and F-test for differences in scale has been derived by [107].Once more, we focus on the new Shannon-test.
Example 21.The Klotz test is asymptotically efficient for the Gaussian distribution.With gives the asymptotic efficiency of the new Shannon test.
Using a distribution that ensures the asymptotic efficiency of the Ansari-Bradley test, we compare the asymptotic efficiency of the Shannon test to the one of the Ansari-Bradley test.
Example 22.The Ansari-Bradley test statistic S AB is asymptotically efficient for the double log-logistic distribution with density function f 0 (cf.[102], p. 104).The Fisher information is given by Furthermore, we have such that the asymptotic efficiency of the Shannon-test for f 0 is These two examples show that the Shannon test has a rather good asymptotic efficiency, even if the underlying distribution has moderate tails similar to the Gaussian distribution or heavy tails like the double log-logistic distribution.Asymptotic efficient linear rank tests correspond to a distribution and a scores generating function ϕ 1 , from which we can derive an entropy generating function φ and a cumulative paired φ-entropy.This relationship will be further examined in [74].

Some Cumulative Paired Entropies for Selected Distribution Functions
In the following, we derive closed form expressions for some cumulative paired φ-entropies.We mimic the procedure of ( [4], p. 326) to some degree.Table 1 of their paper contains multiple formulas of the differential entropy for the most popular statistical distributions.Several of these distributions will also be considered in the following.Since cumulative entropies depend on the distribution function or equivalently on the quantile function, we focus on families of distributions for which these functions have a closed form expression. Furthermore, we only discuss standardized random variables since the parameter of scale only has a multiplicative effect on CPE φ and the parameter of location has no effect.For the standard Gaussian distribution we provide the value of CPE S by numerical integration rounded to two decimal places since the probability function has no explicit form.For the Gumbel distribution however, there is a closed form expression for the distribution function -nevertheless, we were unable to establish a closed form of CPE S and CPE G .Therefore, we applied numerical integration in this case as well.

Uniform Distribution
Let X have the standard uniform distribution.Then we have

Triangular Distribution with Parameter c
Let X have a triangular distribution with density function Then the following holds:

Laplace Distribution
Let X follow the Laplace distribution with density function f X (x) = 1/2 exp(−|x|) for x ∈ R, then we have

Weibull Distribution
Let X follow the Weibull distribution with distribution function F X (x) = 1 − e −x c for x > 0, c > 0, then we have

Pareto Distribution
Let X follow the Pareto distribution with distribution function F X (x) = 1 − x −c for x > 1, c > 1, then we have

Gaussian Distribution
By means of numerical integration we calculated the following values for the standard Gaussian distribution: CPE S (X) = 1.806,CPE G (X) = 1.128,CPE L (X) = 1.596.CPE α for α ∈ [0.5, 3] and the standard Gaussian distribution can be seen in Figure 4.As can be seen in Figure 4, the heavy tails of the Student-t distribution result in a higher value for CPE α as compared with the Gaussian distribution.

Conclusions
A new kind of entropy has been introduced that generalizes Shannon's differential entropy.The main difference to the previous discussion of entropies is the fact that the new entropy is defined for distribution functions instead of density functions.This paper shows that this definition has a long tradition in several scientific disciplines like fuzzy set theory, reliability theory, and more recently in uncertainty theory.With only one exception within all the disciplines, the concepts had been discussed independently.Along with that, the theory of dispersion measures for ordered categorical variables refers to measures based on distribution functions, without realizing that implicitly some sort of entropies are applied.Using the Cauchy-Schwarz inequality, we were able to show the close relationship between the new kind of entropy named cumulative paired φ-entropy and the standard deviation.More precisely, the standard deviation yields an upper limit for the new entropy.Additionally, the Cauchy-Schwarz inequality can be used to derive maximum entropy distributions provided that there are constraints specifying values of mean and variance.Here, the logistic distribution takes on the same key role for the cumulative paired Shannon entropy which the Gaussian distribution takes by maximizing the differential entropy.As a new result we have demonstrated that Tukey's λ distribution is a maximum entropy distribution if using the entropy generating function φ which is known from the Harvda and Charvát entropy.Moreover, some new distributions can be derived by considering more general constraints.A change in perspective allows to determine the entropy that will be maximized by a certain distribution if, e.g., mean and variance are known.In this context the Gaussian distribution gives a simple solution.Since cumulative paired φ-entropy and variance are closely related, we have investigated whether the cumulative paired φ-entropy is a proper measure of scale.We show that it satisfies the axioms which were introduced by Oja for measures of scale.Several further properties, concerning the behavior under transformations or the sum of independent random variables, have been proven.Consequently, we have given first insights on how to estimate the new entropy.In addition, based on cumulative paired φ-entropy we have introduced new concepts like φ-divergence, mutual φ-information, and φ-correlation.φ-regression and linear rank tests for scale alternatives were considered as well.Furthermore, formulas have been derived for some popular distributions with cdf or quantile function in closed form and for certain cumulative paired φ-entropies.

Figure 2 .
Figure 2. Several entropy generating functions φ derived from the generalized maximum entropy (ME) principle.

6. 2 .Definition 2 .
CPE φ and Oja's Axioms for Measures of Scale Oja ([3] p. 159) defined a MOS as follows: Let F be a set of continuous distribution functions and an appropriate ordering of scale on F .

Figure 3 .
Figure 3. φ-regression fit for the number of calls in the "telephone" data set.

9. 10 .
Student-t DistributionBy means of numerical integration and for ν = 3 degrees of freedom we calculated the following values for the Student-t distribution CPE S (X) = 2.947, CPE G (X) = 3.308, CPE L (X) = 2.205.