Next Article in Journal
Thermoeconomic Coherence: A Methodology for the Analysis and Optimisation of Thermal Systems
Previous Article in Journal
Entropy? Honest!

Entropy 2016, 18(7), 248; https://doi.org/10.3390/e18070248

Article
Cumulative Paired φ-Entropy
Department of Statistics and Econometrics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lange Gasse, Nürnberg 90403, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Adom Giffin
Received: 13 April 2016 / Accepted: 24 June 2016 / Published: 1 July 2016

Abstract

:
A new kind of entropy will be introduced which generalizes both the differential entropy and the cumulative (residual) entropy. The generalization is twofold. First, we simultaneously define the entropy for cumulative distribution functions (cdfs) and survivor functions (sfs), instead of defining it separately for densities, cdfs, or sfs. Secondly, we consider a general “entropy generating function” φ, the same way Burbea et al. (IEEE Trans. Inf. Theory 1982, 28, 489–495) and Liese et al. (Convex Statistical Distances; Teubner-Verlag, 1987) did in the context of φ-divergences. Combining the ideas of φ-entropy and cumulative entropy leads to the new “cumulative paired φ-entropy” ( C P E φ ). This new entropy has already been discussed in at least four scientific disciplines, be it with certain modifications or simplifications. In the fuzzy set theory, for example, cumulative paired φ-entropies were defined for membership functions, whereas in uncertainty and reliability theories some variations of C P E φ were recently considered as measures of information. With a single exception, the discussions in the scientific disciplines appear to be held independently of each other. We consider C P E φ for continuous cdfs and show that C P E φ is rather a measure of dispersion than a measure of information. In the first place, this will be demonstrated by deriving an upper bound which is determined by the standard deviation and by solving the maximum entropy problem under the restriction of a fixed variance. Next, this paper specifically shows that C P E φ satisfies the axioms of a dispersion measure. The corresponding dispersion functional can easily be estimated by an L-estimator, containing all its known asymptotic properties. C P E φ is the basis for several related concepts like mutual φ-information, φ-correlation, and φ-regression, which generalize Gini correlation and Gini regression. In addition, linear rank tests for scale that are based on the new entropy have been developed. We show that almost all known linear rank tests are special cases, and we introduce certain new tests. Moreover, formulas for different distributions and entropy calculations are presented for C P E φ if the cdf is available in a closed form.
Keywords:
φ-entropy; absolute mean deviation; cumulative residual entropy; measure of dispersion; generalized maximum entropy principle; Tukey’s λ distribution; φ-regression; L-estimator; linear rank test

1. Introduction

The φ-entropy
E φ ( F ) = R φ ( f ( x ) ) d x ,
where f is a probability density function and φ is a strictly concave function, was introduced by [1]. If we set φ ( u ) = - u ln u , u [ 0 , 1 ] , we get Shannon’s differential entropy as the most prominent special case.
Shannon et al. [2] derived the “entropy power fraction” and showed that there is a close relationship between Shannon entropy and variance. In [3], it was demonstrated that Shannon’s differential entropy satisfies an ordering of scale and thus is a proper measure of scale (MOS). Recently, the discussion in [4] has shown that entropies can be interpreted as a measure of dispersion. In the discrete case, minimal Shannon entropy means maximal certainty about the random outcome of an experiment. A degenerate distribution minimizes the Shannon entropy as well as the variance of a discrete quantitative random variable. For such a degenerate distribution, Shannon entropy and variance both take the value 0. However, there is an important difference between the differential entropy and the variance when discussing a discrete quantitative random variable with support [ a , b ] . The differential entropy is maximized by a uniform distribution over [ a , b ] , while the variance is maximal if both interval bounds a and b have the probability mass of 0 . 5 (cf. [5]). A similar result holds for a discrete random variable with a finite number of realizations. Therefore, it is doubtful that Equation (1) is a true measure of dispersion.
We propose to define the φ-entropy for cumulative distribution functions (cdfs) F and survivor functions (sf) 1 - F instead of for density functions f. Throughout the paper, we define F ¯ : = 1 - F . By applying this modification we get
C P E φ ( F ) = R φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x ,
where cdf F is absolutely continuous, C P E means “cumulative paired entropy”, and φ is the “entropy generating function” defined on [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . We will assume that φ is concave on [ 0 , 1 ] throughout most of this paper. In particular, we will show that Equation (2) satisfies a popular ordering of scale and attains its maximum if the domain is an interval [ a , b ] , while a, b occur with a probability of 1 / 2 . This means that Equation (2) behaves like a proper measure of dispersion.
In addition, we generalize results from the literature, focusing on the Shannon case with φ ( u ) = - u ln u , u [ 0 , 1 ] (cf. [6]), the cumulative residual entropy
C R E ( F ) = - R + F ¯ ( x ) ln F ¯ ( x ) d x
(cf. [7]), and the cumulative entropy
C E ( F ) = - R F ( x ) ln F ( x ) d x
(cf. [8,9]). In the literature, this entropy is interpreted as a measure of information rather than dispersion without any clarification on what kind of information is considered.
A first general aim of this paper is to show that entropies can rather be interpreted as measures of dispersion than as measures of information. A second general aim is to demonstrate that the entropy generating function φ, the weight function J in L-estimation, the dispersion function d which serves as a criterion for minimization in robust rank regression, and the scores-generating function ϕ 1 are closely related.
Specific aims of this paper are:
  • To show that the cdf-based entropy Equation (2) originates in several distinct scientific areas.
  • To demonstrate the close relationship between Equation (2) and the standard deviation.
  • To derive maximum entropy (ME) distributions under simple and more complex restrictions and to show that commonly known as well as new distributions solve the ME principle.
  • To derive the entropy maximized by a given distribution under certain restrictions.
  • To formally prove that Equation (2) is a measure of dispersion.
  • To propose an L-estimator for Equation (2) and derive its asymptotic properties.
  • To use Equation (2) in order to obtain new related concepts measuring the dependence of random variables (such as mutual φ-information, φ-correlation, and φ-regression).
  • To apply Equation (2) to get new linear rank tests for the comparison of scale.
The paper is structured in the same order as these aims. After this introduction, in the second section we give a short review of the literature that is concerned with Equation (2) or related measures. The third section begins by summarizing reasons for defining entropies for cdfs and sfs instead of defining them for densities. Next, some equivalent characterizations of Equation (2) are given, provided the derivative of φ exists. In the fourth section, we use the Cauchy–Schwarz inequality to derive an upper bound for Equation (2), which provides sufficient conditions for the existence of C P E . In addition, more stringent conditions for the existence are directly proven. In the fifth section, the Cauchy–Schwarz inequality allows to derive ME distributions if the variance is fixed. For more complicated restrictions we attain ME distributions by solving the Euler–Lagrange conditions. Following the generalized ME principle (cf. [10]), we change the perspective and ask which entropy is maximized if the variance and the population’s distribution is fixed. The sixth section is of key importance because the properties of Equation (2) as a measure of dispersion is analyzed in detail. We show that Equation (2) satisfies an often applied ordering of scale by [3], is invariant with respect to translations and equivariant with respect to scale transformations. Additionally, we provide certain results concerning the sum of independent random variables. In the seventh section, we propose an L-estimator for C P E φ . Some basic properties of this estimator like influence function, consistency, and asymptotic normality are shown. In the eighth section, we introduce several new statistical concepts based on C P E φ , which are generalizing divergence, mutual information, Gini correlation, and Gini regression. Additionally, we show that new linear rank tests for dispersion can be based on C P E φ . The known linear rank tests like the Mood- or the Ansari-Bradley tests are special cases of this general approach. However, in this paper we exclude most of the technical details for they will be presented in several accompanying papers. In the last section we compute Equation (2) for certain generating functions φ and some selected families of distributions.

2. State of the Art—An Overview

Entropies are usually defined on the simplex of probability vectors, which are summing up to one (cf. [2,11]). Until now it has been rather usual to calculate the Shannon entropy not for vectors of probability or probability density functions f, but for distribution functions F. The corresponding Shannon entropy is given by
C P E S ( F ) = - R F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) d x .
Nevertheless, we have identified five scientific disciplines directly or implicitly working with an entropy based on distribution functions or survivor functions:
  • Fuzzy set theory,
  • Generalized ME principle,
  • Theory of dispersion of ordered categorial variables,
  • Uncertainty theory,
  • Reliability theory.

2.1. Fuzzy Set Theory

To the best of our knowledge, Equation (5) was initially introduced by [12]. However, they did not consider the entropy for a cdf F. Instead, they were concerned with a so-called membership function μ A that quantifies the degree to which a certain element x of a set Ω belongs to a subset A Ω . Membership functions were introduced by [13] within the framework of the “fuzzy set theory”.
It is important to note that if all elements of Ω are mapped to the value 1 / 2 , maximum uncertainty about x belonging to a set A will be attained.
This main property is one of the axioms of membership functions. In the aftermath of [12] numerous modifications to the term “entropy” have been made and axiomatizations of the membership functions have been stated (see, e.g., the overview in [14]).
Finally, those modifications proceeded parallel to a long history of extensions and parametrizations of the term entropy for probability vectors and densities. It began with [15] up to [16,17], who provided a superstructure of those generalizations consisting of a very general form of the entropy, including the φ-entropy Equation (1) as a special case. Burbea et al. [1] introduced the term φ-entropy. If both φ ( x ) and φ ( 1 - x ) appeared in the entropy, as in the Fermi-Dirac entropy (cf. [18], p. 191), they used the term “paired” φ-entropy.

2.2. Generalized Maximum Entropy Principle

Regardless of the debate in the fuzzy set theory and the theory of measurement of dispersion, Kapur [10] showed that a growth model with a logistic growth rate is yielded as the solution of maximizing Equation (5) under two simple constraints. This provides an example for the “generalized maximum entropy principle” postulated by Kesavan et al. [19]. In addition to that, the simple ME principle introduced by [20,21] derives a distribution which maximizes an entropy given certain constraints. Furthermore, the generalization of [19] consists of determining the φ-entropy, which is maximized given a distribution and some constraints. Finally, they used a slightly modified formula Equation (5). The cdf had to be replaced by a monotonically increasing function with logistic shape.

2.3. Theory of Dispersion

Irrespectively of the discussion on membership functions in the fuzzy set theory and the proposals of generalizing the Shannon entropy, Leik [22] discussed a measure of dispersion for ordered categorial variables with a finite number k of categories x 1 < x 2 < < x k . His measure is based on the distance between the k - 1 -dimensional vectors of cumulated frequencies ( F 1 , F 2 , , F k - 1 ) and ( 1 / 2 , 1 / 2 , , 1 / 2 ) . Both vectors only coincide if the extreme categories x 1 and x k appear with same frequency. This represents the case of maximal dispersion. Consider
C P E φ ( F ) = i = 1 k - 1 φ ( F i ) + φ ( 1 - F i )
as discrete version of Equation (2). Setting φ ( u ) = min { u , 1 - u } , we get the measure of Leik as a special case of Equation (6) up to a change of sign. Vogel et al. [23] considered φ ( u ) = - u ln ( u ) and the Shannon variation of Equation (6) as measure of dispersion for ordered categorial variables. Numerous modifications of Leik’s measure of dispersion have been published. In [24,25,26,27,28,29], the authors implicitly used φ ( u ) = 1 / 4 - ( u - 1 / 2 ) 2 or equivalently φ ( u ) = u ( 1 - u ) . Most of the discussion was conducted in the journal “Perceptual and Motor Skills”. For a recent overview of measuring dispersion including ordered categorial variables see, e.g., [30]. Instead of dispersion, some articles are concerned with related concepts for ordered categorial variables, like bipolarization and inequality (cf. [31,32,33,34,35]). A class of measures of dispersion for ordered categorial variables with a finite number of categories that is similar to Equation (6) has been introduced by Klein and Yager [36,37] independently of each other. They had obviously not been aware of the discussion in “Perceptual and Motor Skills”. Both authors gave axiomatizations to describe which functions φ will be appropriate for measuring dispersion. However, at least Yager [37] recognized the close relationship between those measures and the general term “entropy” in the fuzzy set theory. He introduced the term “dissonance” to more precisely characterize measures of dispersion for ordered categorial variables. In the language of information theory, maximum dissonance describes an extreme case in which there is still some information. But this information is extremely contradictory. As an example, we could ask in the field of product evaluation to what degree information, which states that 50 percent of the recommendations are extremely good and at the same time 50 percent are extremely bad, is useful to make a purchase decision. This is an important difference to the Shannon entropy, which is maximal if there is no information at all, i.e., all categories occur with same probability.
Bowden [38] defines the location entropy function h ( x ) = - F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) , given a value of x. He emphasizes the possibility to construct measures of spread and symmetry based on this function. To the best of our knowledge, Bowden [38] is the only one to mention the application of cumulated paired Shannon entropy to continuous distributions so far.

2.4. Uncertainty Theory

Reference ([6] (first edition 2004) can be considered the founder of the uncertainty theory. This theory is concerned with formalizing data consisting of expert opinions rather than formalizing data gathered by repeating a random experiment. Liu slightly modified the Kolmogoroff axioms of probability theory to receive an uncertainty measure, following which he defined uncertain variables, uncertainty distribution functions, and moments of uncertain variables. Liu argued that “an event is the most uncertain if its uncertainty measure is 0 . 5 , because the event and its complement may be regarded as ‘equally likely’ ” ([6], p. 14). Liu’s maximum uncertainty principle states: “For any event, if there are multiple reasonable values that an uncertain measure may take, the value as close to 0 . 5 as possible is assigned to the event” [6] (p. 14). Similar to the fuzzy set theory, the distance between the uncertainty distribution and the value 0 . 5 can be measured by the Shannon-type entropy Equation (5). Apparently for the first time in the third edition of 2010, he explicitly calculated Equation (5) for several distributions (e.g., the logistic distribution) and derived upper bounds. He applied the ME principle to uncertainty distributions. The preferred constraint is to predetermine values of mean and variance ([6], p. 83ff.). In this case, the logistic distribution maximizes Equation (5). In this context, the logistic distribution plays the same role in uncertainty theory as the Gaussian distribution in probability theory. The Gaussian distribution maximizes the differential entropy, given values for mean and variance. Therefore, in uncertainty theory the logistic distribution is called “normal The authors of distribution”. [39] provided Equation (5) as a function of the quantile function. In addition to that, the authors of [40] chose φ ( u ) = u ( 1 - u ) , u [ 0 , 1 ] , as entropy generating function and derived the ME distribution as a discrete uniform distribution, which is concentrated on the endpoints of the compact domain [ a , b ] if no further restrictions are assumed. Popoviciu [5] attained the same distribution by maximizing the variance. Chen et al. [41] introduced cross entropies and divergence measures based on general functions φ. Further literature on this topic is provided by [42,43,44].

2.5. Reliability Theory

Entropies also play a prominent role in reliability theory. They were initially introduced in the fields of hazard rates and residual lifetime distributions (cf. [45]). In addition, the authors of [46,47] introduced the cumulative residual entropy Equation (3), discussed its properties, and derived the exponential and the Weibull distribution by an ME principle, given the coefficient of variation. This work went into detail on the advantage of defining entropy via survivor functions instead of probability density functions. Rao et al. [46] refer to the extensive criticism on the differential entropy by [48]. Moreover, Zografos et al. [49] generalized the Shannon-type cumulative residual entropy to an entropy of the Rényi type. Furthermore, Drissi et al. [50] considered random variables with general support. They also presented solutions for the maximization of Equation (3), provided that more general restrictions are considered. Similar to [51], they identified the logistic distribution to be the ME distribution, given mean, variance, and a symmetric form of the distribution function.
Di Crescenzo et al. [9] analyzed Equation (4) for cdfs and discussed its stochastic properties. Sunoj et al. [52] plugged the quantile function into the Shannon-type entropy Equation (4) and presented expressions if the quantile function possesses a closed form, but not the cdf. In recent papers an empirical version of Equation (3) is used as goodness-of-fit test (cf. [53]).
Additionally, C R E and C E are applied to the distribution function of the residual lifetime ( X - t | X > t ) and the inactivity time ( t - X | X < t ) (cf. [54]). This can directly be generalized to the C P E framework.
Moreover, Psarrakos et al. [55] provides an interesting alternative generalization of the Shannon case. In this paper we focus on the class of concave functions φ. Special extensions to non-concave functions will be subject to future research.
This brief overview shows that different disciplines are accessing an entropy based on distribution functions. The contributions of the fuzzy set theory, the uncertainty theory, and the reliability theory all have the exclusive consideration of continuous random variables in common. The discussions about entropy in reliability theory on the one hand and fuzzy set theory and uncertainty theory, respectively, on the other hand were conducted independently of each other without even noticing the results of the other disciplines. However, Liu’s uncertainty theory benefits from the discussion in the fuzzy set theory. In the theory of dispersion of ordered categorial variables the authors do not appear to be aware of their implicit use a concept of entropy. Nevertheless the situation is somewhat different to that of the other areas since only discrete variables were discussed. Kiesl’s dissertation [56] provides a theory of measures of the form Equation (6) with numerous applications. However, an intensive discussion of Equation (2) is missing and will be provided here.

3. Cumulative Paired φ-Entropy for Continuous Variables

3.1. Definition

We focus on absolute continuous cdfs F with density functions f. The set of all those distribution functions is called F . We call a function “entropy generating function” if it is non-negative and concave on the domain [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . In this case, φ ( u ) + φ ( 1 - u ) is a symmetric function with respect to 1 / 2 .
Definition 1. 
The functional C P E φ : F R 0 + with
C P E φ ( F ) = R φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x
is called cumulative paired φ-entropy for F F with entropy generating function φ.
Up to now, we assumed the existence of C P E φ . In the following section we will discuss some sufficient criteria ensuring the existence of C P E φ . If X is a random variable with cdf F, we occasionally use the notation C P E φ ( X ) instead.
Next, some examples of well established concave entropy generating functions φ and corresponding cumulative paired φ-entropies will be given.
  • Cumulative paired α-entropy C P E α : Following [57], let φ be given by
    φ ( u ) = u u α - 1 - 1 1 - α , u [ 0 , 1 ] ,
    for α > 0 . The corresponding so-called cumulative paired α-entropy is
    C P E α ( F ) = R F ( x ) F ( x ) α - 1 - 1 1 - α + F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x .
  • Cumulative paired Gini entropy C P E G : For α = 2 we get
    C P E G ( F ) = 2 R F ( x ) F ¯ ( x ) d x
    as a special case of C P E α .
  • Cumulative paired Shannon entropy C P E S : Set φ ( u ) = - u ln u , u [ 0 , 1 ] . Thus,
    C P E S ( F ) = - R F ( x ) ln F ( x ) + F ¯ ( x ) ln F ¯ ( x ) d x
    gives the entropy which was already mentioned in the introduction. It is a special case of C P E α for α 1 .
  • Cumulative paired Leik entropy C P E L : The function
    φ ( u ) = min { u , 1 - u } = 1 2 - u - 1 2 , u [ 0 , 1 ] ,
    represents the limiting case of a linear concave function φ. The measure of dispersion proposed by [22] implicitly makes use of φ such that we call
    C P E L ( F ) = 2 R min { F ( x ) , F ¯ ( x ) } d x
    cumulative paired Leik entropy.
Figure 1 gives an impression of the previously mentioned generating functions φ .

3.2. Advantages of Entropies Based on Cdfs

The authos of [46,47] list several reasons for better defining an entropy for distribution functions rather than defining it for density functions. Starting point is the well-known critique of Shannon’s differential entropy - f ( x ) ln f ( x ) d x that was expressed by several authors like [48,58] and (p. 58f) in [59].
Transferred to cumulative paired entropies, the advantages of entropies based on distribution functions (cf. [46]) are as follows:
  • C P E φ is based on probabilities and has a consistent definition for both discrete and continuous random variables.
  • C P E φ is always non-negative.
  • C P E φ can easily be estimated by the empirical distribution function. This estimation is strongly consistent, due to the strong consistency of the empirical distribution function.
Problems of the differential entropy are occasionally discussed in case of grouped data, at which the usual Shannon entropy is calculated for each group probability. With an increasing amount of groups, the Shannon entropy not only does not converge to the respective differential entropy, but it even diverges (cf., e.g., (p. 54) in ([59], (p. 239) in [60]). In the next section we will show that the discrete version of C P E φ converges to C P E φ as the number of groups approaches infinity.

3.3. C P E φ for Grouped Data

First, we show the notation for characterizing grouped data. The interval [ x ˜ 0 , x ˜ k ] is divided into k subintervals with limits x ˜ 0 < x ˜ 1 < . . . < x ˜ k - 1 < x ˜ k . The range of each group is called Δ x i = x ˜ i - x ˜ i - 1 for i = 1 , 2 , . . . , k . Let X be a random variable with absolute continuous distribution function F, which is only known at the limits of each group. The probabilities of each group are denoted by p i = F ( x ˜ i ) - F ( x ˜ i - 1 ) , i = 1 , 2 , . . . , k . X * is the random variable whose distribution function F * is yielded by linear interpolation of the values of F at the limits of successive groups. Finally, X * is the result of adding an independent, uniformly distributed random variable to X. It holds that
F * ( x ) = F ( x ˜ i - 1 ) + p i Δ x i ( x - x ˜ i - 1 ) if   x ˜ i - 1 < x x ˜ i
for x R , F * ( x ) = 0 for x x ˜ 0 and F * ( x ) = 1 for x > x ˜ k .
Let X * denote the respective random variable of F * . The probability density function f * of X * is defined by f * ( x ) = p i / Δ x i for x ˜ i - 1 < x x ˜ i , i = 1 , 2 , . . . , k .
Lemma 1. 
Let φ be an entropy generating function with antiderivative S φ . The paired cumulative φ-entropy of the distribution function in Equation (12) is given as follows:
C P E φ ( X * ) = i = 1 k Δ x i p i S φ ( F ( x ˜ i ) ) - S φ ( F ( x ˜ i - 1 ) ) + S φ ( F ¯ ( x ˜ i - 1 ) ) - S φ ( F ¯ ( x ˜ i ) ) .
Proof. 
For x ( x ˜ i - 1 , x i ] , we have
F * ( x ) = a i + b i x with b i = p i Δ x i and a i = F ( x ˜ i - 1 ) - b i x ˜ i - 1
with a i + b i x ˜ i - 1 = F ( x ˜ i - 1 ) , a i + b i x ˜ i = F ( x ˜ i ) , 1 - a i - b i x ˜ i - 1 = F ¯ ( x ˜ i - 1 ) , and 1 - a i - b i x ˜ i = F ¯ ( x ˜ i ) , i = 1 , 2 , , k . With y = a i + b i x and d x = 1 / b i d y we have
C P E φ ( X * ) = i = 1 k x ˜ i - 1 x ˜ i φ ( a i + b i x ) + φ ( 1 - a i - b i x ) d x = i = 1 k 1 b i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) + φ ( 1 - y ) d y = i = 1 k Δ x i p i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) d y - F ¯ ( x ˜ i - 1 ) F ¯ ( x ˜ i ) φ ( y ) d y = i = 1 k Δ x i p i F ( x ˜ i - 1 ) F ( x ˜ i ) φ ( y ) d y + F ¯ ( x ˜ i ) F ¯ ( x ˜ i - 1 ) φ ( y ) d y = i = 1 k Δ x i p i S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) + S φ F ¯ ( x ˜ i - 1 ) - S φ F ¯ ( x ˜ i ) .
 ☐
Considering this result, we can easily prove the convergence property for C P E φ ( X * ) :
Theorem 1. 
Let φ be a generating function with antiderivative S φ and let F be a continuous distribution function of the random variable X with support [ a , b ] . X * is the corresponding random variable for grouped data with Δ x = ( b - a ) / k , k > 0 . Then the following holds:
C P E φ ( X * ) a b φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x f o r   k .
Proof. 
Consider equidistant classes with Δ x i = Δ x = ( b - a ) / k , i = 1 , 2 , . . . , k . Subsequently, Equation (13) results in
C P E φ ( X * ) = i = 1 k S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) + S φ F ¯ ( x ˜ i - 1 ) - S φ F ¯ ( x ˜ i ) F ( x ˜ i ) - F ( x ˜ i - 1 ) Δ x .
With k we have Δ x 0 such that for F continuous we get F ( x ˜ i ) - F ( x ˜ i - 1 ) 0 . The antiderivative S φ has the derivative φ almost everywhere such that with k
i = 1 k S φ F ( x ˜ i ) - S φ F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) Δ x a b φ ( F ( x ) ) d x .
An analogue argument holds for the second term of Equation (14). ☐
In addition to this theoretical result we get the following useful expressions for C P E φ for grouped data and a specific choice of φ as Corollary 1 shows:
Corollary 1. 
Let φ be s.t.
φ ( u ) = - u ln u f o r   α = 1 - u u α - 1 - 1 1 - α f o r   α 1 ,
where u [ 0 , 1 ] . Then for α = 1
C P E S ( X * ) = - 1 2 i = 1 k Δ x i p i F ( x ˜ i ) 2 ln F ( x ˜ i ) - F ( x ˜ i - 1 ) 2 ln F ( x ˜ i - 1 ) - 1 2 i = 1 k Δ x i p i F ¯ ( x ˜ i - 1 ) 2 ln F ¯ ( x ˜ i - 1 ) - F ¯ ( x ˜ i ) 2 ln F ¯ ( x ˜ i ) + 1 2 ( x ˜ k - x ˜ 0 )
and for α 1
C P E α ( X * ) = 1 1 - α i = 1 k Δ x i p i ( 1 α + 1 F ( x ˜ i ) α + 1 - F ( x ˜ i - 1 ) α + 1 + F ¯ ( x ˜ i - 1 ) α + 1 - F ¯ ( x ˜ i ) α + 1 ) - ( x ˜ k - x ˜ 0 ) .
Proof. 
Using the antiderivatives
S α ( u ) = - 1 2 u 2 ln u + 1 4 u 2 for   α = 1 1 1 - α 1 α + 1 u α + 1 - 1 2 u 2 for   α 1 ,
since p i = F ( x ˜ i ) - F ( x ˜ i - 1 ) , it holds that
1 p i F ( x ˜ i ) 2 - F ( x ˜ i - 1 ) 2 + F ¯ ( x ˜ i - 1 ) 2 - F ¯ ( x ˜ i ) 2 = F ( x ˜ i ) - F ( x ˜ i - 1 ) F ( x ˜ i ) + F ( x ˜ i - 1 ) F ( x ˜ i ) - F ( x ˜ i - 1 ) + F ¯ ( x ˜ i - 1 ) - F ¯ ( x ˜ i ) F ¯ ( x ˜ i - 1 ) + F ¯ ( x ˜ i ) F ( x ˜ i ) - F ( x ˜ i - 1 ) = 2
for i = 1 , 2 , . . . , k . The results follow immediately. ☐

3.4. Alternative Representations of C P E φ

In case φ ( 0 ) = φ ( 1 ) = 0 holds and φ is differentiable, one can provide several alternative representations of C P E φ in addition to Eqaution (7). These alternative representations will be useful in the following to find conditions ensuring the existence of C P E φ and to find some simple estimators.
Proposition 1. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . In this case, for F F with quantile function F - 1 ( u ) , density function f, and quantile density function q ( u ) = 1 / f ( F - 1 ( u ) ) , for u [ 0 , 1 ] , the following holds:
C P E φ ( F ) = 0 1 φ ( u ) + φ ( 1 - u ) q ( u ) d u ,
C P E φ ( F ) = 0 1 ( φ ( 1 - u ) - φ ( u ) ) F - 1 ( u ) d u ,
C P E φ ( F ) = R x ( φ ( F ¯ ( x ) ) - φ ( F ( x ) ) ) f ( x ) d x .
Proof. 
Apply probability integral transform U = F ( X ) and partial integration. ☐
Due to φ ( 0 ) = φ ( 1 ) = 0 it holds that
0 1 ( φ ( 1 - u ) - φ ( u ) ) d u = 0 .
This property supports the understanding of C P E φ being a covariance for which the Cauchy–Schwarz inequality gives an upper bound:
Corollary 2. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . Then if U is uniformly distributed on [ 0 , 1 ] and X F :
C P E φ ( F ) = C o v ( φ ( 1 - U ) - φ ( U ) , F - 1 ( U ) ) ,
C P E φ ( F ) = C o v ( φ ( F ¯ ( X ) ) - φ ( F ( X ) ) , X ) .
Proof. 
Let μ = E [ X ] , then since E [ φ ( 1 - U ) - φ ( U ) ] = 0 ,
C P E φ ( F ) = 0 1 ( φ ( 1 - u ) - φ ( u ) ) F - 1 ( u ) d u = 0 1 ( φ ( 1 - u ) - φ ( u ) ) ( F - 1 ( u ) - μ ) d u .
 ☐
Depending on the context, we switch between these alternative representations of C P E φ .

4. Sufficient Conditions for the Existence of CPE φ

4.1. Deriving an Upper Bound for C P E φ

The Cauchy–Schwarz inequality for Equations (18) and (19), respectively, provides an upper bound for C P E φ if the variance σ 2 = E [ ( F - 1 ( u ) - μ ) 2 ] exists and
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u <
holds. The existence of the upper bound simultaneously ensures the existence of C P E φ .
Proposition 2. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . If Equation (20) holds, then for X F with V a r ( X ) < and quantile function F - 1 , we have
C P E φ ( F ) E ( φ ( 1 - U ) - φ ( U ) ) 2 E ( F - 1 ( U ) - μ ) 2
C P E φ ( F ) E ( φ ( F ¯ ( X ) ) - φ ( F ( X ) ) ) 2 σ 2 .
Proof. 
The statement follows from
E ( φ ( 1 - U ) - φ ( U ) ) ( F - 1 ( U ) - μ ) 2 0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u × E ( F - 1 ( U ) - μ ) 2 .
 ☐
Next, we consider the upper bound for the cumulative paired α-entropy:
Corollary 3. 
Let X be a random variable having a finite variance. Then
C P E α ( X ) σ α 1 - α 2 1 2 α - 1 - B ( α , α )
for α > 1 / 2 , α 1 with
C P E S ( X ) π σ 3
for α = 1 .
Proof. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) and φ ( u ) = ( α u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , 1 ] , we have
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = α 1 - α 2 0 1 u α - 1 - ( 1 - u ) α - 1 2 d u = 2 α 1 - α 2 0 1 u 2 ( α - 1 ) d u - 2 0 1 u α - 1 ( 1 - u ) α - 1 d u = 2 α 1 - α 2 1 2 α - 1 - B ( α , α ) .
α > 1 / 2 is required for the existence of C P E α ( X ) . For α = 1 we have φ ( u ) = - u ln u and φ ( u ) = - ln u - 1 , u [ 0 , 1 ] , such that
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = 0 1 ln 1 - u u 2 d u = π 2 3 .
 ☐
In the framework of uncertainty theory, the upper bound for the paired cumulative Shannon entropy was derived by [51] (see also [6], p. 83). For α = 2 we get the upper bound for the paired cumulative Gini entropy
C P E G ( X ) σ 2 3 .
This result has already been proven for non-negative uncertainty variables by [40]. Finally, one yields the following upper bound for the paired cumulative Leik entropy:
Corollary 4. 
Let X be a random variable with existing variance. Then
C P E L [ X ] 2 σ .
Proof. 
Use
0 1 ( sign ( u - 1 / 2 ) - sign ( 1 / 2 - u ) ) 2 d u = 4
to get the result. ☐

4.2. Stricter Conditions for the Existence of C P E α

So far, we only considered sufficient conditions for an existing variance. Following the arguments in [46,50], which were used for the special case of cumulative residual and residual Shannon entropy, one can derive stricter sufficient conditions for the existence of C P E α .
Theorem 2. 
If E ( | X | p ) < for p > 1 , then C P E α < for α > 1 / p .
Proof. 
To prepare the proof we first note that
u u α - 1 - 1 1 - α - u ln u u u β - 1 - 1 1 - β 1 - u
holds for 0 < β < 1 < α and 0 u 1 .
The second fact required for the proof is that
0 F ¯ ( x ) d x < and - 0 F ( x ) d x <
if E ( X ) < , because
E ( X ) = 0 F ¯ ( x ) d x + - 0 F ( x ) d x .
Third, it holds that
P ( - X y ) P ( | X | y ) for   y > 0 ,
because
P ( | X | y ) = 1 - P ( | X | < y ) = 1 - ( P ( X < y ) - P ( X - y ) ) = 1 - P ( X < y ) + P ( X - y ) = 1 - P ( X < y ) + P ( - X y ) .
C P E α consists of four indefinite integrals:
C P E α = 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x + - 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x + - 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x + 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x .
It must be shown separately that these integrals converge.
The convergence of the first two terms follows directly from the existence of E ( X ) . With Equations (27) and (28) we have for α > 0
0 F ( x ) F ( x ) α - 1 - 1 1 - α d x 0 F ¯ ( x ) d x <
and
- 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x - 0 F ( x ) d x < .
For the third term we have to demonstrate that
- 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x <
for α > 1 / p . If p > 1 , there is a β with 1 / p < β < 1 while β < α . With Equation (27) it is for - < x 0 that
F ( x ) F ( x ) α - 1 1 - α F ( x ) F ( x ) β - 1 1 - β 1 1 - β F ( x ) β
because 1 - β > 0 .
With F ( x ) = P ( X x ) = P ( - X - x ) there exists
1 1 - β F ( x ) β 1 1 - β for   0 - x 1 = 1 β - 1 P ( - X - x ) β 1 β - 1 P ( | X | - x ) β for   1 < - x < .
For p > 0 the transformation g ( y ) = y p is monotonically increasing for y > 1 . Using the Markov inequality we get
P ( | X | y ) E [ | X | p ] y p .
Putting these results together, we attain
- 0 F ( x ) F ( x ) α - 1 - 1 1 - α d x 1 1 - β + 1 1 - β 1 E [ | X | p ] β y p β d y <
for β > 1 / p (and thus for p β > 1 ) and due to 1 1 / y q d y < for q > 1 .
It remains to show the convergence of the fourth term:
0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x <
for α > 1 / p . For p > 1 , there is a β with 1 / p < β < 1 and β < α . Due to Equation (27) and 1 - β > 0 for 0 x < it is true that
F ¯ ( x ) F ¯ ( x ) α - 1 1 - α F ¯ ( x ) F ¯ ( x ) β - 1 1 - β 1 1 - β F ¯ ( x ) β .
With F ¯ ( x ) = P ( X > x ) we have
1 1 - β F ¯ ( x ) β 1 1 - β for   0 x 1 = 1 β - 1 P ( X x ) β 1 β - 1 P ( | X | x ) β for   1 < x < .
Now, the Markov inequality gives
P ( | X | y ) E ( | X | p ) y p .
In summary, we have
- 0 F ¯ ( x ) F ¯ ( x ) α - 1 - 1 1 - α d x 1 1 - β + 1 1 - β 1 E [ | X | p ] β y p β d y <
for β > 1 / p and by 1 1 / y q d y < for q > 1 . This completes the proof. ☐
Following Theorem 2, depending on the number of existing moments, specific conditions for α arise in order to ensure the existence of C P E α :
  • If the variance of X exists (i.e., p = 2 ), C P E α ( X ) exists for α > 1 / 2 .
  • For p > 1 , E [ | X | p ] < is sufficient for the existence of C P E S (i.e., α = 1 ).
  • For p = 1 , E [ | X | p ] < is sufficient for the existence of C P E G (i.e., α = 2 ).

5. Maximum CPE φ Distributions

5.1. Maximum C P E φ Distributions for Given Mean and Variance

Equality in the Cauchy–Schwarz inequality gives a condition under which the upper bound is attained. This is the case if an affine linear relation between F - 1 ( U ) respectively X and φ ( 1 - U ) - φ ( U ) respectively φ ( F ¯ ( x ) ) - φ ( F ( X ) ) exists with probability 1. Since the quantile function is monotonically increasing, such an affine linear function can only exist if φ ( 1 - u ) - φ ( u ) is monotonic as well (de- or increasing). This implies that φ needs to be a concave function on [ 0 , 1 ] . In order to derive a maximum C P E φ distribution under the restriction that mean and variance are given, one may only consider concave generating functions φ.
We summarize this obvious but important result in the following Theorem:
Theorem 3. 
Let φ be a non-negative and differentiable function on the domain [ 0 , 1 ] with derivative φ and φ ( 0 ) = φ ( 1 ) = 0 . Then F is the maximum C P E φ distribution with prespecified mean μ and variance σ 2 of X F iff a constant b R exists such that
P F - 1 ( U ) - μ = σ E ( φ ( 1 - U ) - φ ( U ) ) 2 ( φ ( 1 - U ) - φ ( U ) ) = 1 .
Proof. 
The upper bound of the Cauchy–Schwarz inequality will be attained if there are constants a , b R such that the first restriction equals
P F - 1 ( U ) = a + b ( φ ( 1 - U ) - φ ( U ) ) = 1 .
The property φ ( 0 ) = φ ( 1 ) = 0 leads to E φ ( 1 - U ) - φ ( U ) = 0 such that
μ = 0 1 F - 1 ( u ) d u = a + b 0 1 ( φ ( 1 - u ) - φ ( u ) ) d u = a .
This means that there is a constant b R with
P F - 1 ( U ) - μ = b ( φ ( 1 - U ) - φ ( U ) ) = 1 .
The second restriction postulates that
σ 2 = 0 1 ( F - 1 ( u ) - μ ) 2 d u = b 2 E ( φ ( 1 - U ) - φ ( U ) ) 2 .
φ is concave on [ 0 , 1 ] with
- φ ( 1 - u ) - φ ( u ) 0 , u [ 0 , 1 ] .
Therefore, φ ( 1 - u ) - φ ( u ) is monotonically increasing. The quantile function is also monotonically increasing such that b has to be positive. This gives
b = σ E ( φ ( 1 - U ) - φ ( U ) ) 2 .
 ☐
The quantile function of the Tukey’s λ distribution is given by
Q ( u , λ ) = 1 λ ( u λ - ( 1 - u ) λ ) , u [ 0 , 1 ] , λ 0 .
Its mean and variance are
μ = 0 and σ 2 = 2 λ 2 1 2 λ + 1 - B ( λ + 1 , λ + 1 ) .
The domain is given by [ - 1 / λ , 1 / λ ] for λ > 0 .
By discussing the paired cumulative α-entropy, one can prove the new result that the Tukey’s λ distribution is the maximum C P E α distribution for prespecified mean and variance. Tukey’s λ distribution takes on the role of the Student-t distribution if one changes from the differential entropy to C P E α (cf. [61]).
Corollary 5. 
The cdf F maximizes C P E α for α > 1 / 2 under the restrictions of a given mean μ and given variance σ 2 iff F is the cdf of the Tukey λ distribution with λ = α - 1 .
Proof. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , 1 ] , we have
0 1 ( φ ( 1 - u ) - φ ( u ) ) 2 d u = α 1 - α 2 0 1 ( ( 1 - u ) α - 1 - u α - 1 ) 2 d u = 2 α 1 - α 2 1 2 α - 1 - B ( α , α )
for α > 1 / 2 . As a consequence, the constant b is given by
b = 1 2 σ 1 - α α 1 2 α - 1 - B ( α , α ) - 1 / 2 ,
and the maximum C P E α distribution results in
F - 1 ( u ) - μ = σ 2 1 - α α 1 2 α - 1 - B ( α , α ) - 1 / 2 α 1 - α ( 1 - u ) α - 1 - u α - 1 = σ | α - 1 | 2 1 2 α - 1 - B ( α , α ) - 1 / 2 u α - 1 - ( 1 - u ) α - 1 α - 1 .
F - 1 can easily be identified as the quantile function of a Tukey’s λ distribution with λ = α - 1 and α > 1 / 2 . ☐
For the Gini case ( α = 2 ), one obtains the quantile function of a uniform distribution
F - 1 ( u ) = μ + σ 1 2 6 2 u - 1 = μ + σ 3 ( 2 u - 1 ) , u [ 0 , 1 ] ,
with domain [ μ - 3 σ , μ + 3 σ ] . This maximum C P E G distribution corresponds essentially to the distribution derived by Dai et al. [40].
The fact that the logistic distribution is the maximum C P E S distribution, provided mean and variance are given, was derived by Chen et al. [51] in the framework of uncertainty theory and by ([50], p. 4) in the framework of reliability theory. Both proved this result using Euler–Lagrange equations. In the interest of completeness, we provide an alternative proof via the upper bound of the Cauchy–Schwarz inequality.
Corollary 6. 
The cdf F maximizes C P E S under the restrictions of a known mean μ and a known variance σ 2 iff F is the cdf of a logistic distribution.
Proof. 
Since
0 1 ln 1 - u u 2 d u = π 2 3 ,
one receives
F - 1 ( u ) - μ = σ π / 3 ln 1 - u u , u [ 0 , 1 ] .
Inverting gives the distribution function of the logistic distribution with mean μ and variance 1:
F ( x ) = 1 1 + exp - π 3 x - μ σ , x R .
 ☐
As a last example we consider the cumulative paired Leik entropy C P E L .
Corollary 7. 
The cdf F maximizes C P E L under restrictions of a known mean μ and a known variance σ 2 iff for F holds
F ( x ) = 0 f o r   x < μ - σ 1 / 2 f o r   μ - σ x < μ + σ 1 f o r   x μ + σ .
Proof. 
From φ ( u ) = min { u , 1 - u } and φ ( u ) = sign ( 1 / 2 - u ) , u [ 0 , 1 ] , follows that
F - 1 ( u ) - μ = σ sign ( u - 1 / 2 ) , u [ 0 , 1 ] .
 ☐
Therefore, the maximization of C P E L with given mean and variance leads to a distribution whose variance is maximal on the interval [ μ - σ , μ + σ ] .

5.2. Maximum C P E φ Distributions for General Moment Restrictions

Drissi et al. [50] discuss general moment restrictions of the form
- c i ( x ) f ( x ) d x = 0 1 c i ( F - 1 ( U ) ) d u = k i , i = 1 , 2 , , k ,
for which the existence of the moments is assumed. By using Euler–Lagrange equations they show that
F ¯ ( x ) = 1 1 + exp ( i = 1 r λ i c i ( x ) ) , x R ,
maximizes the residual cumulative entropy - R F ¯ ( x ) ln F ¯ ( x ) d x under constraints Equation (31). Moreover, they demonstrated that the solution needs to be symmetric with respect to μ. Here, λ i , i = 1 , 2 , . . . , k , are the Lagrange parameters which are determined by the moment restrictions, provided a solution exists. Rao et al. [47] shows that for distributions with support R + the ME distribution is given by
F ¯ ( x ) = exp - i = 1 r λ i c i ( x ) , x > 0
if the restrictions Equation (31) are again required.
One can easily examine the shape of a distribution which maximizes the cumulative paired φ-entropy under the constraints Equation (31). This maximum C P E φ distribution can no longer be derived by the upper bound of the Cauchy–Schwarz inequality if i > 2 . One has to solve the Euler–Lagrange equations for the objective function
0 1 ( φ ( u ) - φ ( 1 - u ) ) F - 1 ( u ) d u - i = 1 k λ i ( c i ( F - 1 ( u ) ) - k i )
with Lagrange parameters λ i , i = 1 , 2 , , k . The Euler–Lagrange equations lead to the optimization problem
i = 1 k λ i c i ( F - 1 ( u ) ) = φ ( 1 - u ) - φ ( u ) , u [ 0 , 1 ] ,
for i = 1 , 2 , . . . , k . Once again there is a close relation between the derivative of the generating function and the quantile function, provided a solution of the optimization problem Equation (32) exists.
The following example shows that the optimization problem Equation (32) leads to a well-known distribution if constraints are chosen carefully in case of a Shannon-type entropy.
Example 1. 
The power logistic distribution is defined by the distribution function
F ( x ) = 1 1 + exp - λ sign ( x ) x γ , x R ,
for γ > 0 . The corresponding quantile function is
F - 1 ( u ) = 1 λ 1 / γ sign ( u - 1 / 2 ) ln 1 - u u 1 / γ , u [ 0 , 1 ] .
This quantile function is also solution of Equation (33) given φ ( u ) = - u ln u , u [ 0 , 1 ] , under the constraint E | X | γ + 1 = c . The maximum of the cumulative paired Shannon entropy under the constraint E | X | γ + 1 = c is given by
C P E S ( X ) = 0 1 ln 1 - u u 1 λ 1 / γ sign ( u - 1 / 2 ) · ln 1 - u u 1 / γ d u = 1 λ 1 / γ 0 1 ln 1 - u u ( γ + 1 ) / γ d u = λ E ( | X | γ + 1 ) .
Setting γ = 1 leads to the familiar result for the upper bound of C P E S given the variance.

5.3. Generalized Principle of Maximum Entropy

Kesavan et al. [19] introduced the generalized principle of an ME problem which describes the interplay of entropy, constraints, and distributions. A variation of this principle is the aim of finding an entropy that is maximized by a given distribution and some moment restrictions.
This problem can easily be solved for C P E φ if mean and variance are given, due to the linear relationship between φ ( 1 - u ) - φ ( u ) and the quantile function F - 1 ( u ) of the maximum C P E φ distribution provided by the Cauchy–Schwarz inequality. However, it is a precondition for F - 1 ( u ) that φ ( 1 - u ) - φ ( u ) is strictly monotonic on [ 0 , 1 ] in order to be a quantile function. Therefore, the concavity of φ ( u ) and the condition φ ( 0 ) = φ ( 1 ) = 0 are of key importance.
We demonstrate the solution to the generalized principle of the maximum entropy problem for the Gaussian and the Student-t distribution.
Proposition 3. 
Let φ, Φ and Φ - 1 be the density, the cdf and the quantile function of a standard Gaussian random variable. The Gaussian distribution is the maximum C P E φ distribution for a given mean μ and variance σ 2 for C P E φ with entropy generating function
φ ( u ) = φ ( Φ - 1 ( u ) ) , u [ 0 , 1 ] .
Proof. 
With
φ ( u ) = φ ( Φ - 1 ( u ) ) φ ( Φ - 1 ( u ) ) = - Φ - 1 ( u ) , u [ 0 , 1 ] ,
the condition for the maximum C P E φ distribution with mean μ and variance σ 2 becomes
F - 1 ( u ) - μ = σ 0 1 ( 2 Φ - 1 ( u ) ) 2 d u 2 Φ - 1 ( u ) , u [ 0 , 1 ] .
By substituting 0 1 ( 2 Φ - 1 ( u ) ) 2 d u = 4 , it follows that
F - 1 ( u ) - μ = σ Φ - 1 ( u ) , u [ 0 , 1 ] ,
such that F - 1 is the quantile function of a Gaussian distribution with mean μ and variance σ 2 . ☐
An analogue result holds for the Student-t distribution with k degrees of freedom. In this case, the main difference to the Gaussian distribution is the fact that the entropy generating function possesses no closed form but is obtained by numerical integration of the quantile function.
Corollary 8. 
Let t k respectively t k - 1 be the cdf respectively the quantile function of a Student-t distribution with k degrees of freedom for k > 2 . μ + k k - 2 t k - 1 is the maximum C P E φ quantile function for a given mean μ and variance σ 2 iff
φ ( u ) = k - 2 k 0 u t k - 1 ( p ) d p , u [ 0 , 1 ] .
Proof. 
Starting with
φ ( u ) = - k - 2 k t k - 1 ( u ) , u [ 0 , 1 ] ,
and the symmetry of the t k distribution around μ, we get the condition
F - 1 ( u ) - μ = σ 0 1 ( 2 t k - 1 ( u ) ) 2 d u 2 k - 2 k t k - 1 ( u ) , u [ 0 , 1 ] .
With 0 1 ( t k - 1 ( u ) ) 2 d u = k / ( k - 2 ) we get the quantile function of the t distribution with k degrees of freedom and mean μ:
F - 1 ( u ) - μ = σ k - 2 k t k - 1 = t k - 1 ( u ) , u [ 0 , 1 ] .
 ☐
Figure 2 shows the shape of the entropy generating function φ for several distributions generated by the generalized ME principle.

6. CPE φ as a Measure of Scale

6.1. Basic Properties of C P E φ

The cumulative residual entropy ( C R E ) introduced by [46], the generalized cumulative residual entropy (GCRE) of [50], and the cumulative entropy ( C E ) discussed by [8,9], have always been interpreted as measures of information. However, all these approaches do not explain which kind of information was considered. In contrast to this interpretation as measures of information, Oja [3] proved that the differential entropy satisfies a special ordering of scale and has certain meaningful properties of measures of scale. In [4], the authors discussed the close relationship between differential entropy and variance. In the discrete case the Shannon entropy can be interpreted as a measure of diversity, which is a concept of dispersion if there is no ordering and no distance between the realizations of a random variable. In this section, we will clarifying the important role which the variance plays for the existence of C P E φ .
Therefore, we intend to provide a deeper insight in C P E φ as a proper MOS. We start by showing that C P E φ has typical properties of an MOS. In detail, a proper MOS should always be non-negative and attain its minimal value 0 for a degenerated distribution. If a finite interval [ a , b ] is considered as support, an MOS should attain its maximum if a and b occur with probability 1 / 2 . C P E φ possesses all these properties as shown in the next proposition.
Proposition 4. 
Let φ : [ 0 , 1 ] R with φ ( u ) > 0 for u ( 0 , 1 ) and φ ( 0 ) = φ ( 1 ) = 0 . Let X be a random variable with support D and C P E φ is assumed to exist. Then the following properties hold:
1. 
C P E φ ( X ) 0 .
2. 
C P E φ ( X ) = 0 iff there exists an x * with P ( X = x * ) = 1 .
3. 
C P E φ ( X ) attains its maximum iff there exist a , b with - < a < b < such that P ( X = a ) = P ( X = b ) = 1 / 2 .
Proof. 
  • Follows from the non-negativity of φ.
  • If there is an x * R with P ( X = x * ) = 1 , then F X ( x ) = 0 and F ¯ X ( x ) { 0 , 1 } for all x R . Due to φ ( 0 ) = φ ( 1 ) = 0 follows φ ( F X ( x ) ) = φ ( F ¯ X ( x ) ) = 0 for all x R .
    Set C P E φ ( X ) = 0 . Due to the non-negativity of the integrand of C P E φ , φ ( F X ( x ) ) + φ ( F ¯ X ( x ) ) = 0 must hold for x R . Since φ ( u ) > 0 , 0 < u < 1 , it follows F X ( x ) , F ¯ X ( x ) { 0 , 1 } for x [ 0 , 1 ] .
  • Let C P E φ ( X ) have a finite maximum. Since φ ( u ) + φ ( 1 - u ) has a unique maximum at u = 1 / 2 , the maximum of C P E φ ( X ) is
    2 D φ ( 1 / 2 ) d u = 2 φ ( 1 / 2 ) D d u .
    In order to attain the assumed finite maximum, the support D has to be a finite interval [ a , b ] . Here, 2 φ ( 1 / 2 ) ( b - a ) is the maximum. Now, it is sufficient to construct a distribution with support [ a , b ] that attains this maximum. Set
    F ( x ) = 0 for   x < a 1 / 2 for   a x b 1 for   x b ,
    then C P E φ ( F ) = a b φ ( F ( x ) ) + φ ( F ¯ ( x ) ) d x = 2 φ ( 1 / 2 ) ( b - a ) . Therefore, F is C P E φ -maximal.
    To prove the other direction of statement 3 we consider an arbitrary distribution G with survival function G ¯ and support [ a , b ] . Due to φ ( 0 ) = φ ( 1 ) = 0 and φ ( u ) + φ ( 1 - u ) 2 φ ( 1 / 2 ) , it holds that
    C P E φ ( G ) = a b φ ( G ( x ) ) + φ ( G ¯ ( x ) ) d x 2 φ ( 1 / 2 ) ( b - a ) = C P E φ ( F ) .
     ☐

6.2. C P E φ and Oja’s Axioms for Measures of Scale

Oja ([3] p. 159) defined a MOS as follows:
Definition 2. 
Let F be a set of continuous distribution functions and ⪯ an appropriate ordering of scale on F . T : F R is called MOS, if
1. 
T ( a X + b ) = | a | T ( X ) for all a , b R , F F .
2. 
T ( X 1 ) T ( X 2 ) , for X 1 F 1 , X 2 F 2 , F 1 , F 2 F with F 1 F 2 .
Oja [3] discussed several orderings of scale. He showed in particular that Shannon entropy and variance satisfy a partial quantile based ordering of scale, which has been discussed by [62]. Referring to [63,64] criticized that this ordering and the location-scale family of distributions focused by Oja [3] were too restrictive. He discussed a more general nonparametric model of dispersion based on a more general ordering of scale (cf. [65,66]). In line with [4], we focus on the scale ordering proposed by [62].
Definition 3. 
Let F 1 , F 2 be continuous cdfs with respective quantile functions F 1 - 1 and F 2 - 1 . F 2 is said to be more spread out than F 1 ( F 1 1 F 2 ) if
F 2 - 1 ( u ) - F 2 - 1 ( v ) F 1 - 1 ( u ) - F 1 - 1 ( v ) f o r   a l l 0 < u < v < 1 .
If F 1 , F 2 are absolutely continuous with density functions f 1 , f 2 , 1 can be characterized equivalently by the property that F 2 - 1 F 1 - 1 ( x ) - x is monotonically non-decreasing or
f 1 ( F 1 - 1 ( u ) ) f 2 ( F 2 - 1 ( u ) ) , u [ 0 , 1 ]
(cf. [3], p. 160).
Next, we show that C P E φ is an MOS in the sense of [3]. This following lemma examines the behavior of C P E φ with respect to affine linear transformations, referring to the first axiom of Definition 2:
Lemma 2. 
Let F be the cdf of the random variable X. Then
C P E φ ( a X + b ) = | a | C P E φ ( X ) .
Proof. 
For Y = a X + b , it follows that
- φ ( P ( Y y ) ) d y = - P X y - b a d y for   a > 0 - P X > y - b a d y for   a < 0 .
Substitution of x = ( y - b ) / a with d y = a d x gives
- φ ( P ( Y y ) ) d y = a - P X x d x for   a > 0 - a - P X > x d x for   a < 0 .
Likewise, we have that
- φ ( P ( Y > y ) ) d y = a - P X > x d x for   a > 0 - a - P X x d x for   a < 0 ,
such that
C P E φ ( a X + b ) = | a | C P E φ ( X ) .
 ☐
In order to satisfy the second axiom of Oja’s definition of a measure of scale, C P E φ has to satisfy the ordering of scale ⪯. This is shown by the following lemma:
Lemma 3. 
Let F 1 and F 2 be continuous cdfs of the random variables X 1 and X 2 with F 1 1 F 2 . Then the following holds:
C P E φ ( X 1 ) C P E φ ( X 2 ) .
Proof. 
One can show with u = F i ( x ) that
C P E φ ( F i ) = 0 1 φ ( u ) 1 f i ( F i - 1 ( u ) ) d u + 0 1 φ ( 1 - u ) 1 f i ( F i - 1 ( u ) ) d u
for i = 1 , 2 . Therefore,
C P E φ ( F 1 ) - C P E φ ( F 2 ) = 0 1 φ ( u ) 1 f 1 ( F 1 - 1 ( u ) ) - 1 f 2 ( F 2 - 1 ( u ) ) d u + 0 1 φ ( 1 - u ) 1 f 1 ( F 1 - 1 ( u ) ) - 1 f 2 ( F 2 - 1 ( u ) ) d u .
If F 1 1 F 2 and hence f 1 F 1 - 1 ( u ) f 2 F 2 - 1 ( u ) for u [ 0 , 1 ] , it follows that C P E φ ( F 1 ) - C P E φ ( F 2 ) 0 . ☐
As a consequence of Lemma 2 and Lemma 3, C P E φ is an MOS in the sense of [3]. Thus, not only variance, differential entropy, and other statistical measures have the properties of measures of scale, but also C P E φ .

6.3. C P E φ and Transformations

Ebrahimi et al. ([4] p. 323), the authors considered cdf F 1 , F 2 on domain D 1 , D 2 and density functions f 1 , f 2 , which are connected via F 2 ( x ) = F 1 g - 1 ( x ) , x D 1 , via a differentiable transformation g : D 1 D 2 , that is F 2 ( y ) = F 1 g ( y ) respectively f 2 ( y ) = f 1 g - 1 ( y ) d g - 1 ( y ) / d y for y D 1 . Thus, they demonstrated for Shannon’s differential entropy H that the transformation only affects the difference:
H ( f 2 ) = H ( f 1 ) - D 2 ln d g - 1 ( y ) d y f 2 ( y ) d y .
For C P E φ , one gets a less explicit relationship between C P E φ ( F 2 ) and C P E φ ( F 1 ) :
C P E φ ( F 2 ) = D 1 φ ( F 1 ( y ) ) + φ ( F ¯ 1 ( y ) ) d g - 1 ( y ) d y .
Transformations with | g ( y ) | 1 , y D 2 , are of special interest since these transformations do not diminish measures of scale. In Theorem 1, Ebrahimi et al. [4] showed that F 1 1 F 2 holds if | g ( y ) | 1 for y D 2 . Hence, no MOS can be diminished by this specific transformation, especially neither Shannon entropy nor C P E φ .
Ebrahimi et al. [4] considered the special transformation g ( x ) = a x + b , x D 1 . They showed that Shannon’s differential entropy is moved additively by this transformation, which is not expected for an MOS. Furthermore, the standard deviation is changed by the factor | a | , which is also true for C P E φ as shown in Lemma 2.

6.4. C P E φ for Sums of Independent Random Variables

As is generally known, variance and differential entropy behave additively for the sum of independent random variables X and Y. More general entropies such as the Rényi or the Havrda & Charvát entropy are only subadditive (cf. [18], p. 194).
Neither the property of additivity nor the property of subadditivity could be shown for cumulative paired φ-entropies. Instead, they possess the maximum property if φ is a concave function on [ 0 , 1 ] . This means that, for two independent variables X and Y, C P E φ ( X + Y ) is lower-bounded by the maximum of the two individual entropies C P E φ ( X ) and C P E φ ( Y ) . This result was shown by [46] for the cumulative residual Shannon entropy. The following Theorem generalizes this result, while the proof partially follows Theorem 2 of [46].
Theorem 4. 
Let X and Y be independent random variables and φ a concave function on the interval [ 0 , 1 ] with φ ( 0 ) = φ ( 1 ) = 0 . Then we have
C P E φ ( X + Y ) max C P E φ ( X ) , C P E φ ( Y ) .
Proof. 
Let X and Y be independent random variables with distribution functions F X , F Y and densities f X , f Y . Using the convolution formula, we immediately get
P ( X + Y t ) = - F X ( t - y ) f Y ( y ) d y = E Y [ F X ( t - Y ) ] , t R .
Applying Jensen’s inequality for a concave function φ to Equation (37) results in
E Y φ ( F X ( t - Y ) ) φ E Y F X ( t - Y )
and
E Y φ ( F ¯ X ( t - Y ) ) φ E Y F ¯ X ( t - Y ) .
The existence of the expectation is assumed. To prove the Theorem, we begin with
C P E φ [ X + Y ] = - φ E Y F X ( t - Y ) + φ E Y F ¯ X ( t - Y ) d t .
By using Equations (38) and (39), setting z = t - y , and exchanging the order of integration, one yields
C P E φ [ X + Y ] - - φ F X ( t - y ) + φ F ¯ X ( t - y ) d t f Y ( y ) d y = - - φ F X ( z ) + φ F ¯ X ( z ) d z f Y ( y ) d y = - φ F X ( z ) + φ F ¯ X ( z ) d z = C P E φ [ X ] .
 ☐
In the context of uncertainty theory, Liu [6] considered a different definition of independence for uncertain variables leading to the simpler additivity property
C P E φ ( X + Y ) = C P E φ ( X ) + C P E φ ( Y )
for independent uncertain variables X and Y.

7. Estimation of CPE φ

Beirlant et al. [67] presented an overview of differential entropy estimators. Essentially, all proposals are based on the estimation of a density function f inheriting all typical problems of nonparametric estimation of a density function. Among others, the problems are biasedness, choice of a kernel, and optimal choice of the smoothing parameter (cf. [68], p. 215ff.). However, C P E φ is based on cdf F for which several natural estimators with desirable stochastic properties, derived from the Theorem of Glivenko and Cantelli (cf. [69], p. 61), exist. For a simple random sample ( X 1 , . . . , X n ) , independently distributed random variables with identical distribution function F, the authors of [8,9] estimated F using the empirical distribution function F n ( x ) = 1 n I ( X i x ) for x R . Moreover, they showed for the cumulative entropy C E ( F ) = - R F ( x ) ln F ( x ) d x that the estimator C E ( F n ) is consistent for C E ( F ) (cf. [8]). In particular, for F being the distribution function of a uniform distribution, they provided the expected value of the estimator and demonstrated that the estimator is asymptotically normal. For F being the cdf of an exponential distribution, they additionally derived the variance of the estimator.
In the following, we generalize the estimation approach of [8] by embedding it into the well-established theory of L-estimators (cf. [70], p. 55ff.). If φ is differentiable, then C P E φ can be represented as the covariance between the random variable X and φ ( F ¯ ( X ) ) - φ ( F ( X ) ) :
C P E φ ( F ) = E X φ ( F ¯ ( X ) ) - φ ( F ( X ) ) .
An unbiased estimator for this covariance is
C P E φ ( F n ) = 1 n i = 1 n X i ( φ ( 1 - F n ( X i ) ) - φ ( F n ( X i ) ) ) = 1 n i = 1 n X n : i ( φ ( 1 - F n ( X n : i ) ) - φ ( F n ( X n : i ) ) ) = 1 n i = 1 n φ 1 - i n + 1 - φ i n + 1 X n : i = i = 1 n c n i X n : i
where
c n i = 1 n φ 1 - i n + 1 - φ i n + 1 , i = 1 , 2 , , n .
This results in an L-estimator i = 1 n J ( i / ( n + 1 ) ) X n : i with J ( u ) = φ ( 1 - u ) - φ ( u ) , u ( 0 , 1 ) . By applying known results for the influence functions of L-estimators (cf. [70]), we get for the influence function of C P E φ :
I F ( x ; C P E φ , F ) = 0 1 u f ( F - 1 ( u ) ) ( φ ( 1 - u ) - φ ( u ) ) d u - F ( x ) 1 1 f ( F - 1 ( u ) ) ( φ ( 1 - u ) - φ ( u ) ) d u .
In particular, the derivative is
d I F ( x ; C P E φ , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) , x R .
This means that the influence function will be completely determined by the antiderivative of φ ( F ( x ) ) . The following examples demonstrate that the influence function of C P E φ can easily be calculated if the underlying distribution F is logistic. We consider the Shannon, the Gini, and the α-entropy cases.
Example 2. 
Beginning with the derivative
d I F ( x ; C P E S , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) = ln F ( x ) F ¯ ( x ) = x , x R ,
we arrive at
I F ( x , C P E S , F ) = 1 2 x 2 + C , x R .
The influence function is not bounded and proportional to the influence function of the variance, which implies that variance and C P E S have a similar asymptotic and robustness behavior. The integration constant C has to be determined such that E I F ( x ; C P E S , F ) = 0 :
C = - 1 2 E ( X 2 ) = - 1 2 π 2 3 = - π 2 6 .
Example 3. 
Using the Gini entropy C P E G and the logistic distribution function F we have
d I F ( x ; C P E G , F ) d x = φ ( F ¯ ( x ) ) - φ ( F ( x ) ) = 2 ( 2 F ( x ) - 1 ) = 2 e x - 1 e x + 1 = 2 tanh x 2 , x R .
Integration gives the influence function
I F ( x , C P E G , F ) = 4 ln cosh x 2 + C , x R .
By applying numerical integration we get C = - 1 . 2741 .
Example 4. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) the derivative of the influence function is given by
d I F ( x ; C P E α , F ) d x = φ ( F ¯ ( x ) ) - φ F ( x ) = α 1 - α 1 - e ( α - 1 ) x ( 1 + e x ) α - 1 = α 1 - α 1 ( 1 + e x ) α - 1 - 1 ( 1 + e - x ) α - 1 , x R .
Integration leads to the influence function
I F ( x ; C P E α , F ) = 2 F 1 ( α , α ; α + 1 ; - e - x ) e α x α 1 + e - x + 1 e x + 1 α + 1 α - 1 1 + e x + e ( α - 1 ) x ( e x + 1 ) α - 1 ,
where
2 F 1 ( α , α ; α + 1 ; - e - x ) = α 0 1 t α - 1 1 + t e - x - α d t + C , x R .
Under certain conditions (cf. [71], p. 143) concerning J, or φ and F, L- estimators are consistent and asymptotically normal. So, the cumulative paired φ-entropy is
C P E φ ( F n ) a s y N C P E φ ( F ) , 1 n A ( F , C P E φ )
with asymptotic variance
A ( F , C P E φ ) = V a r ( I F ( X ; C P E φ ( F ) , F ) ) = - F ( x ) 1 φ ( 1 - u ) - φ ( u ) f ( F - 1 ( u ) ) d u 2 f ( x ) d x .
The following examples consider the Shannon and the Gini case for which the condition that is sufficient to guarantee asymptotic normality can easily be checked. We consider again the cdf F of the logistic distribution.
Example 5. 
For the cumulative paired Shannon entropy it holds that
C P E S ( F n ) a s y N C P E S ( F ) , 4 45 π 4
since
A ( F , L ) = V a r ( I F ( X ; C P E φ ( F ) , F ) ) = 1 4 V a r ( X 2 ) = 1 4 E ( X 4 ) - E ( X 2 ) = 4 45 π 4 .
Example 6. 
In the Gini case we get
C P E G ( F n ) a s y N C P E G ( F ) , 2.8405
since by numerical integration
A ( F , L ) = - 4 ln cosh x 2 - 1 . 2274 2 e - x ( 1 + e - x ) 2 d x = 2.8405 .
It is known that L-estimators have a remarkable small-sample bias. Following [72], the bias can be reduced by applying the Jackknife method. It is well-known that asymptotical distributions can be used to construct approximate confidence intervals as well as that they can be applied for hypothesis tests in the one- or two-sample case. ([70], p. 116ff.) discussed asymptotic efficient L-estimators for a parameter of scale θ. Klein et al. [73] examine how the entropy generating function φ will be determined by the requirement that C P E φ ( F n ) has to be asymptotically efficient.

8. Related Concepts

Several statistical concepts are closely related to cumulative paired φ-entropies. These concepts generalize some results which are known from literature. We begin with the cumulative paired φ-divergence that was discussed for the first time by [41], who called it “generalized cross entropy”. Their focus was on uncertain variables, whereas ours is on random variables. The second concept generalizes mutual information, which is defined for Shannon’s differential entropy, to mutual φ-information. We consider two random variables X and Y. The task is to decompose C P E φ ( Y ) into two kinds of variation such that the so-called external variation measures how much of C P E φ ( Y ) can be explained by X. This procedure mimics the well-known decomposition of variance and allows to define directed measures of dependence for X and Y. The third concept deals with dependence. More precisely, we introduce a new family of correlation coefficients that measure the strength of a monotonic relationship between X and Y. Well-known coefficients like the Gini correlation can be embedded in this approach. The fourth concept treats the problem of linear regression. C P E φ can serve as general measure of dispersion that has to be minimized to estimate the regression coefficients. This approach will be identified as a special case of rank-based regression or R regression. Here, the robustness properties of the rank-based estimator can directly be derived from the entropy generating function φ . Moreover, asymptotics can be derived from theory of rank-based regression. The last concept we discuss applies C P E φ to linear rank tests for the difference of scale. Known results, especially concerning the asymptotics, can be transferred from the theory of linear rank tests to this new class of tests. In this paper, we only sketch the main results and focus on examples. For a detailed discussion including proofs we refer to a series of papers by Klein and Mangold ([73,74,75]) , which are currently work in progress.

8.1. Cumulative Paired φ-Divergence

Let φ be a concave function defined on [ 0 , ] with φ ( 0 ) = φ ( 1 ) = 0 . Additionally, we need 0 φ ( 0 / 0 ) = 0 . In the literature, φ-divergences are defined for convex functions φ (cf., e.g., [76], p. 5). Consequently, we consider - φ with φ concave.
The cumulative paired φ-divergence for two random variables is defined as follows.
Definition 4. 
Let X and Y be two random variables with cdfs F X and F Y . Then the cumulative paired φ-divergence of X and Y is given by
C P D φ ( X , Y ) = - - F Y ( x ) φ F X ( x ) F Y ( x ) + F ¯ Y ( x ) φ F ¯ X ( x ) F ¯ Y ( x ) d x .
The following examples introduce cumulative paired φ-divergences for the Shannon, the α-entropy, the Gini, and the Leik cases:
Example 7. 
1. 
Considering φ ( u ) = - u ln u , u [ 0 , ) , we obtain the cumulative paired Shannon divergence
C P D S ( X , Y ) = - F X ( x ) ln F X ( x ) F Y ( x ) + F ¯ X ( x ) ln F ¯ X ( x ) F ¯ Y ( x ) d x .
2. 
Setting φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , ) , leads to the cumulative paired α-divergence
C P D α ( X , Y ) = 1 α - 1 - F X ( x ) α F Y ( x ) 1 - α + F ¯ X ( x ) α F ¯ Y ( x ) 1 - α - 1 d x .
3. 
For α = 2 we receive as a special case the cumulative paired Gini divergence
C P D G ( X , Y ) = - F X ( x ) 2 F Y ( x ) + F ¯ X ( x ) 2 F ¯ Y ( x ) - 1 d x = - ( F X ( x ) - F Y ( x ) ) 2 F Y ( x ) F ¯ Y ( x ) d x .
4. 
The choice φ ( u ) = 1 / 2 - | u - 1 / 2 | , u [ 0 , 1 ] , leads to the cumulative paired Leik divergence
C P D L ( X , Y ) = - - F Y ( x ) 1 2 - F X ( x ) F Y ( x ) - 1 2 - F ¯ Y ( x ) 1 2 - F ¯ X ( x ) F ¯ Y ( x ) - 1 2 d x = - - 1 2 + F X ( x ) - 1 2 F Y ( x ) + 1 2 + 1 2 F Y ( x ) - F X ( x ) d x = - - 1 2 + F X ( x ) - 1 2 F Y ( x ) + F X ( x ) - 1 2 ( 1 + F Y ( x ) ) d x
C P D S is equivalent to the Anderson-Darling functional (cf. [77]) and has been used by [78] for a goodness-of-fit test, where F X represents the empirical distribution. Likewise, C P D S serves as a goodness-of-fit test (cf. [79]).
Further work in this area with similar concepts was done by [80,81], using the notation cumulative residual Kullback-Leiber (CRKL) information and cumulative Kullback-Leiber (CKL) information.
Based on work from [82,83,84,85] a general function φ α was discussed by [86]:
φ α ( u ) = ( α - 1 - α u + u α ) / ( α ( 1 - α ) ) for   α 0 , 1 - u ( ln u - 1 ) - 1 for   α = 1 ln u - u + 1 for   α = 0 .
Up to a multiplicative constant, φ α includes all of the aforementioned examples. In addition, the Hellinger distance is a special case for α = 1 / 2 that leads to the cumulative paired Hellinger divergence:
C P D H ( X , Y ) = 2 - F X ( x ) - F Y ( x ) 2 + F ¯ X ( x ) - F ¯ Y ( x ) 2 d x .
For a strictly concave function φ, Chen et al. [41] proved that C P E φ ( X , Y ) 0 and C P E φ ( X , Y ) = 0 iff X and Y have identical distributions. Thus, the cumulative paired φ-divergence can be interpreted as a kind of a distance between distribution functions. As an application, Chen et al. [41] mentioned the “minimum cross-entropy principle”. They proved that X follows a logistic distribution if C P D S is minimized, given that Y is exponentially distributed and the variance of X is fixed. If F Y is an empirical distribution and F X has an unknown vector of parameters θ, C P D φ can be minimized to attain a point estimator for θ (cf. [87]). The large class of goodness-of-fit tests based on C P D φ , discussed by Jager et al. [86], has already been mentioned.

8.2. Mutual Cumulative φ-Information

Let X and Y again be random variables with cdfs F X , F Y , density functions f X , f Y , and the conditional distribution function F Y | X . D X and D Y denote the supports of X and Y. Then we have
C P E φ ( Y | x ) = - φ F Y | X ( y | x ) d y + - φ 1 - F Y | X ( y | x ) d y ,
which is the variation of Y given X = x . Averaging with respect to x leads to the internal variation
E X ( C P E φ ( Y | X ) ) = - C P E φ ( Y | x ) f X ( x ) d x .
For a concave entropy generating function φ, this internal variation cannot be greater than the total variation C P E φ ( Y ) . More precisely, it holds:
  • E X C P E φ ( Y | X ) C P E φ ( Y ) .
  • E X C P E φ ( Y | X ) = C P E φ ( Y ) if X and Y are independent.
  • If φ is strictly concave and E X C P E φ ( Y | X ) = C P E φ ( Y ) , X and Y are independent random variables.
We consider the non-negative difference
M C P I φ ( X , Y ) : = C P E φ ( Y ) - E X ( C P E φ ( Y | X ) ) .
This expression measures the part of the variation of Y that can be explained by the variable X (= external variation) and shall be named “mutual cumulative paired φ-information” M C P I φ (cf. Rao et al. [46] using the term “cross entropy”, (p. 3) in [50]). M C P I φ is equivalent to the transinformation that is defined for Shannon’s differential entropy (cf. [60], p. 20f.). In contrast to transinformation, M C P I φ is not symmetric, so M C P I φ ( X , Y ) = M C P I φ ( Y , X ) is not true in general.
Cumulative paired mutual φ-information is the starting point for two directed measures of strength of φ-dependence between X and Y, namely “directed (measure) of cumulative paired φ-dependence”, D C P D . The first one is
D C P D φ 1 ( X Y ) = M C P I φ ( X , Y ) C P E φ ( Y )
and the second one is
D C P D φ 2 ( X Y ) = C P E φ ( Y ) 2 - E X ( C P E φ ( Y | X ) 2 ) C P E φ ( Y ) 2 .
Both expressions measure the relative decrease in variation of Y if X is known. The domain is [ 0 , 1 ] . The lower bound 0 is taken if Y and X are independent, while the upper bound 1 corresponds to E X ( C P E φ ( Y | X ) ) = 0 . In this case, from φ ( u ) > 0 for 0 < u < 1 and φ ( 0 ) = φ ( 1 ) = 0 , we can conclude that the conditional distribution F Y | X ( y | x ) has to be degenerated. Thus, for every x D X there is exactly one y * D Y with P ( Y = y * | X = x ) = 1 . Therefore, there is a perfect association between X and Y. The next example illustrates these concepts and demonstrates the advantage of considering both types of measures of dependence.
Example 8. 
Let ( X , Y ) follow a bivariate standard Gaussian distribution with E ( X ) = E ( Y ) = 0 , V a r ( X ) = V a r ( Y ) = 1 , and C o v ( X , Y ) = ρ , - 1 < ρ < 1 . Note that X and Y follow univariate standard Gaussian distributions, whereas X + Y follows a univariate Gaussian distribution with mean 0 and variance 2 ( 1 + ρ ) . Considering this, one can conclude that
F X - 1 ( u ) = F Y - 1 ( u ) = Φ - 1 ( u ) , F X + Y - 1 ( u ) = 2 ( 1 + ρ ) Φ - 1 ( u ) , u [ 0 , 1 ] .
By plugging this quantile function into the defining equation of the cumulative paired φ-entropy one yields
C P E φ ( X + Y ) = 2 ( 1 + ρ ) C P E φ ( X ) C P E φ ( X ) + C P E φ ( Y ) .
For ρ - 1 , the cumulative paired φ-entropy behaves like the variance or the standard deviation. All measures approach 0 for ρ - 1 , such that C P E φ can be used as a measure of risk since the risk can be completely eliminated in a portfolio with perfectly negative correlated returns of assets. To be more precise, it is to say that C P E φ rather behaves like the standard deviation than the variance.
For ρ = 0 , the variance of the sum equals the sum of the variances, but the standard deviation of the sum is equal to or smaller than the sum of the individual standard deviations. This is also true for C P E φ .
In case of the bivariate standard Gaussian distribution, Y | x is Gaussian as well with mean ρ x and variance 1 - ρ 2 for x R and - 1 < ρ < 1 . Therefore, the quantile function of Y | x is
F Y | x - 1 ( u ) = ρ x + 1 - ρ 2 Φ - 1 ( u ) , u [ 0 , 1 ] .
Using this quantile function, the cumulative paired φ-entropy for the conditional random variable Y | x is
C P E φ ( Y | x ) = 1 - ρ 2 0 1 ( φ ( 1 - u ) - φ ( u ) ) Φ - 1 ( u ) d u = 1 - ρ 2 C P E φ ( Y ) .
Just like the variance of Y | x , C P E φ does not depend on x in case of a bivariate Gaussian distribution. This implies that the internal variation is 1 - ρ 2 C P E φ ( Y ) , as well.
For ρ 1 , the bivariate distribution becomes degenerated and the internal variation consequently approaches 0. The mutual cumulative paired φ-information is given by
M C P I φ ( X , Y ) = C P E φ ( Y ) - E Y ( C P E φ ( Y | X ) ) = ( 1 - 1 - ρ 2 ) C P E φ ( Y ) .
M C P I φ takes the value 0 if and only if ρ 2 = 0 , in which case X and Y are independent.
The two measures of directed cumulative φ-dependence for this example are
D C P D φ 1 ( X Y ) = M C P I φ ( X , Y ) C P E φ ( Y ) = 1 - 1 - ρ 2
and
D C P D φ 2 ( X Y ) = C P E φ ( Y ) 2 - E X ( C P E φ ( Y | X ) 2 ) C P E φ ( Y ) 2 = ρ 2 .
ρ completely determines the values for both measures of directed dependence. Provided the upper bound 1 will be attained, there is a perfect linear relation between Y and X.
As a second example we consider the dependence structure of the Farlie-Gumbel-Morgenstern copula (FGM copula). For the sake of brevity, we define a copula C as bivariate distribution function with uniform marginals for two random variables U and V with support [ 0 , 1 ] . For details concerning copulas see, e.g., [88].
Example 9. 
Let
C U , V ( u , v ) = u v + θ u ( 1 - u ) v ( 1 - v ) , u , v [ 0 , 1 ] , θ [ - 1 , 1 ] ,
be the FGM copula (cf. [88], p. 68). With
C U | V ( u | v ) = C ( u , v ) v = u + θ u ( 1 - u ) ( 1 - 2 v )
it holds for the conditional cumulative φ-entropy of U given V = v that
C P E φ ( C U | V ) = 0 1 φ ( 1 - u - θ u ( 1 - u ) ( 1 - 2 v ) ) + φ ( u + θ u ( 1 - u ) ( 1 - 2 v ) ) d u .
To get expressions in closed form we consider the Gini case with φ ( u ) = u ( 1 - u ) , u [ 0 , 1 ] . After some simple calculations we have
C P E G ( C U | V ) = 1 3 - θ 2 15 ( 1 - 2 v ) 2 , v [ 0 , 1 ] .
Averaging over the uniform distribution of V leads to the internal variation
E ( C P E G ( C U | V ) ) = 1 3 - θ 2 45 .
With C P E G ( U ) = 1 / 3 , the mutual cumulative Gini information and the directed cumulative measure of Gini dependence are
M C I G ( V U ) = θ 2 45 a n d D C P D G 1 ( V U ) = θ 2 15 .
It is well-known that only a small range of dependence can be covered by the FGM copula (cf. [88], p. 129).
Hall et al. [89] discussed several methods for estimating a conditional distribution. The results can be used for estimating the mutual φ-information and the two directed measures of dependence. This will be the task of future research.

8.3. φ-Correlation

Schechtman et al. [90] introduced Gini correlations of two random variables X and Y with distribution functions F X and F Y as
Γ G ( X , Y ) = C o v ( X , F Y ( Y ) ) C o v ( X , F X ( X ) ) and Γ G ( Y , X ) = C o v ( Y , F X ( X ) ) C o v ( Y , F Y ( Y ) ) .
The numerator equals 1 / 4 of the Gini mean difference
Δ X = E X 1 E X 2 [ | X 1 - X 2 | ] ,
where the expectation is calculated for two independent and with F X identically distributed random variables X 1 and X 2 .
Gini’s mean difference coincides with the cumulative paired Gini entropy C P E G ( X ) in the following way:
C o v ( X , F X ( X ) ) = 4 C P E G ( X ) = 4 - X ( φ ( F ¯ X ( X ) ) - φ ( F X ( X ) ) ) d x .
Therefore, in the same way that Gini’s mean difference can be generalized to the Gini correlation, C P E φ can be generalized to the φ-correlation.
Let X , Y be two random variables and let C P E φ ( X ) , C P E φ ( Y ) be the corresponding cumulative paired φ-entropies, then
Γ φ ( X , Y ) = E ( X ( φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) ) C P E φ ( X )
and
Γ φ ( Y , X ) = E ( Y ( φ ( F ¯ X ( X ) ) - φ ( F X ( X ) ) ) ) C P E φ ( Y )
are called φ-correlations of X and Y. Since E ( φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) = 0 , the numerator is the covariance between X and φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) .
The first example verifies that the Gini correlation is a proper special case of the φ-correlation.
Example 10. 
The setting φ ( u ) = u ( 1 - u ) , u [ 0 , 1 ] , leads to the Gini correlation, because
E ( X ( φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) ) = 2 E ( X ( 2 F Y ( Y ) - 1 ) ) = 4 E ( X ( F Y ( Y ) - 1 / 2 ) ) = 4 E ( ( X - E ( X ) ) ( F Y ( Y ) - 1 / 2 ) ) = 4 C o v ( X , F Y ( Y ) )
and
E ( X ( φ ( F ¯ X ( X ) ) - φ ( F X ( X ) ) ) ) = 4 C o v ( X , F X ( X ) ) .
The second example considers the new Shannon correlation.
Example 11. 
Set φ ( u ) = - u ln u , u [ 0 , 1 ] , then we get the Shannon correlation
Γ S ( X , Y ) = E ( X ln ( F Y ( Y ) / ( 1 - F Y ( Y ) ) ) ) C P E S ( X ) .
If Y follows a logistic distribution with F Y ( y ) = 1 / ( 1 + e - y ) , y R , then ln ( F Y ( y ) / F ¯ Y ( y ) ) = y . Considering this, we get
Γ S ( X , Y ) = E ( X Y ) C P E S ( X ) .
From Equation (30) we know that C P E S ( X ) = π / 3 if X is logistically distributed. In this specific case we get
Γ S ( X , Y ) = 3 E ( X Y ) π .
In the following example we introduce the α-correlation.
Example 12. 
For φ ( u ) = u ( u α - 1 - 1 ) / ( 1 - α ) , u [ 0 , 1 ] , we get the α-correlation
Γ α ( X , Y ) = E ( X α 1 - α ( F Y ( Y ) α - 1 - F ¯ Y ( Y ) α - 1 ) ) C P E α ( X ) .
For F Y ( y ) = 1 / ( 1 + e - y ) , y R , we get
Γ S ( X , Y ) = α ( 1 - α ) C P E S ( X ) E X 1 1 + e - Y α - 1 - 1 1 + e Y α - 1 .
The authors of [90,91,92] proved that Gini correlations possess many desirable properties. In the following we give an overview of all properties which can be transferred to φ-correlations. For proofs and further details we refer to [75].
We start with the fact that φ-correlations also have a copula representation since for the covariance holds
C o v ( X , F Y ( Y ) ) = - 0 1 0 1 ( C ( u , v ) - u v ) 1 f ( F X - 1 ( u ) ) ( φ ( 1 - v ) + φ ( v ) ) d u d v .
The following examples demonstrate the copula representation for the Gini and the Shannon correlation.
Example 13. 
In the Gini case it is φ ( u ) + φ ( 1 - u ) = - 4 . This leads to
C o v ( X , F Y ( Y ) ) = 4 0 1 0 1 ( C X , Y ( u , v ) - u v ) 1 f X ( F X - 1 ( u ) ) d u d v .
Example 14. 
In the Shannon case, φ ( u ) + φ ( 1 - u ) = - 1 / ( u ( 1 - u ) ) such that
C o v X , ln F Y ( Y ) F ¯ Y ( Y ) = 0 1 0 1 C X , Y ( u , v ) - u v u ( 1 - u ) 1 f X ( F X - 1 ( u ) ) d u d v .
The following basic properties of φ-correlations can easily be checked with the arguments applied by [90]:
  • Γ φ ( X , Y ) [ - 1 , 1 ] .
  • Γ φ ( X , Y ) = 1 ( - 1 ) if there is a strictly increasing (decreasing) transformation g such that X = g ( Y ) .
  • If g is monotonic, then Γ φ ( X , Y ) = Γ φ ( X , g ( Y ) ) .
  • If g is affin-linear, then Γ φ ( X , Y ) = Γ φ ( g ( X ) , Y ) .
  • If X and Y are independent, then Γ X , Y = Γ ( Y , X ) = 0 .
  • If a + b X and c + d Y are exchangeable for some constants a , b , c , d R with b , d > 0 , then Γ φ ( X , Y ) = Γ φ ( Y , X ) .
In the last subsection we have seen that two directed measures of φ-dependence do not rely on φ if a bivariate Gaussian distribution is considered. The same holds for φ-correlations as will be demonstrated in the following example.
Example 15. 
Let ( X , Y ) be a bivariate standard Gaussian random variable with Pearson correlation coefficient ρ. Thus, all φ-correlations coincide with ρ as the following consideration shows:
With E ( X | y ) = ρ y it is
C o v ( X , φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) = E Y E X | Y ( X | Y ) ( φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) = ρ E Y ( Y ( φ ( F ¯ Y ( Y ) ) - φ ( F Y ( Y ) ) ) ) = ρ C P E φ ( Y ) = ρ C P E φ ( X ) .
Dividing this by C P E φ ( X ) yields the result.
Weighted sums of random variables appear for example in portfolio optimization. The diversification effect concerns negative correlations between the returns of assets. Thus, the risk of a portfolio can be significantly smaller than the sum of the individual risks. Now, we analyze whether cumulative paired φ-entropies can serve as a risk measure as well. Therefore, we have to examine the diversification effect for C P E φ .
First, we display the total risk C P E φ ( Y ) as a weighted sum of individual risks. Essentially, the weights need to be the φ-correlations of the individual returns with the portfolio return: Let Y = i = 1 k a i X i , then it holds that
C P E φ ( Y ) = i = 1 k a i Γ φ ( X i , Y ) C P E φ ( X i ) .
For the diversification effect the total risk C P E φ ( Y ) has to be displayed as a function of the φ-correlations between X i and X j , i , j = 1 , 2 , , k . A similar result was provided by [92] for the Gini correlation without proof. Let Y = i = 1 k a i X i and set D i y = Γ φ ( X i , Y ) - Γ φ ( Y , X i ) , i = 1 , 2 , , k , then the following decomposition of the square of C P E φ ( Y ) holds:
C P E φ ( Y ) 2 - C P E φ ( Y ) i = 1 k a i D i y C P E φ ( X i ) = i = 1 k a i 2 C P E φ ( X i ) 2 + i = 1 k j i k a i a j C P E φ ( X i ) C P E φ ( X j ) Γ φ ( X i , X j ) .
This is similar to the representation for the variance of Y, where Γ φ ( X i , X j ) takes the role of the Pearson correlation and C P E φ ( X i ) the role of the standard deviation for i , j = 1 , 2 , , k .
Schechtman et al. [90] also introduced an estimator for the Gini correlation and derived its asymptotic distribution. For the proof it is useful to note that the numerator of the Gini correlation can be represented as a U-statistic. For the general case of the φ-correlation it is necessary to derive the influence function and to calculate its variance. This will be done in [75].

8.4. φ-Regression

Based on the Gini correlation Olkin et al. [93] considered the traditional ordinary least squares (OLS) approach in regression analysis
Y i = α + x i β + ε i , i = 1 , 2 , , n ,
where Y is the dependent variable and x is the independent variable. They modified it by minimizing the covariance between the error term ε in a linear regression model and the ranks of ε with respect to α and β. Ranks are the sample analogue of the theoretical distribution function F ε , such that the Gini mean difference C o v ( ε , F ε ) is the center of this new approach for regression analysis. Olkin et al. [93] noticed that this approach is already known as “rank based regression” or short “R regression” in robust statistics. In robust regression analysis the more general optimization criteria C o v ( ε , φ ( F ε ) ) has been considered, where φ denotes a strictly increasing score function (cf. [94], p. 233). The choice φ ( u ) = 1 - 2 u leads to the Gini mean difference, which is the scores generating function of the Wilcoxon scores. The rank based regression approach with general scores generating function φ ( u ) = φ ( 1 - u ) - φ ( u ) , u [ 0 , 1 ] , is equivalent to the generalization of the Gini regression to a so-called φ-regression based on the criteria function
C P E φ ( ε ) = C o v ( ε , φ ( 1 - F ε ( ε ) ) - φ ( F ε ) ) ,
which has to be minimized to obtain α and β. Therefore, cumulative paired φ-entropies are special cases of the dispersion function that [95,96] proposed as optimization criteria for R regression. More precisely, R estimation proceeds in two steps. In the first step
d φ ( β ) = C P E φ ( y - X β )
has to be minimized with respect to β. Let β ^ φ denote this estimator. In the second step α will be estimated separately by
α ^ φ = med i ( y i - x i β ^ φ ) .
The authors of [97,98] gave an overview of recent developments in rank based regression. We will apply their main results to φ-regression. In [99], the authors showed that the following property holds for the influence function of β ^ φ :
I F ( x 0 , y 0 ; β ^ φ , F Y , X ) = τ φ ( ( X X ) / n ) - 1 φ ( F ¯ ε ( y 0 ) ) - φ ( F ε ( y 0 ) ) x 0 ,
where ( x 0 , y 0 ) represents an outlier. φ determines the influence of an outlier in the dependent variable on the estimator β ^ φ .
The scale parameter τ φ is given by
τ φ = - 0 1 ( φ ( 1 - u ) - φ ( u ) ) f ε ( F ε - 1 ( u ) ) f ε ( F ε - 1 ( u ) ) d u - 1 .
The influence function shows that β ^ φ is asymptotically normal:
β ^ φ a s y N β , τ φ 2 ( X X ) - 1 .
For φ ( 1 - u ) - φ ( u ) bounded, Koul et al. [100] proposed a consistent estimator τ ^ φ for the scale parameter τ φ . This asymptotic property can again be used to construct approximate confidence limits for the regression coefficients, to derive a Wald test for the general linear hypothesis, to derive a goodness-of-fit test, and to define a measure of determination (cf. [97])).
Gini regression corresponds to C P E G ( ε , F ε ( ε ) ) . In the same way we can derive from C P E S ( ε , F ε ( ε ) ) the new Shannon regression, from C P E α ( ε , F ε ( ε ) ) the α-regression, and from C P E L ( ε , F ε ( ε ) ) the Leik regression.
The R package “Rfit” has the option to include individual φ-functions into rank based regression (cf. [97]). Using this option and the dataset “telephone”, which is available with several outliers in “Rfit”, we compare the fit of the Shannon regression ( α 1 ), the Leik regression, and the α-regression (for several values of α) with the OLS regression. Figure 3 shows on the left the original data, the OLS, and the Shannon regression, while on its right side outliers were excluded to get a more detailed impression of the differences between the φ-regressions.
In comparison with the very sensitive OLS regression all rank based regression techniques behave similarly. In case of a known error distribution, McKean et al. [98] showed an asymptotically efficient estimator for τ φ . This procedure also determines the entropy generating function φ. In case of an unknown error distribution but some available information with respect to skewness and leptokurtosis, a data-driven (adaptive) procedure was proposed by them.

8.5. Two-Sample Rank Test on Dispersion

Based on C P E φ the linear rank statistics
C P E φ ( R ) = i = 1 n φ R i n + m + 1 + φ 1 - R i n + m + 1
can be used as a test statistic for alternatives of scale, where R 1 , R 2 , , R n are the ranks of X 1 , X 2 , , X n in the pooled sample X 1 , X 2 , , X n , Y 1 , Y 2 , , Y m . All random variables are assumed to be independent.
Some of the linear rank statistics which are well-known from the literature are special cases of Equation (56) as will be shown in the following examples:
Example 16. 
Let φ ( u ) = 1 / 2 - | u - 1 / 2 | , u [ 0 , 1 ] , then we have
C P E L ( R ) = 2 i = 1 n 1 2 - R i n + m + 1 - 1 2 .
Ansari et al. [101] suggest the statistic
S A B = i = 1 n 1 2 ( n + m + 1 ) - R i - 1 2 ( n + m + 1 )
as a two-sample test for alternatives of scale (cf. [102], p. 104). Apparently, we have S A B = 1 / 2 ( n + m + 1 ) C P E L ( R ) .
Example 17. 
Let φ ( u ) = 1 / 4 - ( u - 1 / 2 ) 2 , u [ 0 , 1 ] . Consequently, we have
C P E G ( R ) = n 2 - 2 i = 1 n R i n + m + 1 - 1 2 2 ,
which is identical to the test statistic suggested by [103] up to an affine linear relation (cf. [68], p. 149f.). This test statistic is given by S M = i = 1 n ( R i - ( n + m + 1 ) / 2 ) 2 , thus, the resulting relation is given by
C P E φ ( R ) = n 2 - 2 ( n + m + 1 ) 2 S M .
In the following, the scores of the Mood test will be generated by the generating function of C P E G .
Dropping the requirement of concavity of φ, one finds analogies to other well-known test statistics.
Example 18. 
Let φ ( u ) = 1 / 2 - 1 / 2 ( s i g n ( | u - 1 / 2 | - 1 / 4 ) + 1 ) , u [ 0 , 1 ] , which is not concave on the interval [0,1], we have
C P E φ ( R ) = n - i = 1 n s i g n R i n + m + 1 - 1 2 - 1 4 + 1 ,
which is identical to the quantile test statistic for alternatives of scale up to an affine linear relation ([102], p. 105).
The asymptotic distribution of linear rank tests based on C P E φ can be derived from the theory of linear rank test, as discussed in [102]. The asymptotic distribution under the null hypothesis is needed to be able to make an approximate test decision given a significance level α. The asymptotic distribution under the alternative hypothesis is needed for an approximate evaluation of the test power and the choice of the required sample size in order to ensure a given effect size, respectively.
We consider the centered linear rank statistic
C P E ¯ φ ( R ) = C P E φ ( R ) - 2 n n + m i = 1 n + m φ i n + m + 1 .
Under the null hypothesis of identical scale parameters and the assumption that
0 1 ( φ ( u ) - φ ¯ ) 2 + ( φ ( u ) - φ ¯ ) ( φ ( 1 - u ) - φ ¯ ) d u > 0 ,
where φ ¯ = 0 1 φ ( u ) d u , the asymptotical distribution of C P E ¯ φ ( R ) is given by
C P E ¯ φ ( R ) a s y N 0 , 2 n m n + m 0 1 φ ( u ) - φ ¯ ) 2 + ( φ ( u ) - φ ¯ ) ( φ ( 1 - u ) - φ ¯
(cf. [102], p. 194, Theorem 1 and p. 195, Lemma 1).
The property of asymptotic normality of the Ansari-Bradley test and the Mood test is well-known. Therefore, we provide a new linear rank test based on cumulative paired Shannon entropy C P E S (so-called “Shannon”-test) in the following example:
Example 19. 
With φ ( u ) = - u ln u , u [ 0 , 1 ] , and φ ¯ = 1 / 4 we have
0 1 φ ( u ) - φ ¯ 2 d u = 0 1 φ ( u ) 2 d u - 1 16 = 0 1 u 2 ( ln u ) 2 d u - 1 16 = 2 27 - 1 16 = 5 432
and
0 1 ( φ ( u ) - φ ¯ ) ( φ ( 1 - u ) - φ ¯ ) d u = 0 1 φ ( u ) φ ( 1 - u ) d u - 1 16 = 0 1 u ( 1 - u ) ln u ln ( 1 - u ) d u - 1 16 = 37 - 3 π 2 108 - 1 16 = 121 - 12 π 2 432 .
Under the null hypothesis of identical scale, the centered linear rank statistic C P E ¯ S ( R ) is asymptotically normal with variance
n m n + m 63 - 6 π 2 108 .
If the alternative hypothesis H 1 for a density function f 0 is given by
f ( x 1 , , x n + m ; σ ) = i = 1 n 1 σ f 0 x i σ i = n + 1 n + m f 0 ( x i )
for σ > 0 and σ 1 , then set
φ 1 ( u ; f 0 ) = - 1 - F 0 - 1 ( u ) f 0 ( F 0 - 1 ( u ) ) f 0 ( F 0 - 1 ( u ) )
and assume I ( f 0 ) = 0 1 φ 1 ( u ; f 0 ) 2 d u > 0 . If min ( n , m ) and ln σ I ( f 0 ) m n / ( n + m ) b 2 with 0 < b 2 < , C P E ¯ φ ( R ) is asymptotically normal distributed with mean
- n n + m ln σ m n n + m 0 1 φ ( u ) φ 1 ( u ; f 0 ) + φ ( 1 - u ) φ 1 ( u ; f 0 ) d u
and variance
2 n m n + m 0 1 φ ( u ) - φ ¯ ) 2 + ( φ ( u ) - φ ¯ ) ( φ ( 1 - u ) - φ ¯ d u .
This result follows immediately from [102], p. 267, Theorem 1, together with the Remark on, p. 268.
If f 0 is a symmetric distribution, φ 1 ( u ; f 0 ) = φ 1 ( 1 - u ; f 0 ) , u [ 0 , 1 ] , holds such that
0 1 ( 2 φ ¯ - φ ( u ) - φ ( 1 - u ) ) φ 1 ( u ; f 0 ) d u = - 2 0 1 φ ( u ) φ 1 ( u ; f 0 ) d u .
This simplifies the variance of the asymptotic normal distribution.
Since the asymptotic normality of the test statistic of the Ansari-Bradley test and the Mood test under the alternative hypothesis have been examined intensely (cf., e.g., [103,104]), we focus in the following example on the new Shannon test:
Example 20. 
Set φ ( u ) = - u ln u , u [ 0 , 1 ] and let f 0 be the density function of a standard Gaussian distribution, such that φ 1 ( u ; f 0 ) = - 1 + Φ - 1 ( u ) 2 and I 1 ( f 0 ) = 1 . As a consequence, we have
- 2 0 1 ( - u ln u ) ( Φ - 1 ( u ) 2 - 1 ) d u = 0 . 240 ,
and
0 1 1 / 2 + u ln u + ( 1 - u ) ln ( 1 - u ) 2 d u = 63 - 6 π 2 108 ,
where the integrals have been evaluated by numerical integration. Then under the alternative Equation (58):
C P E ¯ S ( R ) a s y N 0 . 240 n n + m ln σ m n n + m , 63 - 6 π 2 108 2 n m n + m .
Hereafter, one can discuss the asymptotic efficiency of linear rank tests based on cumulative paired φ-entropy. If f 0 is the true density and
ρ 1 = 0 1 φ u ) φ 1 ( u ; f 0 ) + φ ( 1 - u ) φ 1 ( u ; f 0 d u 0 1 φ 1 ( u ; f 0 ) 2 d u 0 1 φ ( u ) - φ ¯ ) 2 + ( φ ( u ) - φ ¯ ) ( φ ( 1 - u ) - φ ¯ d u ,
then ρ 1 2 gives the desired asymptotic efficiency (cf. [102], p. 317).
The asymptotic efficiency of the Ansari-Bradley test (and the asymptotic equivalent Siegel-Tukey test, respectively) and the Mood test have been analyzed by [104,105,106]. The asymptotic relative efficiency (ARE) with respect to the traditional F-test for differences in scale for two Gaussian distributions has been discussed by [103]. This asymptotic relative efficiency between Mood test and F-test for differences in scale has been derived by [107]. Once more, we focus on the new Shannon-test.
Example 21. 
The Klotz test is asymptotically efficient for the Gaussian distribution. With 0 1 ( Φ - 1 ( u ) 2 - 1 ) 2 d u = 2 ,
ρ 1 2 = 0 . 24 2 ( 63 - 6 π 2 ) / 108 × 2 = 0 . 823
gives the asymptotic efficiency of the new Shannon test.
Using a distribution that ensures the asymptotic efficiency of the Ansari-Bradley test, we compare the asymptotic efficiency of the Shannon test to the one of the Ansari-Bradley test.
Example 22. 
The Ansari-Bradley test statistic S A B is asymptotically efficient for the double log-logistic distribution with density function f 0 (cf. [102], p. 104). The Fisher information is given by
0 1 φ 1 ( u ; f 0 ) 2 d u = 0 1 ( 2 | 2 u - 1 | - 1 ) 2 d u = 4 0 1 ( 2 u - 1 ) 2 d u - 1 = 1 3 .
Furthermore, we have
0 1 φ 1 ( u ; f 0 ) ( 2 φ ¯ - φ ( u ) - φ ( 1 - u ) ) d u = 0 1 φ 1 ( u ; f 0 ) 1 2 + u ln u + ( 1 - u ) ln ( 1 - u ) d u = 2 0 1 | 2 u - 1 | ( u ln u + ( 1 - u ) ln ( 1 - u ) ) d u = 0 . 102 ,
such that the asymptotic efficiency of the Shannon-test for f 0 is
ρ 1 2 = 0.102 2 1 / 3 × ( 63 - 2 π 2 ) / 108 = 0.892 .
These two examples show that the Shannon test has a rather good asymptotic efficiency, even if the underlying distribution has moderate tails similar to the Gaussian distribution or heavy tails like the double log-logistic distribution. Asymptotic efficient linear rank tests correspond to a distribution and a scores generating function ϕ 1 , from which we can derive an entropy generating function φ and a cumulative paired φ-entropy. This relationship will be further examined in [74].

9. Some Cumulative Paired Entropies for Selected Distribution Functions

In the following, we derive closed form expressions for some cumulative paired φ-entropies. We mimic the procedure of ([4], p. 326) to some degree. Table 1 of their paper contains multiple formulas of the differential entropy for the most popular statistical distributions. Several of these distributions will also be considered in the following. Since cumulative entropies depend on the distribution function or equivalently on the quantile function, we focus on families of distributions for which these functions have a closed form expression. Furthermore, we only discuss standardized random variables since the parameter of scale only has a multiplicative effect on C P E φ and the parameter of location has no effect. For the standard Gaussian distribution we provide the value of C P E S by numerical integration rounded to two decimal places since the probability function has no explicit form. For the Gumbel distribution however, there is a closed form expression for the distribution function – nevertheless, we were unable to establish a closed form of C P E S and C P E G . Therefore, we applied numerical integration in this case as well. In the following, next to the Gamma function Γ ( a ) and the Beta function B ( a , b ) , we use
  • the incomplete Gamma function
    Γ ( x ; a ) = 0 x y a - 1 e - y d y for   x > 0 , a > 0 ,
  • the incomplete Beta function
    B ( x ; a , b ) = 0 x u a - 1 ( 1 - u ) b - 1 d u for   0 < x < 1 , a , b > 0 ,
  • and the Digamma function
    ψ ( a ) = d d a ln Γ ( a ) , a > 0 .

9.1. Uniform Distribution

Let X have the standard uniform distribution. Then we have
C P E S ( X ) = 3 2 , C P E G ( X ) = 1 3 , C P E L ( X ) = 1 2 , C P E α ( X ) = 1 α + 1 .

9.2. Power Distribution

Let X have the Beta distribution on [ 0 , 1 ] with parameter α > 0 and b = 1 , i.e., density function f X ( x ) = a x a - 1 for x [ 0 , 1 ] , then we have
C P E S ( X ) = a ( a + 1 ) 2 + ψ a + 1 a - a + 1 a ψ a + 2 a + 1 a ψ ( 1 ) , C P E G ( X ) = 2 a ( 1 + a ) ( 1 + 2 a ) , C P E L ( X ) = a a + 1 1 - 1 2 1 / a , C P E α ( X ) = 1 a ( 1 - α ) B 1 a , α + 1 - α a ( 1 - α ) ( 1 + α a ) .

9.3. Triangular Distribution with Parameter c

Let X have a triangular distribution with density function
f ( x ) = 2 x / c for   0 < x < c 2 ( 1 - x ) / ( 1 - c ) for   c x < 1 .
Then the following holds:
C P E S ( X ) = π 2 6 + ln 2 ( 1 - ln 2 ) , C P E G ( X ) = 2 3 c 2 + ( 1 - c ) 2 - 2 5 c 3 + ( 1 - c ) 3 , C P E L ( X ) = 1 3 ( 2 - c ) - 3 - 2 3 2 1 - c , C P E α ( X ) = 1 1 - α 2 2 α + 1 c α + 1 + ( 1 - c ) α + 1 + c B c ; 1 2 , α + 1 + 1 - c B 1 - c ; 1 2 , α + 1 - 2 .

9.4. Laplace Distribution

Let X follow the Laplace distribution with density function f X ( x ) = 1 / 2 exp ( - | x | ) for x R , then we have
C P E S ( X ) = π 2 6 + ln 2 ( 1 - ln 2 ) , C P E G ( X ) = 3 2 , C P E L ( X ) = 2 , C P E α ( X ) = 4 α - 1 1 2 α - 1 1 α - 1 - 1 2 α .

9.5. Logistic Distribution

Let X follow the logistic distribution with distribution function F X ( x ) = 1 / ( 1 + exp ( - x ) ) for x R , then we have
C P E S ( X ) = π 2 3 , C P E G ( X ) = 2 , C P E L ( X ) = 4 ln 2 , C P E α = 2 α - 1 ( ψ ( α ) - ψ ( 1 ) ) .

9.6. Tukey λ Distribution

Let X follow the Tukey λ distribution with quantile function F - 1 ( U ) = 1 / λ u λ - ( 1 - u ) 1 - λ for 0 u 1 and λ > - 1 . Then the following holds:
C P E S ( X ) = 2 ( λ + 1 ) 2 1 + 1 + 1 λ ( λ + 1 ) ψ ( λ + 1 ) - ψ ( λ + 2 ) - ψ ( 1 ) , C P E G ( X ) = 4 λ + 1 1 + 1 λ , C P E L ( X ) = 2 1 λ + 1 1 2 λ + 1 + B 1 2 ; 2 , λ , C P E α ( X ) = 2 1 1 - α λ 3 - λ α - 2 ( λ + α ) λ 2 ( λ + 1 ) ( λ + α ) + B ( α + 1 , λ ) .

9.7. Weibull Distribution

Let X follow the Weibull distribution with distribution function F X ( x ) = 1 - e - x c for x > 0 , c > 0 , then we have
C P E S ( X ) = 1 c Γ 1 c 1 + i = 1 1 i ! 1 i 1 / c - 1 i + 1 1 / c , C P E G ( X ) = 2 c Γ 1 c - 1 2 Γ 1 2 c , C P E L ( X ) = 2 ( ln 2 ) 1 / c + 1 c Γ 1 c - 2 Γ ln 2 ; 1 c , C P E α ( X ) = 1 c Γ 1 c 1 α 1 / c + i = 1 α i ( - 1 ) i i - 1 / c .

9.8. Pareto Distribution

Let X follow the Pareto distribution with distribution function F X ( x ) = 1 - x - c for x > 1 , c > 1 , then we have
C P E S ( X ) = 1 c - 1 ψ 2 - 1 c + ψ 1 - 1 c - c c - 1 ψ ( 1 ) + 4 c , C P E G ( X ) = 2 c ( c - 1 ) ( 2 c - 1 ) , C P E L ( X ) = 2 1 c - 1 , C P E α ( X ) = 1 1 - α c ( 1 - α ) ( c α - 1 ) ( c - 1 ) - 1 c B α , 1 - 1 c .

9.9. Gaussian Distribution

By means of numerical integration we calculated the following values for the standard Gaussian distribution:
C P E S ( X ) = 1.806 , C P E G ( X ) = 1.128 , C P E L ( X ) = 1.596 .
C P E α for α [ 0.5 , 3 ] and the standard Gaussian distribution can be seen in Figure 4.

9.10. Student-t Distribution

By means of numerical integration and for ν = 3 degrees of freedom we calculated the following values for the Student-t distribution
C P E S ( X ) = 2.947 , C P E G ( X ) = 3.308 , C P E L ( X ) = 2.205 .
As can be seen in Figure 4, the heavy tails of the Student-t distribution result in a higher value for C P E α as compared with the Gaussian distribution.

10. Conclusions

A new kind of entropy has been introduced that generalizes Shannon’s differential entropy. The main difference to the previous discussion of entropies is the fact that the new entropy is defined for distribution functions instead of density functions. This paper shows that this definition has a long tradition in several scientific disciplines like fuzzy set theory, reliability theory, and more recently in uncertainty theory. With only one exception within all the disciplines, the concepts had been discussed independently. Along with that, the theory of dispersion measures for ordered categorical variables refers to measures based on distribution functions, without realizing that implicitly some sort of entropies are applied. Using the Cauchy–Schwarz inequality, we were able to show the close relationship between the new kind of entropy named cumulative paired φ-entropy and the standard deviation. More precisely, the standard deviation yields an upper limit for the new entropy. Additionally, the Cauchy–Schwarz inequality can be used to derive maximum entropy distributions provided that there are constraints specifying values of mean and variance. Here, the logistic distribution takes on the same key role for the cumulative paired Shannon entropy which the Gaussian distribution takes by maximizing the differential entropy. As a new result we have demonstrated that Tukey’s λ distribution is a maximum entropy distribution if using the entropy generating function φ which is known from the Harvda and Charvát entropy. Moreover, some new distributions can be derived by considering more general constraints. A change in perspective allows to determine the entropy that will be maximized by a certain distribution if, e.g., mean and variance are known. In this context the Gaussian distribution gives a simple solution. Since cumulative paired φ-entropy and variance are closely related, we have investigated whether the cumulative paired φ-entropy is a proper measure of scale. We show that it satisfies the axioms which were introduced by Oja for measures of scale. Several further properties, concerning the behavior under transformations or the sum of independent random variables, have been proven. Consequently, we have given first insights on how to estimate the new entropy. In addition, based on cumulative paired φ-entropy we have introduced new concepts like φ-divergence, mutual φ-information, and φ-correlation. φ-regression and linear rank tests for scale alternatives were considered as well. Furthermore, formulas have been derived for some popular distributions with cdf or quantile function in closed form and for certain cumulative paired φ-entropies.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive criticism, which helped to improve the presentation of this paper significantly. Furthermore, we would like to thank Michael Grottke for helpful advises.

Author Contributions

Ingo Klein conceived the new entropy concept, investigated its properties and wrote an initial version of the manuscript. Benedikt Mangold cooperated especially by checking, correcting and improving the mathematical details including the proofs. He examined the entropy’s properties by simulation. Monika Doll contributed by mathematical and linguistic revision. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Burbea, J.; Rao, C.R. On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 1982, 28, 489–495. [Google Scholar] [CrossRef]
  2. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  3. Oja, H. On location, scale, skewness and kurtosis of univariate distributions. Scand. J. Stat. 1981, 8, 154–168. [Google Scholar]
  4. Ebrahimi, N.; Massoumi, E.; Soofi, E.S. Ordering univariate distributions by entropy and variance. J. Econometr. 1999, 90, 317–336. [Google Scholar] [CrossRef]
  5. Popoviciu, T. Sur les équations algébraique ayant toutes leurs racines réelles. Mathematica 1935, 9, 129–145. (In French) [Google Scholar]
  6. Liu, B. Uncertainty Theory. Available online: http://orsc.edu.cn/liu/ut.pdf (accessed on 27 June 2016).
  7. Wang, F.; Vemuri, B.C.; Rao, M.; Chen, Y. A New & Robust Information Theoretic Measure and Its Application to Image Alignment: Information Processing in Medical Imaging; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2732, pp. 388–400. [Google Scholar]
  8. Di Crescenzo, A.; Longobardi, M. On cumulative entropies and lifetime estimation. In Methods and Models in Artificial and Natural Computation; Mira, J.M., Ed.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 132–141. [Google Scholar]
  9. Di Crescenzo, A.; Longobardi, M. On cumulative entropies. J. Stat. Plan. Inference 2009, 139, 4072–4087. [Google Scholar] [CrossRef]
  10. Kapur, J.N. Derivation of logistic law of population growth from maximum entropy principle. Natl. Acad. Sci. Lett. 1983, 6, 429–433. [Google Scholar]
  11. Hartley, R. Transmission of information. Bell Syst. Tech. J. 1928, 7, 535–563. [Google Scholar] [CrossRef]
  12. De Luca, A.; Termini, S. A definition of a nonprobabilistic entropy in the setting of fuzzy set theory. Inf. Control 1972, 29, 301–312. [Google Scholar] [CrossRef]
  13. Zadeh, L. Probability measures of fuzzy events. J. Math. Anal. Appl. 1968, 23, 421–427. [Google Scholar] [CrossRef]
  14. Pal, N.R.; Bezdek, J.C. Measuring fuzzy uncertainty. IEEE Trans. Fuzzy Syst. 1994, 2, 107–118. [Google Scholar] [CrossRef]
  15. Rényi, A. On measures of entropy and information. In Fourth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Oakland, CA, USA, 1961; pp. 547–561. [Google Scholar]
  16. Esteban, M.D.; Morales, D. A summary on entropy statistics. Kybernetika 1995, 31, 337–346. [Google Scholar]
  17. Cichocki, A.; Amari, S. Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
  18. Arndt, C. Information Measures; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  19. Kesavan, H.K.; Kapur, J.N. The generalizedmaximumentropy principle. IEEE Trans. Syst. Man Cyber. 1989, 19, 1042–1052. [Google Scholar] [CrossRef]
  20. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  21. Jaynes, E.T. Information theory and statistical mechanics. II. Phys. Rev. 1957, 108, 171–190. [Google Scholar] [CrossRef]
  22. Leik, R.K. A measure of ordinal consensus. Pac. Sociol. Rev. 1966, 9, 85–90. [Google Scholar] [CrossRef]
  23. Vogel, H.; Dobbener, R. Ein Streuungsmaß für komparative Merkmale. Jahrbücher für Nationalökonomie und Statistik 1982, 197, 145–157. (In German) [Google Scholar]
  24. Kvålseth, T.O. Nominal versus ordinal variation. Percept. Mot. Skills 1989, 69. [Google Scholar] [CrossRef]
  25. Berry, K.J.; Mielke, P.W. Assessment of variation in ordinal data. Percept. Motor Skills 1992, 74, 63–66. [Google Scholar] [CrossRef]
  26. Berry, K.J.; Mielke, P.W. Indices of ordinal variation. Percept. Motor Skills 1992, 74, 576–578. [Google Scholar] [CrossRef]
  27. Berry, K.J.; Mielke, P.W. A test of significance for the index of ordinal variation. Percept. Motor Skills 1994, 79, 291–1295. [Google Scholar]
  28. Blair, J.; Lacy, M.G. Measures of variation for ordinal data. Percept. Motor Skills 1996, 82, 411–418. [Google Scholar] [CrossRef]
  29. Blair, J.; Lacy, M.G. Statistics of ordinal variation. Sociol. Methods Res. 2000, 28, 251–280. [Google Scholar] [CrossRef]
  30. Gadrich, T.; Bashkansky, E.; Zitikas, R. Assessing variation: A unifying approach for all scales of measurement. Qual. Quant. 2015, 49, 1145–1167. [Google Scholar] [CrossRef]
  31. Allison, R.A.; Foster, J.E. Measuring health inequality using qualitative data. J. Health Econ. 2004, 23, 505–524. [Google Scholar] [CrossRef] [PubMed]
  32. Zheng, B. Measuring inequality with ordinal data: A note. Res. Econ. Inequal. 2008, 16, 177–188. [Google Scholar]
  33. Abul Naga, R.H.; Yalcin, T. Inequality measurement for ordered response health data. J. Health Econ. 2008, 27, 1614–1625. [Google Scholar] [CrossRef] [PubMed]
  34. Zheng, B. A new approach to measure socioeconomic inequality in health. J. Econ. Inequal. 2011, 9, 555–577. [Google Scholar] [CrossRef]
  35. Apouey, B.; Silber, J. Inequality and bi-polarization in socioeconomic status and health: Ordinal approaches. Res. Econ. Inequal. 2013, 21, 77–109. [Google Scholar]
  36. Klein, I. Rangordnungsstatistiken als Verteilungsmaßzahlen für ordinalskalierte Merkmale: I. Streuungsmessung. In Diskussionspapiere des Lehrstuhls für Statistik und; Ökonometrie der Universität: Erlangen-Nürnberg, Germany, 1999; Volume 27. (In German) [Google Scholar]
  37. Yager, R.R. Dissonance—A measure of variability for ordinal random variables. Int. J. Uncertain. Fuzzin. Knowl. Based Syst. 2001, 9, 39–53. [Google Scholar]
  38. Bowden, R.J. Information, measure shifts and distribution metrics. Statistics 2012, 46, 249–262. [Google Scholar] [CrossRef]
  39. Dai, W. Maximum entropy principle for quadratic entropy of uncertain variables. Available online: http://orsc.edu.cn/online/100314.pdf (accessed on 27 June 2016).
  40. Dai, W.; Chen, X. Entropy of function of uncertain variables. Math. Comput. Model. 2012, 55, 754–760. [Google Scholar] [CrossRef]
  41. Chen, X.; Kar, S.; Ralescu, D.A. Cross-entropy measure of uncertain variables. Inf. Sci. 2012, 201, 53–60. [Google Scholar] [CrossRef]
  42. Yao, K.; Gao, J.; Dai, W. Sine entropy for uncertain variables. Int. J. Uncertain. Fuzzin. Knowl. Based Syst. 2013, 21, 743–753. [Google Scholar] [CrossRef]
  43. Yao, K.; Ke, H. Entropy operator for membership function of uncertain set. Appl. Math. Comput. 2014, 242, 898–906. [Google Scholar] [CrossRef]
  44. Ning, Y.; Ke, H.; Fu, Z. Triangular entropy of uncertain variables with application to portfolio selection. Soft Comput. 2015, 19, 2203–2209. [Google Scholar] [CrossRef]
  45. Ebrahimi, N. How to measure uncertainty in the residual lifetime distribution. Sankhya Ser. A 1996, 58, 48–56. [Google Scholar]
  46. Rao, M.; Chen, Y.; Vemuri, B.C.; Wang, F. Cumulative residual entropy: A new measure of information. IEEE Trans. Inf. Theory 2004, 50, 1220–1228. [Google Scholar] [CrossRef]
  47. Rao, M. More on a new concept of entropy and information. J. Theor. Probabil. 2005, 18, 967–981. [Google Scholar] [CrossRef]
  48. Schroeder, M.J. An alternative to entropy in the measurement of information. Entropy 2004, 6, 388–412. [Google Scholar] [CrossRef]
  49. Zografos, K.; Nadarajah, S. Survival exponential entropies. IEEE Trans. Inf. Theory 2005, 51, 1239–1246. [Google Scholar] [CrossRef]
  50. Drissi, N.; Chonavel, T.; Boucher, J.M. Generalized cumulative residual entropy distributions with unrestricted supports. Res. Lett. Signal Process. 2008, 2008. [Google Scholar] [CrossRef]
  51. Chen, X.; Dai, W. Maximum entropy principle for uncertain variables. Int. J. Fuzzy Syst. 2011, 13, 232–236. [Google Scholar]
  52. Sunoj, S.M.; Sankaran, P.G. Quantile based entropy function. Stat. Probabil. Lett. 2012, 82, 1049–1053. [Google Scholar] [CrossRef]
  53. Zardasht, V.; Parsi, S.; Mousazadeh, M. On empirical cumulative residual entropy and a goodness-of-fit test for exponentiality. Stat. Pap. 2015, 56, 677–688. [Google Scholar] [CrossRef]
  54. Navarro, J.; del Aguila, Y.; Asadi, M. Some new results on the cumulative residual entropy. J. Stat. Plan. Inference 2010, 140, 310–322. [Google Scholar] [CrossRef]
  55. Psarrakos, G.; Navarro, J. Generalized cumulative residual entropy and record values. Metrika 2013, 76, 623–640. [Google Scholar] [CrossRef]
  56. Kiesl, H. Ordinale Streuungsmaße; JOSEF-EUL-Verlag: Köln, Germany, 2003. (In German) [Google Scholar]
  57. Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
  58. Jumarie, G. Relative Information: Theories and Applications; Springer: Berlin/Heidelberg, Germany, 1990. [Google Scholar]
  59. Kapur, J.N. Measures of Information and their Applications; New Age International Publishers: New Delhi, India, 1994. [Google Scholar]
  60. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
  61. Kapur, J.N. Generalized Cauchy and Students distributions as maximum entropy distributions. Proc. Natl. Acad. Sci. India 1988, 58, 235–246. [Google Scholar]
  62. Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models: III. Dispersion. Ann. Stat. 1976, 5, 1139–1158. [Google Scholar] [CrossRef]
  63. Behnen, K.; Neuhaus, G. Rank Tests with Estimated Scores and their Applications; Teubner-Verlag: Stuttgart, Germany, 1989. [Google Scholar]
  64. Burger, H.U. Dispersion orderings with applications to nonparametric tests. Stat. Probabil. Lett. 1993, 16. [Google Scholar] [CrossRef]
  65. Bickel, P.J.; Lehmann, E.L. Descriptive statistics for nonparametric models: IV. Spread. In Contributions to Statistics; Jurečková, J., Ed.; Academic Press: New York, NY, USA, 1979; pp. 33–40. [Google Scholar]
  66. Pfanzagl, J. Asymptotic Expansions for General Statistical Models; Springer: New York, NY, USA, 1985. [Google Scholar]
  67. Beirlant, J.; Dudewicz, E.J.; Györfi, L.; van der Meulen, E.C. Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 1997, 6, 17–39. [Google Scholar]
  68. Büning, H.; Trenkler, G. Nichtparametrische Statistische Methoden; de Gruyter: Berlin, Germany, 1994. [Google Scholar]
  69. Serfling, R.J. Approximation Theorems in Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
  70. Huber, P.J. Robust Statistics; John Wiley & Sons: New York, NY, USA, 1981. [Google Scholar]
  71. Jurečková, J.; Sen, P.K. Robust Statistical Procedures: Asymptotics and Interrelations; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
  72. Parr, W.C.; Schucany, W.R. Jackknifing L-statistics with smooth weight functions. J. Am. Stat. Assoc. 1982, 77, 629–638. [Google Scholar] [CrossRef]
  73. Klein, I.; Mangold, B. Cumulative paired φ-entropies—Estimation and Robustness. Unpublished work. 2016. [Google Scholar]
  74. Klein, I.; Mangold, B. Cumulative paired φ -entropies and two sample linear rank tests for scale alternatives. Unpublished work. 2016. [Google Scholar]
  75. Klein, I.; Mangold, B. φ-correlation and φ-regression. Unpublished work. 2016. [Google Scholar]
  76. Pardo, L. Statistical Inferences based on Divergence Measures; Chapman & Hall: Boca Raton, FL, USA, 2006. [Google Scholar]
  77. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  78. Berk, R.H.; Jones, D.H. Goodness-of-fit statistics that dominate the Kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 1979, 47, 47–59. [Google Scholar] [CrossRef]
  79. Donoho, D.; Jin, J. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 2004, 32, 962–994. [Google Scholar]
  80. Park, S.; Rao, M.; Shin, D.W. On cumulative residual Kullback–Leibler information. Stat. Probabil. Lett. 2012, 82, 2025–2032. [Google Scholar] [CrossRef]
  81. Di Crescenzo, A.; Longobardi, M. Some properties and applications of cumulative Kullback–Leibler information. Appl. Stoch. Models Bus. Ind. 2015, 31, 875–891. [Google Scholar] [CrossRef][Green Version]
  82. Liese, F.; Vajda, I. Convex Statistical Distances; Teubner-Verlag: Leipzig, Germany, 1987. [Google Scholar]
  83. Csiszár, I. Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Magyar Tud. Akad. Mat. Kutató Int. Közl. 1963, 8, 85–108. (In German) [Google Scholar]
  84. Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. B 1966, 28, 131–142. [Google Scholar]
  85. Cressie, N.; Read, T. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B 1984, 46, 440–464. [Google Scholar]
  86. Jager, L.; Wellner, J.A. Goodness-of-fit tests via phi-divergences. Ann. Stat. 2007, 35, 2018–2053. [Google Scholar] [CrossRef]
  87. Parr, W.C.; Schucany, W.R. Minimum distance and robust estimation. J. Am. Stat. Assoc. 1980, 75, 616–624. [Google Scholar] [CrossRef]
  88. Nelsen, R.B. An Introduction to Copulas; Springer: New York, NY, USA, 1999. [Google Scholar]
  89. Hall, P.; Wolff, R.C.; Yao, Q. Methods for estimating a conditional distribution function. J. Am. Stat. Assoc. 1999, 94, 154–163. [Google Scholar] [CrossRef]
  90. Schechtman, E.; Yitzhaki, S. A measure of association based on Gini’s mean difference. Commun. Stat. Theory Methods 1987, 16, 207–231. [Google Scholar] [CrossRef]
  91. Schechtman, E.; Yitzhaki, S. On the proper bounds of the Gini correlation. Econ. Lett. 1999, 63, 133–138. [Google Scholar] [CrossRef]
  92. Yitzhaki, S. Gini’s mean difference: A superior measure of variability for non-normal distributions. Metron 2003, 61, 285–316. [Google Scholar]
  93. Olkin, I.; Yitzhaki, S. Gini regression analysis. Int. Stat. Rev. 1992, 60, 185–196. [Google Scholar] [CrossRef]
  94. Hettmansperger, T.P. Statistical Inference Based on Ranks; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
  95. Jaeckel, L.A. Estimating regression coefficients by minimizing the dispersion of residuals. Ann. Math. Stat. 1972, 43, 1449–1458. [Google Scholar] [CrossRef]
  96. Jurečková, J. Nonparametric estimate of regression coefficients. Ann. Math. Stat. 1971, 42, 1328–1338. [Google Scholar] [CrossRef]
  97. Kloke, J.D.; McKean, J.W. Rfit: Rank-based estimation for linear models. R J. 2012, 4, 57–64. [Google Scholar]
  98. McKean, J.W.; Kloke, J.D. Efficient and adaptive rank-based fits for linear models with skew-normal errors. J. Stat. Distrib. Appl. 2014, 1. [Google Scholar] [CrossRef]
  99. Hettmansperger, T.P.; McKean, J.W. Robust Nonparametric Statistical Methods; Chapman & Hall: New York, NY, USA, 2011. [Google Scholar]
  100. Koul, H.L.; Sievers, G.L.; McKean, J. An estimator of the scale parameter for the rank analysis of linear models under general score functions. Scand. J. Stat. 1987, 14, 131–141. [Google Scholar]
  101. Ansari, A.R.; Bradley, R.A. Rank-sum tests for dispersion. Ann. Math. Stat. 1960, 31, 142–149. [Google Scholar] [CrossRef]
  102. Hájek, J.; Šidák, Z.; Sen, P.K. Theory of Rank Tests; Academic Press: San Diego, CA, USA, 1999. [Google Scholar]
  103. Mood, A.M. On the asymptotic efficiency of certain nonparametric two-sample tests. Ann. Math. Stat. 1954, 25, 514–522. [Google Scholar] [CrossRef]
  104. Klotz, J. Nonparametric tests for scale. Ann. Math. Stat. 1961, 33, 498–512. [Google Scholar] [CrossRef]
  105. Basu, A.P.; Woodworth, G. A note on nonparametric tests for scale. Ann. Math. Stat. 1967, 38, 274–277. [Google Scholar] [CrossRef]
  106. Shiraishi, T.A. The asymptotic power of rank tests under scale-alternatives including contaminated distributions. Ann. Math. Stat. 1986, 38, 513–522. [Google Scholar] [CrossRef]
  107. Sukhatme, B.V. On certain two-sample nonparametric tests for variances. Ann. Math. Stat. 1957, 28, 188–194. [Google Scholar] [CrossRef]
Figure 1. Some entropy generating functions φ.
Figure 1. Some entropy generating functions φ.
Entropy 18 00248 g001
Figure 2. Several entropy generating functions φ derived from the generalized maximum entropy (ME) principle.
Figure 2. Several entropy generating functions φ derived from the generalized maximum entropy (ME) principle.
Entropy 18 00248 g002
Figure 3. φ-regression fit for the number of calls in the “telephone” data set.
Figure 3. φ-regression fit for the number of calls in the “telephone” data set.
Entropy 18 00248 g003
Figure 4. C P E α , α [ 0.5 , 3 ] for the standard Gaussian and the Student-t distribution.
Figure 4. C P E α , α [ 0.5 , 3 ] for the standard Gaussian and the Student-t distribution.
Entropy 18 00248 g004
Back to TopTop