A Dual Measure of Uncertainty: The Deng Extropy

The extropy has recently been introduced as the dual concept of entropy. Moreover, in the context of the Dempster–Shafer evidence theory, Deng studied a new measure of discrimination, named the Deng entropy. In this paper, we define the Deng extropy and study its relation with Deng entropy, and examples are proposed in order to compare them. The behaviour of Deng extropy is studied under changes of focal elements. A characterization result is given for the maximum Deng extropy and, finally, a numerical example in pattern recognition is discussed in order to highlight the relevance of the new measure.


Introduction
Let X be a discrete random variable with support {x 1 , . . . , x n } and with probability mass function vector p = (p 1 , . . . , p n ). The Shannon entropy of X is defined as where log is the natural logarithm, see [1]. It is a measure of information and discrimination about the uncertainty related to the random variable X. Lad et al. [2] proposed the extropy as the dual of entropy. This is a measure of uncertainty related to the outside and is defined as Lad et al. proved the following property related to the sum of the entropy and the extropy: where H(p i , 1 − p i ) = J(p i , 1 − p i ) = −p i log p i − (1 − p i ) log(1 − p i ) are the entropy and the extropy of a discrete random variable of which the support has cardinality two and the probability mass function vector is (p i , 1 − p i ).
Dempster [3] and Shafer [4] introduced a method to study uncertainty. Their theory of evidence is a generalization of the classical probability theory. In D-S theory, an uncertain event with a finite number of alternatives is considered, and a mass function over the power set of the alternatives (i.e., a degree of confidence to all of its subsets) is defined. If we give positive mass only to singletons, a probability mass function is obtained. D-S theory allows us to describe more general situations in which there is less specific information.
Here we describe an example studied in [5] to explain how D-S theory extends the classical probability theory. Consider two boxes, A and B such that in A there are only red balls, whereas in B there are only green balls and the number of balls in each box is unknown. A ball is picked randomly from one of these two boxes. The box A is selected with probability p A = 0.6 and box B is selected with probability p B = 0.4. Hence, the probability of picking a red ball is 0.6, P(R) = 0.6, whereas the probability of picking a green ball is 0.4, P(G) = 0.4. Suppose now that in box B there are green and red balls with rates unknown. The box A is still selected with probability p A = 0.6 and the box B with probability p B = 0.4. In this case, we can not obtain the probability of picking a red ball. To analyze this problem, we can use D-S theory to express the uncertainty. In particular, we choose a mass function m such that, m(R) = 0.6 and m(R, G) = 0.4.
Dempster-Shafer theory of evidence has several applications due to its advantages in dealing with uncertainty; for example, it is used in decision making [6,7], in risk evaluation [8,9], in reliability analysis [10,11] and so on. In the following, we recall the basic notions of this theory.
Let X be a frame of discernment, i.e., a set of mutually exclusive and collectively exhaustive events indicated by X = {θ 1 , θ 2 , . . . , θ |X| }. The power set of X is indicated by 2 X and it has cardinality 2 |X| . A function m : 2 X → [0, 1] is called a mass function or a basic probability assignment (BPA) if If m(A) = 0 implies |A| = 1 then m is also a probability mass function, i.e., BPAs generalize discrete random variables. Moreover, elements A such that m(A) > 0 are called focal elements. Given a BPA, we can evaluate for each focal element the pignistic probability transformation (PPT). Let us recall that the pignistic probability is the probability that a thinking being would assign to that event. It represents a point estimate of belief and can be determined as [12] PPT If we have a weight or a reliability of evidence, represented by a coefficient α ∈ [0, 1], we can use it to generate another BPA m α in the following way (see [4]) If we have two BPAs m 1 , m 2 for a frame of discernment X, we can introduce another BPA m * for X using the Dempster rule of combination, see [3]. We define m * (A), A ⊆ X, in the following way where K = ∑ B,C⊆X:B∩C=∅ m 1 (B)m 2 (C). We remark that, if K > 1, we can not apply the Dempster rule of combination.
Recently, several measures of discrimination and uncertainty have been proposed in the literature (see, for instance, [13][14][15][16][17][18][19]). In particular, in the context of the Dempster-Shafer evidence theory, there are interesting measures of discrimination, as the Deng entropy; it was this latter concept that has suggested us the introduction of a dual definition.
The Deng entropy was introduced in [5] for a BPA m as This entropy is similar to Shannon entropy and they coincide if the BPA is also a probability mass function. The term 2 |A| − 1 represents the potential number of states in A. For a fixed value of m(A), as the cardinality of A increases, 2 |A| − 1 increases and then also Deng entropy does.
In the literature, several properties of Deng entropy have been studied (see for instance [20]) and other measures of uncertainty based on Deng entropy have been introduced (see [21,22]). Other relevant measures of uncertainty and information known in the Dempster-Shafer theory of evidence are, for example, Hohle's confusion measure [23], Yager's dissonance measure [24] and Klir and Ramer's discord measure [25].
The aim of this paper is to dualize the Deng entropy by defining a corresponding extropy. We present some examples of comparing Deng entropy and our new extropy and their monotonicities. Then investigate the relations between these measures, and the behaviour of the Deng extropy under changes of focal elements. Moreover, a characterization result is given for the maximum Deng extropy. Finally, an application to pattern recognition is given in which it is clear that the Deng extropy can make the right recognition. In the conclusions, the results contained in the paper are summarized.

The Deng Extropy
In order to obtain an analogue of Equation (3), we choose the following definition for the Deng extropy where A c is the complementary of A in X and |A c | = |X| − |A|. Our purpose is to apply the Deng extropy in order to measure the uncertainty related to the outside in the context of Dempster-Shafer evidence theory. For this reason, X is not involved in the determination of the Deng extropy, even when m(X) > 0. The term 2 |A c | − 1 represents the potential number of states outside of A. For a fixed value of m(A), as the cardinality of A increases, 2 |A c | − 1 decreases and then also the Deng extropy does.

Proposition 1.
Let m be a BPA for a frame of discernment X. Then with the convention 0 log 0 = 0, where m * A is a BPA on X defined as i.e., they are equal. Hence, for every gives the corresponding addend of E d (m) + EX d (m). The only exception is given by X, which could give a contribution in the left hand side of Equation (9) if m(X) > 0, and for this reason we have the extra term in the right side of Equations (9) and (10).
Next, we give some examples of evaluation of the Deng extropy and entropy in different situations. Example 1. Given a frame of discernment X, a ∈ X and a BPA m such that m({a}) = m(a) = 1, we have So, in this case, Deng entropy coincides with Deng extropy and they are equal to 0.     Table 1 are obtained: As it was pointed out before, the results show that the extropy of m decreases monotonously with the rise of the size of subset A, while the entropy increases.

The Maximum Deng Extropy
Kang and Deng [26] studied the problem of the maximum Deng entropy. They find out that the maximum Deng entropy on a frame of discernment X with cardinality |X| is attained if and only if the BPA m is defined as where A i , i = 1, . . . , 2 |X| − 1, are all non-empty elements of 2 X . Hence, the value of the maximum Deng entropy is given by In this section, we provide conditions to obtain the maximum Deng extropy for a fixed number of focal elements and with a fixed value for m(X).
In this case, the value of the maximum Deng extropy is Proof. Suppose m(X) = 0. We will prove that in this case the maximum Deng extropy is and that it is attained if and only if the BPA is defined by We have to maximize Then, the Lagrange function can be defined as Thus the gradient can be computed, and for A ∈ N we have where log 2 e + λ does not depend on m(A). By vanishing all the partial derivatives, we obtain where K is a constant. It follows By summing over A ∈ N , we get and then Therefore, from Equation (12) we obtain i.e., Equation (11). Finally, for the Deng extropy related to this BPA, we get and the proof is completed.

Application to Pattern Recognition
In this section, we investigate an application of the Deng extropy in pattern recognition by using the dataset Iris given in [27]. This example was already studied in [28] to analyze the applications of another belief entropy defined in the Dempster-Shafer evidence theory. We compare a method proposed by Kang [29] with a method based on the use of Deng extropy. The Iris dataset is useful to introduce the application of the generation of BPAs based on the Deng extropy in the classification of the kind of flowers. The dataset is composed of 150 samples. For each one, we have the sepal length in cm (SL), the sepal width in cm (SW), the petal length in cm (PL), the petal width in cm (PW) and the class that is only one between Iris Setosa (Se), Iris Versicolour (Ve) and Iris Virginica (Vi). The samples are equally distributed for each class. We select 40 samples for each kind of Iris and then we use sample of max-min value to generate a model of interval numbers, as shown in Table 2. Each element of the dataset can be regarded as an unknown test sample. Suppose the selected sample data is [6.1, 3.0, 4.9, 1.8, Iris Virginica].  where α > 0 is the coefficient of support (we choose α = 5) and D(A, B) is the distance of intervals A and B defined in [30] as In order to generate BPAs, the intervals given in Table 2 are used as interval A and as interval B we use singletons given by the selected sample. For each one of the four properties, we get seven values of similarity and then we get a BPA by normalizing them (see Table 3). Hence, we evaluate the Deng extropy of these BPAs, as shown in the bottom row of Table 3. We obtain a combined BPA by using the Dempster rule of Combination (6). The type of unknown sample is determined by combined BPA. From Equation (4), we get the maximum value of PPT. Hence, Kang's method assigns to the sample the type Iris Versicolour and it does not make the right decision. Next, we use the Deng extropies given in Table 3 to generate other BPAs. We refer to these extropies as EX d (SL), EX d (SW), EX d (PL), EX d (PW). We use these values as significance of each sample property to evaluate the weight of each property. For the sample property sepal length we have We divide each weight for the maximum of the weights and use these values as discounting coefficients to generate new BPAs, as shown in Equation (5), see Table 4. Again, a combined BPA is obtained by using the Dempster rule of combination. The type of unknown sample is determined by combined BPA. Hence, the method based on the Deng extropy can make the right recognition. We tested all 150 samples and we get that the global recognition rate of Kang's method is 93.33% whereas the global recognition of the method based on the Deng extropy is 94%. The results are shown in Table 5.

Conclusions
In this paper, the Deng extropy has been defined as the dual measure of Deng entropy. Its relation with the analogous entropy has been analyzed and these measures have been compared in order to decide in which cases the first is better than the other one. Moreover, some examples are proposed. The behaviour of Deng extropy has been studied under changes of focal elements. We have given a characterization result for the maximum Deng extropy and, finally, a numerical example is discussed in order to highlight the relevance of this dual measure in pattern recognition.