An Improved Total Uncertainty Measure in the Evidence Theory and Its Application in Decision Making

Dempster–Shafer evidence theory (DS theory) has some superiorities in uncertain information processing for a large variety of applications. However, the problem of how to quantify the uncertainty of basic probability assignment (BPA) in DS theory framework remain unresolved. The goal of this paper is to define a new belief entropy for measuring uncertainty of BPA with desirable properties. The new entropy can be helpful for uncertainty management in practical applications such as decision making. The proposed uncertainty measure has two components. The first component is an improved version of Dubois–Prade entropy, which aims to capture the non-specificity portion of uncertainty with a consideration of the element number in frame of discernment (FOD). The second component is adopted from Nguyen entropy, which captures conflict in BPA. We prove that the proposed entropy satisfies some desired properties proposed in the literature. In addition, the proposed entropy can be reduced to Shannon entropy if the BPA is a probability distribution. Numerical examples are presented to show the efficiency and superiority of the proposed measure as well as an application in decision making.


Introduction
Uncertain information processing is a hot topic in intelligent information processing of theory and applications [1][2][3]. Many theories have been proposed to address the problem such as fuzzy set theory [4], probability theory [5], Dempster-Shafer evidence theory (DS theory) [6,7], etc. First proposed by Arthur Dempster and Glenn Shafer, Dempster-Shafer theory has been widely used in many fields such as risk evaluation [8][9][10], sensor data fusion [11], decision making [12,13], image recognition [14], classification [15,16], clustering [17,18] and so on [19,20]. However, some open issues remain unresolved. First, the fused information based on Dempster combination rule may generate counterintuitive results if evidence is highly conflicted with each other [21][22][23]. Second, many properties have been built to evaluate different uncertainty measures [24,25], but the rationality for some of these properties is still questionable. Third, during the process of calculating entropy, some characteristics of the basic probability assignment are lost. Fourth, the evidence in the open world assumption needs further consideration [26,27]. The uncertainty measure including belief entropy can be a promising method for uncertain information modeling, processing, and management of conflict in DS theory [28,29].
Some basic preliminaries are introduced in this section.
Let X be a set of mutually exclusive and exhaustive events, denoted as: where the set X is named the frame of discernment (FOD). The power set of X is defined as follows: For a FOD X = {θ 1 , θ 2 , θ 3 .....θ |X| }, the mass function is a mapping m from 2 X to [0, 1], defined as follows: which satisfies the following condition: In DS theory, the mass function is also referred as basic probability assignment (BPA). If a subset a satisfies a ∈ 2 X and m(a) > 0, then a is called the focal element of m. If a BPA m contains a focal element X with belief 1, then the BPA m is a vacuous BPA.

Dempster's Rule of Combination
Assume there are two independent BPAs m 1 and m 2 ; the Dempster's rule of combination, which is denoted as m = m 1 m 2 , is defined as: follows [6]: where the K is is a normalization constant defined as follows: The normalization constant K is assumed to be non-zero. if K = 0, then m 1 and m 2 are in total-conflict and cannot be combined using Dempster's rule. If k = 1, m 1 and m 2 are non-interactive with each other, then m 1 and m 2 are non-conflicting.
The Dempster's rule of combination combines two BPAs in such a way that the new BPA represents a consensus of the contributing pieces of evidence. It also focus BPA on single set to decrease the uncertainty in the system based on the combination rule, which can be useful in decision making process.

Belief Entropy in Dempster-Shafer Evidence Theory
Several methods have been proposed to solve the problem of uncertainty measure in DS theory. Some previous methods are briefly summarized in Table 1.

The New Measure
To address the uncertainty in the FOD, an improved total uncertainty measure is proposed in this paper. The improved total uncertainty measure denoted as Q ( m) is defined as follows: ), where |A| denotes the cardinality of the focal element A, |X| is the number of element within X, and m is the BPA in FOD X. The first component ∑ A⊆X |A| |X| m(A)log(|A|) is designed as a weighted measure for the total non-specificity among focal elements. The second component ∑ A⊆X m(A)log 2 ( 1 m(A) ) can be interpreted as a portion to capture the uncertainty in the form of conflict. The improved total uncertainty measure also guarantees that the measure could be reduced to Shannon entropy in Bayesian probability cases. Notice that the coefficient |A| |X| is added to consider the size of FOD. As shown in Table 1, the number of elements contained in FOD is not included for most measures.

Desired Properties of Belief Entropy
In this paper, we consider some properties of entropy H(m) (m is a BPA) proposed in [40,51]. Let X and Y denote random variables with state spaces Ω X and Ω Y , respectively. Let m X and m Y denote distinct BPAs for X and Y, respectively. Let γ X and γ Y denote the vacuous BPAs for X and Y, respectively.
(1) Consistency with DS Theory semantics. If a definition of entropy of m, or a portion of a definition, is based on a transform of BPA m to a probability mass function (PMF) p m , then the transform must satisfy the following condition: P m 1 m 2 = P m 1 ⊗ P m 2 (2) Non-negativity. H m X 0, with equality if and only if there is a x ∈ Ω X where x is a single element set that satisfies m X ({x}) = 1. It is similar to probabilistic case.
If m X is a Bayesian BPA for X, then H(m X ) = ∑ x⊆Ω X m X ({x})log 1 m X ({x}) . In other words, in the case of Bayesian BPA, the entropy reduces to Shannon entropy [52].
(5) Maximum entropy. H {m X } H {γ X }, with equality if and only if m X = γ X . The entropy in [40] shows that it is rational that the vacuous BPA γ X has the most uncertainty among all cases since in this case γ X (Ω X ) = 1. One hundred percent belief degree is assigned to γ X (Ω X ), which provides no information to help with decision making because the belief of each proposition in FOD is completely unknown.
(6) Additivity. Given two distinct BPAs m X and m Y for X and Y, we can combine them using Dempster's rule, denoted as m X ⊕ m Y . Then, H(m X ⊕ m Y ) must satisfy the following equation: It is argued in [40] that, if a definition of entropy of m, or a portion of a definition, is based on a transform of BPA to a PMF, then the transform must satisfy the condition P m 1 ⊕m 2 = P m 1 ⊗ P m 2 where ⊗ is the combination rule in probability theory, and ⊕, as mentioned in Section 2, is Dempster's combination rule. Notice that only if a transform is used, then it must be consistent with Dempster's rule. Since none of the methods except for Jirousek-Shenoy Equation in Table 1 use transform of BPA to a PMF, we do not discuss the Consistency with DS Theory semantics property in this article.
The Set consistency property requires that H(γ X ) = log 2 |Ω X |. The probability consistency property would require that, for the Bayesian uniform BPA m u , H(m u ) = log 2 |Ω X | as well. Thus, these two requirements would entail that H(γ X ) = H(m u ). On the contrary, the Maximum entropy property indicates that the entropy of vacuous BPA H(γ X ) should be maximum, H(γ X ) > H(m u ). Before further analysis of these two properties, first consider the following the example.

Example 1.
Suppose there is a race with three bikes. We have two experts that make following the statements.
• Expert 1: "The three bikes and riders are similar." • Expert 2: "I do not have information about the characteristics of each bike and rider." If we represent these information in BPA, the opinion of Expert 1 produces a uniform distribution ( 1 3 , 1 3 , 1 3 ) and the second one a vacuous BPA. The question is, if we must bet on a bike that will win this race, on which should we place our bet following the information of these experts? In this case, we do not have anything that allows us to bet on a bike, thus our final decision will be made randomly. Hence, the information, or uncertainty, should be the same. In both cases, it must reach the maximum uncertainty value. Therefore in this paper, we modify the Maximum entropy property instead of adopting the original property: The Range property requires that, for all BPAs, the value of entropy should be bounded by log 2 |Ω X |; we disagree. In Shannon's information theory, the maximum number of bits required to represent the uncertainty of a system with n status is log 2 (n), which is reasonable since the system can be represented only by state number n. However, in this case, the BPAs focus on several subsets (up to 2 n − 1) and each of them can have non-empty intersection with others. However, because of the non-specificity part of total uncertainty, one cannot even say that the total uncertainty should be bounded by log 2 (2 n − 1). Based on this analysis, we do not adopt Range property.
Based on the analysis above, we list seven properties that may be satisfied by uncertainty measure in DS theory:

Property of the New Measure
The analysis of the property for the proposed measure is presented as follows.
(1) Non-negativity. The first component ∑ A⊆X (2) Monotonicity. Suppose a vacuous BPA γ X ; then, Q(γ X ) = |X|. Since it is monotonic in |X|, Q(m) satisfies the monotonicity property. (4) Maximum entropy. Suppose a Bayesian uniform BPA m u and n denotes |X|; therefore, for each focal element, m(A) = 1 n . Suppose another vacuous BPA γ X with n denoting |X|. After calculation, it is clear that the entropies of both cases reach the maximum value of log 2 n at the same time, thus Q(m) satisfies the Maximum entropy property.
BPAs for X and Y are listed as follows: The BPA m has the following masses: The values of uncertainty via Q(m) are: Clearly, Q(m) does not meet the requirements of Additivity property. The second component ∑ A⊆X m(a)log 2 ( 1 m(A) ) satisfies the additivity property that the log of a product is the sum of the logs.
Let R be the product space of X, Y,

Numerical Examples
In this section, some typical numerical examples are presented to show the effectiveness of the proposed measure.
Example 2 (Adopted from [28]). Given a frame of discernment X with 15 elements which are denoted as  Table 2 and Figure 1 list the values of the proposed entropy Q(m) when A increases. Figure 2 shows that the uncertainty degree measured by the proposed measure increases along with the growing size of A.
It is rational since more information volume becomes unknown if the size of A rises. Figure 2 lists the performance of other uncertainty measures including Dubois-Prade's entropy [24], Höhle's entropy [49], Yager's entropy [48], Klir-Ramer's entropy [46], Klir-Parviz's entropy [49], and Pal et al.'s entropy [50]. According to Figure 2 On the contrary, other measures either decrease irregularly or fluctuate as the size of A increases. This is because other methods do not consider the size of A and X simultaneously in the definition. Using more available information means less uncertainty.    According to the definitions in Table 1, the uncertainty of m 1 and m 2 can be calculated with different uncertainty measures; the results are shown in Table 3. The results calculated by Deng entropy, Pal et al.'s entropy, and Dubois-Prade entropy fail to show the difference of uncertain degree among the two bodies of evidence. The FOD of m 1 consists of four elements denoted as a, b, c, and d, while the FOD of m 2 only has three elements denoted as a, b, and c. It is expected that the uncertainty of these two BPAs should be different. Deng entropy, Pal et al.'s entropy, and Dubois-Prade entropy fail to measure the difference between these BPAs, while the proposed method can effectively measure the difference by considering the size of the FOD. The final result seems rational; although there fewer less elements in the second BPA m 2 , the intersection in m 2 provides more uncertainty while all elements in m 1 are independent from each other. Therefore, the proposed method appears to be a reasonable way to measure uncertainty of evidence under such circumstances.

Application in Conflict Data Fusion
In this section, the proposed measure is applied to a case study on conflict data-based decision making. The dataset is adopted from [53,54].

Problem Statement
Supposing that the FOD is Θ = {F 1 , F 2 , F 3 }, which consists of three types of faults for the machines. The diagnosis sensors are denoted as S = {S 1 , S 2 , S 3 , S 4 , S 5 }. Five sensors are positioned on different places for collecting diagnosis data. The results represented by BPAs are shown in Table 4.

Decision Making Procedure
The process for decision making based on the improved belief entropy is proposed in Figure 3. Six steps are designed as follows. Step 1 Data from sensors are modeled as BPAs. As shown in Table 4, each piece of evidence is modeled as a BPA.
Step 2 Measure the uncertain degree using the improved total uncertainty measure in Equation (7). Generally, the more dispersive is the mass value assigned among the power set, the bigger is the entropy of the BPA . The entropy of each BPA is calculated as follows: Q(m 1 ) = 1.5664, Q(m 2 ) = 0.4690, Q(m 3 ) = 1.4878, Q(m 4 ) = 1.5700, and Q(m 5 ) = 1.4955. Notice that entropy Q ( m 2 ) is much smaller than the others because m 2 assigns a belief of 90% on F 2 , while other BPAs are more dispersive.
Step 3 Calculate the relative weight based on the uncertain degree of each evidence. It is commonly accepted that the bigger is the entropy, the higher is the uncertain degree. The relative weight of each BPA is defined according to the new belief entropy. For the ith BPA, the corresponding relative weight among all n BPAs is defined as follows: The relative weight of each BPA in Table 4 can be calculated with Equation (11 Step 4 Evidence modification for the original BPA using the proposed measure. By using the relative weight of each BPA, we unify the BPAs given by all sensors and generate one weighted BPA. The resulting weighted BPA is used in final data fusion. For a proposition A, the modified BPA can be derived according to the following function: According to Equation (12), the BPAs in Table 4 are modified and the result is as follows: Step 5 Evidence fusion using Dempster's rule of combination in Equations (5) and (6) with (n − 1) times based on the modified BPA. The final fusion result is as follows: Step 6 Decision-making based on data fusion results. From the original data in Table 4, the report from the second sensor is highly conflicted with the other sensors on F 1 and F 2 . Based on all the data in Table 4, intuitively, F 1 should be recognized as the potential fault type. The fusion results with five sensors have a belief of over 98% on the potential fault type {F 1 }.

Discussion
The result with different information fusion methods is shown in Table 5. The fusion results for fault type identification based on the proposed method with two, three, four, and five sensors are shown in Table 6.  Dempster's rule [7] m( Yager's rule [55] The proposed method Table 6. Fusion results of Q(m) with different number of sensor reports.  Figure 4 shows the performance of different combination methods with two sensor reports. One cannot make a decision based on the Yager's combination rule since, in this case, the universal set denoted as m({X}) has the highest belief among all the propositions. Other methods have a belief of more than 80% on F 2 , while according to the prior knowledge, the report of m 2 may come from a bad sensor and F 2 cannot be the potential fault. The result with the proposed method has the lowest belief on   Figure 5 shows the fusion results with three sensor reports. The result of Dempster's rule seems counterintuitive for assigning the highest belief on F 2 . The fusion result of other methods assign the highest belief on F 2 ; however, the proposed method is the only one that has over 90% on the right fault F 1 while none of other methods assign a belief of more than 60% on fault F 1 . In this case, the proposed method successfully identify the right fault with only three sensor reports with the highest belief degree. As shown in Figure 6, the fusion result of Dempster's combination rule with four BPAs still leads to the wrong fault type F 2 due to the conflicting report given by m 2 , while the proposed method has the highest belief degree than the other methods on the right fault type F 1 . The fusion results with all five sensors are shown in Figure 7; the proposed method has the highest belief of 98.49% on the right fault F 1 .   Table 4, F 1 should be identified as the target. For Yager's rule, because of the belief assigned to m({X}), the fusion result of five BPAs shows that only a belief of 77.32% is assigned to F 1 , while other methods except for Dempster's all have much better performance, with a belief of over 90% on F 1 .
The fault type identification based on the proposed method with two, three, four, and five sensors is indicated in Table 6. As shown in the table and Figure 5, the proposed method is the only one that has over 90% on the right fault F 1 when only three sensor reports are given. Suppose that the data in Table 4 only consist of m 1 , m 2 , and m 3 ; based on our intuition and the provided information, F 1 should be the target, and the result shows that the proposed method indeed identifies F 1 . Later fusion results with four and five BPAs also indicate that F 1 should be the right fault. From this point, the decision made based on the proposed method with limited number of reports has certain degree of validity.
According to the fusion results, the proposed method has several superiorities in comparison with the other methods. First, as shown in Section 4, the proposed method can effectively measure the uncertainty degree of two different BPAs even if the same belief value is assigned on different FODs, while both Deng entropy and the Dubois-Prade entropy are failed. Secondly, it is presented in Section 3 that Q(m) satisfies six of seven properties, ensuring that the fusion results in Table 5 are reasonable and consistent with our intuition. Even in special cases such as vacuous BPA or probability cases, Q(m) still presents rational result. In addition, as shown in Figure 5, the proposed method recognizes the right fault with limited number of reports while the other methods cannot make the right decision until there are four reports. The proposed method reduces the interference of conflicting evidence more efficiently than the other method due to the consideration on not only the uncertainty appearing in the mass function, but also the size of FOD and the size of each proposition. As shown in Table 1, only the proposed method considers the size of FOD. Finally, the proposed method is based on information volume measured by modified belief entropy, thus the physical meaning is clear.
There are several reasons that contribute to the performance of the decision-making procedure. First, all sensor data are preprocessed before the decision-making procedure. With the corresponding BPAs, our process successfully identifies the fault and eliminates the conflicting evidence by using the proposed method. The effectiveness and superiority proved in Sections 3 and 4 also guarantee the efficiency of our decision-making approach. Finally, the relative weight is calculated using the proposed measure and the final result is based on the Dempster's combination rule, which combines the merits of Dempster's combination rule and the effectiveness of the proposed method.

Conclusions
An improved total uncertainty measure based on total non-specificity the uncertainty in the form of conflict is proposed in this paper. The rationality of some previous properties are discussed and seven desired properties are listed to define a meaningful measure. The new measure satisfies six of seven desired properties of belief entropy. The proposed entropy not only captures the non-specificity and conflict in uncertainty, but also considers the size of FOD and the size of the proposition with respect to the FOD. Numerical examples show that the proposed entropy can quantify the uncertain degree of the BPA more accurately than the other uncertainty measures when the same belief value is assigned to different FODs. In the case of vacuous BPA and uniform distribution, the proposed method improves the performance of other measures and provides a result that is consistent with our intuition.
A decision making approach based on the proposed measure was applied to a case study. The fusion result shows the superiorities and effectiveness of the improved method in comparison with the other methods. In future studies, the proposed method will be applied in more real word applications such as image compression, image recognition, etc.