A New Belief Entropy in Dempster–Shafer Theory Based on Basic Probability Assignment and the Frame of Discernment

Dempster–Shafer theory has been widely used in many applications, especially in the measurement of information uncertainty. However, under the D-S theory, how to use the belief entropy to measure the uncertainty is still an open issue. In this paper, we list some significant properties. The main contribution of this paper is to propose a new entropy, for which some properties are discussed. Our new model has two components. The first is Nguyen entropy. The second component is the product of the cardinality of the frame of discernment (FOD) and Dubois entropy. In addition, under certain conditions, the new belief entropy can be transformed into Shannon entropy. Compared with the others, the new entropy considers the impact of FOD. Through some numerical examples and simulation, the proposed belief entropy is proven to be able to measure uncertainty accurately.


Introduction
How to measure uncertainty is a meaningful question to be solved. Our work will also discuss this issue. First of all, we need to know what uncertainty is.
In 1960, Dempster [3] proposed upper and lower probabilities to solve the multivalued mapping problem. In 1971, Shafer [4] completed the theory proposed by Dempster and formed evidence theory, also called D-S theory. After years of exploration, D-S theory is a very effective tool for modeling and processing information uncertainty. In 1948, Shannon [23] used the concepts in thermodynamics to define information entropy. Under probability theory, Shannon entropy was very good at measuring the degree of information uncertainty. However, D-S theory is easier than probability theory for getting prior data, and the former has the advantage of fused data. Thus, we introduce D-S theory to replace probability theory in uncertainty.
D-S theory uses basic probability assignment (BPA), which is under the frame of discernment (FOD), to represent the degree of support for a focal element. Different FODs may have different BPAs.

Preliminaries
The focus of this paper is based on D-S theory and information entropy. We divide this section into two parts. In the D-S theory section, some basic concepts will briefly be introduced. In the information entropy section, we will introduce two typical representatives, the Hartley measure and Shannon entropy.
The idea of D-S theory is based on the frame of discernment X, which X = {x 1 , x 2 · · · x n }, and the set of all subsets in X is called the power set 2 X . In the power set 2 X , it contains 2 |X| elements. |X| means the cardinality of X, which is the number of elements in X. Under this frame, Dempster and Shafer defined some basic concepts as follows.

Basic Belief Assignment
Based on the above power set 2 X , function m :2 X → [0, 1] satisfies: The function m(a) is also called a basic probability assignment or mass function. If m(a) > 0, then a is a focal element. m(a) means the value of trust that the object belongs to a. The larger m(a) is, the higher the trust value is.
Some definitions of BPA are as follows. The vacuous BPA means entirely unknown for the true result. In contrast, from the Bayesian BPA, we can know to which category the target should belong.

Belief Function
The belief function is the sum of the basic probability assignments for all subsets of a and is given by: It is the lower limit of support for a.

Plausibility Function
The plausibility function is the sum of the basic probability assignments for all subsets that intersect with a and is given by: It is the upper limit of support for a. The value, between the belief function and plausibility function, is the degree of uncertainty for evidence.

Dempster's Combination Rule
Dempster's combination rule is the most commonly used method in evidence fusion. This rule takes into account the degree of conflict between the evidence and defined conflict coefficient k to measure the degree of conflict among different evidence.
Suppose m 1 and m 2 are independent BPAs from the different evidence resources, respectively. The fusion result of m 1 and m 2 under Dempster's combination is as follows: where k is a conflict coefficient, defined by: Notice that Dempster's combination rule is invalid,if two bodies of evidence completely conflict (k = 1). Furthermore, if k > 1, Dempster's combination rule cannot be applied to the two BPAs' fusion.

Origin of Information Entropy
Different authors have measured information uncertainty in a variety of ways, and Hartley and Shannon laid the foundation for it. The information entropy and their extended models have been applied to many fields [59]. Next, we will briefly introduce the Hartley measure and Shannon entropy.

Hartley Measure
Suppose X is an FOD and a is a subset of X. Then, the Hartley measure [24] is defined as: where |a| means the cardinality of a.
Obviously, the measurement is proportional to the cardinality of a. When a is a singleton of X, H(a) = 0, this means there is no conflict. Unfortunately, the measurement method of Hartley does not show the effect of the probability distribution on the degree of uncertainty.

Shannon Entropy
In 1948, Shannon [23] proposed information entropy, namely Shannon entropy. His model uses the concept of entropy from thermodynamics.
where P(x) is the probability of x and P(x) satisfies ∑ x∈X P(x) = 1.
As he said in his thesis, the role of information is used to eliminate the uncertainty. Shannon entropy is an excellent way to measure and eliminate uncertainty. It played a crucial role in solving the probability problem. We can conclude from his definition that it is based on the probability distribution. With the emergence of D-S theory, the information entropy was given a new meaning. The format of our new model is also derived from the Shannon entropy.

Properties of the Uncertainty Measure in D-S Theory
According to Klir and Wierman [60] and Klir and Folger [61], we introduce some important properties of entropy for D-S theory, including non-negativity, maximum, monotonicity, probability consistency, additivity, sub-additivity, and range. These properties for a measure that captures both discord and non-specificity are defined as follows.

Non-Negativity
Suppose m is a BPA on FOD X; the entropy H(m)must be: where this is equality if and only if m(x) = 1 and x ∈ X.
Only when entropy satisfies the non-negativity property, it provides a standard for measurement uncertainty.

Maximum Entropy
It makes sense that the vacuous BPA m v for uncertainty is lager than other normal BPAs m n . Thus, the maximum entropy property is defined as:

Monotonicity
As the number of focal elements in FOD increases, so should the degree of uncertainty. The monotonicity property is defined as: where m X and m Y are the vacuous BPAs for FOD X and FOD Y. Meanwhile, |X| > |Y|.

Probability Consistency
Let m B be a Bayesian BPA, and then, the entropy should be the same as Shannon entropy. Therefore, the probability consistency property follows as: where H S is the Shannon entropy and P X is the BPA of X corresponding to m b .

Additivity
Let m X and m Y be independent BPAs on FOD X and FOD Y, respectively. ⊕ means Dempster's combination rule. Thus, the additivity property is defined as: where where m is the m X and m Y combined by Dempster's combination rule.

Sub-Additivity
Let m be a BPA on FOD {X × Y}. Let m ↓X and m ↓Y be the marginal BPAs of FOD X and FOD Y. Then, define:

The Development of Entropy Based on D-S Theory
In this section, some belief entropies of BPAs in D-S theory proposed by others are reviewed. We also discuss whether or not these models satisfy the properties we list.
Yager [62] defined the belief entropy using the conflict coefficient between two focal elements, simplified as follows: where Pl(a) is the plausibility function associated with a under m. The entropy of Yager only measures the degree of conflict between evidence. H Y (m) only satisfies the additivity property. Dubois [27] used a new information measurement method to get the new formula of entropy.
From the definition of Dubois, this entropy only answers the question of the non-specific part of the uncertainty. If m is a Bayesian BPA, then H D (m) = 0. It is noticeable that H D (m) is clearly a weighted Hartley [24] measure. H D (m) satisfies the maximum entropy and monotonicity properties.
Nguyen [26] defined a new entropy according to Shannon entropy.
From the definition format, it only uses the BPA to capture the part of the conflict. This is inaccurate for uncertain measurements. It only satisfies the probabilistic consistency property and the additivity property.
Lamata and Moral [63] used the entropy theory proposed by Yager and Dubois.
They both have two components: one measures the innate contradiction, while the other measures the imprecision of the information. This definition does not satisfy the maximum entropy and sub-additivity properties.
Jiroušek and Shenoy [29] entropy is a combination of the Shannon and Dubois definitions.
where Pl_P m is the normalized result of plausibility function Pl m . The first part is the measurement of conflict based on Shannon entropy, and the second part is to measure the non-specificity portion of uncertainty. The entropy of H J&S (m) satisfies non-negativity, maximum entropy, monotonicity, probability consistency, and additivity. Klir and Ramer [28] defined: Due to the Yager entropy not concluded the broader view of conflict (it only considered the conflict situation of B ∩ A = ∅), Klir and Ramer proposed a new method to solve this problem. It is easy to see that this entropy can measure the conflict of evidential claims within each body of evidence in bits. However, under certain conditions, it is difficult for H K&R (m) to express the aspect of uncertainty. It just does not satisfy the maximum entropy property.
Nikhil R. Pal [30,31] focused on nonspecificity and randomness under a total uncertainty environment.
They summed up the methods proposed by Lamata and Moral and Klir and Ramer. It was pointed out that there would be mistakes against common sense in certain situations. The first part is, in some sense, analogous to Yager's entropy, and the second part measures the conflict of the body of evidence. It does not satisfy the maximum entropy property.
He finally proved that as the evidence changes, the entropy becomes more sensitive. Deng [32] defined an entropy: As proven by Joaquín Abellán [66], the Deng entropy does not satisfy the monotonicity, additivity, and subadditivity properties.
Pan and Deng [33] developed Deng entropy and defined it as follows: where Bel(a) and Pl(a) are the belief function and plausibility function, respectively. H Bel (m) uses the interval probability to measure the discord and non-specificity uncertainty of BPA. It does not satisfy the maximum entropy, additivity, sub-additivity, and range properties. W [34] is another modified model based on Deng entropy: where is a constant and ≥ 0, f (|X|) is the function about the cardinality of X. is a change number; it can take different values to represent different entropies. However, as the parameter changes, it has little effect on the value of W entropy [34].

A New Belief Entropy Based on Evidence Theory
As introduced at the start of the first chapter of Shafer's book [4], D-S theory is a theory of evidence. That means using the mathematical form to express the degree of support for evidence.
Based on the entropy proposed by previous scholars, for the measurement method of information uncertainty, there remain several aspects of the frame of discernment about which relatively little is known. In D-S theory, if we have the same cardinality of BPA, but different FODs, the results of uncertainty should be changed. However, most of them we listed above only focused on the value of BPA or the cardinality of every BPA, and the effect of FOD was totally ignored. Thus, these definitions cannot measure the degree of uncertainty under different FOD. To improve these deficiencies, we suggest that the FOD is also important for the measurement of uncertainty. Therefore, we introduce the scale of FOD to our new entropy. The new belief entropy based on D-S theory, namely B& F entropy, is defined as follows: where |a| denotes the cardinality of the focal element a and |X| equals the number of elements in FOD. Like some of the definitions we mentioned, the new definition can be represented by a combination of other entropies. Thus, the new entropy also can be expressed as: (26) where H N (m) is Nguyen's entropy and H D (m) is Dubois' entropy. Obviously, the new entropy is a combination of H N (m) and |X| times H D (m). Similar to most of the belief entropies, the first component ∑ a∈2 X m(a)log 2 m −1 (a) in the new belief entropy is designed to measure the discord uncertainty of BPA. At length, the second component ∑ a∈2 X m(a)log 2 |a| |X| is the measure of non-specificity of the mass function among various focal elements [27,32,61]. In addition, it can capture the information about the size of cardinality. When m is a Bayesian BPA or the cardinality of FOD equals one, the new entropy degenerates to Pal's definition.
The most important information about FOD is the quantity of the focal element, namely |X|. If |X| is modified, the accuracy of uncertain measurement will be affected. Here, we use an example to show that |X| is the best way to represent the information of FOD.
As shown in Figure 1, it is obvious that log 2 |X| and 2 |X| cannot reflect the effect of FOD on entropy very well. When the cardinality of FOD is greater than 10, log 2 |X| is almost constant, but 2 |X| is very large. Thus, |X| can well contain the information of the FOD size. Let c = a × b ∈ 2 X×Y , where a, b, c is a focal element and X, Y means the FOD. Meanwhile, a ∈ 2 X and b ∈ 2 Y . According to the definition of the above properties, m(c) = m ↓X (a) × m ↓Y (b), where m ↓X is the marginal BPA for X and m ↓Y is the marginal BPA for Y.
We can see from the above proof that the new entropy satisfies the additivity property, if and only if |X| = |Y| = 1. Otherwise, the new belief entropy neither satisfies the additivity property nor sub-additivity.
To be more intuitive, we consider the following example: Let Z be the product of FOD X = {x 1 , x 2 } and FOD Y = {y 1 , y 2 , y 3 }. We have that BPA on Z is m, and the marginal BPAs on X and Y are m ↓X and m ↓Y . We suppose the case on Z is shown as follows: Thus, the BPAs on X and Y are: The calculation results are as follows: From the above results we proved, the new belief entropy satisfies the non-negativity, monotonicity, and probability consistency properties, and does not satisfy the maximum entropy, additivity, subadditivity, and range properties.

Numerical Example and Simulation
In the first part of this section, some examples are given to illustrate the effectiveness of the new belief entropy. The influence of different BPAs on B&F entropy is shown in the second section.
Bayesian BPA m 1 , m 2 , m 3 of these FODs is equal. Their BPAs are as follows: The new belief entropy is calculated as follows: It is obvious that uncertainty increases as the number of focal elements increases. This is reasonable.

Example 3
Using the FOD raised by Example 2 and the vacuous BPAs Comparing Example 2 and Example 3, it is easy to get that the results of the vacuous BPA are bigger than the results of the Bayesian BPA.

Example 4
In this example, we compare the difference between Pal entropy and B&F entropy. Let FOD X 1 = {x 1 , x 2 , x 3 , x 4 , x 5 } and X 2 = {x 1 , x 2 , x 3 , x 4 }. Meanwhile, suppose the following two situations exist: Thus, the Pal entropy and B&F entropy results calculated and compared are the following: 3 + 0.7 × log 2 2 5 0.7 = 2.8985 + 3.8602 = 6.7587 We can draw the following conclusions: By comparison, we can conclude that the result of B&F is more reasonable. Because of C 2 has fewer focal elements and they have the same element x 3 in two BPAs, therefore, the uncertainty of C 1 should be bigger than the uncertainty of C 2 .
From an overall view, as long as the focal elements for every BPA are equal, the results of Pal entropy keep constant, even if the number of focal elements on FOD is different. This is unreasonable. However, for the new belief entropy, it reflects the impact of the number of FODs on information uncertainty. Obviously, the degree of information uncertainty is proportional to FOD. Thus, the new definition proposed in this paper is more reasonable for the above Section 6.1.4.

Example 5
In this example, we suppose a FOD that has ten focal elements, X = {x 1 , x 2 . . . We chose ten subsets of 2 X to assignment B and used Dubois entropy, Deng entropy, Pan-Deng entropy, and the new belief entropy for comparison. In Section 4, we already listed these definitions of entropy. When B i changes, their values can be calculated by MATLAB. The calculation results of these definitions are shown in the following Table 1.  Where the P& D entropy in Figure 2 is the H Bel (m) we listed in Section 4. Although the four entropy values in Figure 2 increased, their slopes were different. Deng entropy and Pan-Deng entropy increased linearly, while the slopes of Dubois entropy and the new entropy decreased with the increase of the cardinality of B. We believe that the growth trend of the latter was more reasonable. This was because the scale of B was an important indicator to measure the change of information uncertainty, which should change with the size of cardinality. With the same cardinality of B i , our new belief entropy was larger than the Dubois entropy. It could well reflect the degree of uncertainty. Therefore, through comprehensive analysis, we considered that the new belief entropy was more accurate.
Yager entropy, Pal entropy, Klir and Rammer entropy, and Jiroušek and Shenoy entropy are plotted in Figure 3.
From Figure 3, it can be seen that these definitions kept a small value. The degrees of uncertainty measured by Klir and Rammer and Yager decreased visibly with the increasing of the elements in B.
This was understandable. The uncertainty measures proposed by Pal and Jiroušek and Shenoy were nearly linear with the cardinality of B. They had the same growth trend as Deng entropy. Where the J& S entropy in Figure 3 is the H J&S (m) we listed in Section 4.

Example 6
In recent years, much research has been modified based on Deng entropy theory [33,34,37]. In this example, we chose to compare W entropy and our new model.
Although W entropy takes into account the scale of FOD, the effect of the scale of FOD on W entropy is very limited [34]. As Equation (26) shows, the value of our new model would change exponentially with the scale of FOD. As they showed in their examples, when increased from zero to 10, the change trend of W entropy was almost the same as Deng entropy. However, as we demonstrated in Section 6.1.5, the growth trend of B&F entropy was different from Deng entropy. Therefore, we could see the effectiveness and superiority of the proposed entropy.

Example Summary
Based on the examples proposed above, we list some typical cases that may affect the new belief entropy and compare it with other entropies. From Section 6.1.2 and Section 6.1.3, we could see that the new entropy was more sensitive to the vacuous BPA. Section 6.1.4 shows the limitations of the general entropy, and the new entropy could solve the problem caused by the different number of FODs. Section 6.1.5 reflected the change of the new entropy and other entropies as the number of elements increased. In Section 6.1.6, we made a simple comparison between W entropy and B&F entropy.

Simulation
Here, we use MATLAB to complete the test. This test could more intuitively feel how the new belief entropy changed with the different BPAs.
We supposed an FOD X = {x 1 , x 2 }. This FOD had three BPAs, m({x 1 }) = p 1 , m({x 2 }) = p 2 , and m({x 1 , x 2 }) = 1 − p 1 − p 2 . p 1 and p 2 can take any value from zero to one. However, according to D-S theory in Section 2, we limited the value of these BPAs, where m({x 1 }) + m({x 2 }) = p 1 + p 2 ≤ 1. Obviously, m({x 1 , x 2 }) exists only when m({x 1 }) + m({x 2 }) < 1. The simulation results are as Figures 4 and 5 show, where the x-axis is m({x 1 }), the y-axis is m({x 2 }), and the z-axis means the value of the new entropy.  Analysis: These simulation results suggested that the main trend of the new entropy was changing with different BPAs. It also indicated that the new entropy increased as the vacuous BPA increased, when p 1 + p 2 < 1, which was reasonable. Therefore, the new entropy could reflect well the degree of measurement of information uncertainty.

Conclusions and Discussion
First of all, we reviewed some earlier definitions proposed by Hartley, Shannon, Yager, Nguyen, Lamata and Moral, Jiroušek and Shenoy, Klir and Ramer, Dubois, Nikhil R. Pal, Joussemle, Deng, and Pan-Deng. However, none of them reflected the number of FODs' effect on uncertainty.
We discussed an open issue, which was how to measure information uncertainty. Our principle was to include as much known information as possible under D-S theory. Thus, in this paper, we considered the cardinality of FOD and defined a new model to measure uncertainty. Meanwhile, some properties of the new entropy were discussed. The result of the examples and simulation proved that the new entropy could be more effective and accurate when compared with other entropies.
When the target belonged to the set of clusters and the total number of targets could not be determined, our method could get the information uncertainty from the target accurately. Compared with traditional methods, the new entropy was easy to calculate. This meant that in the same time, it could process more data. In future work, we will apply it to solve practical problems and improve it in real applications.