Paradox Elimination in Dempster–Shafer Combination Rule with Novel Entropy Function: Application in Decision-Level Multi-Sensor Fusion

Multi-sensor data fusion technology in an important tool in building decision-making applications. Modified Dempster–Shafer (DS) evidence theory can handle conflicting sensor inputs and can be applied without any prior information. As a result, DS-based information fusion is very popular in decision-making applications, but original DS theory produces counterintuitive results when combining highly conflicting evidences from multiple sensors. An effective algorithm offering fusion of highly conflicting information in spatial domain is not widely reported in the literature. In this paper, a successful fusion algorithm is proposed which addresses these limitations of the original Dempster–Shafer (DS) framework. A novel entropy function is proposed based on Shannon entropy, which is better at capturing uncertainties compared to Shannon and Deng entropy. An 8-step algorithm has been developed which can eliminate the inherent paradoxes of classical DS theory. Multiple examples are presented to show that the proposed method is effective in handling conflicting information in spatial domain. Simulation results showed that the proposed algorithm has competitive convergence rate and accuracy compared to other methods presented in the literature.


Introduction
Multi-sensor fusion means the combination of information from multiple sensors (homogeneous or heterogeneous) in a meaningful way so that we can overcome any limitations inherent to a single sensor or information source. Based on the identified strengths and weaknesses of previous work, a principled definition of information fusion is proposed in Reference [1]: "Information fusion is the study of efficient methods for automatically or semi-automatically transforming information from different sources and different points in time into a representation that provides effective support for human or automated decision making." A multi-sensor system has two distinct advantages over a single sensor system when used with a proper fusion algorithm: • A single sensor may provide faulty, erroneous results, and there is no way to modify that other than by changing the sensor. A multi-sensor system provides results with diverse accuracy.
With the help of a proper fusion algorithm, faulty sensors can be easily detected. • A multi-sensor system receives information with wide variety and characteristics. Thus, it helps to create a more robust system with less interference. Now, to combine inputs from different sensors at the decision level to achieve correct object classification, we need a robust decision-level sensor-fusion algorithm. As shown in Figure 1 [2], sensor fusion can be represented at three different levels. The signal level can be explained if raw pixels from multiple cameras are combined. Feature-level sensor fusion can be explained by the following. Hand-coded features (area or moment of certain object) are extracted from images. Then, features are fused using some clustering algorithm. The final decision is made using the output of the clustering algorithm. At the decision level, a decision is already made by using a supervised/unsupervised algorithm. Then, decisions from multiple sensors are fused using Bayes or another information fusion algorithm. At the decision level, a crucial issue in multi-source information fusion is how to represent and determine the imprecise, fuzzy, ambiguous, inconsistent, and even incomplete information [3]. As a tool to manipulate an uncertain environment, Dempster-Shafer (DS) evidence theory is an established system for uncertainty management [4,5]. The limitations of the original DS combination rule and works to eliminate them are discussed in Section 4.

Frame of Discernment (FOD)
The frame of discernment contains M mutually exclusive and exhaustive events (also represented by X in this research).
The representation of uncertainties in the DS theory is similar to that in conventional probability theory and involves assigning probabilities to the space Θ. However, the DS theory has one significant new feature: it allows the probability to be assigned to subsets of Θ as well as the individual element θ i . Accordingly, we can derive the power set 2 Θ of DS theory: where φ is an empty set. It is clearly seen in Equation (2) that the power set 2 Θ has 2 M propositions. Any subset except singleton of possible values means their union. For example, {θ 1 , θ 2 , θ 3 } ≡ {θ 1 ∪ θ 2 ∪ θ 3 }. Complete probability assignment to a power set is called basic probability assignment (BPA).

Basic Probability Assignment (BPA)/Mass Function
Evidences in DS theory are acquired by multi-sensor information. Mass function (mass) is a function, m : 2 Θ → [0, 1] that satisfies Equations (3) and (4): m is called basic probability assignment. Elements of power set having m(θ) > 0 are called focal elements. This can be explained with the help of a simple example. Let the three objects to be detected be, The four subsets are called focal elements.

Dempster-Shafer Rule of Combination
The purpose of data fusion is to summarize and simplify information rationally, obtained from independent and multiple sources. DS combination rule emphasizes the agreement between multiple sources and ignores all the conflicting evidences through normalization. Any two mass functions B and C over the same FOD with at least one focal element in common can be combined into a new mass function using DS combination rule. The combination of two mass functions can also be said to take the orthogonal sum ⊕. The combination of two belief functions, the DS combination rule for combining two evidences m 1 and m 2 , is defined as follows: when A = φ and m(φ) = 0.
where K is the degree of conflict in two sources of evidences. The denominator (1 − K) is a normalization factor, which helps aggregation by completely ignoring the conflicting evidence and is calculated by adding up the products of BPAs of all sets where intersection is null. DS combination rule in Equation (5) conforms to both commutative law and associate law.

Belief and Plausibility Function
Given a basic assignment m, we can define a belief function Bel : m : 2 Θ → [0, 1], such that for any A ⊂ Θ: Bel (A) measures the belief that the element is a member of A. m(A) measures the amount of belief that one commits exactly to A alone; Bel (A) measures the total belief that the special element is in A. Based on the same premise, we have the following: Pl(A) measures the degree to which one fails to doubt A. Pl(A) measures the total belief mass that can move into A, whereas Bel(A) measures the total belief mass that is constrained to A. Similarly, Bel and Pl values can be calculated for all the BPA, which is shown in Table 1.

Paradoxes (Source of Conflicts) in DS Combination Rule
Dempster-Shafer theory, introduced and developed by Dempster and Shafer [6][7][8], has many merits by contrast to Bayesian probability theory [9]. However, to use DS sensor fusion algorithm for robust application, we have to overcome the fusion paradoxes. Based on the application in a multi-sensor system, this theory also has its shortcomings [10]. The different levels of performance of sensors, cluster, and interference of a complex environment may lead to conflicts among evidences. When evidences are highly conflicting, the fusing results obtained by the DS combination method are normally contrary to common sense. When the conflicting factor K is close to 1, this rule cannot obtain reasonable fusing results as the denominator is approximately 0. These counterintuitive phenomena of the DS theory are called paradoxes. According to Reference [11], there are mainly three types of paradoxes.

Completely Conflicting Paradox:
In this situation, there are two sensors and one sensor output completely contradicts the other sensor output. The following example depicts the situation: Example 2. In the multi-sensor system, assume that there are four evidences in the frame, that Θ = {A, B, C}, and that proposition A is true. Here, the two sensors are completely conflicting each other. The conflicting factor in Equation (6) is K = 1, which reports that evidences from sensor 1 and sensor 2 are completely conflicting. Under such circumstances, the DS combination rule cannot be applied.

"One Ballot Veto" Paradox
For a multi-sensor system (more than two sensors), one sensor completely contradicts all other sensor outputs. The following example depicts the situation:

"Total Trust" Paradox
Here, one sensor highly contradicts the other sensor but both of them have a common focal element with low evidence. The following example depicts the situation: Applying DS combination rule, we get m 12 (A) = 0, m 12 (B) = 1, and m 12 (C) = 0, K = 0.99. Here, common sense suggests that either m(A) or m(C) is correct, but the wrong proposition B is identified to be true with total confidence even though senor 1 and 2 nearly negates this idea.

Eliminating the Paradoxes of DS Combination Rule
Existing modified methods are divided mainly into three categories:

Modification of DS Combination Rule
Smet's rule [12] is essentially the Dempster rule applied in Smet's Transferable Belief Model. Smet believed that conflict is caused by incompleteness of frame of discernment Θ and moved mass of conflict directly to φ as an unknown proposition. This model is a slightly different formulation of DS theory, but the ideas are essentially the same. In Yager's rule [13], the mass associated with conflict is directly given to universal set Θ. Yager's rule provides the same results when conflict is zero. Although these two methods solve the conflict situation theoretically, the uncertainty of the system still exists. Bicheng et al. [14] modified Yager's rule and conflicting probability of the evidences are distributed to every proposition based on average support. Inagaki [15] defined a continuous parameter class of combination operations, which subsumes both DS and Yager's rule. Depending on conflict of information, his combination rule changed between DS and Yager combination rule. However, based on experience, if an engineer applied a weighting factor to one of the sensors credibility, this rule cannot be applied. Zhang [16] pointed out that DS rule fails to take into account the focal element intersection. He presented the "two frame" representation of DS theory, where he measures focal element intersections based on cardinality. Li [17] used the interaction between focal elements and proposed two weighted redistribution methods, which consider the associative relationship among the evidences collected from multi-sources. His argument was that, if a body of evidence is greatly supported by others, this piece of evidence should be more important and has great effect on the final combination results. On the contrary, if a body of evidence is highly conflicting with others, this piece of evidence should be less important and has little effect on the final combination results. However, all these methods sometimes violate the theoretical properties of DS combination rule like commutativity and associativity.

Revision of Original Evidence before Combination
Commutative and associative properties of DS rule are important for multi-sensor information fusion, which may get lost when the original rule is tampered with. As a result, the propositions are modified so that conflict among the evidences are resolved before applying them in DS combination rule. Chen et al. [18] used triangular functions to set a fuzzy model for each sensor. Assuming each sensor output is gaussian, BPA was determined from the sensor outputs using the fuzzy model. Then, the raw BPA was weighted using the credibility of each BPA before fusing. Sun [19] also used fuzzy membership function to convert sensor values to fuzzy values. Support degree was calculated using an error distance function. If sensor output is not gaussian, then fuzzy set methods cannot be applied. Instead of distance function, an entropy function (Deng entropy [9]) was used to calculate the credibility of evidence in Reference [20]. This was inspired by Murphy's method [21], which used an average of BPAs. Murphy's method had a fast convergence rate but failed to consider the relation between focal elements. Jiang [22] used an entropy function to measure the weight of the evidence to modify them before applying to DS rule. Xiao [23] used almost the same procedure as Jiang but with a different distance function to measure the credibility. Murphy's method is the simplest to implement, and most of the methods within this type are inspired by his method.

Hybrid Technique Combining Both Modification of DS Rule and Original Evidence
Through the comparison between two kinds of conflict resolutions, it is easy to see the underlying logic of two methods. Method 1 cancels the normalization step in DS theory and redistributes the conflict with different measure. Method 2 considers the essential differences between propositions of each sensor in multi-sensor systems and solves the conflict by modifying the original evidence. If methods 1 and 2 are combined, then the inherent paradoxes of DS rule are solved. Building on this idea, Lin et al. [24] and Ye Fang et al. [11] published several new improvements of original DS combination rule. They improved the fusion results, but the results were often too complicated and overengineered to apply for real-time use. These methods also lose commutative and associative properties of DS rule.
How to accurately measure the conflicting evidences under DS framework is still an open issue. Keeping the commutative and associative properties of the original DS combination rule and eliminating the paradoxes are critical for multi-sensor fusion. There is still room for improvement to properly measure the conflicts between evidences and to obtain appropriate weights for each evidence. Based on this, an improved combination method is proposed which follows "revision of original evidence before combination" method. A novel entropy function is proposed which can better capture the conflicts between evidences. Reward and penalty are imposed on evidences based on how they agree or disagree with each other. The amount of reward or penalty is determined by the entropy function. Then, the modified weight value (reward or penalty) is applied in adjusting the body of the evidences before using the Dempster's combination rule (n − 1) times, when there are n number of evidences (sensors). The simulation experiments illustrate that the proposed method is reasonable and efficient in coping with the conflicting evidences.

Entropy in Information Theory under DS Framework
Information is a measure of the compactness of a distribution; logically, if a probability distribution is spread evenly across many states, then its information content is low, and conversely, if a probability distribution is highly peaked on a few states, then its information content is high [25]. Information is a function of distribution. Entropy measures the compactness of a distribution of information. Entropy is zero when BPA is assigned to a single element, thus creating the most informative distribution. When BPA is uniformly distributed, entropy is at maximum and agrees with the idea of least informative distribution.
In information theory, Shannon entropy [26] is often used to measure the "amount of information" in a variable.
where n is the amount of basic states in a state space and p i is the probability of state i. It is clear that the quantity of entropy is always associated with the amount of states in a system. In the framework of DS evidence theory, the uncertain information is represented by both mass functions and the FOD. Deng entropy [9] considers both.
where |A| denotes the cardinality of the focal element A. Other works related to entropy under DS framework can be found in the literature [27]. Based on Shannon and Deng entropy, we propose a new belief entropy, which considers Bel and Pl of mass function, cardinality of focal elements, and number of elements in FOD. The goal of the proposed entropy is to capture the uncertainty of information under DS framework, which are omitted by Shannon and Deng entropy.
where |X| denotes the cardinality of X, which represents the number of element in FOD.
The exponential factor exp ( |A|−1 |X| ) in the new belief entropy represents the uncertain information in the number of elements of FOD that has been ignored by Deng entropy. This probability interval considers the lower and upper bounds of evidence that are Bel and Pl, respectively. The new belief entropy which considers Deng entropy and the interval probability can better measure the uncertainty of BPA.

Properties of Proposed Entropy Function
Property 1. Mathematically, the value range of the new belief entropy is (0, +∞). According to DS evidence theory, a focal element A consists of at least one element and the limit of its element number is the scale of FOD. FOD consists of at least one element, and there is no maximum limit; thus, the ranges of |A| and |X| are the same, denoted as [1, +∞).   The following example shows the properties of proposed entropy and how it is better at capturing uncertainties compared to Shannon and Deng entropy.  This showed that the result of the proposed entropy is identical to Shannon entropy and Deng entropy when the belief is only assigned on single elements (or Bayesian).  Table 2.  E Sh = −( 1 7 log 2 9 2 * 7 log 2 ( 9 2 * 3 * 7 . exp (1/3) ) + log 2 ( 1 7 . exp (2/3) )) = 6.79 Shannon entropy only considers mass function value and has the lowest entropy. Deng entropy considers both mass function value and cardinality on focal elements. It calculates higher entropy than Shannon. Proposed entropy considers mass function value (central value of probability interval), cardinality of both focal elements, and FOD. It results in the highest entropy value compared to Shannon and Deng. If a FOD consists of 7 elements compared to say 3 elements, intuitively it can be said that the 7-element FOD should have higher entropy because it is less compact. Also, because the proposed entropy considers central value of probability interval , it is capturing more uncertainty compared to only mass function. As a result, the proposed entropy function abides by the DS framework and is superior in capturing uncertainty compared to Shannon and Deng entropy.

Proposed Steps to Eliminate Paradoxes
With increasing use of sensors application in real-time decision making, we need an algorithm which can fuse sensor outputs both in the space domain and the time domain. The goal of the proposed method is to eliminate the paradoxes of the original DS combination rule and work as a decision-level sensor fusion algorithm in both the space and time domains. We are adopting "revision of original evidence before combination" because we do not want to lose the associative and commutative properties of the original DS rule. The proposed method is a distance-based method. It calculates the relative distances between the sensor evidences (classification output). Then, based on average distance, it classifies which sensor output is credible and which sensor output is incredible. Then, it penalizes the incredible sensor output using the novel entropy function so that the incredible sensor has less effect on the fused output. It also rewards the credible sensor input so that the credible sensor carries more weight towards the fused output. At the end, modified evidence is fused using the original DS sensor fusion equation. The following example is used to showcase the steps and to compare the final fused results with works from open literature.

Example 7.
In a multisensor-based target recognition system, assume there are three types of targets to be recognized: {A, B, C}. Suppose there are five sensors. They could be any type of sensors. After data acquisition at a specific moment by five sensors, data are processed and classification IDs are generated. Generated IDs from five sensors are listed as BPAs: This is a classic example of the "one ballot veto" paradox. Bel and Pl values can be calculated for all the BPA, which is shown in Table 3.
Step 2: Measure the relative distance between evidences. Several distance function can be used to measure the relative distance. They all have their own advantages and disadvantages regarding runtime and accuracy. We have used Jousselme's distance [28] function. Jousselme's distance function uses cardinality in measuring distance which is an important metric when multiple elements are present in one BPA under DS framework. The effect of different distance functions (Euclidean, Jousselme, Minkowsky, Manhttan, Jffreys, and Camberra distance function) on simulation time and information fusion can be found in the literature [29]. Assuming that there are two mass functions indicated by m i and m j on the discriminant frame Θ, the Jousselme distance between m i and m j is defined as follows: where D = |A∩B| |A∪B| and |.| represents cardinality.
Step 3: Calculate sum of evidence distance for each sensor.
Step 4: Calculate global average of evidence distance.
Step 5: Calculate belief entropy for each sensor by using Equation (11), and normalize.
Step 6: The evidence set is divided into two parts: the credible evidence and the incredible evidence. From Equations (14) and (15): The intuition is that, if an evidence has higher distance than average distance (which is calculated using all the evidences), then probably that evidence is faulty and should be penalized (incredible evidence). If an evidence distance is lower than average, then that evidence is in harmony with other evidence and should be rewarded (credible evidence). Lower entropy means lower uncertainty, and that evidence should be rewarded more for credible evidence. The opposite is true for incredible evidence. Therefore, we needed a function which has large slope as it goes near to zero. Natural log function fits the bill. As a result, the following reward and penalty function is proposed: For credible evidence, Reward f unction = −ln(E P (m)) For incredible evidence, Penalty f unction = −ln(1 − E P (m)) (18) Using Equations (16)- (18), calculate reward and penalty value for each evidence. Reward 1 = 1.4103, Penalty 2 = 0.0759, Reward 3 = 1.5326, Reward 4 = 1.445, Reward 5 = 1.4647.
Normalize reward and penalty values to get evidence weights. w 1 = 0.2379, w 2 = 0.0128, w 3 = 0.2585, w 4 = 0.2437, w 5 = 0.2471. Obviously, we can observe that there is a high conflict between the evidence m 2 and other evidences. Therefore, m 2 is defined as an incredible evidence and has very low weight. Other evidences are supported by each other, so their weights are higher than m 2 .
Step 7: Modify the original evidences.
The Step 8: Combine modified evidence for (n − 1) times (for this example, 4 times) with DS combination rule by using Equations (5) (5) and (6), that would be wrong. To get m 123 , m 12 values should be fused with the original modified evidence from step 7. It is also evident that, for single elements, if that element has higher value after step 7, it will have highest value after fusing (n − 1) times. The higher the value after step 7, the higher the value after fusion.   As seen from Table 4, when evidences are in high conflict, classical Dempster's combination rule produces counterintuitive results that are not correct. With increases in number of sensors, Murphy's simple averaging, Deng's weighted averaging, and Han's novel weight averaging, Wang's weighted evidence and Jiang's uncertainty measure give reasonable results, although their final combination results are slightly inferior to the outcomes of our proposed approach. Wang et al. [32] showed in his paper that the modified evidences before the fusion steps are m(A) = 0.5048, m(B) = 0.184, m(C) = 0.068, and m(AC) = 0.243. Now, the modified evidence for m(A) is lower than our proposed method as stated in step 7. Also as explained in step 8, it is unlikely that, after fusing these evidences (n − 1) times (4 for this example) using original DS combination rule, the fused m 12345 (A) will be higher than our proposed method. Using the evidences presented in Wang's work, the recalculated fused evidences are presented in Table 4. The proposed method also has the highest convergence rate (rate of m(A) value goes towards 1) after sensor 3. It is reasonable to say that the proposed method overcomes the paradoxes of classical DS rule and produces competitive fusion results compared to that of combination rule available in open literature. Figure 2 shows how fused evidence of m(A) changes with the addition of new sensors and compares multiple methods from the literature. As m(A) is the correct evidence, how it is changing with the inclusion of new sensor evidence is important for justification of the fused result. The proposed method penalizes m(A) when only two sensors are used. As a result, m(A) starts with lower evidence for the proposed method compared to other methods (number of sensors = 2). However, with the inclusion of correct evidences from sensors 3 and 4, m(A) converges towards 1 quickly for the proposed method compared to other methods. As m(A) evidence converges towards 1, the convergence rate becomes slow for all the methods. A zoomed-in view shows that the proposed method has higher m(A) evidence after fusing 5 sensor evidences compared to other methods from the literature. It can be seen from Example 7 that this method is applicable for any multi-sensor system because fusion occurs after classification ID is created from sensor output. As an example, let us assume multiple cameras are used for object classification. Camera output (video/image) will go through a classifier (example: neural network) for object ID classification. After classification, the output may have similar syntax to Example 7. Then, the proposed method can be applied to find out which sensor is providing erroneous data and to fuse them accordingly.

Conclusions
In this paper, an eight-step algorithm under DS framework is introduced as an innovative methodology that can be used to better capture uncertainties related to decision-level multi-sensor fusion. A novel entropy function is proposed based on Shannon entropy which takes into account the central value of probability interval and cardinality of both focal elements and FOD. As a result, it is better at capturing uncertainties under DS framework compared to Shannon and Deng entropy. The proposed algorithm calculates distances between multiple evidences (sensors). Based on evidence distance, it rewards the evidences which agree with one another and penalizes the evidences which disagree. The proposed entropy function is used to calculate the weights of the evidences. Conflicting evidences are modified before using them for spatial domain fusion. Classical DS combination rule is used for decision-level sensor fusion; as a result, associative and commutative properties are kept. The proposed method is able to suppress the paradoxes of classical DS combination rule. Detailed examples showed that the proposed method produces competitive convergence rate and fusion accuracy in terms of combining the conflicting evidences in the spatial domain compared to other methods available in the literature. Funding: This research received no external funding Acknowledgments: The authors greatly appreciate the reviews, the suggestions from reviewers, and the editor's encouragement.

Conflicts of Interest:
The authors declare no conflict of interest.