An Improved Multi-Source Data Fusion Method Based on the Belief Entropy and Divergence Measure

Dempster–Shafer (DS) evidence theory is widely applied in multi-source data fusion technology. However, classical DS combination rule fails to deal with the situation when evidence is highly in conflict. To address this problem, a novel multi-source data fusion method is proposed in this paper. The main steps of the proposed method are presented as follows. Firstly, the credibility weight of each piece of evidence is obtained after transforming the belief Jenson–Shannon divergence into belief similarities. Next, the belief entropy of each piece of evidence is calculated and the information volume weights of evidence are generated. Then, both credibility weights and information volume weights of evidence are unified to generate the final weight of each piece of evidence before the weighted average evidence is calculated. Then, the classical DS combination rule is used multiple times on the modified evidence to generate the fusing results. A numerical example compares the fusing result of the proposed method with that of other existing combination rules. Further, a practical application of fault diagnosis is presented to illustrate the plausibility and efficiency of the proposed method. The experimental result shows that the targeted type of fault is recognized most accurately by the proposed method in comparing with other combination rules.


Introduction
Multi-source data fusion is the process of combining data obtained from different sources to make robust and complete evaluation on the certain system. As is known, single data source cannot provide sufficient information to detect a complex system in a full scale. By contrast, multi-source data fusion presents comprehensive and credible results after integrating groups of data that reflect various features of the system [1,2]. Therefore, multi-source data fusion technology is widely applied in many real applications, such as energy management strategy [3,4], health prognosis [5][6][7], suppliers selection [8][9][10][11], decision making [12][13][14], evaluation [15][16][17], etc. However, since the single data source is easily disturbed by environmental factors, it is unavoidable to meet the situation when the data collected from different sources are inconsistent, irrelevant or even conflicted [18]. How to fuse these groups of data from different sources correctly has received much attention but is still an open problem [19,20]. Thus far, many theories and methods have been proposed to solve the uncertain problem, which were extended from Z-numbers [21,22], D-numbers [23][24][25][26][27], R-numbers [28], fuzzy sets [29][30][31][32], rough sets [33], evidence theory [34], entropy [35,36], quantum [37], and so on.
Dempster-Shafer evidence theory (DS evidence theory), firstly proposed by Dempster [38] and later developed by Shafer [39], is a general framework for reasoning with uncertainty. As a generalization of Bayesian theory, DS evidence theory can express uncertain and imprecise data more explicitly by using mass function, which can assign the probability to the union of single elements [40,41]. Besides, DS evidence theory provides a combination rule to fuse pieces of evidence.
Due to its flexibility and effectiveness on handling uncertainty, DS evidence theory is widely applied in information fusion technology [42][43][44]. However, the combination rule in DS evidence theory presents counter-intuitive results when evidence is highly in conflict [45,46]. To address this problem, many modified combination methods have been proposed, which derive from two basic strategies. One is to modify classical DS combination formula. Based on this strategy, Yager [47] believed that little about conflicting evidence can be understood and reassigned the conflict constant to the unknown space. In [48], Smets thought that the conflict is attributed to the incompleteness of the frame of discernment and proposed unnormalized combination rule. In [49], Lefevre et al. proposed a general framework to unify the general combination rules. In [50], Sun et al. believed that the availability of conflicting evidence is associated with their credibilities. In [51], Li et al. reallocated the conflict constant based on the weighted average support degree to each piece of evidence. However, the main shortcoming of these methods is the loss of associative properties, which greatly increases the computational complex degree especially when fusing thousands of pieces of evidence simultaneously. Another strategy is to pre-process the original evidence and apply the classical DS evidence theory on the adjusted evidence multiple times. Many combination methods have been proposed on the basis of this strategy. In [52], Murphy generated the modified evidence by simply averaging the original evidence. In [53], Deng et al. took the distance between pieces of evidence into consideration and reallocated the weight on the evidence. In [54], Jiang et al. proposed a new combination rule based on information volume calculated by belief entropy. In [55], Zhang et al. applied cosine theorem to calculate the support degree of evidence. In [56], Lin et al. generated a similarity vector by measuring Euclidean distances between pieces of evidence before generating the weighted average evidence. Although these combination methods presented quite reasonable fusing results, there is still some room for further improvement. In this paper, therefore, a novel multi-source data combination method is proposed to handle the problem of highly-conflicted evidence fusion.
In particular, by taking advantage of the belief entropy to quantify the information volume of the system and belief divergence to measure the difference among multi-source data, the credibility and the information volume, as two important factors of evidence are integrated to allocate the weight on the original evidence. In this way, the weight of untrustworthy evidence is declined, so that the influence of conflicted evidence is controlled more strictly. The main steps of the proposed procedure are concluded as follows. Firstly, the credibility of evidence is calculated according to their similarity with the average evidence. Besides, the belief entropy is applied to calculate the information volume of each piece of evidence. After that, the weight is allocated on the evidence based on their credibility and information volume before the weighted average evidence is generated. Finally, classical DS evidence is used to fuse the modified evidence multiple times and the final result is obtained. A practical application of fault diagnosis is given to prove the efficiency of the proposed method.
The following parts of this paper are organized as follows. In Section 2, some basic concepts and definitions of DS evidence theory, belief entropy and Belief Jenson-Shannon divergence are concisely introduced. In Section 3, a novel multi-source data fusion method is presented. In Section 4, a numerical example is used to compare the fusing results with other existing combination rules. In Section 5, the proposed combination method is applied in a practical example of fault diagnosis. Finally, the conclusion of this paper is discussed in Section 6.

Preliminaries
In this section, several preliminary theories are briefly introduced, including DS evidence theory, belief entropy and Belief Jenson-Shannon divergence.

Dempster-Shafer Evidence Theory
Dempster-Shafer evidence theory is an extension of the Bayes probability theory. Comparing with probability theory, DS evidence theory can not only assign the probability on one single element, but also on the subset of the universal set [57,58]. Besides, DS evidence theory can handle uncertainty and imprecision without prior probability is given [59][60][61][62]. When the probability is only allocated on several single elements, DS evidence theory degenerates into Bayes probability theory [63,64]. Some basic concepts of Dempster-Shafer theory are introduced as follows.
Definition 1 (Frame of discernment). Assume Θ is an exhaustive and finite set of all possible, independent and exclusive values of variable x, indicated by: where Θ is denoted as a frame of discernment. The power set of Θ is 2 Θ . If A ∈ 2 Θ , then A is called a proposition [65][66][67][68]. then function m is called mass function or basic probability assignment (BPA).

Definition 3 (Belief function).
The belief function is a mapping Bel: 2 Θ -> [0, 1], defined as: which represents the total belief on A [74,75]. Belief function is the lower limit function of A.
Definition 4 (Plausibility function). The plausibility function is a mapping Pl: 2 Θ -> [0,1], defined as: which represents the undeniable degree on A [76,77]. Plausibility function is the upper limit function of A.
Definition 5 (DS combination rule). Assume m 1 and m 2 are two independent BPAs on 2 Θ . In DS evidence theory, their orthogonal sum m 1 ⊕ m 2 is defined as: where The orthogonal sum in Equation (5) can be extended to the condition when n pieces of BPAs are fused simultaneously, which satisfies the mathematical communication law and the associative law. In Equation (5), the constant k measures the conflict degree of BPAs [78][79][80]. If k is higher, the conflict between BPAs is more serious [81][82][83].

Belief Entropy
In [84], Deng proposed belief entropy. As a generalization of Shannon entropy [85], Deng's belief entropy can be used to measure the information volume of BPAs. When the belief is only assigned to the single element, Deng's belief entropy degenerates into Shannon entropy. Many applications have proved the efficiency of Deng entropy [86]. Deng's belief entropy is defined as follows: where m is a BPA defined on the frame of discernment Θ, and A is a focal element of m. |A| indicates the cardinality of A. When the belief is only assigned to single elements of Θ, Deng's belief entropy degenerates into Shannon entropy, which is shown as: However, Deng's belief entropy has some limitations when the propositions are of intersections. To address this shortcoming, Cui et al. [87] improved Deng 's belief entropy, which takes out the redundant volume created by intersections. Cui et al.'s belief entropy is defined as follows: where |A| denotes the cardinality of proposition A. Here, a numerical example in [87] is used to demonstrate that Cui et al.'s belief entropy is more efficient in measuring the information volume of evidence that contains intersecting propositions.
The values of two BPAs is presented as follows: According to the data above, both m 1 and m 2 have the same scale of focal elements and the same function values. However, the propositions of m 2 are of intersections, which only contain three elements a, b and c. Therefore, m 1 has greater information volume than that of m 2 . Next, Deng's belief entropy and Cui et al.'s belief entropy of two pieces of evidence are calculated as follows.
Deng's belief entropy: Cui et al.'s belief entropy: As shown in Example 1, Deng's belief entropy ignores the influence of intersections and presents the same uncertain degree of two pieces of evidence. Comparatively, Cui et al.'s belief entropy takes the redundant space of intersections out from their information volume. Therefore, if the evidence has many intersecting propositions, it would be better to use Cui et al.'s belief entropy to measure their information volume more accurately. Besides, if the evidence has greater belief entropy, it contains more information and there fewer less conflicts between this subset and the frame of discernment. Therefore, the evidence that has greater belief entropy should be assigned more weights in the fusing procedure.

Belief Jenson-Shannon Divergence
In [88], Xiao proposed Belief Jensen-Shannon (BJS) divergence by integrating DS evidence theory and Jenson-Shannon divergence. Suppose m 1 and m 2 are two BPAs on the same frame of discernment that contains n elements; the BJS divergence between m 1 and m 2 is defined as: where The main contribution of the belief Jenson-Shannon divergence is that it replaces the probabilities distributions in JS divergence with BPAs, so that BJS divergence can be applied in DS evidence theory to measure the difference between BPAs.

The Proposed Method
To address the problem of fusing highly-conflicted evidence, a new combination method is proposed in this section. To allocate the weight on evidence more properly, not only credibility of the evidence but also its information volume is taken into consideration. The procedure is divided into three parts. Firstly, the credibility weight of evidence is obtained after transforming BJS divergence into similarities. Secondly, the information volume weight of evidence is obtained by calculating the belief entropy. Thirdly, the weighted average BPA is generated before fusing it by DS combination rule. The flowchart of the method is shown in Figure 1. More details and explanations about each step of the method are described as follows.

Collect data from different sources.
Model the data from different sources into BPAs.
Step 1-2: Calculate the BJS divergence between each evidence and averaging BPA.
Step 1-3: Calculate the similarity degree of evidences.
Step 1-4: Generate the credibility weight of evidences.
Step 2-2: Adjust the information volume of evidences.
Step 2-3: Generate the information volume weight of evidences.
Step 3-1: Adjust the final weight of each evidence.
Step 3-3: Fuse the modified evidence by DS combination rule.

Calculate the Credibility Weight of Evidences
Step 1-1. Suppose M = {m 1 , m 2 ..., m n } is a set of n independent BPAs on the same frame of discernment that contains N elements: Θ = {F 1 , F 2 , F 3 , ..., F N }. The arithmetical average BPA m a is defined as: Step 1-2. Calculate the BJS divergence between m i and m a (i = 1, 2, 3..., n) according to Equation (9). where Step 1-3. Since the similarities of the pieces of evidence are negatively correlated with their divergences, if the divergence between two pieces of evidence is higher, they have lower similarity. The divergence between m i and m a is converted into their similarity as follows: Step 1-4. If a piece of evidence is highly similar to the average BPA, it means that the evidence is supported by most of the other pieces of evidence and it is more reliable, thus it gains high credibility. Thus, the credibility weight of the pieces of evidence is determined by normalizing their similarity with the arithmetical average BPA. The credibility weight (W c ) of each piece of evidence is defined as follows:
Step 2-2. To avoid assigning zero weight to the evidence whose belief entropy is zero, the information volume in Step 2-1 is modified as follows: where E * d (m i ) represents the belief entropy of m i . Step 2-3. Calculate the information volume weight (W iv ) of each piece of evidence by normalizing IV, which is defined as:

Generate the Modified Evidence and Fuse
Step 3-1. Based on the credibility weight and information volume weight of evidence, the weight of each piece of evidence is adjusted as follows: Step 3-2. Normalize the modified weight as follows: Step 3-3. Generate the modified evidence by calculating the weighted average sum of BPAs, which is defined as: Step 3-4. DS combination rule is used n − 1 times on the modified evidence based on [52], then the final combination result is obtained:

Numerical Example
In this section, the proposed method is compared with other existing combination rules by a numerical example in [55] and the effectiveness of the proposed method is illustrated.

Example Presentation
Assume the frame of discernment Θ = {A, B, C}. There are five pieces of evidence denoted as m 1 , m 2 , m 3 , m 4 , and m 5 and their mass functions are listed in Table 1. Here, evidence m 2 is not credible as other pieces of evidence since the sensor which is modeled into m 2 is disturbed by some unknown environmental factors.

Combination by the Proposed Method
Step 1-1. Calculate the arithmetical average BPA. Step 1-4. Calculate the weight of credibility.
Step 2-1. Calculate the belief entropy of each piece of evidence.
Step 2-2. Adjust the information volume of each piece of evidence.
Step 2-3. Calculate the weight of information volume.
Step 3-1. Adjust the weight of each piece of evidence. Step 3-4. Use the classical DS combination rule to fuse the modified evidence four times, and the result is shown in Table 2.

Analysis
According to Table 1, m 2 shows great conflict with the four other pieces of evidence, which assigns most of the belief on B, while the remaining four pieces of evidence all support A. In this case, the major belief after fusing should be allocated on A since the m 2 is modeled from an abnormal sensor. The fusing results of the proposed method and other existing combination rules are presented in Table 2 and Figure 2.
As shown in Table 2, classical DS combination rule is disturbed by the abnormal evidence and assigns most of its belief on C wrongly. The remaining combination rules all present the reasonable results and majorly support A. The incredible evidence m 2 appears on the first time fusion and misguides the fusing process to recognize C, but as the later credible pieces of evidence join the fusion process, these combination methods all turn to assign their beliefs mainly on A. However, in the real situation, a slight increase of accuracy is significant to improve the performance of the system [54][55][56]. Comparatively, the proposed method achieves the highest accuracy of 0.9874 for identifying A among these combination rules. Therefore, the proposed method is relatively effective because it takes two important factors of evidence-the credibility and information volume-into consideration, so that the weight of the incredible evidence is controlled more strictly.

Problem Statement
An automobile system was excessively used and it caused shortage of power. According to the records in the database, there were three possible faults that may lead to this problem: low oil pressure, air leakage in the intake system and a stuck solenoid valve, which are denoted as F 1 , F 2 and F 3 , respectively. Five sensors, denoted as S 1 , S 2 , S 3 , S 4 and S 5 , were placed at different positions to measure the parameters including the air displacement, maximum power, maximum torsion, compression ratio and maximum rotation speed of the power system. After all the sensors finished measuring, the central controlling system then modeled the parameters detected from sensors to BPAs, which are presented in Table 3, where Θ = {F 1 , F 2 , F 3 } is the frame of discernment and m i is the evidence modeling from S i (i = 1, 2, 3, 4, 5). In this application, S 5 was broken because the engine speed accidentally exceeded its upper limitation and could not function normally as the four other sensors. The objective is to judge which type of fault has occurred in the automobile system according to these pieces of evidence. Table 3. BPAs after modeling from sensors.

Fuse Evidences by the Proposed Method
Step 1-1. Calculate the arithmetical average BPA. Step 1-4. Calculate the weight of credibility.
Step 2-1. Calculate the belief entropy of each piece of evidence.
Step 2-2. Adjust the information volume of each piece of evidence.
Step 2-3. Calculate the weight of information volume. Step 3-3. Generate the modified evidence.
Step 3-4. Use the classical DS combination rule to fuse the modified evidence four times, and the result is shown in Table 4. Table 4 and Figure 3 show the results of fusing five BPAs by applying the proposed method, together with the other existing combination rules. Here, a threshold = 0.70 is set based on [54]. After fusing by the combination method, if m(F i ) ≥ 0.70, it means that the method recognizes fault F i successfully; otherwise, this combination rule cannot identify what kind of fault and "unknown" is marked in Table 4.

Discussion
In this application, S 1 , S 2 , S 3 , and S 4 all work well, and the pieces of evidence after transforming their detected data are greatly consistent, which all assign most of their beliefs to F 1 according to Table 3. However, S 5 is broken and the evidence after modeling the data collected from S 5 intensively conflicts with the other four pieces of evidence, which assigns most of the belief to F 3 wrongly. Based on these considerations, F 1 is judged as the fault in the automobile system. The ideal result after fusing by these combination methods is to recognize fault F 1 accurately, ignoring the disturbing effect of S 5 .
As shown in Table 4, the classical DS combination rule successfully recognizes fault F 1 after fusing the first four pieces of evidence, but, when the incredible evidence m 5 joins, it drastically reassigns most of the beliefs to the fault F 3 wrongly. Therefore, classical DS combination rule fails to fuse the highly conflicting evidence. Yager's method reallocates the conflicting degree to the unknown space and it cannot identify which type of fault. Sun  However, in Deng's method, the sizes of its distance matrix and similarity matrix are both n × n, while the sizes of divergence vector and information volume vector are both 1 × n in the proposed method. Therefore, the proposed method presents better accuracy of targeted type of fault with a lower computational complexity algorithm comparing with Deng's method. It is worth noting that a slight increase of accuracy is significant to improve the performance of the system in the real application [54][55][56]. When the conflicting evidence m 5 joins the fusion, Lin et al.'s method shows a slight decrease from 98.99% to 98.94% and Jiang et al.'s method climbs from 98.95% to 99.14%, while the proposed method increases from 98.91% to 99.34%. This means that the proposed method can overcome the influence of incredible evidence better and maintain the degree of increase as more pieces of evidence join. Overall, the proposed method can diagnosis the targeted type of fault more accurately than other combination rules as it assigns the highest belief of 0.9934 to the fault F 1 . The reason is that the proposed method not only makes use of Belief Jenson-Shannon function to measure the credibility of the evidence, but also considers their information volume by applying belief entropy before allocating the final weight to each piece of evidence. Since the conflicted evidence m 5 has not only low credibility but small information volume, its weight is declined, so that the influence of fault evidence is controlled more strictly in comparison with the other proposed combination rules.

Conclusions
Dempster-Shafer evidence theory is widely used in information fusion field due to its efficiency in handling uncertainty and imprecision. However, some counter-intuitive results occur when the evidence is highly in conflict. To address this shortcoming, a novel multi-source combination method is proposed in this paper based on BJS divergence measure and the improved Deng entropy. Not only the credibility but also the information volume of the evidence is taken into consideration to allocate the weight on the original evidence before fusing the modified evidence, so that the influence of untruthful evidence is controlled more seriously, resulting in the higher attention on the credible evidence when fusing. Next, the proposed method is compared with other existing combination rules in a numerical example. The result shows that the proposed method achieves the highest accuracy of 99.74% among these combination rules. Furthermore, the proposed method is applied in an application of fault diagnosis to identify the type of fault in an automobile system. Among the combination rules which successfully recognized the correct type of fault, the proposed method shows the highest degree of accuracy of 99.34% with lower computational complexity since the sizes of two vectors are both 1 × n rather than n × n. In summary, this study provides a promising way to deal with the multi-source data fusion problems. In the near future, to make the proposed method more applicable in the real environment, how to generate BPAs more properly from different sources will be further considered.
Author Contributions: Z.W. designed and performed the experiments and wrote the paper. F.X. developed the method, conceived the experiments and revised the paper.