A Novel Conflict Management Method Based on Uncertainty of Evidence and Reinforcement Learning for Multi-Sensor Information Fusion

Dempster–Shafer theory (DST), which is widely used in information fusion, can process uncertain information without prior information; however, when the evidence to combine is highly conflicting, it may lead to counter-intuitive results. Moreover, the existing methods are not strong enough to process real-time and online conflicting evidence. In order to solve the above problems, a novel information fusion method is proposed in this paper. The proposed method combines the uncertainty of evidence and reinforcement learning (RL). Specifically, we consider two uncertainty degrees: the uncertainty of the original basic probability assignment (BPA) and the uncertainty of its negation. Then, Deng entropy is used to measure the uncertainty of BPAs. Two uncertainty degrees are considered as the condition of measuring information quality. Then, the adaptive conflict processing is performed by RL and the combination two uncertainty degrees. The next step is to compute Dempster’s combination rule (DCR) to achieve multi-sensor information fusion. Finally, a decision scheme based on correlation coefficient is used to make the decision. The proposed method not only realizes adaptive conflict evidence management, but also improves the accuracy of multi-sensor information fusion and reduces information loss. Numerical examples verify the effectiveness of the proposed method.


Introduction
Multi-sensor information fusion (MSIF) is an important information processing technology, which can achieve multi-level and multi-source information combination optimization [1,2]. A single sensor has less information and is easily affected by environmental interference and measurement error. As a result, the obtained information may contain mistakes, which makes it difficult to make accurate decisions [3]. In contrast, fusing multisensor information can improve the performance of system and make the results more reliable [4,5]. Due to the advantages of multi-sensor setups, in recent years, it has been widely used in fault diagnosis, target positioning, and UAV system control [6][7][8][9][10]. The practical experience shows that comparing with a single-sensor system, multi-sensor systems can significantly enhance the system performance of detection, identification, and fault diagnosis [11,12]; however, due to various uncertainties in the real world, the information obtained by multi-sensor is affected. In addition, due to the influence of the sensor itself, the information obtained by multi-sensor systems may be inaccurate, uncertain, or even be faulty [13][14][15]. How to correctly process multi-sensor information and establish a fusion model is a widespread attention problem. As for this issue, many theories and methods have been proposed, for example Z-number [16,17], D-number [18,19], fuzzy sets [20][21][22], rough sets [23,24], R-number [25], entropy-based [26,27], and Dempster-Shafer theory (DST) [28,29].
DST is an uncertainty reasoning theory, as an extension of probability theory, which can process uncertain information without prior probability [29]. Due to the characteristics, DST has been widely used in military and civil fields. In addition, DST provides a classic combination rule for fusing multi-source information, namely Dempster's combination rule (DCR); however, DCR has some problems in application. When the evidence to combine is highly conflicting, it may produce counter-intuitive results, for example, the Zadeh paradox [30]. Facing with these challenges, many methods have been proposed in the past years. Yager [31] considered that the conflict cannot provide useful information. He proposed a combination rule that redistributes the conflict to the frame of discernment (FOD). Dubois and Prade [32] proposed that the conflict should be assigned to the intersection or union of associated focal elements. Later, Murphy [33] proposed that the original evidence should be given weights for modification and to obtain new evidence. Then, the new evidence was used to achieve multi-sensor information fusion (MSIF) based on the DCR. Lefevre et al. [34] proposed a general framework to realize the unification of several classical combination rules. Smets [35] thought the conflict should be allocated to empty set. Dezert and Smarandache [36] proposed a new framework i.e., Dezert-Smarandache Theory (DSmT), which is an extension of DST. Further, in [36], a series of combined rules are provided, namely PCR1-PCR6, which can handle conflicting evidence. Based on intervalvalued belief structures, Song et al. [37] presented an uncertainty measurement method and applied the method to MSIF. Aiming at the fusion decision making without prior knowledge, Wang et al. [38] designed a method based on interval-valued belief structure and DCR. Yuan and Xiao et al. [39] proposed a fusion method based on Deng entropy [40] and evidence distance [41]. Jiang and Wei et al. [42] proposed a weighted average method based on the credibility of evidence to deal with high-conflict evidence. Ni et al. [43] presented an improved conflict evidence fusion method, in which the degree of uncertainty of evidence was used to design the weight coefficient of each evidence.
The above methods mainly focus on original basic probability assignment (BPA); however the concept of negative evidence is also a feasible way to express information. Through the negation, multi-faceted aspects of information can be viewed. Smets proposed a calculation method for determining the negation of probabilistic events [44]. Based on that, many scholars have carried out relevant research on the negation of BPA, and proposed a series of approaches for the negation of BPA [45][46][47][48]. In addition, researchers adopt different methods to measure the uncertainty of BPA, and modified the original BPA based on the uncertainty for the combination of evidence.
Until now, the above-mentioned methods cannot realize the real-time conflict processing and the calculation is complicated when the amount of data is large. This paper proposes a new information fusion method, which combines the uncertainty of evidence and RL. In the proposed method, the negation of evidence is calculated. Then, Deng entropy is used to measure the uncertainty of evidence. Moreover, in order to avoid the irrationality caused by the conflict of information, RL is used to realize adaptive conflict resolution of evidence. Finally, DCR and correlation coefficient are used for multi-sensor information fusion and decision making. In the proposed method, we consider the original BPA and the negation of BPA, the reason is as follows. The positive information of the evidence can be obtained from the original BPA, the negative information of the evidence can be obtained from the negative BPA. Through the original BPA and negation of BPA can make the information obtained more comprehensive.
The main contributions are summarized as follows: • The negation of evidence is introduced into RL to achieve information quality assessment. The uncertainty of original evidence and its negation is obtained by using Deng entropy. Then, the obtained uncertainty degrees are used to distinguish the information quality of evidence, which helps to realize the access to information. • In order to achieve the adaptive online information fusion, RL is combined with the uncertainty degrees to process the conflicting evidence. In this process, a Markov decision process (MDP) model is built, and solved through Q-learning algorithm to implement the fusion of evidence.
The rest of this paper is organized as follows. In Section 2, the preliminaries, including DST, the negation of BPA, Deng entropy, and RL are introduced. In Section 3, the proposed information fusion decision method is presented. In Section 4, the effectiveness the proposed method is verified by numerical examples. Finally, in Section 5, the conclusion is given.

Dempster-Shafer Theory (DST)
DST is an effective method to deal with uncertain information, which satisfies weaker conditions than Bayesian probability [29]. Some basic concepts in DST are given below.
Assume Θ is a finite set consisting of N mutually exclusive elements, indicated by then the Θ is called a FOD. The power set of Θ is indicated by If a function m : 2 Θ → [0, 1] satisfies the following conditions, it is a BPA or mass function, where A is called focal element, and m(A) represents the mass assigned to A. DST provides a Dempster's combination rule (DCR) [28,29] to fuse multiple pieces of evidence, which is defined as below represents the conflict among BPAs.
Yager's combination rule [31] is an alternative for the combination of evidence, which is defined as below where k = ∑

Negation of Evidence
The negation is an important way to express information. Recently, Deng and Jiang [45] proposed a BPA negation calculation method based on maximum uncertainty allocation.
Given a FOD Θ, for each focal element A i , assuming m(A i ) = α i , the negation of m is denoted asm: It can be seen from the above that, for an evidence m, the negation of m can be calculated bym where B ⊆ Θ.

Deng Entropy
Deng entropy [40] is a method to calculate the uncertainty of evidence, and it is an extension of Shannon entropy [49]. The specific definition of Deng entropy is given as follows where |A i | is the cardinality of A.
When dealing with a bayesian BPA, Deng entropy degenerates to Shannon entropy, which is

Correlation Coefficient
For a FOD with N elements, assuming that there are two BPAs are m 1 and m 2 , respectively, then the correlation coefficient between m 1 and m 2 is defined as follows [50] where c(m 1 , m 2 ) is defined as where | · | is the cardinality of a set. The correlation coefficient r BPA (m 1 , m 2 ) indicates the correlation between m 1 and m 2 . The larger the correlation coefficient, the higher the degree of correlation between m 1 and m 2 .

Reinforcement Learning (RL)
RL does not require any data to be given in advance, which obtains the reward by the continuous interaction between agent and environment. By employing the RL, a system dynamically adjusts the parameters to maximize the accumulated reward [51,52]. In RL, the return function is usually defined to represent the sum of the discounts of all rewards observed by the agent after a certain state, i.e., where, γ is the discount factor (γ ∈ [0, 1)), which represents the weight relationship between future rewards and immediate reward, and R is the immediate reward.
In RL, the value function is used to evaluate the expected return in a certain state, which do not consider the actions taken at this time, only consider the current system state, and defined as The Bellman equation of value function is given as follows V * (s) is the optimal value function, i.e., Since V(s) cannot evaluate the impact of a certain action on the system, a state-action value function (Q value function) is proposed. Q value function is used to evaluate the expected return in a certain policy. The policy is defined as π : S → A, defined as π(a|s) = P(A t = a|S t = s). In other word, Q value function is the expectation of the cumulative reward obtained when the agent in state s adopts action a, which is defined as The Bellman equation of Q value function is given as follows Q * (s, a) is the optimal Q value function, i.e., We can obtain the optimal policy from V * (s) and Q * (s, a).

The Proposed Method
In this section, a novel evidence combination method is proposed for adapting conflict and making fusion decisions based on the uncertainty of evidence and RL. This method defines information fusion as a RL task, and builds a fusion model using RL and the uncertainty of original BPA and are calculated by the use of Deng entropy comprehensively. Firstly, considering that the negation of BPA is also an important way to express information, the uncertainty of original BPA and its negation. If we adopt the negation of BPA and the original BPA as the judgment conditions. Then the judgment conditions are diversified, which can help to obtain the correct processing results of different sensor information and realize effective conflict management. If we adopt the original BPA as the judgment condition. Then the judgment condition is single, which may cause inaccurate processing results of the sensor information. Thus, these two uncertainty degrees as the judgment conditions are used to distinguish the information quality of evidence, so that consistent evidence can be selected through RL. Next DCR is used to implement information fusion. Finally, the decision result is obtained through a decision-making scheme based on correlation coefficients. The overall information fusion and decision process of the proposed method is shown in Figure 1.

Markov Decision Process (MDP)
In the fusion decision system, the next state is obtained by selecting an action under the current system state. A MDP is built for the multi-sensor information fusion decision system.

Action Set
Due to the impact of the actual environment, the multi-sensor information fusion decision system may be of high conflict; therefore, it needs to set up a reasonable action policy to realize the effective processing of conflicting data. In our proposed method, the action set A is defined as An evidence can be retained through action a 1 , whose information can be fused later. A high-conflict evidence can be deleted through action a 2 , which can avoid the adverse impact of conflicting evidence on fusion results. An evidence with a low degree of conflict or with a small amount of information can be temporarily retained through action a 3 , i.e., "waiting to process". A "waiting to process" evidence will be operated in the subsequent steps. After the first round of screening of all the evidence, the evidence of "waiting to process" will process again. Specifically, all the evidence retained in the first round is fused and denoted as F U . Then the evidence of "waiting to process" will be reconsidered until the uncertainty of evidence obtained by combination is satisfied.

State Set
In RL, when an action is taken, the state of the system will change in another state. In the fusion system, when the system action changes, the fusion result changes. Thus, we define the current fused result as the system state, i.e., where m t represents the fusion result at time t, D t+1 is the sensor evidence at time t + 1, and a t+1 represents the action taken at time t + 1.
Based on the above analysis, the system state set can be defined as

Reward
Reward is a feedback value given by the environment in a certain state s and certain action a. In this paper, the environment is mainly containing the sensor information and the fusion result at each time. The system uses reward value to determine the optimal action at each time. In this paper, there are two cases. Case 1: The evidence is not in conflict, then the fusion of evidence will generate consistent results. Case 2: The evidence is in conflict, then the quality of fusion result is not guaranteed. In this paper, we use Deng entropy to evaluate the quality of fusion results so as to set the reward function. The reason is as follows.
According in Equation (7), Deng entropy uses m(A) log(2 |A| − 1) to represent nonspecificity, which not only contains focal elements, but represents the power set of FOD. Deng entropy is more sensitive to the change of focal elements. When the focal element changes, the uncertainty of BPA also changes strongly. In RL, we use the uncertainty of BPA to make policy for sensor information. The stronger the uncertainty, the stronger the feedback signal for RL, the more conducive RL to make accurate policy.
The uncertainty of the original BPA is defined as E(m). At the same time Deng entropy is also adopted to calculate the uncertainty of the negation of m, defined as E(m). These two uncertainties are denoted as Then E(m) and E(m) are jointly used to judge the quality of information. Specifically, it can be divided into the following cases.
, it indicates that the new state s t+1 is with less uncertainty from both positive and negative view of information, which should be given a positive reward, since adding new evidence leads to more certain fusion result.
, it indicates that the new state s t+1 is with larger uncertainty from both positive and negative view of information, which should be given a penalty reward, since adding new evidence leads to more uncertain fusion result.
, it indicates that the effect of the new state s t+1 cannot be determined, which will not be rewarded or penalized. Therefore, the evidence in this case is waiting to be processed. By setting the above three cases, we can adopt different policies for sensors (i.e., delete, retain, or waiting to process), so as to delete the high conflict evidence and retain the valid evidence.
Given the above analysis the reward function in this paper is defined as

Q-Learning Algorithm Solution
After modeling the MDP, we adopt a model-free Q-learning algorithm to obtain the optimal policy [53]. The main reasons are as shown as follows.
Reason 1: The system in this paper is a discrete system, and Q-learning is suitable for a discrete system. Reason 2: The state-action space is small in this system. Hence the system does not require a neural network to store state-action.
Reason 3: The state transition probability of the system is unknown, so a model-free algorithm is needed.
Q-learning is used to find high-quality evidence by removing deletion of conflicting BPAs, which is the main idea of obtaining the optimal fusion result. Specifically, at time t, the system receives BPAs from different sensors, then it uses the action selection policy to select an action a t . Herein, a ε − greedy policy is utilized to select the action, which is to explore new actions with a probability of ε, and select optimal action currently considered with a probability of 1 − ε. The ε − greedy policy can ensure the balance between the exploration and exploitation of the algorithm. The specific definition is as follows.
where m represents all optional actions, and Q(s, a) represents the Q value of the Q value function in state s and action a. Then, the fusion system performs action a t and obtains a new fusion result (i.e, a new BPA). At time t, the uncertainty of original BPA and the negative BPA is measured by Deng entropy, and compared with the uncertainty at time t − 1. A reward value at time t is obtained according to the reward function. Equation (25) is used to calculate the current Q value, and the Q value is stored in the Q table. We have where γ is the discount factor. The fusion system selects actions according to the Q value function, then the system state transfers to the next state s t+1 . With the continuous exploration of Q-learning, we use Equation (26) to update the Q value function: where α ∈ (0, 1] is the learning rate. Subsequently, the optimal action can be obtained through Equation (27). The system will randomly select an action with a certain probability to ensure that the algorithm has a certain degree of exploration. Finally, the optimal policy is obtained.
According to the above process, the fusion system obtains the optimal action by repeatedly calculating and updating the Q value. As a result, the BPAs in conflict are deleted, consistent BPAs are retained, which can realize the adaptive online information processing. After processing all the evidence, in this paper, the DCR is used to achieve MSIF. The proposed method is outlined in Algorithm 1.

Algorithm 1
The proposed evidence combination algorithm. for t = 1 to m do 5: Initialize state S; 6: Observe current state s t , and choose an action a t (use ε − greedy policy); 7: Take action a t , calculate the negation of BPA, calculate the uncertainty degrees of original BPA and its negation according to Equation (22), obtain the reward value R t according to Equation (23), then the system transfers to next state s t + 1 ; 8: Utilize Equation (26) to update Q function; 9: Calculate fusion results according to Equation (4); 10: S ← s t + 1 is the final state. 11: end for 12: end for 13: Output: Multi-sensor information fusion result.

Decision Making Based on Correlation Coefficient
In this paper, a decision-making scheme based on the correlation coefficient is proposed as follows.
A BPAm whose mass is fully assigned to an element of FOD is called baseline BPA, i.e,m(A) = 1, for any A ∈ Θ. Then, we calculate the correlation coefficient between each baseline BPA and the BPA obtained by combination. The proposition corresponding to the maximum correlation coefficient is the decision result.
whereX is the final decision result, and r BPA (·) is the correlation coefficient.

Simulation Analysis and Application
To evaluate the effectiveness of the proposed multi-sensor information fusion decisionmaking method, numerical examples are provided. The example is adapted from [39]. In this example, there are five sensors simultaneously detecting a target. Assume FOD is Θ = {A, B, C}, which indicates that the target is one among A, B, and C. BPAs obtained from the five sensors are m 1 , m 2 , m 3 , m 4 , m 5 , respectively, as shown in Table 1.  The proposed method in this paper is used to perform multi-sensor information fusion for the provided BPAs shown in Table 1. The detailed simulation parameters are summarized in Table 2. The evidence processing results are shown in Table 3. From the table, by using the proposed method, BPAs m 1 , m 3 , m 4 , and m 5 are retained, while m 2 is deleted because it is highly conflicting with other BPAs. During the process, we can obtain the values of the negation of BPA. The detailed negation of the BPA is summarized in Table 4. Table 2. Simulation parameters for the numerical example 1.

Parameter Value
Discount factor (γ) 0.9 Learning rate (α) 0.1 Episode number (M) 100 Table 3. Results of online processing of BPAs for the numerical example 1.   Table 4, we can obtain the uncertainty of the other side of the evidence, which effectively enhances the expression of the uncertainty of the evidence.

BPA
Further, we compare the proposed method with four existing methods, including the methods from Yager [31], Yuan et al. [39], Jiang et al. [42], and Ni et al. [43]. The fusion results are shown in Table 5, which are also graphically shown in Figure 2. Then, by calculating the correlation coefficient of BPA m obtained by the combination with each baseline BPA,m A (A) = 1,m B (B) = 1,m C (C) = 1, we have

m(A) m(B) m(C) m(A, B) m(A, C) m(B, C) m(A, B, C)
Yager [31] 0.7732 0.0167 0.0011 0 0.0938 0 0.1152 Yuan et al. [39] 0.9886 0.0002 0.0072 0 0.0039 0 0 Jiang et al. [42] 0.9867 0.0008 0 0 0.0036 0 0 Ni et al. [43] 0  It can be seen that the proposition with the largest correlation coefficient is A, so the final decision result is A. Similarly, the decision results from other combination methods can be obtained as shown in Table 6. According to Tables 5 and 6, by comparing these methods, it is found that the proposed method has the largest belief value on m(A), which is the most favorable for decision making. Moreover, in order to fully demonstrate the importance of negative BPA in conflict management and multi-sensor information fusion, a numerical example is used to illustrate. The evidence of the numerical simulation example are shown in Table 7.

BPA m(a) m(b) m(c) m(a, b) m(b, c)
The evidence in Table 7 is used to explain in detail that the negation of BPA contributes to conflict management. Specifically, it can be divided into two cases. Case 1: only uses the uncertainty of the original BPA for conflict management. Case 2: uses the uncertainty of the original BPA and the uncertainty of the negative BPA for conflict management. By comparing the fusion result in the two cases, the importance of the negative BPA for conflict management and fusion results can be proved.
We can obtain the detailed negation of the BPA by calculating, which is summarized in Table 8. Further, we can obtain the uncertainty degrees in the calculation process, as shown follows.

The Negation of BPA m(b, c) m(a, c) m(a, b) m(a, b, c)
It can be seen that if only the uncertainty of the original BPA is considered, m s2 is deleted, which is because E(m s1 ⊗ m s2 ) = 1.0020 > E(m s1 ) = 0.8831. Since E(m s1 ⊗ m s3 ) = 0.6781 < E(m s1 ) = 0.8813, m s3 is retained. We can know that, in this case, m s1 and m s3 are retained, m s2 is deleted; therefore, the fusion result in this case is m(a) = 0.8209, m(b) = 0.1791.
If we not only consider the uncertainty of the original BPA, but also consider the uncertainty of the negative BPA. Sensor m s2 is waiting to process in the first round of processing result, which is because , m s3 is retained. When all the sensor information is processed, m s2 is processed for the second round. At this moment, we can , so in the second round of processing result, m s2 is retained; therefore, the fusion result in this case is m(a) = 0.8425, m(b) = 0.1575. From the above, we can see that if only the uncertainty of the original BPA is used for conflict management, the result may be single. When there are existing conflicts between one evidence and other evidence (i.e., in Table 7), this evidence will be deleted directly, which will result in the loss of part of the information. When the uncertainty of negative BPA is considered, the judgment conditions will be sufficient and the loss of information can be fully reduced. The above discussions demonstrate the effectiveness and reliability of negative BPA for conflict management. In addition, the fusion results show that the fusion result with the negation of BPA is more accurate. Thus, we consider that the negation of BPA can improve the belief value on m(a). It also demonstrates the effectiveness of the proposed method.

Application to Fault Diagnosis
An application from [54] about fault diagnosis is examined herein. Assuming a motor rotor could have three different fault types, defined as, F 1 , F 2 , and F 3 . The fault information is obtained through three sensors, under three different features, as shown in Table 9a-c. In Table 9, m S1 , m S2 , and m S3 represent the evidence collected by the three sensors. In this paper, the true fault type of the motor rotor is F 2 . By using the proposed method with the setting of parameters in Table 10, the evidence processing results are shown in Table 11. During the process, we can obtain the values of the negation of BPA, which are shown in Table 12. Table 9. BPAs for the application.
(a) BPAs for the application under feature 1.

Parameter Value
Discount factor (γ) 0.9 Learning rate (α) 0.1 Episode number (M) 80 We can know from Tables 11 and 12, the BPAs for the application under feature 1, the processing result of sensor 3 in the first round is waiting to process, and the final round of processing result is deletion. It can be seen from the simulation results that the accuracy of the fusion result is improved when the evidence of sensor 3 is deleted, which indicates that the negation of BPA can improve the accuracy of the fusion result. The BPAs in the application under feature 2 and feature 3, which can provide a larger amount of information, and the conflict between BPAs is small, hence sensor 2 and sensor 3 are retained. (a) The negation of the BPAs for the application under feature 1.

The Negation of BPA
The negation of the BPAs for the application under feature 2.

The Negation of BPA
The negation of the BPAs for the application under feature 3.

The Negation of BPA
For the sake of comparison, results by the use of other methods are also obtained, as shown in Tables 13 and 14 and Figures 3-5. It can be seen from Tables 13 and 14 that the proposed method has the highest mass or belief on the true fault type F 2 under each of the three features. This is because the proposed method can delete the conflicting evidence adaptively through RL, uncertainty degree of BPAs, and the negation of BPA, so as to avoid the impact of the conflicting evidence on the overall fusion accuracy. In addition, the proposed method can make full use of the sensor information to obtain the fusion results. By contrast, in the fusion result of Yager's method m(F 3 ) is the largest under feature 3, which is inconsistent with the true fault type. As for Ni et al.'s method, the decision result is F 1 under feature 2, which is inconsistent with the true fault type. The other methods can identity the true fault type but the mass or belief of the result is lower than the proposed method.
In this paper, uncertainty of BPA and RL are combined to achieve multi-sensor information fusion. Thus, the analysis of the simulation results in this paper is enhanced from the perspective of uncertainty. Deng entropy and the entropy of Pal et al. [55,56] are used to measure the uncertainty of BPA, so as to judge its influence on the fusion result. The fusion results under two different entropies are consistent; however, the use of Deng entropy makes the convergence speed of the algorithm better than the entropy of Pal et al. The algorithm converges when the number of episodes is 55 and 58, respectively. Due to the small amount of information in this paper, there is little difference in convergence speed between different algorithms; however, this phenomenon also shows the importance of using Dun entropy to calculate BPA uncertainty. (a) Fusion results of different methods for the application under feature 1.  Methods  (c) The correlation value under feature 3.

Robustness Analysis
Since the fusion result application cannot fully reflect the robustness of the proposed method, we focus on the analysis of the robustness in the application. Specifically, in order to fully reflect the robustness of the method in this paper when conflict is increasing, we adjust the evidence in application to fault diagnosis. When conflict is increasing, we calculate the fusion result of the proposed method. For the evidence in Table 9a, we first assign the belief value of m(F 2 ) in sensor 2 to m(F 1 ) at 0.05 intervals. Then, we assign the belief value of m(F 1 , F 2 , F 3 ) in sensor 3 to m(F 3 ) at 0.05 intervals. In addition, the evidence of sensor 1 remains unchanged. For the evidence in Table 9b, we first assign the belief value of m(F 2 ) in sensor 1 to m(F 1 ) at 0.05 intervals. Then, we assign the belief value of m(F 2 ) in sensor 2 to m(F 3 ) at 0.05 intervals. In addition, the evidence of sensor 3 remains unchanged. For the evidence in Table 9c, we first assign the belief value of m(F 1 , F 2 ) in sensor 2 to m(F 1 ) at 0.03 intervals. Then, we assign the belief value of m(F 1 , F 2 ) in sensor 3 to m(F 3 ) at 0.03 intervals. In addition, the evidence of sensor 1 remains unchanged. According to the above discussion, the adjusted BPAs are shown in  In Tables 15-17, we adopt the conflict calculation method based on correlation coefficient proposed by the Jiang [50] to calculate the degree of conflict. The degree of conflict is defined as: where C ij represent the degree of conflict, m i and m j denote the evidence of the i-th and j-th sensors, respectively, and c(m i , m j ) =   Table 15, it can be seen that in the evidence under feature 1 after adjustment, the conflict between sensor 1 and sensor 2 has been increasing. The conflict degree between sensors 1 and 3 first decreases and then increases. The conflict degree between sensors 2 and 3 first decreases and then increases; however, it can be seen from the whole that the degree of conflict between adjusted evidence is gradually increasing. From Table 16, it can be seen that in the evidence under feature 2 after adjustment, the degree of conflict between sensor 1, sensor 2, and sensor 3 has been increasing, and it is obvious. In the evidence under feature 3, the belief value of the single subset is relatively small, and the distribution of belief value is relatively uniform. For these reasons, we adjust the evidence to a relatively small extent. Whereas, in Table 17, we can know that the conflicts between the evidence is also changing significantly.
According to the evidence in Tables 15-17, the fusion results under different cases can be obtained by using the proposed method in this paper, as shown in Table 18.      Table 16, we can know that, with the conflict between evidence increasing, the proposed method in this paper can still obtain accurate fusion results; however, the belief value on m(F 2 ) decreases as the conflict increases. This is mainly shown as follows. In the evidence under feature 1, the belief value on m(F 2 ) is reduced from 0.9587 to 0.9129. In the evidence under feature 2, the belief value on m(F 2 ) is reduced from 0.9708 to 0.8875. In the evidence under feature 3, the belief value on m(F 2 ) is reduced from 0.6863 to 0.6108.
As can be seen from the simulation results, the proposed method can obtain effective fusion results; however, there are still some limitations in the fusion results. Specifically, the belief values are particularly concentrated, mainly on m(F 2 ) and m(F 1 , F 2 , F 3 ). In this case, if BPAs fluctuates greatly, the conflict between evidence will increase. Then the fusion results made by the proposed method will fluctuate greatly; however, the simulation results show that the proposed method can also obtain effective fusion results when conflict is increasing. Thus, the robustness of the proposed method can be verified.

Conclusions
In this paper, we have investigated the multi-sensor online fusion problem, and proposed a novel method on the basis of the uncertainty of BPA and RL. Specially, the proposed method has measured the uncertain degrees of original BPA and its negation by the use of Deng entropy. Then, the two uncertain degrees and RL have been combined to achieve the online conflicting management. The above process has the advantages of making full use of the information and reducing the loss of information. On the basis of selected BPAs, DCR has been used for evidence combination. Finally, a decision scheme based on the correlation coefficient has been adopted to obtain the decision-making result. Simulation results of numerical example and application have demonstrated the effectiveness of the proposed method. In a future study, the application of the proposed method will be further investigated.
In addition to those problems listed above, there are many research issues beckoning for further investigation. In this paper, we focus on the multi-sensor fusion decisionmaking problem with a small amount of information, and ignore how to quickly and accurately obtain the fusion result when the amount of sensor information is significant. Nevertheless, the proposed method proposed provides an idea for the application of artificial intelligence in multi-sensor fusion. As a future work, we plan to use neural networks and RL, and combine them with our proposed algorithm for an actual fusion decision-making system.
Author Contributions: F.H.: conceptualization, methodology, software, validation, writing-original draft preparation, writing-review and editing, and data curation. Y.Z.: software, validation, writing-review and editing, and data curation. Z.W.: validation, writing-review and editing. X.D.: conceptualization, methodology, and supervision. All authors have read and agreed to the published version of the manuscript.