A Decision Probability Transformation Method Based on the Neural Network

When the Dempster–Shafer evidence theory is applied to the field of information fusion, how to reasonably transform the basic probability assignment (BPA) into probability to improve decision-making efficiency has been a key challenge. To address this challenge, this paper proposes an efficient probability transformation method based on neural network to achieve the transformation from the BPA to the probabilistic decision. First, a neural network is constructed based on the BPA of propositions in the mass function. Next, the average information content and the interval information content are used to quantify the information contained in each proposition subset and combined to construct the weighting function with parameter r. Then, the BPA of the input layer and the bias units are allocated to the proposition subset in each hidden layer according to the weight factors until the probability of each single-element proposition with the variable is output. Finally, the parameter r and the optimal transform results are obtained under the premise of maximizing the probabilistic information content. The proposed method satisfies the consistency of the upper and lower boundaries of each proposition. Extensive examples and a practical application show that, compared with the other methods, the proposed method not only has higher applicability, but also has lower uncertainty regarding the transformation result information.

Sudano [31] mapped the BPA of multi-element propositions to the probability of each single-element proposition according to a certain ratio using the belief and plausibility functions. Smets [32] equally distributed the BPA of multi-element propositions to each single-element proposition according to cardinality. However, this method is too conservative and does not fully use the known information. Pan [33] assigned the BPA of multi-element propositions according to the ordered weighted average operator (OWA) and then determined the final probability of each single-element proposition using the minimum entropy difference as a constraint. When a single-element proposition was not contained in any multi-element proposition, the allocation could still affect the singleelement proposition, leading to anti-intuitive results. Deng [34] proposed a probability transformation method based on the belief interval, which first obtains the preference of each single-element proposition by the possibility degree, then quantifies the data information about the belief interval of each singleton based on the continuous interval argument ordered weighted average (C-OWA) operator, and finally calculates the support degree of the singleton based on quantitative data information to reasonably allocate the BPA of multi-subset focal elements. However, this method can result in extreme cases where the preference degree of a single-element proposition can be zero, which will further lead to a support degree of zero, thus meaning that the single-subset proposition is not assigned by the BPA to a multi-element proposition. Huang [35] proposed a probabilistic transformation method based on the Shapley value. This method requires that the degree to which each single-element proposition contributes to the multi-element propositions is determined; then, these are transformed according to the contribution degree. Since the marginal probability of a single-element proposition may be zero, which can mean that the proposition is not considered in the assignment of multi-element propositions, this may lead to inaccurate transformation results. Li [36] proposed a probability transformation method based on the ordered visibility graph (OVG). The OVG network was constructed according to the BPA order, and then the weight of each single-element proposition was obtained according to the proposition edges and cardinality. According to the weight, the BPA of multi-element propositions was transformed into the probability of each single-element proposition. This method only uses the out-degree and in-degree of the nodes to determine the weight and does not make full use of the influence of nodes with larger shadows. Chen [37] improved Li's method as follows. First, the OVG network was constructed by each proposition order of information volume, and the weighted adjacency matrix was constructed with proposition edges or belief entropy. Then, the weight of each single-element proposition was obtained from the matrix and cardinality. Finally, the probability of each single-element proposition was obtained by assigning the BPA of multi-element propositions according to weight.
The key to the above methods is obtaining the distribution weight of the multi-element proposition. Inspired by these methods, under the premise of obtaining reasonable transformation results and minimizing the uncertainty of information, an efficient probability transformation method based on a neural network is proposed. The neural network is a computational model [38,39] designed to simulate the neural network of the human brain. Similar to human brain neurons, it consists of multiple nodes (neurons), which are interconnected to model the complex relationship between data. The connections between different nodes are given different weights, representing the influence of one node on another. Each node represents a specific function, which is calculated by combining the input and weight of other nodes. The result of the calculation is input into the activation function, which, in turn, provides the final output of the neuron and is passed on to the next neuron. The main questio is how to construct the neuronal network and changing weight, assigning the BPA of a proposition subset layer by layer according to weight until the probability of each single-element proposition containing the parameter is output. Then, the probabilistic information content (PIC) [31] is determined (with PIC being the dual form of Shannon entropy [40], which was first proposed by Sudano [31] and is widely used to measure the uncertainty of probability distribution) and, finally, optimal probability transformation results are obtained under the premise of PIC maximization. The main contributions of this study can be summarized as follows: (1). The BPA of a multi-element proposition with the largest cardinality is used as an input layer of a neural network, and the BPAs of the remaining existing propositions are used as bias units of the neural network. The hidden network layers are constructed according to decreasing order of cardinality for the proposition subset of the input layer and bias units. Finally, the probability of each single-element proposition is output by the output layer. (2). The interval information content (IIC) and average information content (AIC) are introduced to quantify the information contained in each proposition subset and combined to construct a weighting function containing the parameter. The weighting function reaches its extreme value at a certain point, in preparation for the use of the constraints below. After obtaining the PIC of the change based on the probability of each single-element proposition, the maximum PIC value is taken as a constraint to obtain the optimal transformation results.
The remainder of this paper is organized as follows. Section 2 briefly introduces the DS evidence theory and decision probability transformation methods. Section 3 describes the properties of belief entropy and PIC. Section 4 proposes a probability transformation method based on a neural network and proves its properties. Section 5 presents numerical simulation results to demonstrate the rationality and superiority of the proposed method compaed to the existing methods. A practical application of our proposed method is described in Section 6. Section 7 concludes the paper.

Preliminaries
In this section, preliminaries such as the DS evidence theory [24] and some existing probability transformation methods are briefly introduced.

DS Evidence Theory
Assume a set Θ is composed of k mutually exclusive elements, which can be expressed as: Θ = {θ 1 , θ 2 , . . . , θ k }, where Θ is a frame of discernment; these elements are randomly combined to form the power set as follows: If Bel : 2 Θ → [0, 1] holds for A ∈ 2 Θ , then: where Bel(A) denotes the belief function, which represents the degree of trust that proposition A is true. If Pl : 2 Θ → [0, 1] holds for A ∈ 2 Θ , then: where Pl(A) denotes the plausibility function, which represents the degree of trust that proposition A is non-false. Since the belief function Bel represents a degree of agreement with a proposition, and the plausibility function Pl indicates a degree of disagreement with the proposition; the interval formed by the belief and plausibility functions represents the uncertainty of the proposition. The intervals are formed by Bel and Pl, as shown in Figure 1.

Probability Transformation Methods
In the frame of discernment, Θ = {θ 1 , θ 2 , . . . , θ k }, where θ i ∈ Θ, when there is a multielement proposition D in evidence, which has a large BPA, then the information assignment is too scattered, with large uncertainty, so it is difficult to make decisions directly based on the BPA. To facilitate the decision-making process and improve its accuracy, the following improvements have been introduced in recent studies. Sudano [31] used Bel and Pl to obtain the probability of a single-element proposition by: Pr When only Pl is used, the probability of a single-element proposition is calculated by: In contrast, when only Bel is used, the probability of a single-element proposition is obtained by: Smets [32] provided an in-depth analysis of the reasonability of probability transformation in the decision-making domain and proposed the Pignistic probability transformation method, which is given by: where |D| is the cardinality of a multi-element proposition D . Pan [33] proposed a probability transformation method based on the OWA operator and entropy difference, and the average function can be expressed by: The adopted normalization function is as follows: The probability of each single-element proposition obtained using the OWA operator is given by: where ρ(θ k ) and T 0 = 0 . Since the probability of a single-element proposition contains an unknown variable r, min{|E d − H|} is used to determine the variable r, where E d is the Deng entropy of the BPA, and H is the Shannon entropy of probability.
Deng [34] defined the belief interval and preference degree of a single-element proposition based on the belief and plausibility function, as follows: where i = 1, 2, . . . , k . The quantization of belief interval data was performed according to the C-OWA operator as follows: The preference degree was used to modify the quantized belief interval data to obtain the support degree of the single-element proposition as follows: The probability of a single-element proposition is given by: where Huang [35] proposed a probability transformation method based on the Shapley value, where the marginal probability of a proposition θ i for a proposition D is expressed by: where θ i ∈ D, and D ⊆ Θ ; D\{θ i } denotes a subset of propositions D excluding θ i . The average marginal probability contribution of θ i in D is calculated by: The probability of each single-element proposition is obtained by: After reordering each proposition according to the BPA from the largest to the smallest, Li [36] obtained a set of edges of each proposition based on the ordered visibility graph and obtained the weight of each single-element proposition according to the cardinality as follows: The probability of a single-element proposition is calculated by: Chen [37] ordered propositions according to the Deng entropy magnitude, and the Deng entropy of a proposition is calculated by: After ordering, which is denoted as 1, IV A 1 , 2, IV A 2 , . . . , s, IV A s , and where A 1 is the ordered proposition, the network is constructed by the OVG. Then, the weighted adjacency matrix is obtained based on the set of edges with an internal element b ij . When there is a connection between A i and A j , then b ij = 1; otherwise, b ij = 0 . The two edge weights are obtained using the node distance and belief entropy, respectively, as follows: Then, the degree of a focal element is calculated based on the edge weights as follows: The weight of a single-element proposition is calculated by: Finally, the probability of each single-element proposition is given by:

Shannon Entropy and Probabilistic Information Content
This section describes how to evaluate the performance of a transformation method after the probabilistic transformation is completed.

Shannon Entropy
Shannon [40] first introduced the concept of entropy into the field of information theory, defining the entropy of discrete finite sets. Assume a finite discrete set is defined as F = { f 1 , f 2 , . . . , f n }, and its probability distribution is denoted by P = {p( f 1 ), p( f 2 ), . . . , p( f n )} . Then, the Shannon entropy of this set is obtained by: where n ∑ i=1 p( f i ) = 1; in this study, a = e, where e is Euler's number. In a discrete set, when the probability of each element is equally distributed, i.e., p( f 1 ) = p( f 2 ) = . . . = p( f n ) = 1 n , then, the Shannon entropy is maximal, and it is given by n . The Shannon entropy is used to measure information uncertainty, and the greater information uncertainty is, the greater the entropy will be and vice versa.

Probabilistic Information Content
Sudano [31] developed a PIC method to evaluate transformation results. The PIC for the probability distribution The PIC is expressed as a dual of the normalized Shannon entropy, and varies between zero and one. The smaller the PIC value is, the greater the information uncertainty will be, and vice versa. When PIC = 1, there is no interference of uncertain information in decision-making, but when PIC = 0, it will be impossible to make a decision based on the information. The PIC has often been used to evaluate the performance of probabilistic transformation methods.

Probability Transformation Based on the Neural Network
The existing methods assign the BPA of multi-element propositions to single-element propositions according to certain weights, but these methods have the defects of not fully considering the relationship between propositions and using the existing information, which will lead to inaccuracies in the weight factors generated in some special cases, and the probability transformation results that are obtained are counter-intuitive. To overcome the shortcomings of the existing methods, this study proposes a probability transformation method based on a neural network. In this section, the neural network construction and weighting function are introduced. The optimal transformation result is described in detail.

Neural Network Construction
A neural network has one input layer and one output layer, and the number of hidden layers and bias nodes depends on the actual situation of the evidence. In this paper, a neural network uses the ReLu function [41] as the activation function.
If the frame of discernment Θ = {θ 1 , θ 2 , . . . , θ k }(k ≥ n) exists, the BPA m(θ 1 , θ 2 , . . . , θ n ) of the proposition {θ 1 , θ 2 , . . . , θ n } with the maximum cardinality used as the input network layer. The BPAs of the remaining propositions are denoted as bias units, the propositions in the hidden layer are subsets of the input layer and the bias unit propositions, and the size of the cardinality decreases with the layer number. From the combination of elements in probability statistics, it is known that when the number of proposition subsets of the first hidden layer is C n−1 n = n, the BPA of proposition subsets is expressed as Nm(θ 1 , θ 2 , . . . , θ n−1 ), Nm(θ 1 , θ 2 , . . . , θ n−2 , θ n ), . . . , Nm(θ 2 , θ 3 , . . . , θ n ), where each proposition subset cardinality is (n − 1 ). The number of proposition subsets of the second hidden layer is C n−2 n = n×(n−1) 2 , and the number of proposition subsets of the jth hidden layer is C There are (n − 2) hidden layers; the output layer gives the probability values of all single-element propositions in the frame of discernment as follows: PNm(θ 1 ), PNm(θ 2 ),. . . , PNm(θ k ). The neural network structure is presented in Figure 2.
Each neuron in the hidden layer is composed of multiple parts, and the initial value of a neuron is obtained based on the accumulation of weights and bias units and then activated using the activation function before obtaining the BPA of each focal proposition, as shown in Figure 3.

AIC and IIC Values
Assume the set of discernment Θ = {θ 1 , θ 2 , . . . , θ k }, and a proposition A i ⊆ Θ . In order to accurately quantify the information content of the proposition A i , the average information content ( AIC ) and interval information content ( IIC ) based on belief and plausibility functions are defined as follows:

Weighting Function with Variable
This paper proposes a weighting function containing variable parameter r, which can obtain the weight of each proposition by combining the AIC and IIC. The weighting function is defined as follows: where 0 ≤ r ≤ 1.

Theorem 1.
The weight varies with a variable r; the function curve is neither monotonically increasing nor monotonically decreasing but has an extreme value point at a certain point.
where 0 < r < 1, X ≥ 1 and Y ≥ 1, When r = 0, then f (0) = 1; when r = 0, then f (0) = 1 . The derivative of the weighting function is given by: since f (0) = X + ln Y − 1 and f (1) = 1 − ln X − Y, regardless of the values of X and Y, f (0) and f (1) will always have opposite signs. According to the first sufficient condition for determining an extreme value, there exists Z ∈ [0, 1] so that f (Z) = 0; this point represents an extreme value of the function.
If A i is a proposition of the pth hidden layer, assign its BPA to the subset proposition a i of the (p + 1 )th layer as follows: Nm(a i ) = max 0, Nm (a i ) , where Nm(A i ) is the BPA of the proposition A i ; m(a i ) is the bias unit; when there is no bias unit, then Nm (a i ) = ∑ If A i is a proposition subset of the last hidden layer, its BPA is assigned to a single proposition subset θ i as follows: At this point, the output probability contains variable parameter r .

Theorem 2.
According to the related literature [34], this transformation method can be justified by verifying the consistency of the upper and lower bounds: is jointly assigned by the BPA and bias nodes of the previous subset of propositions, so Pl(θ i ) ≥ PNm(θ i ); finally, inequality Bel(θ i ) ≤ PNm(θ i ) ≤ Pl(θ i ) holds.
Similarly, W(θ 1 ), W(θ 2 ), and W(θ 3 ) are denoted by q 1 (r), q 2 (r), and q 3 (r), respectively. The bias units are combined to obtain the probability of a single-element proposition by: The changing trends of the probability and the PIC value of single-element propositions with r are shown in Figure 4.

Optimal Probability Calculation of Single-Element Propositions
The larger the PIC value is, the lower the information uncertainty in the decisionmaking, the better the transformation results, and the more favorable the decision-making. Additionally, the higher the information uncertainty, the worse the transformation results and the less favorable the decision-making. According to [33], to obtain optimal transformation results, it is necessary to determine parameter r by maximizing the PIC as follows: arg max r (PIC(PNm(Θ))) where H max (PNm(Θ)) = − k ∑ i=1 1 k ln 1 k . For Example 1, it holds that max(PIC(PNm(Θ))) = 0.2230, r = 0.91, so PNm(θ 1 ) = 0.2755, PNm(θ 2 ) = 0.6380, and PNm(θ 3 ) = 0.0865 . This transformation process is illustrated in Figure 5.
Next, a brief overview of the proposed method steps is given. The exact content of each step varies with the actual situation, but the main operations of each of the steps are as follows: Step 1: Different propositions are combined to construct a neural network model; Step 2: The AIC and IIC of proposition are obtained by Equations (30) and (31), weights are initiated combining the AIC and IIC by Equation (32), and the BPA of each proposition is assigned according to the weights until the single-element proposition probability containing the variable parameter r is output by Equations (34)-(38); Step 3: The parameter r is determined according to the constraints and the optimal transformation results are obtained by Equation (39).

Analysis and Numerical Examples
In this section, a few examples are given to compare the transformation results of the proposed method with those of the other methods [31][32][33][34][35][36][37] to verify the rationality and accuracy of the method proposed in this paper. The method performances are evaluated based on the PIC value. The probability transformation results of different methods for Example 2 are shown in Table 1.
In the probability transformation, since there is no information on single-element proposition A in the multi-element proposition, the assignment should be independent of the proposition A, and it is impossible to discern the difference between propositions B and C based on the known conditions. Intuitively, the BPA of a multi-element proposition should equally be assigned to the single-element propositions B and C. However, the PeBel transformation method cannot be used to obtain the transformation results, and the OWA method is influenced by the single-element proposition A when assigning the BPA of the multi-element proposition, which leads to counter-intuitive transformation results. The other probability transformation methods distribute the BPA of multi-element propositions equally according to the elemental cardinality to obtain a single-element proposition probability, which is consistent with the intuitive results.  Step 1: According to the cardinality of propositions, m(A, B)is used as the input layer, m(A)denoted the bias unit, and the probability of a single-element proposition is obtained by the output layer.
Step 2: The AIC and IIC of the single-element proposition A and B are obtained by Equations (30)  The changing trends of the probability and PIC value with r are shown in Figure 6. Step 3: Obtain the optimal probability by Equation (39)   Since the BPA of a single-subset proposition A, but not of a single-subset proposition B, directly exists in the evidence, when assigning multi-element propositions, intuitively, the proposition A should have a larger weight; however, in this test, the OWA method assigned a larger weight to the proposition B. The OVG, PraPl, PrPl, and BetP methods assigned the BPA of multi-element proposition equally to propositions A and B . The OVGWP method and the proposed method considered the prior information of m(A) = 0.5, thus considering the connection between multi-element propositions and each single-element proposition, and assigned a larger weight to the single-subset proposition A . The transformation results are reasonable, and the PIC is relatively larger, which is more favorable for decision-making.  The weights of each single-element proposition are obtained by Equation (32) as follows: The probabilities of each proposition are calculated by Equations (37) and (38): When the value of parameter r varies in the range of [0, 1], the probability and PIC curves change, as shown in Figure 8.
Step 3: Determine the optimal probability by Equation (39): The transformation results of different methods are shown in Table 3 and Figure 9. The PrBel method is limited and cannot obtain the correct transformation results. The PraPl, ITP, and OWA methods assign a larger weight to proposition A than to proposition B, due to the direct presence of the BPA of a single-element proposition A in the evidence. However, it is not considered that the multi-element propositions contain single-element proposition B when Pl(B) > Pl(A). Intuitively, a single-element proposition B should have a larger weight. The transformation results of the PrPl, ITP, MPSV, OVGP, and OVGPWP methods and the proposed method are reasonable. However, compared with the other methods, the proposed method has the largest PIC value of 0.0596, lower information uncertainty after transformation, and is more beneficial to the decision-making process.

Practical Application
In this section, the newly proposed method is applied to the practical problem of target recognition to further verify its effectiveness.  Table 4. The specific steps can be described as follows: Step 1: The degree of credibility between evidence is measured according to the literature [42].
Step 2: The credibility degree of each evidence is modified.
Step 3: The weighted average evidence is obtained by taking into account the relationship between the bodies of evidence and the relative importance of the collected evidence.
Step 4: According to the Dempster combination rule, the combination result is obtained by combining the weighted average evidence three times.
Step 5: Due to the large information uncertainty in the combination result, the combination result is transformed into probability distribution by the proposed method.
For the target recognition problem in the Iris dataset, the proposed method is compared with some existing methods [35,[42][43][44], as shown in Table 5. Table 5. Results of the target recognition problem in Iris dataset.

Method m(Se) m(Ve) m(Vi) Target
Xiao's method [43] 0 The recognition results of the five methods are the same, and the category of Iris is identified as Ve. It can be seen from Table 5 that this method outperformed the comparison methods, and the belief in Ve is as high as 92.01%, the belief degrees of Xiao's method [43], Jiang's method [44], MSDF [42], MPSV [35] for the category Ve are 73.90%, 87.98%, 91.63%, and 91.86%, respectively.

Conslusions
How to reasonably transform the BPA under the DS evidence theory into probability before decision-making has become a major research hotspot. In this paper, a probability transformation method is proposed by combining the BPA with a neural network. The BPA of multi-element propositions with maximum cardinality is used as the input network layer, and the BPAs of the remaining propositions are used as bias nodes, which are assigned to the proposition subsets in each hidden layer of the neural network according to the weights. The probability of each single-element proposition is obtained as the network output. The AIC and IIC values of each proposition subset are determined using the belief and plausibility functions, respectively, and then combined to obtain the weight factors of the contained variables to output the probability containing variables. Finally, the PIC is maximized as a constraint to determine the variables to obtain the optimal probability transformation result. The proposed method is verified by numerical examples and compared with the other methods. The results indicate that the proposed method is more reasonable and has better generalizability and lower information uncertainty regarding the transformed results than the other method, which makes it more beneficial for decision-making. However, the proposed method may generate a large computational effort when the cardinality of a multi-element proposition is too large. In the future, a more comprehensive evaluation index for probabilistic transformation results could be explored to further verify the rationality and superiority of this method and apply this method to more practical scenarios.