Genetic Algorithm Based on a New Similarity for Probabilistic Transformation of Belief Functions

Recent studies of alternative probabilistic transformation (PT) in Dempster–Shafer (DS) theory have mainly focused on investigating various schemes for assigning the mass of compound focal elements to each singleton in order to obtain a Bayesian belief function for decision-making problems. In the process of such a transformation, how to precisely evaluate the closeness between the original basic belief assignments (BBAs) and transformed BBAs is important. In this paper, a new aggregation measure is proposed by comprehensively considering the interval distance between BBAs and also the sequence inside the BBAs. Relying on this new measure, we propose a novel multi-objective evolutionary-based probabilistic transformation (MOEPT) thanks to global optimizing capabilities inspired by a genetic algorithm (GA). From the perspective of mathematical theory, convergence analysis of EPT is employed to prove the rationality of the GA used here. Finally, various scenarios in evidence reasoning are presented to evaluate the robustness of EPT.


Background and Research Motivation
Since the pioneering work of Dempster and Shafer [1,2], known as Dempster-Shafer evidence theory (DST), belief functions are widely used in information fusion for decision making [3,4]. However, the computational complexity of reasoning with DST is one of the major points of criticism this formalism has to face. To overcome this difficulty, various approximating methods have been suggested that aim at reducing the number of focal elements in the frame of discernment (FoD) in order to maintain the tractability of computation. One common strategy is to simplify the FoD by removing or aggregating focal elements for approximating the original belief function [5]. Among these methods, probabilistic transformations (PTs) seem particularly desirable for reducing such computational complexity by means of assigning the mass of non-singleton elements to some singletons of the FoD [6,7]. The research on this probabilistic measure has received a lot of attention [8], and many efficient PTs have been proposed by scholars in recent years. Among them, a classical transformation, denoted as BetP [6], was usually adopted because it offered a compromise between the maximum of credibility (Bel) and the maximum of plausibility (Pl) for decision making. Unfortunately, BetP does not provide the highest probabilistic information content (PIC) [9], and Shenoy argued against BetP in his publication [10]. Sudano [11] also proposed series of alternatives and principles for them similar to BetP, which were called PrPl, PrBel and PrHyb. CuzzP [12], which was proposed by Cuzzolin in the framework of DST in 2009, showed its probabilistic transformation ability. Another novel transformation was proposed by Dezert and Smarandache in the • The 2D criteria, PIC and Jousselme's distance have pointed out its drawbacks in many references [17,24]. Thus, an efficient and different aggregation measure is proposed. Its novelty lies in considering the drawback of the past description of the distance between the evidence. In other words, up to now, most distances were defined according to the corresponding focal elements between two sources of evidence, and the sequence inside the assignments of the focal elements themselves was not considered. The sequence might also lead to dissimilarity, which is referred to as "self-conflict or self-contradiction" [25]; • More specific steps of evolutionary-based algorithms are given in detail. Aside from that, the convergence analysis of the MOEPT is illustrated to prove the rationality of using GAs. Moreover, some bugs are detected and fixed when using the MOEPT with traditional constraints; • The specific application problem, target type tracking (TTT), is efficiently solved and discussed based on the proposed method with a novel simple constraint.
Compared with traditional PTs, global search replaces designing various assigning operators in classical PTs, and the evaluation criteria are embedded into an MOEPT to provide important guidance for the searching procedure. Specifically, masses of singletons are randomly generated in an evolutionary-based framework, which needs to satisfy with the basic constraints for probability distributions in evidence reasoning. Additionally, an assessment factor is presented to assess the best individual in all populations by a special objective function (desirable evaluation criteria). The simulation results in 4D FoD test cases show that in these problems, the proposed MOEPT was able to outperform other PTs from the perspective of the 2D criteria. Moreover, we propose a simple constrainthandling strategy within the MOEPT that is well-suited for two-target type tracking  problems, which to some extent encourages the application of MOEPTs to more complex and real-world decision-making problems.
The remainder of this paper is structured as follows. In Section 2, we briefly summarize the basis of DST. The new aggregation measure is proposed in Section 3. In Section 4, multiobjective evolutionary algorithms (EAs) based on a two-dimensional objective function are proposed. In Section 5, several examples and comprehensive comparisons are carried out. A simple pattern recognition problem and also a target type tracking problem are presented and solved in detail at the end of this section. The conclusions are drawn in Section 6.

Basis of Belief Functions
In this section, we introduce the belief function terminology of DST and the notations used in the sequel to this paper.

DST Basis
In DST [2], the elements θ i (i = 1, . . . , N) of the frame of discernment (FoD) Θ {θ 1 , . . . , θ N } must be mutually exhaustive and exclusive. The power set of the FoD is denoted as 2 Θ , and a basic belief assignment (BBA), also called a mass function, is defined by the mapping: 2 Θ → [0, 1], which satisfies m(∅) = 0 and whereĀ Θ \ A is the complement to A in Θ. The belief interval [Bel(A), Pl(A)] represents the uncertainty committed to A, and the bounds of this interval are usually interpreted as lower and upper bounds of the unknown (possibly subjective) probability of A.
In order to fuse n bodies Of evidence (BOEs), Dempster's rule of combination is usually used in the DST framework. The combination of n distinct BOEs is achieved as follows:

Classical Probabilistic Transformations
The efficiency of a probabilistic transformation (PT) in the field of decision making was analyzed in depth by Smets [6]. Various PTs have been proposed in the open literature such as BetP [6,26], CuzzP [12], DSmP [9], PrBP1 and PrBP2 [27], as well as Cobb and Shenoy's normalization of plausibility [10]. The simple and classical transformation (BetP) is briefly recalled in this subsection.

Distance Proposed by Han and Dezert d E BI
The Jousselme's distance, which was widely denoted as D J in [15], was applied in many recent works [19,28], but the particular choice of the D J distance is not a very good choice because one knows that the D J distance has bad behavior. This was clearly explained recently in [17,24]. Assuming that two independent BBAs m 1 (·) and m 2 (·) are defined on Θ = {θ 1 , θ 2 , . . . , θ n }, for each focal element θ i ∈ Θ (i = 1, 2, . . . , 2 n − 1), belief intervals of θ i for m 1 (·) and m 2 (·) can be calculated, which are denoted by [Bel 1 (θ i ), Pl 1 (θ i )] and [Bel 2 (θ i ), Pl 2 (θ i )], respectively. The strict distance between the interval numbers [a, b] and [c, d] is defined by [29] Therefore, we can calculate the distance between BI 1 (θ i ) : [Bel 1 (θ i ), Pl 1 (θ i )] and BI 2 (θ i ) : [Bel 2 (θ i ), Pl 2 (θ i )] according to Equation (7). Thus, we can obtain a total of 2 N − 1 belief interval distance values for all θ i ∈ Θ. Aside from that, the Euclidean family belief interval-based distance d E BI can be rewritten as Here, N c = 1/2 n − 1 is the normalization factor. In this paper, we regard d E BI as one criterion for evaluating the degree of similarity (similarity representing the degree of difference between the original BBAs and the transformed ones in [30]) between the original BBAs and the transformed ones.

New Evidence for Similarity Characterization
As mentioned in previous section, those distances (i.e., Jousselme's distance [28]) and other metrics such as the PIC [31] or entropy [27] were widely applied to measure the degree of "similarity or dissimilarity" between BBAs. However, only the corresponding focal elements (or the relevant focal element set) between two sources of evidence are described or characterized. This one-sided view does not consider the order of size of the assignment of each focal element in the evidence, which might lead to "self-conflict or self-contradiction". To consider such "information" produced by the evidence itself, here, a new evidence similarity measure is defined between two evidential sources according to the order of size of the assignment. Prior to this, to give this new similarity measure, first we define the order correlation coefficient between two sets of data: Definition 1. [32] Given two sets of data {x 1 , x 2 , . . . , x n } and {y 1 , y 2 , . . . , y n }, here, x 1 , x 2 , . . . , x n and y 1 , y 2 , . . . , y n are in an ascending order. After sorting, two sets of data x p 1 , x p 2 , . . . , x p n and y q 1 , y q 2 , . . . , y q n meet x p 1 ≤ x p 2 ≤ . . . ≤ x p n and y q 1 ≤ y q 2 ≤ . . . ≤ y q n , respectively, for each p i and index their positions from q 1 , q 2 , . . . , q n , assuming it is q j ; that is, q j = p i . Note that j = f (i), and the correlation coefficient is This satisfies 0 ≤ µ ≤ 1. When µ = 0, the convergence of two sets of data is the largest. When µ = 1, this is reversed.

The Consistency of Focal Elements between Two BOEs
Definition 2. [32] For any two sources of evidence (i.e., S 1 and S 2 ), m 1 (·) and m 2 (·) are the basic belief assignments over the discernment framework Θ of a size n. The number of focal elements and the focal elements of m 1 (.) and m 2 (.) can be different. We denote X i and Y i as the indexes of the focal elements whose masses are sorted by increasing order. The similarity function of the evidence to characterize the order of the size of the assignments over the subsets is as follows: As we all know, if there is a similarity function Sim(m i , m j ), which is the characterization of distance between any two evidence sources, then the following four basic conditions must be satisfied: Triangle inequality: Sim(X, Y) + Sim(Y, Z) ≥ Sim(X, Z).

•
When n − 1 = 2k ⇒ k = n−1 2 , 1 + (k − 1) 2 + (k − 3) 2 + . . . = 1 6 · k(k + 1)(k + 2), and thus Definition 3. [32] For any two sources of evidence (i.e., S 1 and S 2 ), m 1 (·) and m 2 (·) are the basic belief assignments over n focal elements in the discernment framework Θ (note that the BBA of each subproposition might be same). Assume that the s 1 subpropositions' BBAs are the same in m 1 (·), and the s 2 subpropositions' BBAs are the same in m 2 (·) Herein, X i and Y i are the serial numbers according to the order of the size of the subpropositions' BBAs, where the subscript i indicates the ith subproposition, due to the BBAs of some subpropositions being the same. For the evidence S 1 , there might be s 1 kinds of sorts. For S 2 , there might be s 2 kinds of sorts. Therefore, there are s 1 × s 2 kinds of sorts for S 1 and S 2 . The similarity measure functions are redefined in this case as follows: Similarly, it is easy to prove that Sim seq (m X , m Y ) is a similarity measure function.
, respectively. Thus, we can calculate the similarity measure based on Equation (10): According to Sim seq (m 1 , m 2 ), we can find that m 1 and m 2 are completely different and lacking similarity.

The Inconsistency of the Focal Elements between Two BOEs
How do we calculate Sim seq when the focal elements in BBAs are different? Let us consider the following example and put forward a different way compared with that in [32]: Borrowing ideas from Dezert (d E BI ) in [24], for each focal element θ i ∈ Θ(i = 1, 2, . . . , 2 n − 1), the belief intervals of θ i for m 1 (·) and m 2 (·) can be calculated, which are denoted by [Bel 1 (θ i ), Pl 1 (θ i )] and [Bel 2 (θ i ), Pl 2 (θ i )], respectively. According to the theory of evidence, the width of such an interval as [Bel 1 (θ i ), Pl 1 (θ i )] represents the degree of uncertainty for the corresponding focal element θ i . Therefore, X i and Y i in Equation (10) are obtained, which refer to the index of the width of the interval for each focal element, whose values are sorted in increasing order. The steps of this mechanism are illustrated as follows: • Step 1: Step 2: The parameter ς denotes the width of the belief interval, where Step 3: X 1 and Y 1 are the indexes of focal elements whose ς values are sorted in increasing order, where X 1 = {1, 2, 3, 3} and Y 1 = {1, 2, 3, 2}; • Step 4: Sim seq is calculated based on Equation (10).
To consider the influence of the distance of the evidence, here, based on d E BI in Equation (8), we propose a new similarity measure which is presented as follows: where w 1 = w 2 = 0.5. and (·) is the decreasing function within the interval [0, 1], for which in this paper (·) = 1 − x 2 . That aside, it is easy to prove that the improved measure function Cim(m i , m j ) is still a similarity measure function. This is because the final metric between two similarity measure functions still meets the definition of a similarity measure function. Additionally, to consider the normalization of Equation (12), it can be rewritten as follows: where

Multi-Objective Evolutionary Algorithm Based on Two-Dimensional Criteria
In this section, we regard a PT as a general multi-objective problem consisting of two objectives also involved in a number of inequality and equality constraints. Then, a corresponding optimization model is proposed for selecting the best Bayesian BBA in the set of candidates.

Multiple-Objective Evolutionary-Based Probabilistic Transformation
The idea to approximate any BBA into a Bayesian BBA (i.e., a subjective probability measure) using the minimization of the Shannon entropy under compatibility constraints was proposed recently by Han et al. in [13,19] using on-the-shelf optimization techniques. In this paper, we present in detail a new optimization method to achieve this PT based on a random evolutionary algorithm to acquire minimization of the new aggregation criteria, and this new comprehensive criteria represents different aspects of information in BBAs. For example, the conflict coefficient represents the degree of similarity in conflict between transformed BBAs and original BBAs (in other words, the more conflicts that exist between two BBAs, the less similarity they have). In addition, d E BI represents the interval distance between the original BBAs and the transformed ones.
Let us assume that the FoD of the original BBA m(.) approximated by a Bayesian BBA is Θ {θ 1 , θ 2 , . . . , θ N }. The MOEPT method consists of the following steps, which are derived from GAs: • Step 0 (setting parameters): Assume t max is the maximum number of iterations, n max is the population size in each iteration, P s is the selection probability, P c is the crossover probability, and P m is the mutation probability. • Step 1 (population generation and encoding mechanism): A set P t of j = 1, 2, . . . , n max random probability values P j t = {P j (θ 1 ), . . . , P j (θ N )} is generated such that the constraints in Equations (14)-(16) for j = 1, 2, . . . , n max are satisfied in order to make each random set of probabilities P j t compatible with the original or target BBA m(.) to approximate. (The lower (Bel) and upper (Pl) limits of each focal element are calculated using Equations (2) and (3) based on the value of m(·).) In other words, we have • Step 2 (fitness assignment): For each probability set P j t , (j = 1, 2, . . . , n max ), we compute its fitness value F based on Equation (13). More precisely, one takes F(P Step 3 (best approximation of m(.)): The best probability set P j best t with a minimum fitness value is sought, and its associated index j best is stored in Best Individual and Index of BestIndividual.

•
Step 4 (selection, crossover and mutation): The tournament selection, crossover and mutation operators drawn from the evolutionary theory framework [33] are implemented to create the associated offspring population P t based on the parent popula- ), then the Best Individual remains unchanged; otherwise, Best Individual = P j best t .

-
Crossover operator: The crossover operator is one of the most important operators in the genetic algorithm. The crossover operation is conducted for the selected pairs of individuals. The feasibility condition of each individual is described as follows. The value of each subsegment must be between 0 and 1, and the sum of the individuals should be 1. Although the initial population is formed in a way that all individuals are feasible and correct, using the standard crossover operators leads to defective sub-segments, and a normalization procedure is needed for such a situation. Consider the following two individuals to be parents: X = (0.1, 0.2, 0.3, |0.4) and Y = (0.2, 0.2, 0.1, |0.5). (Here, the vertical bar represents the intersection point with the crossover operator.) With the single-point classic crossover operator, the following offspring will be produced: X = (0.1, 0.2, 0.3, 0.5) and Y = (0.2, 0.2, 0.1, 0.4), where ∑ 4 j=1 X j is equal to 1.1, which is greater than 1, and ∑ 4 j=1 Y j is equal to 0.9, which is less than 1. Therefore, X and Y have defective values for which a normalization factor is needed, which leads to the following: = (0.2/0.9, 0.2/0.9, 0.1/0.9, 0.4/0.9).
-Mutation operator: The mutation operator randomly alters the value of a subsegment. After applying the mutation operator, normalization of the changed individuals is required. The normalization will be performed in a similar way to the crossover operator.

•
Step 5 (stopping MOEPT): Steps 1-4 illustrate the tth iteration of the MOEPT method. If t ≥ t max , then the MOEPT method is complete; otherwise, another iteration must be performed by taking t + 1 = t and going back to step 1.
The scheme of the MOEPT method is shown in Figure 1, and its pseudo-code is given in Algorithm 1. 1: Define Stopping Criteria, (t ≤ t max ); population Size n max for each iteration; crossover probability P c , mutation probability P m and selection probability P s .
2: Generate an initial random population P t of consistent probabilities P j t with m(.). Best-Individual remains unchanged 15: else 16: Best-Individual = P j best t 17: If t ≥ t max then stops, otherwise t + 1 → t and go back to line 7

Convergence Analysis
In order to mathematically prove the feasibility of an MOEPT, convergence analysis of our algorithm is given. First, we give a simplified description of the algorithm and also its symbolic representation for simplicity:

•
Encoding mechanism: The size of the population is n max , the length of individual (chromosome) is N, and the initial population is P 1 ; • Retain the best individual directly for the next generation; • Randomly select the other non-optimal individuals in P t to cross over so as to form the intermediate population The population Y t is mutated to form a population V t ; • The better individuals in the population V t are selected as the new generation population P t .
Specifically, three operators (crossover operator, mutation operator and selection operator) can be described by the transition probability as follows: • Crossover operator: For a single-point crossover, a new individual k is produced based on their parents: individuals i and j: where |k| is the number of individuals k, 0 ≤ p c ≤ 1 is the crossover probability and a is the minimum probability for individuals |k|: • Mutation operator: where 0 ≤ p m ≤ 1 is the mutation probability, d(i, j) is the Hamming distance between i and j and b is the minimum probability: • Selection operator: An MOEPT uses the strategy of retaining the elite options, and the best individual is retained for the next generation which does not participate in the competition. Assume that m individuals are selected based on the following equations: , j ∈ P t , n = 1, 2, . . . .
where σ n represents an increasing scale function. That aside, the probability of selecting the first individual in the next generation's population is where |P t | is the number of individuals P t in P t and B(P t ) is the cardinality of the optimal set of P t .
In order to facilitate the convergence analysis, the changing process of the fitness value F(P j t ) is regarded as a Markov chain. If the MOEPT obtains the best individual P j best t in generation t, we can denote this as {F(P t )} = P j best t . Then, all the other populations in t + 1 generations will also reach the best fitness value due to the elite strategy [34]. Therefore, the Markov chain {F(P t )} constitutes the lower martingale. According to the properties of the lower martingale and the convergence theorem of the lower martingale [35], convergence analysis of the MOEPT is converted into the convergence of {F(P t )}. The following three theorems are given, in which Theorem 4 is for proving that {F(P t )} satisfies the conditions of the martingale theorem, Theorem 5 proves the global convergence of the MOEPT and Theorem 6 constructs three conditions for the convergence of the lower martingale so that the optimal solution can almost be obtained everywhere. Theorem 1. The process of describing the values of the fitness functions in the MOEPT is a non-bounded martingale: Proof. Because the algorithm retains the maximum fitness value of the previous generation for the next generation and does not participate in the genetic operation, the best individual mode is not destroyed, so the maximum fitness value of the next generation's population will not be less than the maximum fitness value of the previous generation: Theorem 2. The MOEPT converges to the global optimal solution based on the probability, which is mathematically expressed by the condition.
Proof. When the population P t is updated to generation t, the minimum or best fitness is recorded as P j best t , and the global optimal solution is noted as F * , we assume that the MOEPT can converge to a global optimal solution at generation t such that Based on Theorem 4, we have This is defined by the following conditional expectation: When k / ∈ B(P t ), P t * S (v, k) = 0, and when k ∈ B(P t ),F(k) = F * , E{F(P t+1 )/P t } can be rewritten as Therefore, we obtain ab m F * ≤ F * .
Based on the above formula derivation, the MOEPT converges to the global optimal solution. Theorem 3. When ∀n ≥ 1, the following conditions are satisfied: Then, we have the random sequenceF(P t ) a,s − → F * Proof: By taking the mathematical expectation on both sides of condition (2), one has According to conditions (1) and (3), we have

Simulation Results
According to the first step of the MOEPT, we initially set the related parameters as follows: t max = 50, n max = 1000, P s = 0.3, P c = 0.5 and P m = 0.1.

Simple Examples
Based on the respective classical PTs, the original BBAs are transformed into their corresponding probabilities as illustrated in Table 1. Their corresponding C norm values can be calculated using Equation (13), which is already listed in Table 1. Clearly, several interesting characteristics presented in Table 1 are worth mentioning: (1) MOEPT d E BI +Sim seq had the minimum value from the perspective of the C norm criteria, which consider both aspects of d E BI and Sim seq rather than concentrating on a single aspect, and (2) compared with other PTs, especially MOEPT D J +Sim seq , (here, to show the property of d E BI , we replaced d E BI with D J in the MOEPT to make comparisons) our method performed better than the mentioned methods. However, in practice, the suitability of various PTs depends on a number of factors, including the designer's choices; that is, from the perspective of sequence similarity, Sim seq plays an important role in C norm , but from the view of the whole distance, it involves transferring the principle role to d E BI or D J . How does one quantity this role? Here, we depend on the parameters w 1 and w 2 in C norm to distinguish our ideas from Han in [19], which initially set w 1 and w 2 as 0.5. Here, we discuss three different situations: (1). w 2 is set to 0.8 so as to pay more attention to the similarity of the sequence, (2) w 1 is set to 0.8 so as to focus more on the distance, and (3) considering both sequence similarity and distance, w 1 = w 2 is set to 0.5, which is the same value used in [19], so that the similarity of both the sequence and distance are considered. This phenomenon, to some degree, reminds us of the importance of proper selections of weight in various applications when the MOEPT is applied. That aside, it is worth noting that C norm turned out to be Sim seq when w 1 = 0, and C norm turned out to be d E BI when w 2 = 0.
In actuality, Example 4 is the extension of the case studied by Han in [13], which assumes a special scenario where no difference exists between m(θ 1 ), m(θ 2 ), m(θ 3 ) and m(θ 4 ) and where the traditional PTs become invalid and give unreasonable results, which can be seen in Table 2. The property of the original BBA where no difference exists between m(θ 1 ), m(θ 2 ), m(θ 3 ) and m(θ 4 ) was almost lost when classical PTs were applied. When a "sequence" is not considered in an MOEPT, which is denoted as MOEPT Distance , the feature of equal mass in the original BBAs was also missing, as with other classical PTs. Fortunately, when information of the "sequence" was added into the objective function, the MOEPT performed better in keeping the original information, as expected.
To investigate the robustness of the MOEPT from a statistical point of view, in this example, we randomly generate BBAs and compare the MOEPT with classical PTs (BetP [6,26], CuzzP [12], DSmP [9], PrBP1 and PrBP2 [27]). The original BBAs for approximation are generated according to Algorithm 2 of [36]. In our test, we set the cardinality of the FoD to 4 and fixed the number of focal elements to l = N max = 15. We randomly generated L = 100 BBAs. Six PT methods were tested, and C norm was used to evaluate the quality of their corresponding results, shown in Figure 2. As we can see, and as was naturally expected, the MOEPT significantly outperformed the other methods based on the minimum C norm criterion, which is absolutely normal because the method was developed for this aim.

Example of Pattern Classification Using the MOEPT
In this example, we used the evaluation of decision making under the evidence theory framework to indirectly evaluate the MOEPT. We considered seven classes of aircraft, which are illustrated in Figure 3, and the classifier used in this example was the probabilistic neural network (PNN). For each test example, the output of the classifier was represented by a BBA. The corresponding BBA for each test sample was generated according to Li's previous work [37]. 1. First, the image was preprocessed with binarization, and then multiple features were extracted, such as Hu moments, the normalized moment of inertia, affine invariant moments, discrete outline parameters and singular values. Secondly, five BBAs could be assigned to the evidence sources for each PNN. (Specifically, the transfer functions in five PNNs were set to a Gaussian function, the weighting function was set to the Euclidean distance; the input function was netprod, and the output function was compet.) Third, all five of these BBAs were fused by PCR6 [7] to form a single BBA m(·).
There were 100 samples for each class, with a total of 700 samples. For each class, 50 samples were randomly selected for training PNNs, and the remaining samples were used for testing. For the MOEPT, the decision result would be class t f inal if As we can see from Figure 4, the MOEPT performed well in this task of pattern classification.

Example of Target Type Tracking Using the MOEPT
To further discuss the practicality of the proposed MOEPT, a target type tracking (TTT) problem in the area of decision making was used, which is briefly described below [39].

Target Type Tracking Problem (TTT)
1. Consider ζ = 1, 2, . . . , ζ max as the time index, and let there be N possible target types Tar ζ ∈ Θ = {θ 1 , θ 2 , . . . , θ N } in the surveillance area. For instance, in normal air target surveillance systems, the FoD could be Θ = {Fighter, Cargo}; that is, Tar 1 = θ 1 Fighter, and Tar 2 = θ 2 Cargo. Similarly, the FoD in a ground target surveillance system could be Θ ground = {Tank, Truck, Car, Bus}. In this paper, we just considered the air target surveillance systems to prove the practicability of EPT.
2. At every time ζ, the true type of the target Tar(ζ) ∈ Θ was immediately observed by an attribute-sensor (here, we assumed a possible target probability).
3. A defined classifier was applied to process the attribute measurement of the sensor, which provided the probability Tar d (ζ) for the type of observed target at each instance ζ.
4. The sensor was, in general, not totally reliable and was characterized by an N × N confusion matrix: where 0 ≤ i ≤ N; 0 ≤ j ≤ N.
Here, we briefly summarize the main steps of the TTT using MOEPT: 1. Initialization: Determine the target type frame Θ = {θ 1 , θ 2 , . . . , θ N } and set the initial BBA m initial (θ 1 ∪ θ 2 ∪ . . . ∪ θ N ) = 1, since there is no information about the first target type that will be observed; 2. Updating the BBA: An observed BBA m obs (.) on the types of unknown observed targets is defined from the current target type declaration and confusion matrix M; 3. Combination: We combine the current BBA m obs (·) with the initial BBA m initial (·) according to the PCR6 combination rule [7]: m PCR6 (·) = m obs (·) ⊕ m initial (·); 4. Approximation: Use MOEPT(·) to approximate m PCR6 (·) into a Bayesian BBA; 5. Decision making: Make a final decision about the type of the target at the current observation time based on the obtained Bayesian BBA; 6. Updating the BBA: Set m initial (·) = m PCR6 (·), and increase the time index ζ = ζ + 1 before going back to step 2.

Raw Dataset of TTT
We tested our MOEPT-based TTT on a very simple scenario for a 2D TTT problem, namely Θ = {Fighter, Cargo}, for two types of classifiers. The matrix M 1 corresponds to the confusion matrix of the good classifier, and M 2 corresponds to the confusion matrix of the poor classifier: In our scenario, a true target type sequence over 120 scans was generated according to Figure 5. We can observe clearly from Figure 5 that Cargo (which is denoted as Type 2) appeared first in the sequence, and then the observation of the target type switched three times onto the Fighter type (Type 1) during different time durations (namely 20 s, 10 s and 5 s). A pathological case for TTT: Our analysis showed that MOEPT can nevertheless be troublesome for tracking two target types, as proven in this particularly simple example (when 0 ≤ m(θ 1 ∪ θ 2 ) ≤ 0.1). Let us consider the following BBA: 1,0] According to the compatibility constraints in Equations (14)- (16), the population P t was obtained from P t through a selection procedure. Next, an individual P From the above inequalities, one can see that only one probability measure, P S t = [m(θ 1 ), m(θ 2 )] = [0, 1] (where the superscript index S means single), satisfied this constraint (the constraint was m(θ 1 ) ∈ [Bel(θ 1 ), Pl(θ 1 )] = [0, 0],m(θ 2 ) ∈ [Bel(θ 2 ), Pl(θ 2 )] = [1,1]). However because of the mechanism of MOEPT Equations (14)-(16), the P j t in population P t , which was randomly generated in the interval [Bel(θ i ), Pl(θ i )], i = 1, 2, ·, N, would be unable to generate enough candidates for evolutionary computation. (A sufficient number of candidate sets is a prerequisite for ensuring the global optimization performance of evolutionary algorithms.) That is why MOEPT becomes inefficient in this case, which occurs with a probability of 1/n max , where n max denotes the size of the population P t . (In our simulation, we had n max = 1000.) Unfortunately, in TTT decision-making problems, such a case cannot be avoided because it can really happen.
To circumvent this problem and make the MOEPT approach work in most circumstances, we needed modify the MOEPT method a bit to generate enough individuals for making the selection steps efficient when the bounds of the belief interval [Bel, Pl] took their minimum and maximum values ([0.9, 0.05, 0.05] and [0.05, 0.9, 0.05], respectively). To achieve this, we proposed enlarging this particular interval through a parameter λ and maintaining the property of the original interval to some degree at the same time. More precisely, the modified belief interval, denoted as [Bel , Pl ], was heuristically computed by a simple thresholding technique as follows.
First, we assume that the original BBA we consider here for the FoD ( , with (a+b+c) = 1 and 0 ≤ c ≤ 0.1: Step 1: Let m (θ 1 ∪ θ 2 ) = c + λ; Step 2: If a > b, then Step 3: If a ≤ b, then Therefore, the values of [Bel (θ 1 ), Pl (θ 1 )] and [Bel (θ 2 ), Pl (θ 2 )] can be calculated based on Equations (37) and (38), which are presented as follows. When a > b, we have When a ≤ b, we have Explanation: Through step 1, one computes the total singleton mass one has in the entire BBA, and the threshold value of 0.9 allows one to evaluate if the percentage of the singleton mass is big enough or not. Here, we not only consider the unique extreme case m target (·) = [θ 1 , θ 2 , θ 1 ∪ θ 2 ] = [0, 1, 0] but also other possible cases, such as m target (·) = [θ 1 , θ 2 , θ 1 ∪ θ 2 ] = [0.0001, 0.9998, 0.0001]. Why do we consider the concept of this percentage? Actually, the higher the percentage of singleton mass, the smaller the interval for P j t ; in other words, the higher value of m(θ 1 ∪ θ 2 ), the bigger interval for P j t , which can be seen in Equation (36). Then    Here, the parameter λ is set to 0.2. Then, any Bayesian BBA P j t = [m (θ 1 ), m (θ 2 )] must be generated according the (modified) compatibility constraints: In order to evaluate the influence of parameter λ, we reexamined all the pathological cases based on the following procedure: from the parameter λ above aim at guaranteeing enough of a number of P j t in P t in the implementation of the MOEPT. Another point we also need to mention is that the number of P j t in P t was not influenced by the weight. (Here, the weight equals w 2 in Equation (13). And thus, w 1 = 1 − Weight, which to some degree guarantees the implementation of the MOEPT.)

Conclusions
A multi-objective evolutionary-based algorithm for probabilistic transformation (MOEPT) was proposed in this paper. It uses a genetic algorithm to obtain a Bayesian belief function and offer a comprehensive consideration concerning the closeness of distance between the orignal BBA and the Bayesian approximate one. In addition, a new aggregation measure was proposed in this paper to be combined into a more accurate "distance closeness" measure for MOEPT. More importantly, the convergence analysis of the MOEPT was given to prove the rationality of our proposed method. The effectiveness of the MOEPT was compared with respect to several probabilistic transformations proposed in the literature. Furthermore, the shortcomings of the original MOEPT version were clearly identified in two target type tracking problems, and they were solved thanks to modification of the belief interval constraints. As for future works, we would like to establish an adaptive scheme for the selection of weights in an MOEPT and make more comparisons between the performance of this MOEPT approach and other recently proposed evolutionary algorithms. We would also conduct more investigations to extend the MOEPT to DSmT using the DSm cardinal of elements. That aside, the current work in this paper is mainly for verifying the effectiveness of the algorithm through simulation examples from a theoretical perspective, and the feasibility of the proposed evolutionary-based PT will be verified through the practical and real problems in our future discussions.

Conflicts of Interest:
The authors declare no conflict of interest.