Evidential Decision Tree Based on Belief Entropy

Decision Tree is widely applied in many areas, such as classification and recognition. Traditional information entropy and Pearson’s correlation coefficient are often applied as measures of splitting rules to find the best splitting attribute. However, these methods can not handle uncertainty, since the relation between attributes and the degree of disorder of attributes can not be measured by them. Motivated by the idea of Deng Entropy, it can measure the uncertain degree of Basic Belief Assignment (BBA) in terms of uncertain problems. In this paper, Deng entropy is used as a measure of splitting rules to construct an evidential decision tree for fuzzy dataset classification. Compared to traditional combination rules used for combination of BBAs, the evidential decision tree can be applied to classification directly, which efficiently reduces the complexity of the algorithm. In addition, the experiments are conducted on iris dataset to build an evidential decision tree that achieves the goal of more accurate classification.


Introduction
Decision trees are one of the efficient techniques that are widely used in various areas, like machine learning, image processing, and pattern recognition. Decision trees are good due to having better comprehensibility of classification in terms of extracting from feature-based samples [1][2][3]. In addition, decision trees were not only proven efficient in many fields [4], but also have less parameters [5]. There are two main rules considered in the process of building decision trees [6]. One is the stopping criterion to determine when to stop the growth of tree and generate leaf nodes [7]. The other is how to assign class labels in leaf nodes [8]. The first rule means that the growth of the tree should be ended [9] if all samples belongs to the same class [9]. The second rule emphasizes the importance of setting a threshold [10]. There exist many methods of decision trees, such as ID3 [7], C4.5 [11,12], and CART [10].
Some works combined with evidence theory and decision trees are presented [56][57][58][59][60], but, motivated by the idea of building decision tree based on Pearson correlation coefficient [61] and the proposed Deng entropy instead of information entropy [62][63][64][65][66], in this paper, the evidential decision tree is proposed for classification of fuzzy data sets using BBAs, which are applied directly for classification instead of using combination rules for classification indirectly. That is to say, the evidential decision tree is constructed for classification directly based on BBAs rather than using combination rules, which not only reduce the complexity of algorithm but also avoid designing the combination rules, which is always complicated. Moreover, the proposed evidential decision trees are much more efficient than traditional decision tree methods, illustrated by the analysis of experiments with the iris data set and wine data set.
The organization of this paper is introduced briefly as follows. Section 2 presents the introduction of preliminaries. The building of the evidential decision tree is shown in Section 3. Experiments are conducted in Section 4. This paper ends with the conclusion in Section 5.

Preliminaries
In this section, D-S evidence [17,18], Deng Entropy [55], and Pearson's correlation coefficient based on the decision tree (PCC-Tree) [61] are briefly introduced. D-S evidence theory is introduced to present the definitions in terms of uncertain problems. Additionally, the Deng entropy is introduced to calculate the uncertain degree of BBAs. Finally, PPC-Tree is followed by the proposed method, replacing Pearson's correlation coefficient with Deng entropy to build an evidential decision tree.

D-S Evidence Theory
Handling uncertainty is an open issue, and many methods have been developed [67][68][69]. In D-S evidence theory [17,18] represents the identification of every element in the framework.
Basic Belief Assignment (BBA), a mass function, is one of the most important definitions of D-S evidence theory and many operations are presented based on it such as negation [70,71], divergence measure [72], and correlation [73]. BBA has two features: m(∅) = 0 and ∑ A⊆Θ m(A) = 1. It should be mentioned that the BBA of an empty set in classical evidence theory is zero [74].
For the same evidence, different Basic Belief Assignments will be obtained due to different independent evidence sources. Assuming the frame of discernment is Θ, m 1 , m 2 , m 3 , · · · m n are n different BBAs which are all independent. According to Dempster's combination rule, the result is presented as follows: K is normalization factor, which is defined as follows: The reliability factor α(α ∈ [0, 1]) is given to construct the discounted mass function α m, m is one of the BBAs on the identification frame Θ:

Deng Entropy
Inspired by Shannon Entropy, a new uncertainty method called Deng Entropy is proposed [55]: As shown in the above definition, different from the classical Shannon entropy, the belief for each focal element A is divided by 2 |A| − 1 , which means the potential number of states in A. Through a simple transformation, it is found that Deng entropy is actually a type of composite measure, as follows: If the quotient rule of logarithm transformation of Deng Entropy is carried out, it is actually a comprehensive measurement: where the first term could be explained as a measure of total nonspecificity in the mass function m, and the second term could be interpreted as the measure of discord of the mass function among distinct focal elements.

PCC-Tree
During building decision trees, the Pearson's correlation coefficient can be used as the optimal splitting point-PCC-Tree [61].
Following the idea of building the traditional decision tree, one new type of decision tree was reconstructed by Pearson's correlation coefficient through a top-down recursive way. The detailed constructing process can be found in Algorithm 1.

Algorithm 1 Constructing a PCC-Tree
, where x i is the i th instance with n condition attributes {A k } n k=1 and one decision attribute D; the stopping criterion ε. Ensure: A PCC-Tree.
if the samples in X belong to some class then Mark X as a leaf node and assign the class as its label. return. end if for each attribute A k , k = 1, 2, · · · , n in X do where P denotes Pearson's correlation coefficient and V denotes one vector. end for c * jk = arg max c j P c j (A k ). end for Get the best attribute A k * and the splitting point c k * , where k * = arg max k P c j (A k ). Suppose p(X) is the proportion of samples covered by X. if p(X) < ε then Mark X as leaf node. Assign the maximum class of samples in X to this leaf node. return else Split X into two subsets X 1 and X 2 , based on A k * and c k * . if p(X 1 ) == 0 or p(X 2 ) == 0 then Mark X as a leaf node. Assign the maximum class of samples in X to this leaf node. return end if Recursively search the new tree nodes from X 1 and X 2 by Algorithm 1, respectively. end if

Proposed Method
Evidential decision tree is introduced in this section. Motivated by the idea of building a decision tree based on Pearson's correlation coefficient, the Deng Entropy is calculated as a measure in splitting rules processing the decision tree. The difference is that the relation between the probability distributions of attributes and the probability distribution of decision attributes can be measured by Pearson's correlation coefficient, but BBAs can not in terms of uncertainty. Thus, the Deng Entropy is proposed in this paper, as a measure of splitting rules processing in the decision tree. In the end, the decision tree is built in the situation of uncertainty.

BBA Determination
It is an open issue to determine the BBAs of attributes. In this paper, one of them is chosen to determine the BBAs [75]. The procedures are introduced in detail as follows.

1.
Step 1: Normality test is carried out for each attribute column from each training set class. Consider a case where there are N samples in each class i(i = 1, 2, · · · , n) in the training set, and the attribute j(j = 1, 2, · · · , k) column (length N) are normality tested to get a Normality Index for the attribute j of class i, donated as N I ij (binary expression). If N I ij = 0, it means the selected attribute obeys the experimental assumption. Otherwise, if N I ij = 1, it represents that the attribute does not follow normal distribution. Transformation of the original data to an equivalent normal space will occur when condition ∑ n i=1 N I ij ≥ n 2 is adopted.

2.
Step 2: Calculate the value of the mean and the sample standard deviation of each sample for selected class and selected attribute.
x ijl is the sample value of the j th attribute from the l th sample in class i. Thus, obtain the corresponding normal distribution function: For each attribute, n normal distribution functions (or curves) can be obtained as models of different classes in the specific attribute.

3.
Step 3: Determine the relationship between the test sample and the normal distribution models. Choose a sample from the test set, the n intersection of the selected sample is obtained by calculating the intersection of x j (j = 1, 2, · · · , k) and the n normal distribution functions f (x; µ ij , s 2 ij ).

Deng Entropy Calculation
In this part, Deng Entropy is used to measure the degree of uncertainty of BBAs in each attribute. Deng Entropy will then be used as the measure of splitting rules. According to Equations (5) and (9), the Deng Entropy can be calculated as follows:

Evidential Decision Tree Construction
Based on the above equations, the decision tree based on Deng Entropy can be constructed in a top-down recursive way, which follows the traditional progress of decision trees. Firstly, the algorithm is proposed to find the best attribute for splitting rules shown in Algorithm 2. if the samples in X belong to some class then Mark X as a leaf node and assign the class as its label. return. end if for each attribute A k , k = 1, 2, · · · , n in X do

Require: A root node
The smaller the entropy value, the better the subsequent division. end for Get the best attribute A k * and the splitting point c k * . Secondly, Algorithm 3 is proposed to classify samples by maximum value and minimum value of training set and find the child nodes of decision tree. In this section, the implementation of the algorithm is illustrated by taking the case of only three classes as an example. Similar to Algorithm 3, branches only need to be added when the number of classes increases.

Algorithm 3 Construct an Evidential Decision Tree
Require: Set attributes as Features. Set classes as A,B,C, etc. Ensure: An Evidential Decision Tree.
for All samples do

An Illustration For Evidential Decision Tree Construction
Assuming that there is a set of training instance S = {e 1 , e 2 , · · · , e N }, λ = {A 1 , A 2 , · · · , A n } is a set of evidential test attributes, and each attribute A k is represented by a belief function on the set of possible terms. Let D be the decision attribute and the members of it compose the frame of discernment Θ.
In order to better illustrate the implementation of the algorithm in the process of building a decision tree based on Deng entropy, a numerical example shown in Table 1 is given to illustrate the meaning of each notations.  · · · · · · · · · · · · · · · · · · · · · In this example, there are two test attributes and one decision attribute. According to proposed approach steps, the Deng entropy should be calculated under these circumstances.
In the implementation of Algorithm 2, A k means each attribute, c k i represents the value of each focal element in the identification framework for the instance e i of attribute A k . In other words, c k i is the term m j (w) in Equation (10).
For the two properties of Table 1, there are some specific notation representations: By comparing the calculation result of each attribute of Deng entropy, Algorithm 2 can find the father nodes of the decision tree, and Algorithm 3 is used to find the child of the decision tree.

The Application of the Proposed Method
The Iris data set contains three classes, and each classes has 50 samples. These BBAs are used to generate the evidential decision tree instead of being combined to do classification work.
First, the iris data set is used to generate the BBAs shown in Table 2. Set classes as A,B,C, etc.  0.8978 0.1022 0 · · · · · · · · · · · · Second, samples should be classified simply by maximum value and minimum value of iris dataset used in Algorithm 3, shown in Table 3. Classifications of wines are shown in Table 4.  Then, according to the BBAs in Table 2, the Deng entropy is calculated, as shown in Table 5, which will be used as the measure of splitting rules to find the best splitting attribute. Deng entropy for wines are shown in Table 6. Finally, Algorithm 2 is used to find the father nodes of decision tree and Algorithm 3 is used to find the child nodes of decision tree. In the end, the evidential decision tree for iris is constructed and shown in Figure 1; and is shown for wines in Figure 2.

Analysis of the Experiments
The evidential decision tree of the iris dataset is constructed, as shown in Figure 1. According to SL, SW, PL, and PW's value of Deng entropy, the father nodes are PW, SW, PL, and SL. Then, according to the designed Algorithm 3, the child nodes are replenished to build the complete decision tree. A total of 150 samples are classified using the entropy decision. As a result, 147 samples can be absolutely classified and 3 samples cannot be classified. Almost 98% of samples are classified under the uncertain situation. In the process of building the evidential decision tree, the lowest value of Deng entropy (PW) firstly is used as the best splitting attribute, which is efficient to classify almost 3/4 of all samples into certain decision attributes. The reason is that the Deng entropy measures the uncertain degree of BBAs. The lower the Deng entropy is, the more accurate the attribute can classify samples. The wine dataset is also used to generate the evidential decision tree for classification. Moreover, there have been more experiments conducted to have a comparison. The average accuracy is 95%, which is much higher than traditional decision methods like Exhaustive CHAID, CART, CHAID, and QUEST. The same applies for wine, shown in Table 7 and Figure 3.  Compared with evidence fusion processing fuzzy data classification, in terms of time complexity of the algorithm, the evidential decision tree is almost O(n) during the process of building the decision tree, which can still complete the task of fuzzy data classification. Instead, the traditional evidence combination methods at least cause O(n 2 ) since the orthogonal-sum calculator (Equation (2)) is used in the evidence combination equation. The reason why time complexity of the algorithm increases is that the measurement of Deng entropy is directly used as the indicator of information gain before building the tree.

Conclusions
The existing methods have been based on Pearson's correlation coefficient and information entropy to find the best splitting attribute in the process of building a decision tree. However, they are all impossible to handle with uncertain data classification, since Pearson's correlation coefficient and the traditional information entropy both can only be used in the probability problem. When it comes to uncertain issues, the definition of BBA in D-S evidence theory can be seen as the probability in uncertain problems. Moreover, motivated by the idea of Deng entropy-which can measure the uncertain degree of BBAs-the evidential decision tree is proposed in this paper. The Deng entropy values of attributes' BBAs are used as the measurement of the best splitting attribute. The lower the Deng entropy is, the more accurate the attribute can classify samples. Without using BBAs combination rules, 98% samples of iris and 95% samples of wine can be classified into certain decision attributes. In other words, the application of the evidential decision tree based on belief entropy efficiently reduces the complexity of algorithms for fuzzy data classification.
Author Contributions: M.L., H.X., and Y.D. conceived the idea of the study, performed the research, analyzed data, and wrote the manuscript.

Funding:
The work is partially supported by the National Natural Science Foundation of China (Grant Nos. 61573290, 61973332).