Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory

: Decision rule extraction is an important tool for artiﬁcial intelligence and data mining, but decision rule redundancy reduces the generalization ability of causal trees. In order to better reduce the size of causal trees and improve the classiﬁcation accuracy, based on factor space theory and aiming at the elimination of noise and special samples in the dataset using the extension decision degree criteria, the conditional factor corresponding to the optimal extension decision degree is used as the branch node of the tree, and the abnormal state object is removed from the conditional factor, recurring to obtain the streamlined causal tree algorithm. Comparison with other classiﬁcation algorithms shows that the streamlined causal tree algorithm produces the smallest causal tree size, the least redundant rules, and the best classiﬁcation accuracy.


Introduction
In 1982, Wang Peizhuang [1] proposed the idea of factor space from the origin of object cognition and based on it, established the mathematical theory of knowledge representation-factor space theory-which is the earliest basic theory of artificial intelligence in international intelligence mathematics. In 2014, Wang Peizhuang [2] et al. Proceeded with the rapid extraction of causal rules based on the logical nature of reasoning and proposed the factor analysis method, which is one of the core algorithms in factor space and provides important tools for artificial intelligence and data mining. Bao Yanke [3] et al. proposed a subtraction and rotation calculation to improve the utilization of factor analysis methods in the training set sample information. Liu Haitao [4] et al. provided a reasoning model for the factor analysis method, which solved the problem of object recognition caused by incomplete training set samples and improved the accuracy of the factor analysis method. Wang Huadong [5] adopts a column-by-column advancement method when selecting factors for superposition division to improve the accuracy and running speed of the factor analysis method.
However, current literature studies have not significantly reduced the size of the causal tree in the factor analysis method. The main method to reduce the size of the causal tree is pruning [6]. Current literature research shows that pruning can reduce the size of causal trees to a certain extent, but the resulting causal trees are not streamlined. The size of the causal tree reflects the generalization ability of the tree to a certain extent. The more complex the rules extracted from the dataset, the larger the size of the tree. Rule redundancy will lead to overfitting and weaken the generalization ability. It is particularly important to minimize the size of the causal tree without affecting classification accuracy. Therefore, this paper proposes a streamlined causal tree algorithm where, by using a self-defined threshold, the noise samples in the training set are filtered out and the optimized causal tree is trained in the same step, thereby greatly reducing the size of the causal tree and improving its classification performance. In addition, the deletion of the determining region is a key factor in reducing the computational complexity of the algorithm and achieving fast convergence. The streamlined causal tree algorithm can find a larger determining region, enabling the algorithm to converge faster under the optimal threshold.

Basic Knowledge
Factor is a key to describing everything and can be understood as a generalized gene. From a mathematical perspective, factors are a special mapping that maps objects onto their phases. The basic theories related to factor space [7] are as follows: Factors influence each other, restrict each other, and cause and affect each other. In the factor analysis method, the factor g that is concerned is called the result factor, and those factors { f 1 , f 2 , · · · f n } that have an influence on it are called conditional factors.
The causal analysis table takes the object as the row and the conditional and result factors as the columns, as shown in Table 1. The i-th row and j-th column elements in Table 1 represent the state of the i-th object under the j-th factor.

U
F→g Each row of the causal analysis table is the coordinate of an object in the factor space. A finite number of objects constitute a domain U = {u 1 , u 2 , · · · , u m }. The conditional factors are F = { f 1 , f 2 , · · · , f n }. The state space of conditional factors is I( f j ) = a j1 , a j2 , · · · , a jk (j = 1, 2, · · · , n). The result factor is g. The state space of the result factor is I(g) = {g 1 , g 2 , · · · , g s }.

Definition 1.
Given a conditional factor f j and the state a t taken by that factor, remember , it is said to be a determining class [a t ] of factor f j . The union of all determining classes of factor f j is called the determining region for the result factor. The ratio of the number of rows h in the determining region of the factor f j to the number of rows in the table (i.e., the number of all objects) m is called its determining degree on the result factor g, denoted as d( f j ) = h/m.

Definition 2.
If the class [a t ] of conditional factors f j is a determining class and all objects in the class [a t ] have a unique and definite result, then it is called "if f j is a t , then the result g is g l ". This sentence is a reasoning sentence determined by conditional factors f j , denoted as f j = a t → g = g l .

The Streamlined Causal Tree Algorithm
The factor analysis method in the factor space can quickly and concisely analyze the causal relationships contained in the dataset, establish causal rules, and obtain a causal tree. However, when using the factor analysis method to train causal trees, when there are too many conditional factors in the dataset or when there are many states of conditional factors, the trained causal tree rules are redundant, and the prediction effect is poor. Since the calculation principle of determining degree is too absolute, noisy object data and special object data generated due to input errors, measurement equipment failures, and other reasons in the dataset will have a significant negative impact on the training of the factor analysis method. This means that it cannot cope with noisy data, has poor robustness, and the decision effectiveness of factors cannot be fully utilized, thus limiting the application of this algorithm. Even if pre-and post-pruning are used for the trained causal tree, the negative impact is inevitable.
In order to solve the interference of noisy data, improve the robustness of causal tree algorithms, reduce the size of causal trees, and improve the accuracy of classification prediction, a streamlined causal tree algorithm is proposed.

Algorithm Principle
The purpose of factor analysis in factor space is to transform a table into a set of inference sentences (decision rules). Since the determining class is contained by the result class, an inference sentence is formed from the determining class to the result class containing it, and finally a rule causal tree is obtained from the conditional factor to the result factor.
, l = 1, 2, · · · , s for all states {g 1 , g 2 , · · · , g s } of the result factor g. Given a α threshold (α ∈ (0.5 is an extended determining class of factor f j . The union of all extended determining classes of factor f j is called the extended determining region of the result factor g.

Algorithm Principle
The extended determining degree criterion, which adopts the extended determining degree with the essence of reasoning set logic as the criterion, selects the optimal conditional factors and achieves fast convergence of the algorithm by expanding the determining region.

Setting of the α Threshold
If the α threshold is too low, during the training process, the conditions are easy to meet, which will delete too many non-noise objects and special objects, easily leading to underfitting, resulting in a single decision tree rule and loss of decision value.
If the α threshold is too high, during the training process, the conditions are difficult to satisfy, which is not enough to delete noisy objects and special objects and cannot achieve the purpose of optimizing the training set and reducing the size of the causal tree.
Through experiments, it was found that the α threshold range is generally 0.8~0.95.
Step 1. Divide the dataset into Train_data and Test_data. Given the initial α threshold.
Step 3. Calculate the ratio r of q(a jt , g l ) to q(a jt ).
Step 4. Determine the extended determining class. Compare r a jt,1 , r a jt,2 , · · · , r a jt,s under the same state of the conditional factor f j with α. If r a jt,l > α, then all objects whose state a jt of the conditional factor f j corresponds to the result factor state g l are extended determining classes.
Step 5. Determine the extended determining region. Union of the extended determining classes of each conditional factor to obtain the extended determining region.
Step 6. Calculate the extended determining degree. Calculate the number of objects in the extended determining region of each conditional factor and obtain the extended determining degree d = {d 1 , d 2 , · · · , d n }. Calculate the maximum extended determining degree d max = max{d 1 , d 2 , · · · , d n }.
Step 7. Update the training set. Suppose that the conditional factor corresponds to d max is f j , if there are q(a jt , g l ) extended determining classes in a certain state of the conditional factor f j , label all objects in the extended determining class as normal objects. At the same time, in this state, there are Q = q(a jt , g 1 ) + q(a jt , g 2 ) + · · · + q(a jt , g l−1 ) + q(a jt , g l+1 ) + · · · + q(a jt , g s ) objects that are not extended determining classes and are marked as abnormal objects, namely noise objects and special objects to be deleted. In the training set Train_data, objects marked abnormal were deleted to obtain a new training set Train_data1.
Step 8. Extraction rules. For Train_data1, it uses d max corresponding to conditional factors to extract decision rules and divides the dataset to obtain sub-datasets.
Step 9. Building a causal tree. Repeat steps 2 to 8 on the sub-dataset to construct a causal tree under the α threshold. Each node of the causal tree satisfies the condition α, and each branch is carried out on the updated training set under the condition α.
Step 10. Select the optimal α threshold. Given a step size step = 0.01, repeat steps 2 to 9. Analyze the relationship between α threshold and the accuracy of causal tree prediction and select the optimal α threshold on the training set.
Output: The causal tree under the optimal α threshold.

Instance Analysis
Five classification datasets in the UCI database were analyzed using the streamlined causal tree algorithm, the factor analysis method, the ID3 algorithm, and the C4.5 algorithm. Tenfold cross-validation was used to obtain the number of decision rules, accuracy, precision, recall, F1-measure, and running time. The running time of the streamlined causal tree algorithm is the causal tree training time under the optimal threshold. The experimental results are shown in Table 2.

Conclusions
The causal tree trained by the streamlined causal tree algorithm has the smallest size, the best classification accuracy and F1-measure, and the least redundant rules. Therefore, the streamlined causal tree algorithm can not only reduce rule redundancy and significantly reduce the size of the causal tree but also improve the classification performance of the causal tree to a certain extent, expanding the theory and application of factor space.
Author Contributions: There are five authors in this paper. K.L. provided the algorithm and software coding for verification analysis and writing of the paper; F.Z. provided the guidance for writing and preparing the first draft; X.L., K.Z. and Y.W. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.