Decision Rules Derived from Optimal Decision Trees with Hypotheses

Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.


Introduction
Decision trees are commonly used as classifiers, as an algorithmic tool for solving various problems, and a means of representing information [1][2][3]. They form a part of statistical learning, which refers to a vast set of tools for understanding data [4]. Conventional decision trees studied in test theory [5], rough set theory [6][7][8], and many other areas of computer science exploit queries based on a single attribute. In [9][10][11][12], we considered decision trees that also exploit queries based on hypotheses. Such decision trees are analogous to the tools that have been analyzed in exact learning [13][14][15], where both membership and equivalence queries are used.
In the present paper, we analyze decision trees with hypotheses as a means for representation of information. We design dynamic programming algorithms to optimize such trees corresponding to two cost functions. For various decision tables, we build optimal decision trees and analyze the length and coverage of decision rules extracted from the constructed trees to study which kinds of decision trees are more suitable for the representation of information.
Let us have a decision table T that contains n attributes. We can use two types of queries in the decision trees for this table. We can ask about the value of an attribute. As a 1.
Using both attributes and hypotheses. 4.
Using attributes as well as proper hypotheses.
We analyzed four different cost functions for the decision trees: the depth, the number of realizable nodes, the number of realizable leaf nodes, and the number of internal nodes. We define a node as realizable relative to a given decision table if a computation can pass through this node for at least one row of the considered decision table.
Previously, we proposed a dynamic programming algorithm in [12] for each of these four cost functions. When we give a decision table and a type of decision tree to this algorithm, it returns the minimum cost of a decision tree of a given type for the given table.
The results of the computer experiments show that decision trees with hypotheses can have less complexity than conventional decision trees. It means that they can be used as a means for the representation of information.
The present paper has two aims. The first aim is to construct optimal decision trees with hypotheses. We know that such trees can be used for the representation of information (especially decision trees of type 3). However, the algorithms from [12] were designed only to find the complexity of optimal trees. The algorithms for the two cost functions (the depth and the number of internal nodes) can be modified to build optimal decision trees. Unfortunately, we cannot use a similar approach to build optimal decision trees of types 2 and 3 relative to the number of realizable nodes and optimal decision trees of type 2 relative to the number of realizable leaf nodes.
The second aim is to study the length and coverage of decision rules extracted from the optimal decision trees. Decision rules can be considered one of the simplest and most understandable models for the representation of information. Deriving decision rules from decision trees is a well-known approach. We want to confirm that the decision rules derived from decision trees with hypotheses can be better than the rules derived from conventional decision trees.
For computer experiments, we chose eight decision tables from the UCI ML Repository [16] as well as 100 randomly generated Boolean functions that contain n variables (n = 3, . . . , 6). We constructed optimal (relative to the depth or to the number of internal nodes) decision trees of five types for these tables. Then we analyzed the length and coverage of decision rules extracted from these trees. For a decision tree with hypotheses for some rows of the considered decision table, it can be more than one derived decision rule that covers the row. In this case, for each row we chose the best rule. The results of the computer experiments show that the decision rules derived from the decision trees with hypotheses, in many cases, are better than the ones derived from conventional decision trees.
The novelty of the paper is directly related to its two main contributions: (i) the modification of dynamic programming algorithms described in [12] such that the modified algorithms can now construct optimal decision trees of five types relative to two cost functions and (ii) the experimental confirmation that the decision rules derived from the decision trees with hypotheses can be more suitable for the representation of information than the decision rules derived from conventional decision trees.
To make the paper more understandable, we add to it slightly modified definitions and one algorithm from [12].
We present the remaining parts of the paper as follows: important notions in Sections 2 and 3, the decision tree optimization based on dynamic programming algorithms in Sections 4-6, experimental results in Section 7, and short conclusions in Section 8. Tables   We can define a decision table T as follows: • It is a rectangular Furthermore, we consider the following notation for T:

Decision
is the set of decisions that are attached to rows.
is empty when m = 0). We denote by TS the subtable of T consisting of all rows of T that have values δ 1 , . . . , δ m when they intersect with the columns f i 1 , . . . , f i m . Such subtables are called separable subtables of T.

Decision Trees and Rules
In this section, we define notions of decision trees and rules related with a given nonempty decision table T that contains n conditional attributes f 1 , . . . , f n . Let us consider the decision trees in connection with two types of queries. The first type of query is to ask the value of an attribute f i ∈ F(T) = { f 1 , . . . , f n }. The answer of this query is from the set A( f i ) = {{ f i = δ} : δ ∈ E(T, f i )}. The second type of query is to ask about a hypothesis over T in the form of H = { f 1 = δ 1 , . . . , f n = δ n } where δ 1 ∈ E(T, f 1 ), . . . , δ n ∈ E(T, f n ). The answer of this query is from the set A(H) = {H, { f 1 = σ 1 }, . . . , { f n = σ n } : σ 1 ∈ E(T, f 1 ) \ {δ 1 }, ..., σ n ∈ E(T, f n ) \ {δ n }}. If the answer is H, then the hypothesis is true. Other answers are counterexamples. Note that H is a proper hypothesis for T if (δ 1 , . . . , δ n ) is a row of the table T.
A decision tree over T is a tagged finite directed rooted tree, where the following hold: • We label each leaf node by a number from the set D(T) ∪ {0}. • We label each internal node by a hypothesis over T or an attribute from the set F(T). In both cases, there is exactly one edge leaving this node for each answer, either from the set A(H) in the case of hypothesis query or from the set A( f i ) in the case of attribute query, and no other edges exit from this node.
Let us consider a decision tree Γ over T. If v is a node of Γ, then we define an equation system S(Γ, v) over T corresponding to the node v. We denote the directed path from the root of Γ to the node v as ξ. When ξ does not have any internal nodes, then S(Γ, v) is the empty system. On the other side, if it has internal nodes, then S(Γ, v) is the union of the systems of equation attached to the edges in ξ.
We consider a decision tree Γ over T as a decision tree for T if, for any node v of Γ, the following hold: • When the node v is a leaf node, then the subtable TS(Γ, v) is degenerate and vice versa. • When v is a leaf node and the subtable TS(Γ, v) is empty, then we label the node v by the decision 0. • When v is a leaf node and the subtable TS(Γ, v) is nonempty, then we label the node v by the decision attached to all rows of TS(Γ, v).
An arbitrary directed path ξ from the root to a leaf node v in Γ is called a complete path in Γ. Denote T(ξ) = TS(Γ, v). The depth of a decision tree Γ is analogous to its time complexity. We denote its depth by h(Γ), which is defined as the maximum number of internal nodes in a complete path in the tree. Similarly, the number of internal nodes in a decision tree Γ (denoted by L w (Γ)) is analogous to its space complexity.
Let Γ be a decision tree for T, ξ be a complete path in Γ such that T(ξ) is a nonempty table, and the leaf node of the path ξ be tagged with the decision d. We now define a system of equations S(ξ). S(ξ) is the empty system in the case of no internal nodes in ξ. Let us assume now that ξ contains at least one internal node. We now transform systems of equations attached to edges leaving internal nodes of ξ. If an edge is tagged with an equation system containing exactly one equation, then we not change this system. Let an edge e leaving a internal node v be tagged with an equation system containing more than one equation. Then v is tagged with a hypothesis H and e is tagged with the equation system H. (Note that if such a node exists, then it is the last internal node in the complete path ξ.) In this case, we remove from the equation system H attached to e all equations of the kind f j = σ such that f j / ∈ E(TS(Γ, v)). Then, we can obtain S(ξ) as the union of new equation systems corresponding to edges in the path ξ. One can show that T(ξ) = TS(ξ).
We correspond to the complete path ξ the decision rule, We denote this rule by rule(ξ). The number of equations in the equation system S(ξ) is called the length of the rule rule(ξ) and is denoted l(rule(ξ)). The number of rows in the subtable TS(ξ) is called the coverage of the rule rule(ξ) and is denoted c(rule(ξ)).
Denote Ξ(T, Γ) the set of complete paths ξ in Γ such that the table T(ξ) is nonempty and Rows(T) the set of rows of the decision table T. For a row r ∈ Rows(T), we denote by l(r, T, Γ) the minimum length of a rule rule(ξ) such that ξ ∈ Ξ(T, Γ) and r is a row of the subtable TS(ξ), and we denote by c(r, T, Γ) the maximum coverage of a rule rule(ξ) such that ξ ∈ Ξ(T, Γ) and r is a row of the subtable TS(ξ). We use the following notation:

Construction of Directed Acyclic Graph ∆(T)
Let us consider a nonempty decision table T that has n conditional attributes f 1 , . . . , f n . The Algorithm 1 A DAG is used for the construction of a directed acyclic graph (DAG) ∆(T). Consequently, this DAG is used for the construction of optimal decision trees. Some separable subtables of the table T are the nodes of this DAG. We process one node during each iteration of the algorithm. We begin by the graph consisting of unprocessed one node T and end by processing all nodes of the graph. This algorithm was described and used in [9,10,12]. It is a special version of the more general algorithm considered in [17].

1.
Build the graph consisting of one node T that is not tagged as processed.

2.
Check the processing of all nodes of the graph is completed or not. If yes, then the algorithm halts and returns the resulting graph as ∆(T). Otherwise, select a node (table) Θ which is yet unprocessed.

3.
Check node Θ is degenerate or not.
(a) If yes, then tag the node Θ as processed and move to step 2.
If no, then draw a bundle of edges from the node Θ for each Then draw k edges from Θ and attach to these edges systems of equations In case some of the nodes Θ{ f i = a 1 }, . . . , Θ{ f i = a k } are not available in the graph, then add these nodes to the graph. Tag the node Θ as processed and move to step 2.

Construction of Decision Trees with Minimum Depth
Let us consider a nonempty decision table T that contains n conditional attributes f 1 , . . . , f n and k ∈ {1, . . . , 5}. We can use the DAG ∆(T) to construct a decision tree Γ (k) (T) of the type k with the minimum depth for the decision table T. For this purpose, we have to construct, corresponding to each vertex Θ of ∆(T), a decision tree Γ (k) (Θ) of the type k with minimum depth for the table Θ. It is necessary not only consider subtables corresponding to the nodes of ∆(T) but also empty subtable Λ of T as well as subtables T r containing only one row r of T, which are not nodes of ∆(T). The idea is to start with these special subtables as well as leaf nodes of ∆(T) that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T.
Let us consider the case when Θ is a leaf node of ∆(T) or Θ = T r for a row r of the table T, or T = Λ. If Θ is nonempty, then Γ (k) (Θ) has only one node that is tagged by a decision which is attached to all rows of Θ. Otherwise, it is tagged with 0.
Let us consider other case when Θ is an internal node of ∆(T) and the construction of the decision tree Γ (k) (Θ ) is already completed for each child Θ of Θ. Based on these trees, a decision tree for Θ having the minimum depth can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows: • By an attribute from F(T) (such decision tree can be designated as Γ (k) a (Θ)). • By a hypothesis over T (such decision tree can be designated as Γ By a proper hypothesis over T (such decision tree can be designated as Γ We now concentrate on a decision tree Γ( f i ) for the node Θ, where the root is tagged by an attribute f i ∈ E(Θ). For each δ ∈ E(T, f i ), there exists an edge that exits the root and enters the root of the decision tree Γ (k) (Θ{ f i = δ}). We tag this edge by the equation One can easily prove using (1) that Γ( f i ) is a decision tree with the minimum depth for Θ such that the root of this tree is tagged by the attribute f i and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider attributes f i ∈ F(T) \ E(Θ). The reason is that for such f i , we can find δ ∈ E(T, f i ) with Θ{ f i = δ} = Θ. Therefore, we cannot construct an optimal tree for Θ based on f i .

Construction of the tree Γ
(k) a (Θ). We build the set E(Θ). For any f i ∈ E(Θ), construct the decision tree Γ( f i ) and choose among these trees a tree with the minimum depth. Return this tree as Γ (k) a (Θ). Let us consider a hypothesis H = { f 1 = δ 1 , . . . , f n = δ n } over T. We call this hypothesis admissible for Θ and an attribute This hypothesis is not admissible for Θ and an attribute f i ∈ F(T) if and only if |E(Θ, f i )| = 1 and δ i / ∈ E(Θ, f i ). We call H admissible for Θ when we find that H is admissible for Θ and any attribute f i ∈ F(T).
We now describe a decision tree Γ(H) for Θ. The root of this tree is tagged by an admissible hypothesis H = { f 1 = δ 1 , . . . , f n = δ n } for Θ. For any equation system S ∈ A(H), there is an edge that exits the root of Γ(H) and enters the root of the tree Γ (k) (ΘS) This edge is tagged by the equation system S.
It is obvious that One can easily prove using (2) that Γ(H) is a decision tree with the minimum depth for Θ such that the root of this tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider hypotheses H that are not admissible for Θ. The reason is that for such H, we can find an equation system S ∈ A(H) with ΘS = Θ. Therefore, we cannot construct an optimal decision tree for Θ based on H. (2), one can prove the correctness of this procedure. . . , f n = δ n }. We inspect if H r is admissible for Θ. If yes, then we construct the decision tree Γ(H r ). We choose among the constructed trees a tree with the minimum depth. Return this tree as Γ (k) p (Θ). Given input of a decision table T and k ∈ {1, . . . , 5}, the following Algorithm 2 C h builds for each node Θ of the DAG ∆(T) a decision tree Γ (k) (Θ) of the type k for the table Θ having the minimum depth.

Construction of the tree Γ
Algorithm 2 C h (construction of the tree Γ (k) (T)). Input: T (a nonempty decision table), ∆(T) (the directed acyclic graph for T), and k (a natural number between 1 to 5). Output: A decision tree Γ (k) (T).

1.
Check all nodes of the DAG ∆(T) whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as Γ (k) (T) and break the algorithm. If not, select a node Θ of the graph ∆(T) that does not have an attached tree. It can be either a leaf node of ∆(T) or an internal node of ∆(T) where all children are tagged with trees.

2.
If Θ is a leaf node, then attach to it the decision tree Γ (k) (Θ) that have only a single node. This node is tagged with the decision attached to all rows of Θ. Move to step 1.

3.
If Θ is not a leaf node, then do the following according to the value k: • When k = 1, construct the tree Γ a (Θ) and attach it to Θ as the tree Γ (1) (Θ). • When k = 2, construct the tree Γ h (Θ) and attach it to Θ as the tree Γ (2) (Θ). • When k = 3, construct the trees Γ p (Θ), choose between them a tree with the minimum depth and attach it to Θ as the tree Γ (5) (Θ).
Move to step 1.
Let T be a decision table and k ∈ {1, . . . , 5}. We use the following notation: l

Construction of Decision Trees Containing Minimum Number of Internal Nodes
Let us consider a nonempty decision table T that contains n conditional attributes f 1 , . . . , f n and k ∈ {1, . . . , 5}. We can use the DAG ∆(T) to construct a decision tree G (k) (T) of the type k with the minimum number of internal nodes for the decision table T. To construct the tree G (k) (T), for each node Θ of the DAG ∆(T), we construct a decision tree G (k) (Θ) of the type k with the minimum number of internal nodes for the decision table Θ. It is necessary to not only consider the subtables corresponding to the nodes of ∆(T) but also the empty subtable Λ of T as well as the subtables T r containing only one row r of T which are not nodes of ∆(T). The idea is to start with these special subtables as well as leaf nodes of ∆(T) that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T.
Let us consider the case when Θ is a leaf node of ∆(T) or Θ = T r for a row r of the table T, or T = Λ. If Θ is nonempty, then the decision tree G (k) (Θ) has only one node that is tagged by a decision which is attached to all rows of Θ. Otherwise, it is tagged with 0.
Let us consider another case when Θ is an internal node of ∆(T) such that the construction of the decision tree G (k) (Θ ) is already completed for each child Θ of Θ. Based on these trees, a decision tree containing the minimum number of internal nodes for Θ can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows: • By an attribute from F(T) (such decision tree can be designated as G  We now concentrate on a decision tree G( f i ) for the node Θ where the root is tagged by an attribute f i ∈ E(Θ). For each δ ∈ E(T, f i ), there is an edge that exits the root and enters the root of the decision tree G (k) (Θ{ f i = δ}). We tag this edge by the equation system { f i = δ}. It is obvious that One can easily prove using (3) that G( f i ) is a decision tree with the minimum number of internal nodes for Θ such that the root of the tree is tagged by the attribute f i and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider attributes f i ∈ F(T) \ E(Θ). The reason is that for such f i , we can find δ ∈ E(T, f i ) with Θ{ f i = δ} = Θ. Therefore, we cannot construct an optimal tree for Θ based on f i .

Construction of the tree G
(k) a (Θ). We build the set of attributes E(Θ). For any f i ∈ E(Θ), construct the decision tree G( f i ) and choose among these trees a tree with the minimum number of internal nodes. We return this tree as G (k) a (Θ). We now describe a decision tree G(H) for Θ. The root of this tree is tagged by an admissible hypothesis H = { f 1 = δ 1 , . . . , f n = δ n } for Θ. For any equation system S ∈ A(H), there is an edge that exits the root of G(H) and enters the root of the tree G (k) (ΘS). This edge is tagged by the equation system S. It is obvious that One can easily prove using (4) that G(H) is a decision tree with the minimum number of internal nodes for Θ such that the root of the tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider hypotheses H that are not admissible for Θ. The reason is that for such H, we can find an equation system S ∈ A(H) with ΘS = Θ. Therefore, we cannot construct an optimal decision tree for Θ based on such H.

Construction of the tree G
h (Θ). Using (4), one can prove the correctness of this procedure. . . , f n = δ n }. We inspect if H r is admissible for Θ. If yes, then we construct the decision tree G(H r ). We choose among the constructed trees a tree with the minimum number of internal nodes. Return this tree as G

Construction of the tree G
Given input of a decision table T and k ∈ {1, . . . , 5}, the following Algorithm 3 C L w builds for each node Θ of the DAG ∆(T) a decision tree G (k) (Θ) of the type k for the table Θ having the minimum number of internal nodes. Algorithm 3 C L w (construction of the tree G (k) (T)). Input: T (a nonempty decision table), ∆(T) (the directed acyclic graph for T), and k (a natural number between 1 to 5). Output: A decision tree G (k) (T).

1.
Check all nodes of the DAG ∆(T) whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as G (k) (T) and break the algorithm. If not, select a node Θ of the graph ∆(T) that does not have an attached tree. It can be either a leaf node of ∆(T) or an internal node of ∆(T) where all children are tagged with trees.

2.
If Θ is a leaf node, then attach to it the decision tree G (k) (Θ) that have only a single node. This node is tagged with the decision attached to all rows of Θ. Move to step 1.

3.
If Θ is not a leaf node, then do the following according to the value k: • When k = 1, construct the tree G (1) a (Θ) and attach it to Θ as the tree G (1) (Θ). • When k = 2, construct the tree G (2) h (Θ) and attach it to Θ as the tree G (2) (Θ). • When k = 3, construct the trees G p (Θ), choose between them a tree with the minimum number of internal nodes and attach it to Θ as the tree G (5) (Θ).
Move to step 1.

Experimental Results and Discussion
In this section, we describe the results of the experiments. First, we accomplished the experiments on eight decision tables from the UCI ML Repository [16]. We give the description of these tables in Table 1 where we show first the name of the table (Name), then number of rows (#Rows) and the number of attributes (#Attrs). We arranged the decision tables in Table 1 based on the number of rows. For each of these tables, we built an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length. Next, we experimented with 100 Boolean functions having n variables (n = 3, . . . , 6) which are generated randomly. Let f be such a Boolean function with n variables x 1 , . . . , x n . We can map it to a decision table T f having n attributes x 1 , . . . , x n . This table has 2 n rows corresponding to all possible n-tuples of variable values. We label each row with the decision that is the value of the function f for the considered row. The decision trees for the table T f are interpreted as decision trees that compute the function f .
For each of tables representing the generated Boolean functions, we build an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length.

Decision Trees with Minimum Depth
The results of experiments based on eight decision tables from [16] and decision trees optimal relative to the depth are represented in Tables 2 and 3. The first column of Table 2 contains the name of the considered decision table T. The last five columns contain values l  Table 2. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum depth.
Decision Table T l The first column of Table 3 Table 3. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum depth.
Decision Table T c (1)  The results of experiments based on Boolean functions and decision trees optimal relative to the depth are represented in Tables 4 and 5. The first column of Table 4 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values l (1) h , . . . , l (5) h in the format min Avg max (minimum values of Avg for each n are in bold).
The first column of Table 5

Decision Trees Containing Minimum Number of Internal Nodes
We present the results based on the decision tables from [16] and decision trees optimal relative to the number of internal nodes in Tables 6 and 7. The first column of Table 6 contains the name of the considered decision table T. The last five columns contain values l (1) L w (T) (minimum values for each decision table are in bold). Table 6. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum number of internal nodes. Table T l The first column of Table 7 contains the name of the considered decision table T. The last five columns contain values c The results of experiments based on Boolean functions and decision trees optimal relative to the number of internal nodes are represented in Tables 8 and 9. The first column of Table 8

Decision
L w in the format min Avg max (minimum values of Avg for each n are in bold). Table 7. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum number of internal nodes. Table T c  Table 8. Results for Boolean functions: length of decision rules derived from decision trees with minimum number of internal nodes. The first column of Table 9 contains the number of variables n in the considered Boolean functions. The last five columns contain information about values c

Number of Variables n l
L w in the format min Avg max (maximum values of Avg for each n are in bold). Table 9. Results for Boolean functions: coverage of decision rules derived from decision trees with minimum number of internal nodes.

Analysis of Experimental Results
The experimental results show that the decision rules derived from decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees. In particular, in the case of decision trees with the minimum depth, for each row in Tables 2-5, the results for type 2 decision trees are better than the results for type 1 decision trees. In the case of decision trees with a minimum number of internal nodes, for each row of Tables 6-9 (with the exception of the row ZOO-DATA in Table 7) there is a number k ∈ {2, . . . , 5} such that the results for type k decision trees are superior compared to the results for type 1 decision trees.
Note that for the decision trees with the minimum depth, for each decision table from [16] considered in this paper, the best results related to the length and the coverage among decision trees of types {2, . . . , 5} are close to the optimal ones obtained in [18] with the help of dynamic programming algorithms for the construction of optimal decision rules. Results for the decision trees of the type 1 are, generally, far from the optimal ones.
From the obtained experimental results, it follows that the decision rules derived from optimal decision trees with hypotheses are more preferable as a tool for the representation of information than the decision rules derived from optimal conventional decision trees.

Conclusions
In this paper, we studied modified decision trees that use two types of queries. We constructed optimal trees relative to two cost functions for a number of known datasets from the UCI Machine Learning Repository and randomly generated Boolean functions, and compared the length as well as coverage of decision rules extracted from the constructed decision trees. The experimental results confirmed that the decision rules derived from the decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees.  Data Availability Statement: Data are available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. These data can be found here: http://archive.ics.uci.edu/ml (accessed on 12 April 2017).