1. Introduction
Decision trees are used in many areas of computer science as a means for knowledge representation, as classifiers, and as algorithms to solve different problems of combinatorial optimization, computational geometry, etc. [
1,
2,
3]. They are studied, in particular, in test theory initiated by Chegis and Yablonskii [
4], rough set theory initiated by Pawlak [
5,
6,
7], and exact learning initiated by Angluin [
8,
9]. These theories are closely related: attributes from rough set theory and test theory correspond to membership queries from exact learning. Exact learning studies additionally the socalled equivalence queries. The notion of “minimally adequate teacher” that allows both membership and equivalence queries was discussed by Angluin in Reference [
10]. Relations between exact learning and PAC learning proposed by Valiant [
11] are discussed in Reference [
8].
In this paper, which is an extension of two conference papers [
12,
13], we add the notion of a hypothesis to the model that has been considered in rough set theory, as well as in test theory. This model allows us to use an analog of equivalence queries. Our goal is to check whether it is possible to reduce the time and space complexity of decision trees if we use additionally hypotheses. Decision trees with less complexity are more understandable and more suitable as a means for knowledge representation. Note that, to improve the understandability, we should not only try to minimize the number of nodes in a decision tree but also its depth that is the unimprovable upper bound on the number of conditions describing objects accepted by a path from the root to a terminal node of the tree. In this paper, we concentrate only on the consideration of complexity of decision trees and do not study many recent problems considered in machine learning [
14,
15,
16,
17].
Let T be a decision table with n conditional attributes ${f}_{1},\dots ,{f}_{n}$ having values from the set $\omega =\{0,1,2,\dots \}$ in which rows are pairwise different, and each row is labeled with a decision from $\omega $. For a given row of T, we should recognize the decision attached to this row. To this end, we can use decision trees based on two types of queries. We can ask about the value of an attribute ${f}_{i}\in \{{f}_{1},\dots ,{f}_{n}\}$ on the given row. We will obtain an answer of the kind ${f}_{i}=\delta $, where $\delta $ is the number in the intersection of the given row and the column ${f}_{i}$. We can also ask if a hypothesis ${f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}$ is true, where ${\delta}_{1},\dots ,{\delta}_{n}$ are numbers from the columns ${f}_{1},\dots ,{f}_{n}$, respectively. Either this hypothesis will be confirmed or we obtain a counterexample in the form ${f}_{i}=\sigma $, where ${f}_{i}\in \{{f}_{1},\dots ,{f}_{n}\}$, and $\sigma $ is a number from the column ${f}_{i}$ different from ${\delta}_{i}$. The considered hypothesis is called proper if $({\delta}_{1},\dots ,{\delta}_{n})$ is a row of the table T.
In this paper, we study four cost functions that characterize the complexity of decision trees: the depth, the number of realizable nodes relative to T, the number of realizable terminal nodes relative to T, and the number of working nodes. We consider the depth of a decision tree as its time complexity, which is equal to the maximum number of queries in a path from the root to a terminal node of the tree. The remaining three cost functions characterize the space complexity of decision trees. A node is called realizable relative to T if, for a row of T and some choice of counterexamples, the computation in the tree will pass through this node. Note that, in the considered trees, all working nodes are realizable.
Decision trees using hypotheses can be essentially more efficient than the decision trees using only attributes. Let us consider an example, the problem of computation of the conjunction ${x}_{1}\wedge \cdots \wedge {x}_{n}$. The minimum depth of a decision tree solving this problem using the attributes ${x}_{1},\dots ,{x}_{n}$ is equal to n. The minimum number of realizable nodes in such decision trees is equal to $2n+1$, the minimum number of working nodes is equal to n, and the minimum number of realizable terminal nodes is equal to $n+1$. However, the minimum depth of a decision tree solving this problem using proper hypotheses is equal to 1: it is enough to ask only about the hypothesis ${x}_{1}=1,\dots ,{x}_{n}=1$. If it is true, then the considered conjunction is equal to 1. Otherwise, it is equal to 0. The obtained decision tree contains one working node and $n+1$ realizable terminal nodes, altogether $n+2$ realizable nodes.
We study the following five types of decision trees:
Decision trees that use only attributes.
Decision trees that use only hypotheses.
Decision trees that use both attributes and hypotheses.
Decision trees that use only proper hypotheses.
Decision trees that use both attributes and proper hypotheses.
For each cost function, we propose a dynamic programming algorithm that, for a given decision table and given type of decision trees, finds the minimum cost of a decision tree of the considered type for this table. Note that dynamic programming algorithms for the optimization of decision trees of the type 1 were studied in Reference [
18] for decision tables with onevalued decisions and in Reference [
19] for decision tables with manyvalued decisions. The dynamic programming algorithms for the optimization of decision trees of all five types were studied in References [
12,
13] for the depth and for the number of realizable nodes.
It is interesting to consider not only specially chosen examples as the conjunction of
n variables. For each cost function, we compute the minimum cost of a decision tree for each of the considered five types for eight decision tables from the UCI ML Repository [
20]. We do the same for randomly generated Boolean functions with
n variables, where
$n=3,\dots ,6$.
From the obtained experimental results, it follows that, generally, the decision trees of the types 3 and 5 have less complexity than the decision trees of the type 1. Therefore, such decision trees can be useful as a means for knowledge representation. Decision trees of the types 2 and 4 have, generally, too many nodes.
Based on the experimental results, we formulate and prove the following hypothesis: for any decision table, we can construct a decision tree with the minimum number of realizable terminal nodes using only attributes.
The motivation for the work is related to the use of decision trees to represent knowledge: we try to reduce the complexity of decision trees (and improve their understandability) by using hypotheses. The main achievements of the work are the following: (i) we have proposed dynamic programming algorithms for optimizing five types of decision trees relative to four cost functions, and (ii) we have shown cases, when the use of hypotheses leads to the decrease in the complexity of decision trees.
2. Decision Tables
A decision table is a table T with $n\ge 1$ columns filled with numbers from the set $\omega =\{0,1,2,\dots \}$. Columns of this table are labeled with conditional attributes ${f}_{1},\dots ,{f}_{n}$. Rows of the table are pairwise different. Each row is labeled with a number from $\omega $ that is interpreted as a decision. Rows of the table are interpreted as tuples of values of the conditional attributes.
Each decision table can be represented by a word (sequence) over the alphabet $\{0,1,;,\}$: numbers from $\omega $ are in binary representation, we use the symbol “;” to separate two numbers from $\omega $, and we use the symbol “” to separate two rows (for each row, we add corresponding decision as the last number in the row). The length of this word is called the size of the decision table.
A decision table T is called empty if it has no rows. The table T is called degenerate if it is empty or all rows of T are labeled with the same decision.
We denote $F\left(T\right)=\{{f}_{1},\dots ,{f}_{n}\}$ and denote by $D\left(T\right)$ the set of decisions attached to the rows of T. For any conditional attribute ${f}_{i}\in F\left(T\right)$, we denote by $E(T,{f}_{i})$ the set of values of the attribute ${f}_{i}$ in the table T. We denote by $E\left(T\right)$ the set of conditional attributes of T for which $\leftE(T,{f}_{i})\right\ge 2$.
A system of equations over
T is an arbitrary equation system of the kind
where
$m\in \omega $,
${f}_{{i}_{1}},\dots ,{f}_{{i}_{m}}\in F\left(T\right)$, and
${\delta}_{1}\in E(T,{f}_{{i}_{1}}),\dots ,{\delta}_{m}\in E(T,{f}_{{i}_{m}})$ (if
$m=0$, then the considered equation system is empty).
Let T be a nonempty table. A subtable of T is a table obtained from T by removal of some rows. We correspond to each equation system S over T a subtable $TS$ of the table T. If the system S is empty, then $TS=T$. Let S be nonempty and $S=$$\{{f}_{{i}_{1}}={\delta}_{1},\dots ,{f}_{{i}_{m}}={\delta}_{m}\}$. Then, $TS$ is the subtable of the table T containing the rows from T that, in the intersection with columns, ${f}_{{i}_{1}},\dots ,{f}_{{i}_{m}}$ have numbers ${\delta}_{1},\dots ,{\delta}_{m}$, respectively. Such nonempty subtables, including the table T, are called separable subtables of T. We denote by $SEP\left(T\right)$ the set of separable subtables of the table T.
3. Decision Trees
Let T be a nonempty decision table with n conditional attributes ${f}_{1},\dots ,{f}_{n}$. We consider the decision trees with two types of queries. We can choose an attribute ${f}_{i}\in F\left(T\right)=\{{f}_{1},\dots ,{f}_{n}\}$ and ask about its value. This query has the set of answers $A\left({f}_{i}\right)=\{\{{f}_{i}=\delta \}:\delta \in E(T,{f}_{i})\}$. We can formulate a hypothesis over T in the form of $H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$, where ${\delta}_{1}\in E(T,{f}_{1}),\dots ,{\delta}_{n}\in E(T,{f}_{n})$, and ask about this hypothesis. This query has the set of answers $A\left(H\right)=\{H,\{{f}_{1}={\sigma}_{1}\},\dots ,\{{f}_{n}={\sigma}_{n}\}:{\sigma}_{1}\in E(T,{f}_{1})\setminus \left\{{\delta}_{1}\right\},\cdots ,{\sigma}_{n}\in E(T,{f}_{n})\setminus \left\{{\delta}_{n}\right\}\}$. The answer H means that the hypothesis is true. Other answers are counterexamples. The hypothesis H is called proper for T if $({\delta}_{1},\dots ,{\delta}_{n})$ is a row of the table T.
A decision tree over T is a marked finite directed tree with the root in which:
Each terminal node is labeled with a number from the set $D\left(T\right)\cup \left\{0\right\}$.
Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set $F\left(T\right)$ or with a hypothesis over T.
If a working node is labeled with an attribute ${f}_{i}$ from $F\left(T\right)$, then, for each answer from the set $A\left({f}_{i}\right)$, there is exactly one edge labeled with this answer, which leave this node and there are no any other edges leaving this node.
If a working node is labeled with a hypothesis $H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ over T, then, for each answer from the set $A\left(H\right)$, there is exactly one edge labeled with this answer, which leaves this node and there are no any other edges leaving this node.
Let $\Gamma $ be a decision tree over T and v be a node of $\Gamma $. We now define an equation system $S(\Gamma ,v)$ over T associated with the node v. We denote by $\xi $ the directed path from the root of $\Gamma $ to the node v. If there are no working nodes in $\xi $, then $S(\Gamma ,v)$ is the empty system. Otherwise, $S(\Gamma ,v)$ is the union of equation systems attached to the edges of the path $\xi $.
A decision tree $\Gamma $ over T is called a decision tree for T if, for any node v of $\Gamma $,
The node v is terminal if and only if the subtable $TS(\Gamma ,v)$ is degenerate.
If v is a terminal node and the subtable $TS(\Gamma ,v)$ is empty, then the node v is labeled with the decision 0.
If v is a terminal node and the subtable $TS(\Gamma ,v)$ is nonempty, then the node v is labeled with the decision attached to all rows of $TS(\Gamma ,v)$.
A complete path in $\Gamma $ is an arbitrary directed path from the root to a terminal node in $\Gamma $. As the time complexity of a decision tree, we consider its depth that is the maximum number of working nodes in a complete path in the tree or, which is the same, the maximum length of a complete path in the tree. We denote by $h(\Gamma )$ the depth of a decision tree $\Gamma $.
As the space complexity of the decision tree $\Gamma $, we consider the number of its realizable relative to T nodes. A node v of $\Gamma $ is called realizable relative to T if and only if the subtable $TS(\Gamma ,v)$ is nonempty. We denote by $L(T,\Gamma )$ the number of nodes in $\Gamma $ that are realizable relative to T. We also consider two more cost functions relative to the space complexity: ${L}_{t}(T,\Gamma )$ — the number of terminal nodes in $\Gamma $ that are realizable relative to T and ${L}_{w}(T,\Gamma )$ — the number of working nodes in $\Gamma $. Note that all working nodes of $\Gamma $ are realizable relative to T.
We will use the following notation:
For $k=1,\dots ,5$, ${h}^{\left(k\right)}\left(T\right)$ is the minimum depth of a decision tree of the type k for T.
For $k=1,\dots ,5$, ${L}^{\left(k\right)}\left(T\right)$ is the minimum number of nodes realizable relative to T in a decision tree of the type k for T.
For $k=1,\dots ,5$, ${L}_{t}^{\left(k\right)}\left(T\right)$ is the minimum number of terminal nodes realizable relative to T in a decision tree of the type k for T.
For $k=1,\dots ,5$, ${L}_{w}^{\left(k\right)}\left(T\right)$ is the minimum number of working nodes in a decision tree of the type k for T.
4. Construction of Directed Acyclic Graph $\Delta \left(T\right)$
Let
T be a nonempty decision table with
n conditional attributes
${f}_{1},\dots ,{f}_{n}$. We now describe an Algorithm
${\mathcal{A}}_{DAG}$ for the construction of a directed acyclic graph (DAG)
$\Delta \left(T\right)$ that will be used for the study of decision trees. Nodes of this graph are separable subtables of the table
T. During each iteration we process one node. We start with the graph that consists of one node
T, which is not processed and finish when all nodes of the graph are processed. This algorithm can be considered as a special case of the algorithm for DAG construction considered in Reference [
18].
Algorithm${\mathcal{A}}_{DAG}$ (construction of DAG $\Delta \left(T\right)$). 
Input: A nonempty decision table T with n conditional attributes ${f}_{1},\dots ,{f}_{n}$. 
Output: Directed acyclic graph $\Delta \left(T\right)$. 
Construct the graph that consists of one node T, which is not marked as processed. If all nodes of the graph are processed, then the algorithm halts and returns the resulting graph as $\Delta \left(T\right)$. Otherwise, choose a node (table) $\Theta $ that has not been processed yet. If $\Theta $ is degenerate, then mark the node $\Theta $ as processed and proceed to step 2. If $\Theta $ is not degenerate, then, for each ${f}_{i}\in E(\Theta )$, draw a bundle of edges from the node $\Theta $. Let $E(\Theta ,{f}_{i})=\{{a}_{1},\dots ,{a}_{k}\}$. Then, draw k edges from $\Theta $ and label these edges with systems of equations $\{{f}_{i}={a}_{1}\},\dots ,\{{f}_{i}={a}_{k}\}$. These edges enter nodes $\Theta \{{f}_{i}={a}_{1}\},\dots ,$ $\Theta \{{f}_{i}={a}_{k}\}$, respectively. If some of the nodes $\Theta \{{f}_{i}={a}_{1}\},\dots ,\Theta \{{f}_{i}={a}_{k}\}$ are not present in the graph, then add these nodes to the graph. Mark the node $\Theta $ as processed and return to step 2.

The following statement about time complexity of the Algorithm
${\mathcal{A}}_{DAG}$ follows immediately from Proposition 3.3 [
18].
Proposition 1. The time complexity of the Algorithm ${\mathcal{A}}_{DAG}$ is bounded from above by a polynomial on the size of the input table T and the number $\leftSEP\left(T\right)\right$ of different separable subtables of T.
Generally, the time complexity of the Algorithm
${\mathcal{A}}_{DAG}$ is exponential, depending on the size of the input decision tables. Note that, in Section 3.4 of the book [
18], classes of decision tables are described for each of which the number of separable subtables of decision tables from the class is bounded from above by a polynomial on the number of columns in the tables. For each of these classes, the time complexity of the Algorithm
${\mathcal{A}}_{DAG}$ is polynomial depending on the size of the input decision tables.
Note that similar results can be obtained for the space complexity of the considered algorithm.
5. Minimizing the Depth
In this section, we consider some results obtained in Reference [
12]. Let
T be a nonempty decision table with
n conditional attributes
${f}_{1},\dots ,{f}_{n}$. We can use the DAG
$\Delta \left(T\right)$ to compute values
${h}^{\left(1\right)}\left(T\right),\dots ,{h}^{\left(5\right)}\left(T\right)$. Let
$k\in \{1,\dots ,5\}$. To find the value
${h}^{\left(k\right)}\left(T\right)$, for each node
$\Theta $ of the DAG
$\Delta \left(T\right)$, we compute the value
${h}^{\left(k\right)}(\Theta )$. It will be convenient for us to consider not only subtables that are nodes of
$\Delta \left(T\right)$ but also empty subtable
$\Lambda $ of
T and subtables
${T}_{r}$ that contain only one row
r of
T and are not nodes of
$\Delta \left(T\right)$. We begin with these special subtables and terminal nodes of
$\Delta \left(T\right)$ (nodes without leaving edges) that are degenerate separable subtables of
T and stepbystep move to the table
T.
Let $\Theta $ be a terminal node of $\Delta \left(T\right)$ or $\Theta ={T}_{r}$ for some row r of T. Then, ${h}^{\left(k\right)}(\Theta )=0$: the decision tree that contains only one node labeled with the decision attached to all rows of $\Theta $ is a decision tree for $\Theta $. If $\Theta =\Lambda $, then ${h}^{\left(k\right)}(\Theta )=0$: the decision tree that contains only one node labeled with 0 will be considered as a decision tree for $\Lambda $.
Let $\Theta $ be a nonterminal node of $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of $\Theta $, we already know the value ${h}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Based on this information, we can find the minimum depth of a decision tree for $\Theta $, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled:
With an attribute from $F\left(T\right)$ (we denote by ${h}_{a}^{\left(k\right)}(\Theta )$ the minimum depth of such a decision tree).
With a hypothesis over T (we denote by ${h}_{h}^{\left(k\right)}(\Theta )$ the minimum depth of such a decision tree).
With a proper hypothesis over T (we denote by ${h}_{p}^{\left(k\right)}(\Theta )$ the minimum depth of such a decision tree).
Since $\Theta $ is nondegenerate, the set $E(\Theta )$ is nonempty. We now describe three procedures for computing the values ${h}_{a}^{\left(k\right)}(\Theta )$, ${h}_{h}^{\left(k\right)}(\Theta )$, and ${h}_{p}^{\left(k\right)}(\Theta )$, respectively.
Let us consider a decision tree
$\Gamma \left({f}_{i}\right)$ for
$\Theta $ in which the root is labeled with an attribute
${f}_{i}\in E(\Theta )$. For each
$\delta \in E(T,{f}_{i})$, there is an edge that leaves the root and enters a node
$v\left(\delta \right)$. This edge is labeled with the equation system
$\{{f}_{i}=\delta \}$. The node
$v\left(\delta \right)$ is the root of a decision tree of the type
k for
$\Theta \{{f}_{i}=\delta \}$ for which the depth is equal to
${h}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})$. It is clear that
Since
${h}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})={h}^{\left(k\right)}(\Lambda )=0$ for any
$\delta \in E(T,{f}_{i})\setminus E(\Theta ,{f}_{i})$,
Evidently, for any $\delta \in E(\Theta ,{f}_{i})$, the subtable $\Theta \{{f}_{i}=\delta \}$ is a child of $\Theta $ in the DAG $\Delta \left(T\right)$, i.e., we know the value ${h}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})$.
One can show that $h(\Gamma \left({f}_{i}\right))$ is the minimum depth of a decision tree for $\Theta $ in which the root is labeled with the attribute ${f}_{i}$ and which uses for the subtables corresponding to the children of the root decision trees of the type k.
We should not consider attributes
${f}_{i}\in F\left(T\right)\setminus E(\Theta )$ since, for each such attribute, there is
$\delta \in E(T,{f}_{i})$ with
$\Theta \{{f}_{i}=\delta \}=\Theta $, i.e., based on this attribute, we cannot construct an optimal decision tree for
$\Theta $. As a result, we have
Computation of ${h}_{a}^{\left(k\right)}(\Theta )$. Construct the set of attributes
$E(\Theta )$. For each attribute
${f}_{i}\in E(\Theta )$, compute the value
$h(\Gamma \left({f}_{i}\right))$ using (
1). Compute the value
${h}_{a}^{\left(k\right)}(\Theta )$ using (
2).
Remark 2. Let Θ be a nonterminal node of the DAG $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of Θ, we already know the value ${h}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Then, the procedure of computation of the value ${h}_{a}^{\left(k\right)}(\Theta )$ has polynomial time complexity depending on the size of decision table T.
A hypothesis $H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ over T is called admissible for $\Theta $ and an attribute ${f}_{i}\in F\left(T\right)=\{{f}_{1},\dots ,{f}_{n}\}$ if, for any $\sigma \in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, $\Theta \{{f}_{i}=\sigma \}\ne \Theta $. The hypothesis H is not admissible for $\Theta $ and an attribute ${f}_{i}\in F\left(T\right)$ if and only if $\leftE\right(\Theta ,{f}_{i}\left)\right=1$ and ${\delta}_{i}\notin E(\Theta ,{f}_{i})$. The hypothesis H is called admissible for $\Theta $ if it is admissible for $\Theta $ and any attribute ${f}_{i}\in F\left(T\right)$.
Let us consider a decision tree
$\Gamma \left(H\right)$ for
$\Theta $ in which the root is labeled with an admissible for
$\Theta $ hypothesis
$H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$. The set of answers for the query corresponding to the hypothesis
H is equal to
$A\left(H\right)=\{H,\{{f}_{1}={\sigma}_{1}\},\dots ,\{{f}_{n}={\sigma}_{n}\}:{\sigma}_{1}\in E(T,{f}_{1})\setminus \left\{{\delta}_{1}\right\},\cdots ,{\sigma}_{n}\in E(T,{f}_{n})\setminus \left\{{\delta}_{n}\right\}\}$. For each
$S\in A\left(H\right)$, there is an edge that leaves the root of
$\Gamma \left(H\right)$ and enters a node
$v\left(S\right)$. This edge is labeled with the equation system
S. The node
$v\left(S\right)$ is the root of a decision tree of the type
k for
$\Theta S$, which depth is equal to
${h}^{\left(k\right)}(\Theta S)$. It is clear that
We have
$\Theta H=\Lambda $ or
$\Theta H={T}_{r}$ for some row
r of
T. Therefore,
${h}^{\left(k\right)}(\Theta H)=0$. Since
H is admissible for
$\Theta $,
$E(\Theta ,{f}_{i})\setminus \left\{{\delta}_{i}\right\}=\varnothing $ for any attribute
$f\in F\left(T\right)\setminus E(\Theta )$. It is clear that
$\Theta \{{f}_{i}=\sigma \}=\Lambda $ and
${h}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})=0$ for any attribute
${f}_{i}\in E(\Theta )$ and any
$\sigma \in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$ such that
$\sigma \notin E(\Theta ,{f}_{i})$. Therefore,
It is clear that, for any
${f}_{i}\in E(\Theta )$ and any
$\sigma \in E(\Theta ,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, the subtable
$\Theta \{{f}_{i}=\sigma \}$ is a child of
$\Theta $ in the DAG
$\Delta \left(T\right)$, i.e., we know the value
${h}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})$.
One can show that $h(\Gamma (H\left)\right)$ is the minimum depth of a decision tree for $\Theta $ in which the root is labeled with the hypothesis H and which uses for the subtables corresponding to the children of the root decision trees of the type k.
We should not consider hypotheses that are not admissible for $\Theta $ since, for each such hypothesis H for corresponding query, there is an answer $S\in A\left(H\right)$ with $\Theta S=\Theta $, i.e., based on this hypothesis, we cannot construct an optimal decision tree for $\Theta $.
Computation of${h}_{h}^{\left(k\right)}(\Theta )$. First, we construct a hypothesis:
for
$\Theta $. Let
${f}_{i}\in F\left(T\right)\setminus E(\Theta )$. Then,
${\delta}_{i}$ is equal to the only number in the set
$E(\Theta ,{f}_{i})$. Let
${f}_{i}\in E(\Theta )$. Then,
${\delta}_{i}$ is the minimum number from
$E(\Theta ,{f}_{i})$ for which
${h}^{\left(k\right)}(\Theta \{{f}_{i}={\delta}_{i}\})=max\{{h}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \}):\sigma \in E(\Theta ,{f}_{i})\}$. It is clear that
${H}_{\Theta}$ is admissible for
$\Theta $. Compute the value
$h(\Gamma \left({H}_{\Theta}\right))$ using (
3). Simple analysis of (
3) shows that
$h(\Gamma \left({H}_{\Theta}\right))={h}_{h}^{\left(k\right)}(\Theta )$.
Remark 3. Let Θ be a nonterminal node of the DAG $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of Θ, we already know the value ${h}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Then, the procedure of computation of the value ${h}_{h}^{\left(k\right)}(\Theta )$ has polynomial time complexity depending on the size of decision table T.
Computation of${h}_{p}^{\left(k\right)}(\Theta )$. For each row
$r=({\delta}_{1},\dots ,{\delta}_{n})$ of the decision table
T, we check if the corresponding proper hypothesis
${H}_{r}$$=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ is admissible for
$\Theta $. For each admissible for
$\Theta $ proper hypothesis
${H}_{r}$$=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$, we compute the value
$h(\Gamma \left({H}_{r}\right))$ using (
3). One can show that the minimum among the obtained numbers is equal to
${h}_{p}^{\left(k\right)}(\Theta )$.
Remark 4. Let Θ be a nonterminal node of the DAG $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of Θ, we already know the value ${h}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Then, the procedure of computation of the value ${h}_{p}^{\left(k\right)}(\Theta )$ has polynomial time complexity depending on the size of decision table T.
We describe an Algorithm ${\mathcal{A}}_{h}$ that, for a given nonempty decision table T and given $k\in \{1,\dots ,5\}$, calculates the value ${h}^{\left(k\right)}\left(T\right)$, which is the minimum depth of a decision tree of the type k for the table T. During the work of this algorithm, we find for each node $\Theta $ of the DAG $\Delta \left(T\right)$ the value ${h}^{\left(k\right)}(\Theta )$.
Algorithm${\mathcal{A}}_{h}$ (computation of ${h}^{\left(k\right)}\left(T\right)$). 
Input: A nonempty decision table T, the directed acyclic graph $\Delta \left(T\right)$, and number $k\in \{1,\dots ,5\}$. 
Output: The value ${h}^{\left(k\right)}\left(T\right)$. 
If a number is attached to each node of the DAG $\Delta \left(T\right)$, then return the number attached to the node T as ${h}^{\left(k\right)}\left(T\right)$ and halt the algorithm. Otherwise, choose a node $\Theta $ of the graph $\Delta \left(T\right)$ without attached number, which is either a terminal node of $\Delta \left(T\right)$ or a nonterminal node of $\Delta \left(T\right)$ for which all children have attached numbers. If $\Theta $ is a terminal node, then attach to it the number ${h}^{\left(k\right)}(\Theta )=0$ and proceed to step 1. If $\Theta $ is not a terminal node, then, depending on the value k, do the following: In the case $k=1$, compute the value ${h}_{a}^{\left(1\right)}(\Theta )$ and attach to $\Theta $ the value ${h}^{\left(1\right)}(\Theta )={h}_{a}^{\left(1\right)}(\Theta )$. In the case $k=2$, compute the value ${h}_{h}^{\left(2\right)}(\Theta )$ and attach to $\Theta $ the value ${h}^{\left(2\right)}(\Theta )={h}_{h}^{\left(2\right)}(\Theta )$. In the case $k=3$, compute the values ${h}_{a}^{\left(3\right)}(\Theta )$ and ${h}_{h}^{\left(3\right)}(\Theta )$, and attach to $\Theta $ the value ${h}^{\left(3\right)}(\Theta )=min\{{h}_{a}^{\left(3\right)}(\Theta ),{h}_{h}^{\left(3\right)}(\Theta )\}$. In the case $k=4$, compute the value ${h}_{p}^{\left(4\right)}(\Theta )$ and attach to $\Theta $ the value ${h}^{\left(4\right)}(\Theta )={h}_{p}^{\left(4\right)}(\Theta )$. In the case $k=5$, compute the values ${h}_{a}^{\left(5\right)}(\Theta )$ and ${h}_{p}^{\left(5\right)}(\Theta )$, and attach to $\Theta $ the value ${h}^{\left(5\right)}(\Theta )=min\{{h}_{a}^{\left(5\right)}(\Theta ),{h}_{p}^{\left(5\right)}(\Theta )\}$.
Proceed to step 1.

Using Remarks 2–4, one can prove the following statement.
Proposition 5. The time complexity of the Algorithm ${\mathcal{A}}_{h}$ is bounded from above by a polynomial on the size of the input table T and the number $\leftSEP\left(T\right)\right$ of different separable subtables of T.
A similar bound can be obtained for the space complexity of the considered algorithm.
6. Minimizing the Number of Realizable Nodes
In this section, we consider some results obtained in Reference [
13]. Let
T be a nonempty decision table with
n conditional attributes
${f}_{1},\dots ,{f}_{n}$. We can use the DAG
$\Delta \left(T\right)$ to compute values
${L}^{\left(1\right)}\left(T\right),\dots ,{L}^{\left(5\right)}\left(T\right)$. Let
$k\in \{1,\dots ,5\}$. To find the value
${L}^{\left(k\right)}\left(T\right)$, we compute the value
${L}^{\left(k\right)}(\Theta )$ for each node
$\Theta $ of the DAG
$\Delta \left(T\right)$. We will consider not only subtables that are nodes of
$\Delta \left(T\right)$ but also empty subtable
$\Lambda $ of
T and subtables
${T}_{r}$ that contain only one row
r of
T and are not nodes of
$\Delta \left(T\right)$. We begin with these special subtables and terminal nodes of
$\Delta \left(T\right)$ (nodes without leaving edges) that are degenerate separable subtables of
T and stepbystep move to the table
T.
Let $\Theta $ be a terminal node of $\Delta \left(T\right)$ or $\Theta ={T}_{r}$ for some row r of T. Then, ${L}^{\left(k\right)}(\Theta )=1$: the decision tree that contains only one node labeled with the decision attached to all rows of $\Theta $ is a decision tree for $\Theta $. The only node of this tree is realizable relative to $\Theta $. If $\Theta =\Lambda $, then ${L}^{\left(k\right)}(\Theta )=0$: the decision tree that contains only one node labeled with 0 will be considered as a decision tree for $\Lambda $. The only node of this tree is not realizable relative to $\Lambda $.
Let $\Theta $ be a nonterminal node of $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of $\Theta $, we already know the value ${L}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Based on this information, we can find the minimum number of realizable relative to $\Theta $ nodes in a decision tree for $\Theta $, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled
With an attribute from $F\left(T\right)$ (we denote by ${L}_{a}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ nodes in such a decision tree).
With a hypothesis over T (we denote by ${L}_{h}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ nodes in such a decision tree).
With a proper hypothesis over T (we denote by ${L}_{p}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ nodes in such a decision tree).
We now describe three procedures for computing the values ${L}_{a}^{\left(k\right)}(\Theta )$, ${L}_{h}^{\left(k\right)}(\Theta )$, and ${L}_{p}^{\left(k\right)}(\Theta )$, respectively. Since $\Theta $ is nondegenerate, the set $E(\Theta )$ is nonempty.
Let us consider a decision tree
$\Gamma \left({f}_{i}\right)$ for
$\Theta $ in which the root is labeled with an attribute
${f}_{i}\in E(\Theta )$. For each
$\delta \in E(T,{f}_{i})$, there is an edge that leaves the root and enters a node
$v\left(\delta \right)$. This edge is labeled with the equation system
$\{{f}_{i}=\delta \}$. The node
$v\left(\delta \right)$ is the root of a decision tree of the type
k for
$\Theta \{{f}_{i}=\delta \}$ for which the number of realizable relative to
$\Theta \{{f}_{i}=\delta \}$ nodes is equal to
${L}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})$. It is clear that
$L(\Theta ,\Gamma \left({f}_{i}\right))=1+{\sum}_{\delta \in E(T,{f}_{i})}{L}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})$. Since
${L}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})={L}^{\left(k\right)}(\Lambda )=0$ for any
$\delta \in E(T,{f}_{i})\setminus E(\Theta ,{f}_{i})$,
Evidently, for any $\delta \in E(\Theta ,{f}_{i})$, the subtable $\Theta \{{f}_{i}=\delta \}$ is a child of $\Theta $ in the DAG $\Delta \left(T\right)$, i.e., we know the value ${L}^{\left(k\right)}(\Theta \{{f}_{i}=\delta \})$. One can show that $L(\Theta ,\Gamma \left({f}_{i}\right))$ is the minimum number of realizable relative to $\Theta $ nodes in a decision tree for $\Theta $, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled with the attribute ${f}_{i}$.
We should not consider attributes
${f}_{i}\in F\left(T\right)\setminus E(\Theta )$ since, for each such attribute, there is
$\delta \in E(T,{f}_{i})$ with
$\Theta \{{f}_{i}=\delta \}=\Theta $, i.e., based on this attribute, we cannot construct an optimal decision tree for
$\Theta $. As a result, we have
Computation of${L}_{a}^{\left(k\right)}(\Theta )$. Construct the set of attributes
$E(\Theta )$. For each attribute
${f}_{i}\in E(\Theta )$, compute the value
$L(\Theta ,\Gamma \left({f}_{i}\right))$ using (
4). Compute the value
${L}_{a}^{\left(k\right)}(\Theta )$ using (
5).
Let us consider a decision tree $\Gamma \left(H\right)$ for $\Theta $ in which the root is labeled with an admissible for $\Theta $ hypothesis $H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$. For each $S\in A\left(H\right)$, there is an edge that leaves the root of $\Gamma \left(H\right)$ and enters a node $v\left(S\right)$. This edge is labeled with the equation system S. The node $v\left(S\right)$ is the root of a decision tree of the type k for $\Theta S$, for which the number of realizable relative to $\Theta S$ nodes is equal to ${L}^{\left(k\right)}(\Theta S)$. It is clear that $L(\Theta ,\Gamma \left(H\right))=1+{\sum}_{S\in A\left(H\right)}{L}^{\left(k\right)}(\Theta S)$.
Denote
$r=({\delta}_{1},\dots ,{\delta}_{n})$. It is easy to show that
$\Theta H=\Lambda $ if
r is not a row of
$\Theta $ and
$\Theta H={T}_{r}$ if
r is a row of
$\Theta $. Therefore,
Since
H is admissible for
$\Theta $,
$E(\Theta ,{f}_{i})\setminus \left\{{\delta}_{i}\right\}=\varnothing $ for any attribute
${f}_{i}\in F\left(T\right)\setminus E(\Theta )$. It is clear that
$\Theta \{{f}_{i}=\sigma \}=\Lambda $ and
${L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})=0$ for any attribute
${f}_{i}\in E(\Theta )$ and any
$\sigma \in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$ such that
$\sigma \notin E(\Theta ,{f}_{i})$. Therefore,
where
Evidently, for any ${f}_{i}\in E(\Theta )$ and any $\sigma \in E(\Theta ,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, the subtable $\Theta \{{f}_{i}=\sigma \}$ is a child of $\Theta $ in the DAG $\Delta \left(T\right)$, i.e., we know the value ${L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})$. It is easy to show that $L(\Theta ,\Gamma (H\left)\right)$ is the minimum number of realizable relative to $\Theta $ nodes in a decision tree for $\Theta $, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled with the admissible for $\Theta $ hypothesis H.
We should not consider hypotheses that are not admissible for
$\Theta $ since, for each such hypothesis
H for corresponding query, there is an answer
$S\in A\left(H\right)$ with
$\Theta S=\Theta $, i.e., based on this hypothesis, we cannot construct an optimal decision tree for
$\Theta $. As a result, we have
where
$Adm(\Theta )$ is the set of admissible hypotheses for
$\Theta $.
For each
${f}_{i}\in \{{f}_{1},\dots ,{f}_{n}\}$, denote
${a}_{i}(\Theta )=max\{{L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \}):\sigma \in E(\Theta ,{f}_{i})\}$ and
$C(\Theta ,{f}_{i})=\{\sigma \in E(\Theta ,{f}_{i}):{L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})={a}_{i}(\Theta )\}$. Set
$C(\Theta )=C(\Theta ,{f}_{1})\times \cdots \times C(\Theta ,{f}_{n})$. It is clear that, for each
$\overline{\delta}=({\delta}_{1},\dots ,{\delta}_{n})\in C(\Theta )$, the hypothesis
${H}_{\overline{\delta}}=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ is admissible for
$\Theta $. Simple analysis of (
8) shows that the set
$\{{H}_{\overline{\delta}}:\overline{\delta}\in C(\Theta )\}$ coincides with the set of admissible for
$\Theta $ hypotheses
H that minimize the value
$K(\Theta ,H)$. Denote
${K}_{min}=K(\Theta ,{H}_{\overline{\delta}})$, where
$\overline{\delta}\in C(\Theta )$.
Let there be a tuple
$\overline{\delta}\in C(\Theta )$, which is not a row of
$\Theta $. Then,
${L}^{\left(k\right)}(\Theta {H}_{\overline{\delta}})=0$ and
${L}_{h}^{\left(k\right)}(\Theta )={K}_{min}$. Let all tuples from
$C(\Theta )$ be rows of
$\Theta $. We now show that
${L}_{h}^{\left(k\right)}(\Theta )=1+{K}_{min}$. For any
$\overline{\delta}\in C(\Theta )$, we have
$L(\Theta ,\Gamma \left({H}_{\overline{\delta}}\right))=1+{K}_{min}$. Therefore,
${L}_{h}^{\left(k\right)}(\Theta )\le 1+{K}_{min}$. Let us assume that
${L}_{h}^{\left(k\right)}(\Theta )<1+{K}_{min}$. Then, by (
9), there exists an admissible for
$\Theta $ hypothesis
$H=\{{f}_{1}={\sigma}_{1},\dots ,{f}_{n}={\sigma}_{n}\}$ for which
$({\sigma}_{1},\dots ,{\sigma}_{n})\notin C(\Theta )$ and
$L(\Theta ,\Gamma \left(H\right))<1+{K}_{min}$, but this is impossible since, according to (
7),
$L(\Theta ,\Gamma \left(H\right))\ge K(\Theta ,H)\ge {K}_{min}+1$.
As a result, we have ${L}_{h}^{\left(k\right)}(\Theta )={K}_{min}$ if not all tuples from $C(\Theta )$ are rows of $\Theta $, and ${L}_{h}^{\left(k\right)}(\Theta )=1+{K}_{min}$ if all tuples from $C(\Theta )$ are rows of $\Theta $.
Computation of${L}_{h}^{\left(k\right)}(\Theta )$. For each
${f}_{i}\in \{{f}_{1},\dots ,{f}_{n}\}$, we compute the value:
${a}_{i}(\Theta )=max\{{L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \}):\sigma \in E(\Theta ,{f}_{i})\}$ and construct the set
$C(\Theta ,{f}_{i})=\{\sigma \in E(\Theta ,{f}_{i}):{L}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})={a}_{i}(\Theta )\}$. For a tuple
$\overline{\delta}\in C(\Theta )=C(\Theta ,{f}_{1})\times \cdots \times C(\Theta ,{f}_{n})$, using (
8), we compute the value
${K}_{min}=K(\Theta ,{H}_{\overline{\delta}})$. Then, we count the number
N of rows from
$\Theta $ that belong to the set
$C(\Theta )$ and compute the cardinality
$\leftC\right(\Theta \left)\right$ of the set
$C(\Theta )$ that is equal to
$C(\Theta ,{f}_{1})\xb7\dots \xb7C(\Theta ,{f}_{n})$. As a result, we have
${L}_{h}^{\left(k\right)}(\Theta )={K}_{min}$ if
$N<\leftC\right(\Theta \left)\right$ and
${L}_{h}^{\left(k\right)}(\Theta )=1+{K}_{min}$ if
$N=\leftC\right(\Theta \left)\right$.
Computation of${L}_{p}^{\left(k\right)}(\Theta )$. For each row
$r=({\delta}_{1},\dots ,{\delta}_{n})$ of the decision table
T, we check if the corresponding proper hypothesis
${H}_{r}$$=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ is admissible for
$\Theta $. For each admissible for
$\Theta $ proper hypothesis
${H}_{r}$, we compute the value
$L(\Theta ,\Gamma \left({H}_{r}\right))$ using (
6), (
7), and (
8). One can show that the minimum among the obtained numbers is equal to
${L}_{p}^{\left(k\right)}(\Theta )$.
We now consider an algorithm ${\mathcal{A}}_{L}$ that, for a given nonempty decision table T and number $k\in \{1,\dots ,5\}$, calculates the value ${L}^{\left(k\right)}\left(T\right)$, which is the minimum number of nodes realizable relative to T in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node $\Theta $ of the DAG $\Delta \left(T\right)$ the value ${L}^{\left(k\right)}(\Theta )$.
The description of the algorithm ${\mathcal{A}}_{L}$ is similar to the description of the Algorithm ${\mathcal{A}}_{h}$. Instead of ${h}^{\left(k\right)}$, we should use ${L}^{\left(k\right)}$. For each $b\in \{a,h,p\}$, instead of ${h}_{b}^{\left(k\right)}$, we should use ${L}_{b}^{\left(k\right)}$. In particular, for each terminal node $\Theta $, ${L}^{\left(k\right)}(\Theta )=1$.
One can show that the procedures of computation of the values ${L}_{a}^{\left(k\right)}(\Theta )$, ${L}_{h}^{\left(k\right)}(\Theta )$, and ${L}_{p}^{\left(k\right)}(\Theta )$ have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.
Proposition 6. The time complexity of the algorithm ${\mathcal{A}}_{L}$ is bounded from above by a polynomial on the size of the input table T and the number $\leftSEP\left(T\right)\right$ of different separable subtables of T.
A similar bound can be obtained for the space complexity of the considered algorithm.
7. Minimizing the Number of Realizable Terminal Nodes
The procedure considered in this section is similar to the procedure of the minimization of the number of realizable nodes. The main difference is that, in decision trees with the minimum number of realizable terminal nodes, it is possible to meet constant attributes and hypotheses that are not admissible. Fortunately, for any decision table and any type of decision trees, there is a decision tree of this type with the minimum number of realizable terminal nodes for the considered table that do not use such attributes and hypotheses. We will omit many details and describe main steps only.
Let T be a nonempty decision table with n conditional attributes ${f}_{1},\dots ,{f}_{n}$ and $k\in \{1,\dots ,5\}$. To find the value ${L}_{t}^{\left(k\right)}\left(T\right)$, we compute the value ${L}_{t}^{\left(k\right)}(\Theta )$ for each node $\Theta $ of the DAG $\Delta \left(T\right)$. We begin with terminal nodes of $\Delta \left(T\right)$ that are degenerate separable subtables of T and stepbystep move to the table T.
Let $\Theta $ be a terminal node of $\Delta \left(T\right)$. Then, ${L}_{t}^{\left(k\right)}(\Theta )=1$: the decision tree that contains only one node labeled with the decision attached to all rows of $\Theta $ is a decision tree for $\Theta $. The only node of this tree is a terminal node realizable relative to $\Theta $.
Let $\Theta $ be a nonterminal node of $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of $\Theta $, we already know the value ${L}_{t}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Based on this information, we can find the minimum number of realizable relative to $\Theta $ terminal nodes in a decision tree for $\Theta $, which uses for the subtables corresponding to children of the root decision trees of the type k and in which the root is labeled
With an attribute from $F\left(T\right)$ (we denote by ${L}_{t,a}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ terminal nodes in such a decision tree).
With a hypothesis over T (we denote by ${L}_{t,h}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ terminal nodes in such a decision tree).
With a proper hypothesis over T (we denote by ${L}_{t,p}^{\left(k\right)}(\Theta )$ the minimum number of realizable relative to $\Theta $ terminal nodes in such a decision tree).
We now describe three procedures for computing the values ${L}_{t,a}^{\left(k\right)}(\Theta )$, ${L}_{t,h}^{\left(k\right)}(\Theta )$, and ${L}_{t,p}^{\left(k\right)}(\Theta )$, respectively. Since $\Theta $ is nondegenerate, the set $E(\Theta )$ is nonempty.
Computation of${L}_{t,a}^{\left(k\right)}(\Theta )$. Construct the set of attributes
$E(\Theta )$. For each attribute
${f}_{i}\in E(\Theta )$, compute the value:
Computation of${L}_{t,h}^{\left(k\right)}(\Theta )$. For each
${f}_{i}\in \{{f}_{1},\dots ,{f}_{n}\}$, we compute the value:
${a}_{i}(\Theta )=max\{{L}_{t}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \}):\sigma \in E(\Theta ,{f}_{i})\}$ and construct the set
$C(\Theta ,{f}_{i})=\{\sigma \in E(\Theta ,{f}_{i}):{L}_{t}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \})={a}_{i}(\Theta )\}$. For a tuple
$({\delta}_{1},\dots ,{\delta}_{n})\in C(\Theta )=C(\Theta ,{f}_{1})\times \cdots \times C(\Theta ,{f}_{n})$, we compute the value:
Then, we count the number N of rows from $\Theta $ that belong to the set $C(\Theta )$ and compute the cardinality $\leftC\right(\Theta \left)\right$ of the set $C(\Theta )$ that is equal to $C(\Theta ,{f}_{1})\xb7\dots \xb7C(\Theta ,{f}_{n})$. As a result, we have ${L}_{t,h}^{\left(k\right)}(\Theta )={K}_{min}$ if $N<\leftC\right(\Theta \left)\right$ and ${L}_{t,h}^{\left(k\right)}(\Theta )=1+{K}_{min}$ if $N=\leftC\right(\Theta \left)\right$.
Computation of${L}_{t,p}^{\left(k\right)}(\Theta )$. For each row
$r=({\delta}_{1},\dots ,{\delta}_{n})$ of the decision table
T, we check if the corresponding proper hypothesis
${H}_{r}$$=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ is admissible for
$\Theta $. For each admissible for
$\Theta $ proper hypothesis
${H}_{r}$, we compute the value:
One can show that the minimum among the obtained numbers is equal to
${L}_{t,p}^{\left(k\right)}(\Theta )$.
We now consider an algorithm ${\mathcal{A}}_{{L}_{t}}$ that, for a given nonempty decision table T and number $k\in \{1,\dots ,5\}$, calculates the value ${L}_{t}^{\left(k\right)}\left(T\right)$, which is the minimum number of terminal nodes realizable relative to T in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node $\Theta $ of the DAG $\Delta \left(T\right)$ the value ${L}_{t}^{\left(k\right)}(\Theta )$.
The description of the algorithm ${\mathcal{A}}_{{L}_{t}}$ is similar to the description of the Algorithm ${\mathcal{A}}_{h}$. Instead of ${h}^{\left(k\right)}$, we should use ${L}_{t}^{\left(k\right)}$. For each $b\in \{a,h,p\}$, instead of ${h}_{b}^{\left(k\right)}$, we should use ${L}_{t,b}^{\left(k\right)}$. In particular, for each terminal node $\Theta $, ${L}_{t}^{\left(k\right)}(\Theta )=1$.
One can show that the procedures of computation of the values ${L}_{t,a}^{\left(k\right)}(\Theta )$, ${L}_{t,h}^{\left(k\right)}(\Theta )$, and ${L}_{t,p}^{\left(k\right)}(\Theta )$ have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.
Proposition 7. The time complexity of the algorithm ${\mathcal{A}}_{{L}_{t}}$ is bounded from above by a polynomial on the size of the input table T and the number $\leftSEP\left(T\right)\right$ of different separable subtables of T.
A similar bound can be obtained for the space complexity of the considered algorithm.
8. Minimizing the Number of Working Nodes
The procedure considered in this section is similar to the procedure of the minimization of the depth. We will omit many details and describe main steps only.
Let T be a nonempty decision table with n conditional attributes ${f}_{1},\dots ,{f}_{n}$ and $k\in \{1,\dots ,5\}$. To find the value ${L}_{w}^{\left(k\right)}\left(T\right)$, we compute the value ${L}_{w}^{\left(k\right)}(\Theta )$ for each node $\Theta $ of the DAG $\Delta \left(T\right)$. We begin with terminal nodes of $\Delta \left(T\right)$ that are degenerate separable subtables of T and stepbystep move to the table T.
Let $\Theta $ be a terminal node of $\Delta \left(T\right)$. Then, ${L}_{t}^{\left(k\right)}(\Theta )=0$: the decision tree that contains only one node labeled with the decision attached to all rows of $\Theta $ is a decision tree for $\Theta $. This tree has no working nodes.
Let $\Theta $ be a nonterminal node of $\Delta \left(T\right)$ such that, for each child ${\Theta}^{\prime}$ of $\Theta $, we already know the value ${L}_{w}^{\left(k\right)}\left({\Theta}^{\prime}\right)$. Based on this information, we can find the minimum number of working nodes in a decision tree for $\Theta $, which uses for the subtables corresponding to children of the root decision trees of the type k and in which the root is labeled
With an attribute from $F\left(T\right)$ (we denote by ${L}_{w,a}^{\left(k\right)}(\Theta )$ the minimum number of working nodes in such a decision tree).
With a hypothesis over T (we denote by ${L}_{w,h}^{\left(k\right)}(\Theta )$ the minimum number of working nodes in such a decision tree).
With a proper hypothesis over T (we denote by ${L}_{w,p}^{\left(k\right)}(\Theta )$ the minimum number of working nodes in such a decision tree).
We now describe three procedures for computing the values ${L}_{w,a}^{\left(k\right)}(\Theta )$, ${L}_{w,h}^{\left(k\right)}(\Theta )$, and ${L}_{w,p}^{\left(k\right)}(\Theta )$, respectively. Since $\Theta $ is nondegenerate, the set $E(\Theta )$ is nonempty.
Computation of${L}_{w,a}^{\left(k\right)}(\Theta )$. Construct the set of attributes
$E(\Theta )$. For each attribute
${f}_{i}\in E(\Theta )$, compute the value:
Computation of${L}_{w,h}^{\left(k\right)}(\Theta )$. First, we construct a hypothesis:
for
$\Theta $. Let
${f}_{i}\in F\left(T\right)\setminus E(\Theta )$. Then,
${\delta}_{i}$ is equal to the only number in the set
$E(\Theta ,{f}_{i})$. Let
${f}_{i}\in E(\Theta )$. Then,
${\delta}_{i}$ is the minimum number from
$E(\Theta ,{f}_{i})$ for which
${L}_{w}^{\left(k\right)}(\Theta \{{f}_{i}={\delta}_{i}\})=max\{{L}_{w}^{\left(k\right)}(\Theta \{{f}_{i}=\sigma \}):\sigma \in E(\Theta ,{f}_{i})\}$. Then
Computation of${L}_{w,p}^{\left(k\right)}(\Theta )$. For each row
$r=({\delta}_{1},\dots ,{\delta}_{n})$ of the decision table
T, we check if the corresponding proper hypothesis
${H}_{r}$$=\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$ is admissible for
$\Theta $. For each admissible for
$\Theta $ proper hypothesis
${H}_{r}$, we compute the value:
One can show that the minimum among the obtained numbers is equal to
${L}_{w,p}^{\left(k\right)}(\Theta )$.
We now consider an algorithm ${\mathcal{A}}_{{L}_{w}}$ that, for a given nonempty decision table T and $k\in \{1,\dots ,5\}$, calculates the value ${L}_{w}^{\left(k\right)}\left(T\right)$, which is the minimum number of working nodes in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node $\Theta $ of the DAG $\Delta \left(T\right)$ the value ${L}_{w}^{\left(k\right)}(\Theta )$.
The description of the algorithm ${\mathcal{A}}_{{L}_{w}}$ is similar to the description of the Algorithm ${\mathcal{A}}_{h}$. Instead of ${h}^{\left(k\right)}$, we should use ${L}_{w}^{\left(k\right)}$. For each $b\in \{a,h,p\}$, instead of ${h}_{b}^{\left(k\right)}$, we should use ${L}_{w,b}^{\left(k\right)}$. In particular, for each terminal node $\Theta $, ${L}_{w}^{\left(k\right)}(\Theta )=0$.
One can show that the procedures of computation of the values ${L}_{w,a}^{\left(k\right)}(\Theta )$, ${L}_{w,h}^{\left(k\right)}(\Theta )$, and ${L}_{w,p}^{\left(k\right)}(\Theta )$ have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.
Proposition 8. The time complexity of the algorithm ${\mathcal{A}}_{{L}_{w}}$ is bounded from above by a polynomial on the size of the input table T and the number $\leftSEP\left(T\right)\right$ of different separable subtables of T.
A similar bound can be obtained for the space complexity of the considered algorithm.
9. On Number of Realizable Terminal Nodes
Based on the results of experiments, we formulated the following hypothesis: ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)={L}_{t}^{\left(5\right)}\left(T\right)$ for any decision table T. In this section, we prove it. First, we consider a simple lemma.
Lemma 9. Let $\phantom{\rule{4pt}{0ex}}T$ be a decision table and ${T}^{\prime}$ be a subtable of the table T. Then, ${L}_{t}^{\left(3\right)}\left({T}^{\prime}\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$.
Proof. It is easy to prove the considered inequality if ${T}^{\prime}$ is degenerate. Let ${T}^{\prime}$ be nondegenerate and $\Gamma $ be a decision tree of the type 3 for T with the minimum number of realizable relative to T terminal nodes. Then, the root r of $\Gamma $ is a working node. It is clear that the table ${T}^{\prime}S(\Gamma ,r)$ is nondegenerate. For each working node v of $\Gamma $ such that the table ${T}^{\prime}S(\Gamma ,v)$ is degenerate and the table ${T}^{\prime}S(\Gamma ,{v}^{\prime})$ is nondegenerate, where ${v}^{\prime}$ is the parent of v, we do the following. We remove all nodes and edges of the subtree of $\Gamma $ with the root v with the exception of the node v. If ${T}^{\prime}S(\Gamma ,v)=\Lambda $, then we label the node v with the number 0. If the subtable ${T}^{\prime}S(\Gamma ,v)$ is nonempty, then we label the node v with the decision attached to each row of this subtable. We denote by ${\Gamma}^{\prime}$ the obtained decision tree. One can show that ${\Gamma}^{\prime}$ is a decision tree of the type 3 for the table ${T}^{\prime}$ and ${L}_{t}^{\left(3\right)}({T}^{\prime},{\Gamma}^{\prime})\le {L}_{t}^{\left(3\right)}(T,\Gamma )$. Therefore, ${L}_{t}^{\left(3\right)}\left({T}^{\prime}\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$. □
Proposition 10. For any decision table T, the following equalities hold: Proof. It is clear that ${L}_{t}^{\left(3\right)}\left(T\right)\le {L}_{t}^{\left(5\right)}\left(T\right)\le {L}_{t}^{\left(1\right)}\left(T\right)$ for any decision table T. To prove the considered statement, it is enough to show that ${L}_{t}^{\left(1\right)}\left(T\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$ for any decision table T. We will prove this inequality by induction on the number of attributes in the set $E\left(T\right)$.
We now show that ${L}_{t}^{\left(1\right)}\left(T\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$ for any decision table T with $\leftE\right(T\left)\right=0$. If $\leftE\right(T\left)\right=0$, then either the table T is empty or the table T contains one row. Let $T$ be empty. In this case, the decision tree that contains only one node labeled with 0 is considered as a decision tree for T. The only node of this tree is not realizable relative to T. Therefore, ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)=0$. Let T contain one row. In this case, the decision tree that contains only one node labeled with the decision attached to the row of T is a decision tree for T. The only node of this tree is realizable relative to T. Therefore, ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)=1$.
Let $n\ge 1$ and, for any decision table T with $\leftE\right(T\left)\right\le n1$, the inequality ${L}_{t}^{\left(1\right)}\left(T\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$ hold. Let T be a decision table with $\leftE\right(T\left)\right=n$ and T have $m\ge n$ columns labeled with the attributes ${f}_{1},\dots ,{f}_{m}$. Let, for the definiteness, $E\left(T\right)=\{{f}_{1},\dots ,{f}_{n}\}$. If T is a degenerate table, then, as it is easy to show, ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)=1$. Let T be nondegenerate.
We denote by $\Gamma $ a decision tree of the type 3 for the table T for which ${L}_{t}(T,\Gamma )={L}_{t}^{\left(3\right)}\left(T\right)$ and $\Gamma $ has the minimum number of nodes among such decision trees. One can show that the root of $\Gamma $ is either labeled with an attribute from $E\left(T\right)$ or with a hypothesis over T that is admissible for T. We now prove that the tree $\Gamma $ can be transformed into a decision tree ${\Gamma}^{*}$ of the type 1 for the table T such that ${L}_{t}(T,{\Gamma}^{*})\le {L}_{t}^{\left(3\right)}\left(T\right)$.
Let the root of $\Gamma $ be labeled with an attribute ${f}_{i}\in E\left(T\right)$. Then, for each $\sigma \in E(T,{f}_{i})$, the root of $\Gamma $ has a child ${v}_{\sigma}$ such that $TS(\Gamma ,{v}_{\sigma})=T\{{f}_{i}=\sigma \}$ and the root of $\Gamma $ has no other children. Since ${f}_{i}\in E\left(T\right)$, $\leftE\right(T\{{f}_{i}=\sigma \}\le n1$. Using the inductive hypothesis, we obtain that there is a decision tree ${\Gamma}_{\sigma}$ of the type 1 for the table $T\{{f}_{i}=\sigma \}$ such that ${L}_{t}(T\{{f}_{i}=\sigma \},{\Gamma}_{\sigma})\le {L}_{t}^{\left(3\right)}\left(T\{{f}_{i}=\sigma \}\right)$. For each child ${v}_{\sigma}$ of the root of $\Gamma $, we replace the subtree of $\Gamma $ with the root ${v}_{\sigma}$ with the tree ${\Gamma}_{\sigma}$. As a result, we obtain a decision tree ${\Gamma}^{*}$ of the type 1 for the table T such that ${L}_{t}(T,{\Gamma}^{*})\le {L}_{t}(T,\Gamma )={L}_{t}^{\left(3\right)}\left(T\right)$.
Let the root of
$\Gamma $ be labeled with a hypothesis
$H=\{{f}_{1}={\delta}_{1},\dots ,{f}_{m}={\delta}_{m}\}$ over
T that is admissible for
T; see
Figure 1, which depicts a prefix of the tree
$\Gamma $. The root of
$\Gamma $ has a child
${v}_{0}$ such that
$TS(\Gamma ,{v}_{0})=TH=T\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}$. For each
${f}_{i}\in E\left(T\right)$ and each
${\sigma}_{i}\in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, the root of
$\Gamma $ has a child
${v}_{i,{\sigma}_{i}}$ such that
$TS(\Gamma ,{v}_{i,{\sigma}_{i}})=T\{{f}_{i}={\sigma}_{i}\}$. The root of
$\Gamma $ has no other children.
We transform the tree
$\Gamma $ into a decision tree
${\Gamma}^{*}$ of the type 1 for the table
T; see
Figure 2, which depicts a prefix of the tree
${\Gamma}^{*}$. For the node
${u}_{0}$ of the considered prefix,
$TS({\Gamma}^{*},{u}_{0})=T\{{f}_{1}={\delta}_{1},\dots ,{f}_{n}={\delta}_{n}\}=TH$. For each
${f}_{i}\in E\left(T\right)$ and each
${\sigma}_{i}\in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, the node of this prefix labeled with the attribute
${f}_{i}$ has a child
${u}_{i,{\sigma}_{i}}$ such that
$TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})=T\{{f}_{1}={\delta}_{1},\dots ,{f}_{i1}={\delta}_{i1},{f}_{i}={\sigma}_{i}\}$. It is clear that
$TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})$ is a subtable of
$TS(\Gamma ,{v}_{i,{\sigma}_{i}})$. By Lemma 9,
${L}_{t}^{\left(3\right)}\left(TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})\right)\le {L}_{t}^{\left(3\right)}\left(TS(\Gamma ,{v}_{i,{\sigma}_{i}})\right)$. It is also clear that
$\leftE\right(TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})\le n1$. Using the inductive hypothesis, we obtain that there is a decision tree
${\Gamma}_{i,{\sigma}_{i}}$ of the type 1 for the table
$TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})$ such that
${L}_{t}(TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}}),{\Gamma}_{i,{\sigma}_{i}})\le {L}_{t}^{\left(3\right)}\left(TS({\Gamma}^{*},{u}_{i,{\sigma}_{i}})\right)\le {L}_{t}^{\left(3\right)}\left(TS(\Gamma ,{v}_{i,{\sigma}_{i}})\right)$.
We now transform the prefix of a decision tree
${\Gamma}^{*}$ depicted in
Figure 2 into a decision tree
${\Gamma}^{*}$ of the type 1 for the table
T. First, we transform the node
${u}_{0}$ into a terminal node labeled with the number 0 if
$({\delta}_{1},\dots ,{\delta}_{n})$ is not a row of
T and labeled with the decision attached to
$({\delta}_{1},\dots ,{\delta}_{n})$ if this tuple is a row of
T. Next, for each
${f}_{i}\in E\left(T\right)$ and each
${\sigma}_{i}\in E(T,{f}_{i})\setminus \left\{{\delta}_{i}\right\}$, we replace the node
${u}_{i,{\sigma}_{i}}$ with the tree
${\Gamma}_{i,{\sigma}_{i}}$. It is clear that the obtained tree
${\Gamma}^{*}$ is a decision tree of the type 1 for the decision table
T and
${L}_{t}(T,{\Gamma}^{*})\le {L}_{t}(T,\Gamma )={L}_{t}^{\left(3\right)}\left(T\right)$.
We proved that, for any decision table T, ${L}_{t}^{\left(1\right)}\left(T\right)\le {L}_{t}^{\left(3\right)}\left(T\right)$; hence, ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)={L}_{t}^{\left(5\right)}\left(T\right)$. □
10. Results of Experiments
We conducted experiments with eight decision tables from the UCI ML Repository [
20].
Table 1 contains information about each of these decision tables: its name, the number of rows, and the number of attributes. For each of the considered four cost functions, each of the considered five types of decision trees, and each of the considered eight decision tables, we find the minimum cost of a decision tree of the given type for the given table.
For $n=3,\dots ,6$, we randomly generate 100 Boolean functions with n variables. We represent each Boolean function f with n variables ${x}_{1},\dots ,{x}_{n}$ as a decision table ${T}_{f}$ with n columns labeled with variables ${x}_{1},\dots ,{x}_{n}$ considered as attributes and with ${2}^{n}$ rows that are all possible ntuples of values of the variables. Each row is labeled with the decision that is the value of the function f on the corresponding ntuple. We consider decision trees for the table ${T}_{f}$ as decision trees computing the function f.
For each of the considered four cost functions, each of the considered five types of decision trees, and each of the generated Boolean functions, using its decision table representation, we find the minimum cost of a decision tree of the given type computing this function.
The following remarks clarify some experimental results considered later.
From Proposition 10, it follows that ${L}_{t}^{\left(1\right)}\left(T\right)={L}_{t}^{\left(3\right)}\left(T\right)={L}_{t}^{\left(5\right)}\left(T\right)$ for any decision table T.
Let
f be a Boolean function with
$n\ge 1$ variables. Since each hypothesis over the decision table
${T}_{f}$ is proper, the following equalities hold:
10.1. Depth
In this section, we consider some results obtained in Reference [
12]. Results of experiments with eight decision tables from Reference [
20] and the depth are represented in
Table 2. The first column contains the name of the considered decision table
T. The last five columns contain values
${h}^{\left(1\right)}\left(T\right),\dots ,{h}^{\left(5\right)}\left(T\right)$ (minimum values for each decision table are in bold).
Decision trees with the minimum depth using attributes (type 1) are optimal for 5 decision tables, using hypotheses (type 2) are optimal for 4 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 3 tables, using attributes and proper hypotheses (type 5) are optimal for 7 tables.
For the decision table soybeansmall, we must use attributes to construct an optimal decision tree. For this table, it is enough to use only attributes. For the decision tables breastcancer and nursery, we must use both attributes and hypotheses to construct optimal decision trees. For these tables, it is enough to use attributes and proper hypotheses. For the decision table tictactoe, we must use both attributes and hypotheses to construct optimal decision trees. For this table, it is not enough to use attributes and proper hypotheses.
Results of experiments with Boolean functions and the depth are represented in
Table 3. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values
${h}^{\left(1\right)},\dots ,{h}^{\left(5\right)}$ in the format
${}_{\mathit{min}}{\mathit{Avg}}_{\mathit{max}}$.
From the obtained results, it follows that, generally, the decision trees of the types 2 and 4 are better than the decision trees of the type 1, and the decision trees of the types 3 and 5 are better than the decision trees of the types 2 and 4.
10.2. Number of Realizable Nodes
In this section, we consider some results obtained in Reference [
13]. Results of experiments with eight decision tables from Reference [
20] and the number of realizable nodes are represented in
Table 4. The first column contains the name of the considered decision table
T. The last five columns contain values
${L}^{\left(1\right)}\left(T\right),\dots ,{L}^{\left(5\right)}\left(T\right)$ (minimum values for each decision table are in bold).
Decision trees with the minimum number of realizable nodes using attributes (type 1) are optimal for 4 decision tables, using hypotheses (type 2) are optimal for 0 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 0 tables, and using attributes and proper hypotheses (type 5) are optimal for 8 tables.
Decision trees of the types 3 and 5 can be a bit better than the decision trees of the type 1. Decision trees of the types 2 and 4 are far from the optimal.
For the decision tables hayesrothdata, soybeansmall, tictactoe, and zoodata, we must use attributes to construct optimal decision trees. For these tables, it is enough to use only attributes. For the rest of the considered decision tables, we must use both attributes and hypotheses to construct optimal decision trees. For these tables, it is enough to use attributes and proper hypotheses.
Results of experiments with Boolean functions and the number of realizable nodes are represented in
Table 5. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values
${L}^{\left(1\right)},\dots ,{L}^{\left(5\right)}$ in the format
${}_{min}Av{g}_{max}$.
From the obtained results, it follows that, generally, the decision trees of the types 3 and 5 are slightly better than the decision trees of the type 1, and the decision trees of the types 2 and 4 are far from the optimal.
10.3. Number of Realizable Terminal Nodes
Results of experiments with eight decision tables from Reference [
20] and the number of realizable terminal nodes are represented in
Table 6. The first column contains the name of the considered decision table
T. The last five columns contain values
${L}_{t}^{\left(1\right)}\left(T\right),\dots ,{L}_{t}^{\left(5\right)}\left(T\right)$ (minimum values for each decision table are in bold).
Decision trees of the types 1, 3, and 5 are optimal for each of the considered tables. Decision trees of the types 2 and 4 are far from the optimal.
Results of experiments with Boolean functions and the number of realizable terminal nodes are represented in
Table 7. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values
${L}_{t}^{\left(1\right)},\dots ,{L}_{t}^{\left(5\right)}$ in the format
${}_{min}Av{g}_{max}$.
From the obtained results, it follows that, generally, the decision trees of the types 1, 3, and 5 are optimal, and the decision trees of the types 2 and 4 are far from the optimal.
10.4. Number of Working Nodes
Results of experiments with eight decision tables from Reference [
20] and the number of working nodes are represented in
Table 8. The first column contains the name of the considered decision table
T. The last five columns contain values
${L}_{w}^{\left(1\right)}\left(T\right),\dots ,{L}_{w}^{\left(5\right)}\left(T\right)$ (minimum values for each decision table are in bold).
Decision trees with the minimum number of working nodes using attributes (type 1) are optimal for 2 decision tables, using hypotheses (type 2) are optimal for 0 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 0 tables, using attributes and proper hypotheses (type 5) are optimal for 7 tables.
Decision trees of the types 3 and 5 can be a bit better than the decision trees of the type 1. Decision trees of the types 2 and 4 are far from the optimal.
For all decision tables with the exception of soybeansmall and zoodata, we must use both attributes and hypotheses to construct optimal decision trees. Moreover, for tictactoe, it is not enough to use attributes and proper hypotheses. For soybeansmall and zoodata, it is enough to use only attributes to construct optimal decision trees.
Results of experiments with Boolean functions and the number of working nodes are represented in
Table 9. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values
${L}_{w}^{\left(1\right)},\dots ,{L}_{w}^{\left(5\right)}$ in the format
${}_{min}Av{g}_{max}$.
From the obtained results, it follows that, generally, the decision trees of the types 3 and 5 are better than the decision trees of the type 1, and the decision trees of the types 2 and 4 are far from the optimal.
We can now sum up the results of the experiments. Generally, the decision trees of the types 3 and 5 are slightly better than the decision trees of the type 1. Decision trees of the types 2 and 4 have, generally, too many nodes.
11. Conclusions
In this paper, we studied modified decision trees that use both queries based on one attribute each and queries based on hypotheses about values of all attributes. We designed dynamic programming algorithms for minimization of four cost functions for such decision trees and considered results of computer experiments. The main result of the paper is that the use of hypotheses can decrease the complexity of decision trees and make them more suitable for knowledge representation. In the future, we are planning to compare the length and coverage of decision rules derived from different types of decision trees constructed by the dynamic programming algorithms. Unfortunately, the considered algorithms cannot work together to optimize more than one cost function. In the future, we are also planning to consider two extensions of these algorithms: (i) sequential optimization relative to a number of cost functions and (ii) bicriteria optimization that allows us to construct for some pairs of cost functions the corresponding Pareto front.