On the Depth of Decision Trees with Hypotheses

Moshkov, Mikhail

doi:10.3390/e24010116

Open AccessArticle

On the Depth of Decision Trees with Hypotheses

by

Mikhail Moshkov

Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia

Entropy 2022, 24(1), 116; https://doi.org/10.3390/e24010116

Submission received: 20 October 2021 / Revised: 6 January 2022 / Accepted: 10 January 2022 / Published: 12 January 2022

(This article belongs to the Special Issue Rough Set Theory and Entropy in Information Science)

Download Versions Notes

Abstract

:

In this paper, based on the results of rough set theory, test theory, and exact learning, we investigate decision trees over infinite sets of binary attributes represented as infinite binary information systems. We define the notion of a problem over an information system and study three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. The considered three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses (an analog of equivalence queries from exact learning), and (iii) decision trees using both attributes and hypotheses. The first function has two possible types of behavior: logarithmic and linear (this result follows from more general results published by the author earlier). The second and the third functions have three possible types of behavior: constant, logarithmic, and linear (these results were published by the author earlier without proofs that are given in the present paper). Based on the obtained results, we divided the set of all infinite binary information systems into four complexity classes. In each class, the type of behavior for each of the considered three functions does not change.

Keywords:

test theory; rough set theory; exact learning; decision trees; complexity classes

1. Introduction

Decision trees are studied in different areas of computer science, in particular in exact learning [1], rough set theory [2,3,4], and test theory [5]. In some sense, these theories deal with dual objects: for example, membership queries from exact learning correspond to attributes from test theory and rough set theory. In contrast to test theory and rough set theory, in exact learning, besides membership queries, equivalence queries are also considered.

We extend the model considered in test theory and rough set theory by adding the notion of a hypothesis that is an analog of equivalence query. Papers [6,7,8,9,10] are related mainly to the experimental study of decision trees with hypotheses. The present paper contains a theoretical study of the depth of decision trees with hypotheses.

An infinite binary information system is a pair

U = (A, F)

where A is an infinite set of elements and F is an infinite set of functions (attributes) from A to

{0, 1}

. A problem over U is given by a finite number of attributes

f_{1}, \dots, f_{n}

from F: for

a \in A

, we should find the tuple

(f_{1} (a), \dots, f_{n} (a))

. To solve this problem, we can use decision trees with two types of queries. We can ask about the value of an attribute

f_{i} \in {f_{1}, \dots, f_{n}}

. As a result, we obtain an answer of the kind

f_{i} (x) = δ

where

δ \in {0, 1}

. We also can ask if a hypothesis

f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}

is true where

δ_{1}, \dots, δ_{n} \in {0, 1}

. Either we obtain the confirmation or a counterexample in the form

f_{i} (x) = \neg δ_{i}

.

The depth of decision trees with hypotheses can be essentially less than the depth of decision trees using only attributes. As an example, we consider the problem of the computation of the disjunction

x_{1} \lor \dots \lor x_{n}

. The minimum depth of a decision tree solving this problem using only attributes

x_{1}, \dots, x_{n}

is equal to n. However, the minimum depth of a decision tree with hypotheses solving this problem is equal to one: it is enough to ask only about the hypothesis

x_{1} = 0, \dots, x_{n} = 0

. If it is true, then the considered disjunction is equal to zero. Otherwise, it is equal to one.

Based on the results of exact learning, rough set theory, and test theory [1,11,12,13,14,15,16], we study for an arbitrary infinite binary information system three functions of the Shannon type that characterize the growth in the worth case of the minimum depth of a decision tree solving a problem with the growth of the number of attributes in the problem description. The considered three functions correspond to the following three cases:

(i): Only attributes are used in decision trees;
(ii): Only hypotheses are used in decision trees;
(iii): Both attributes and hypotheses are used in decision trees.

We show that the first function has two possible types of behavior: logarithmic and linear. The second and third functions have three possible types of behavior: constant, logarithmic, and linear. Bounds for the case (i) can be derived from more general results obtained in [15,16]. Results related to the cases (ii) and (iii) were presented in the conference paper [17] without proofs. In the present paper, we consider complete proofs for the cases (ii) and (iii). We also investigate the join behavior of these three functions and describe four complexity classes of infinite binary information systems; these results are completely new.

The obtained results allow us to understand the difference of time complexity for conventional decision trees that use only queries based on one attribute each and for decision trees with hypotheses. Moreover, we know now which combinations of types of behavior of the three Shannon-type functions we can take under consideration of an arbitrary infinite binary system, and we know the criteria for each combination.

This paper consists of six sections. In Section 2 and Section 3, we consider the basic notions and main results. Section 4 and Section 5 contain proofs of the main results, and Section 6 gives a short conclusion.

2. Basic Notions

Let A be a set of elements and F be a set of functions from A to

{0, 1}

. Functions from F are called attributes, and the pair

U = (A, F)

is called a binary information system (this notion is close to the notion of information systems proposed by Pawlak [18]). If A and F are infinite sets, then the pair

U = (A, F)

is called an infinite binary information system.

A problem over U is an arbitrary n-tuple

z = (f_{1}, \dots, f_{n})

where

n \in N

,

N

is the set of natural numbers

{1, 2, \dots}

, and

f_{1}, \dots, f_{n} \in F

. The problem z may be interpreted as a problem of searching for the tuple

z (a) = (f_{1} (a), \dots, f_{n} (a))

for an arbitrary

a \in A

. The number

dim z = n

is called the dimension of the problem z. Denote

F (z) = {f_{1}, \dots, f_{n}}

. We denote by

P (U)

the set of problems over U.

A system of equations over U is an arbitrary equation system of the kind:

{g_{1} (x) = δ_{1}, \dots, g_{m} (x) = δ_{m}}

where

m \in N \cup {0}

,

g_{1}, \dots, g_{m} \in F

, and

δ_{1}, \dots, δ_{m} \in {0, 1}

(if

m = 0

, then the considered equation system is empty). This equation system is called a system of equations over z if

g_{1}, \dots, g_{m} \in F (z)

. The considered equation system is called consistent (on A) if its set of solutions on A is nonempty. The set of solutions of the empty equation system coincides with A.

As algorithms for problem z solving, we consider decision trees with two types of queries. We can choose an attribute

f_{i} \in F (z)

and ask about its value. This query has two possible answers:

{f_{i} (x) = 0}

and

{f_{i} (x) = 1}

. We can formulate a hypothesis over z in the form

H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

where

δ_{1}, \dots, δ_{n} \in {0, 1}

and ask about this hypothesis. This query has

n + 1

possible answers:

H, {f_{1} (x) = \neg δ_{1}}, \dots, {f_{n} (x) = \neg δ_{n}}

where

\neg 1 = 0

and

\neg 0 = 1

. The first answer means that the hypothesis is true. Other answers are counterexamples.

A decision tree over z is a marked finite directed tree with the root in which:

Each terminal node is labeled with an n-tuple from the set ${0, 1}^{n}$ ;
Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set $F (z)$ or with a hypothesis over z;
If a working node is labeled with an attribute $f_{i}$ from $F (z)$ , then there are two edges, which leave this node and are labeled with the systems of equations ${f_{i} (x) = 0}$ and ${f_{i} (x) = 1}$ , respectively;
If a working node is labeled with a hypothesis:

$H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}$

over z, then there are $n + 1$ edges, which leave this node and are labeled with the system of equations $H, {f_{1} (x) = \neg δ_{1}}, \dots, {f_{n} (x) = \neg δ_{n}}$ , respectively.

Let

Γ

be a decision tree over z. A complete path in

Γ

is an arbitrary directed path from the root to a terminal node in

Γ

. We now define an equation system

S (ξ)

over U associated with the complete path

ξ

. If there are no working nodes in

ξ

, then

S (ξ)

is the empty system. Otherwise,

S (ξ)

is the union of equation systems assigned to the edges of the path

ξ

. We denote by

A (ξ)

the set of solutions on A of the system of equations

S (ξ)

(if this system is empty, then its solution set is equal to A).

We say that a decision tree

Γ

over z solves the problem z relative to U if, for each element

a \in A

and for each complete path

ξ

in

Γ

such that

a \in A (ξ)

, the terminal node of the path

ξ

is labeled with the tuple

z (a)

.

We now consider an equivalent definition of a decision tree solving a problem. Denote by

Δ_{U} (z)

the set of tuples

(δ_{1}, \dots, δ_{n}) \in {0, 1}^{n}

such that the system of equations

{f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

is consistent. The set

Δ_{U} (z)

is the set of all possible solutions to the problem z. Let

Δ \subseteq Δ_{U} (z)

,

f_{i_{1}}, \dots, f_{i_{m}} \in {f_{1}, \dots, f_{n}}

, and

σ_{1}, \dots, σ_{m} \in {0, 1}

. Denote:

Δ (f_{i_{1}}, σ_{1}) \dots (f_{i_{m}}, σ_{m})

the set of all n-tuples

(δ_{1}, \dots, δ_{n}) \in Δ

for which

δ_{i_{1}} = σ_{1}, \dots, δ_{i_{m}} = σ_{m}

.

Let

Γ

be a decision tree over the problem z. We correspond to each complete path

ξ

in the tree

Γ

a word

π (ξ)

in the alphabet

{(f_{i}, δ) : f_{i} \in F (z), δ \in {0, 1}}

. If the equation system

S (ξ)

is empty, then

π (ξ)

is the empty word. If

S (ξ) = {f_{i_{1}} (x) = σ_{1}, \dots, f_{i_{m}} (x) = σ_{m}}

, then

π (ξ) = (f_{i_{1}}, σ_{1}) \dots (f_{i_{m}}, σ_{m})

. The decision tree

Γ

over z solves the problem z relative to U if, for each complete path

ξ

in

Γ

, the set

Δ_{U} (z) π (ξ)

contains at most one tuple, and if this set contains exactly one tuple, then the considered tuple is assigned to the terminal node of the path

ξ

.

As the time complexity of a decision tree

Γ

, we consider its depth

h (Γ)

, that is the maximum number of working nodes in a complete path in the tree

Γ

.

Let

z \in P (U)

. We denote by

h_{U}^{(1)} (z)

the minimum depth of a decision tree over z, which solves z relative to U and uses only attributes from

F (z)

. We denote by

h_{U}^{(2)} (z)

the minimum depth of a decision tree over z, which solves z relative to U and uses only hypotheses over z. We denote by

h_{U}^{(3)} (z)

the minimum depth of a decision tree over z, which solves z relative to U and uses both attributes from

F (z)

and hypotheses over z.

For

i = 1, 2, 3

, we define a function of the Shannon type

h_{U}^{(i)} (n)

that characterizes the dependence of

h_{U}^{(i)} (z)

on

dim z

in the worst case. Let

i \in {1, 2, 3}

and

n \in N

. Then:

h_{U}^{(i)} (n) = max {h_{U}^{(i)} (z) : z \in P (U), dim z \leq n} .

3. Main Results

Let

U = (A, F)

be an infinite binary information system and

r \in N

. The information system U is called r-reduced if, for each consistent on A system of equations over U, there exists a subsystem of this system that has the same set of solutions and contains at most r equations. We denote by

R

the set of infinite binary information systems each of which is r-reduced for some

r \in N

.

The next theorem follows from the results obtained in [15], where we considered closed classes of test tables (decision tables). It also follows from the results obtained in [16], where we considered the weighted depth of decision trees.

Theorem 1.

Let U be an infinite binary information system. Then, the following statements hold:

(a): If $U \in R$ , then $h_{U}^{(1)} (n) = Θ (log n)$ ;
(b): If $U \notin R$ , then $h_{U}^{(1)} (n) = n$ for any $n \in N$ .

A subset

{f_{1}, \dots, f_{m}}

of F is called independent if, for any

δ_{1}, \dots, δ_{m} \in {0, 1}

, the system of equations

{f_{1} (x) = δ_{1}, \dots, f_{m} (x) = δ_{m}}

is consistent on the set A. The empty set of attributes is independent by definition. We now define the independence dimension or I-dimension

I (U)

of the information system U (this notion is similar to the notion of the independence number of the family of sets considered by Naiman and Wynn in [19]). If, for each

m \in N

, the set F contains an independent subset of cardinality m, then

I (U) = \infty

. Otherwise,

I (U)

is the maximum cardinality of an independent subset of the set F. We denote by

D

the set of infinite binary information systems with a finite independence dimension.

Let

U = (A, F)

be a binary information system, which is not necessarily infinite,

f \in F

, and

δ \in {0, 1}

. Denote:

A (f, δ) = {a : a \in A, f (a) = δ} .

We now define inductively the notion of a k-information system,

k \in N \cup {0}

. The binary information system U is called a 0-information system if all attributes from F are constant on the set A. Let, for some

k \in N \cup {0}

, the notion of a m-information system be defined for

m = 0, \dots, k

. The binary information system U is called a

(k + 1)

-information system if it is not a m-information system for

m = 0, \dots, k

and, for any

f \in F

, there exist numbers

δ \in {0, 1}

and

m \in {0, \dots, k}

such that the information system

(A (f, δ), F)

is a m-information system. It is easy to show by induction on k that if

U = (A, F)

is a k-information system, then

U^{'} = (A^{'}, F)

,

A^{'} \subseteq A

, is a l-information system for some

l \leq k

. We denote by

C

the set of infinite binary information systems for each of which there exists

k \in N

such that the considered system is a k-information system. The following theorem was presented in [17] without proof.

Theorem 2.

Let U be an infinite binary information system. Then, the following statements hold:

(a): If $U \in C$ , then $h_{U}^{(2)} (n) = O (1)$ and $h_{U}^{(3)} (n) = O (1)$ ;
(b): If $U \in D ∖ C$ , then $h_{U}^{(2)} (n) = Θ (log n)$ , $h_{U}^{(3)} (n) = Ω (\frac{log n}{log log n})$ , and $h_{U}^{(3)} (n) = O (log n)$ ;
(c): If $U \notin D$ , then $h_{U}^{(2)} (n) = n$ and $h_{U}^{(3)} (n) = n$ for any $n \in N$ .

Let U be an infinite binary information system. We now consider the join behavior of the functions

h_{U}^{(1)} (n)

,

h_{U}^{(2)} (n)

, and

h_{U}^{(3)} (n)

. It depends on the belonging of the information system U to the sets

R

,

D

, and

C

. We correspond to the information system U its indicator vector

i n d (U) = (c_{1}, c_{2}, c_{3}) \in {0, 1}^{3}

in which

c_{1} = 1

if and only if

U \in R

,

c_{2} = 1

if and only if

U \in D

, and

c_{3} = 1

if and only if

U \in C

.

Theorem 3.

For any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Each row of Table 1 is the indicator vector of some infinite binary information system.

For

i = 1, 2, 3, 4

, we denote by

V_{i}

the class of all infinite binary information systems, for which the indicator vector coincides with the ith row of Table 1. Table 2 summarizes Theorems 1–3. The first column contains the name of complexity class

V_{i}

. The next three columns describe the indicator vector of information systems from this class. The last three columns

h_{U}^{(1)} (n)

,

h_{U}^{(2)} (n)

, and

h_{U}^{(3)} (n)

contain information about the behavior of the functions

h_{U}^{(1)} (n)

,

h_{U}^{(2)} (n)

, and

h_{U}^{(3)} (n)

for information systems from the class

V_{i}

.

4. Proof of Theorem 2

We precede with the proof of Theorem 2 by two lemmas.

Let

d \in N

. A d-complete tree over the information system

U = (A, F)

is a marked finite directed tree with the root in which:

Each terminal node is not labeled;
Each nonterminal node is labeled with an attribute $f \in F$ . There are two edges leaving this node that are labeled with the systems of equations ${f (x) = 0}$ and ${f (x) = 1}$ , respectively;
The length of each complete path (the path from the root to a terminal node) is equal to d;
For each complete path $ξ$ , the equation system $S (ξ)$ , which is the union of equation systems assigned to the edges of the path $ξ$ , is consistent.

Let G be a d-complete tree over U and

F (G)

be the set of all attributes attached to the nonterminal nodes of the tree G. The number of nonterminal nodes in G is equal to

2^{0} + 2^{1} + \dots + 2^{d - 1} = 2^{d} - 1

. Therefore,

| F (G) | \leq 2^{d}

.

The results mentioned in the following lemma are obtained by methods similar to those used by Littlestone [12], Maass and Turán [13], and Angluin [11].

Lemma 1.

Let

U = (A, F)

be a binary information system,

d \in N

, G be a d-complete tree over U, and z be a problem over U such that

F (G) \subseteq F (z)

. Then

(a): $h_{U}^{(2)} (z) \geq d$ ;
(b): $h_{U}^{(3)} (z) \geq \frac{d}{{log}_{2} (2 d)}$ .

Proof.

(a) We prove the inequality

h_{U}^{(2)} (z) \geq d

by induction on d. Let

d = 1

. Then, the tree G has the only one nonterminal node, which is labeled with an attribute f that is not constant on A. Therefore,

| Δ_{U} (z) | \geq 2

and

h_{U}^{(2)} (z) \geq 1

. Let, for

t \in N

and for any natural d,

1 \leq d \leq t

, the considered statement hold. Assume now that

d = t + 1

, G is a d-complete tree over U,

z

is a problem over U such that

F (G) \subseteq F (z)

, and

Γ

is a decision tree over z with the minimum depth, which solves the problem z and uses only hypotheses. Let f be the attribute attached to the root of the tree G and H be the hypothesis attached to the root of the decision tree

Γ

. Then, there is an edge that leaves the root of

Γ

and is labeled with the equation system

{f (x) = δ}

where the equation

f (x) = \neg δ

belongs to the hypothesis H. This edge enters to the root of the subtree of

Γ

, which is denoted by

Γ_{f}

. There is an edge that leaves the root of G and is labeled with the equation system

{f (x) = δ}

. This edge enters the root of the subtree of G, which is denoted by

G_{δ}

. One can show that the decision tree

Γ_{f}

solves the problem z relative to the information system

U^{'} = (A (f, δ), F)

and

G_{δ}

is a t-complete tree over

U^{'}

. It is clear that

F (G_{δ}) \subseteq F (z)

. Using the inductive hypothesis, we obtain

h (Γ_{f}) \geq t

. Therefore,

h (Γ) \geq t + 1 = d

and

h_{U}^{(2)} (z) \geq d

.

(b) We now prove the inequality

h_{U}^{(3)} (z) \geq \frac{d}{{log}_{2} (2 d)}

. Let

z = (f_{1}, \dots, f_{n})

and

Γ

be a decision tree over z with the minimum depth, which solves the problem z and uses both attributes and hypotheses. The d-complete tree G has

2^{d}

complete paths

ξ_{1}, \dots, ξ_{2^{d}}

. For

i = 1, \dots, 2^{d}

, we denote by

a_{i}

a solution of the equation system

S (ξ_{i})

. Denote

B = {a_{1}, \dots, a_{2^{d}}}

. We now show that the decision tree

Γ

contains a complete path, the length of which is at least

\frac{d}{{log}_{2} (2 d)}

. We describe the process of this path construction beginning with the root of

Γ

.

Let the root of

Γ

be labeled with an attribute

f_{i_{0}}

. For

δ \in {0, 1}

, we denote by

B^{δ}

the set of solutions on B of the equation system

{f_{i_{0}} (x) = δ}

and choose

σ \in {0, 1}

for which

| B^{σ} | = max {| B^{0} |, | B^{1} |}

. It is clear that

| B^{σ} | \geq \frac{| B |}{2} \geq \frac{| B |}{2 d}

. In the considered case, the beginning of the constructed path in

Γ

is the root of

Γ

, the edge that leaves the root and is labeled with the equation system

{f_{i_{0}} (x) = σ}

, and the node to which this edge enters.

Let as assume now that the root of

Γ

is labeled with a hypothesis

H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

. We denote by

ξ_{H}

the complete path in G for which the system of equations

S (ξ_{H})

is a subsystem of H. Let the nonterminal nodes of the complete path

ξ_{H}

be labeled with the attributes

f_{i_{1}}, \dots, f_{i_{d}}

. For

j = 1, \dots, d

, we denote by

B_{j}

the set of solutions on B of the equation system

{f_{i_{j}} (x) = \neg δ_{i_{j}}}

. It is clear that

| B_{1} | + \dots + | B_{d} | \geq | B | - 1

. Therefore, there exists

l \in {1, \dots, d}

such that

| B_{l} | \geq \frac{| B | - 1}{d} \geq \frac{| B |}{2 d}

. In the considered case, the beginning of the constructed path in

Γ

is the root of

Γ

, the edge that leaves the root and is labeled with the equation system

{f_{i_{l}} (x) = \neg δ_{i_{l}}}

, and the node to which this edge enters.

We continue the construction of the complete path in

Γ

in the same way such that after the tth query, we have at least

\frac{| B |}{{(2 d)}^{t}}

elements from B. The process of path construction continues at least until

\frac{| B |}{{(2 d)}^{t}} \leq 1

, i.e., at least until

{log}_{2} | B | \leq t {log}_{2} (2 d)

. Since

| B | = 2^{d},

we have

h (Γ) \geq t \geq \frac{d}{{log}_{2} (2 d)}

and

h_{U}^{(3)} (z) \geq \frac{d}{{log}_{2} (2 d)}

. □

Lemma 2.

Let

U = (A, F)

be a binary information system,

k \in N \cup {0}

, and U not be an m-information system for

m = 0, \dots, k

. Then, there exists a

(k + 1)

-complete tree over U.

Proof.

We prove the considered statement by induction on k. Let

k = 0

. In this case, U is not a 0-information system. Then, there exists an attribute

f \in F

, which is not constant on A. Using this attribute, it is easy to construct a 1-complete tree over U.

Let the considered statement hold for some k,

k \geq 0

. We now show that it also holds for

k + 1

. Let

U = (A, F)

be a binary information system, which is not an m-information system for

m = 1, \dots, k + 1

. Then, there exists an attribute

f \in F

such that, for any

δ \in {0, 1}

, the information system

U_{δ} = (A (f, δ), F)

is not an m-information system for

m = 1, \dots, k

. Using the inductive hypothesis, we conclude that, for any

δ \in {0, 1}

, there exists a

(k + 1)

-complete tree

G_{δ}

over

U_{δ}

. Denote by G a directed tree with root in which the root is labeled with the attribute f, and for any

δ \in {0, 1}

, there is an edge that leaves the root, is labeled with the equation system

{f (x) = δ}

, and enters the root of the tree

G_{δ}

. One can show that the tree G is a

(k + 2)

-complete tree over U. □

Proof of Theorem 2.

It is clear that

h_{U}^{(3)} (z) \leq h_{U}^{(2)} (z)

for any problem z over U. Therefore,

h_{U}^{(3)} (n) \leq h_{U}^{(2)} (n)

for any

n \in N

.

(a) Let

k \in N \cup {0}

. We now show by induction on k that, for each binary k-information system U (not necessarily infinite) for each problem z over U, the inequality

h_{U}^{(2)} (z) \leq k

holds. Let

U = (A, F)

be a binary 0-information system and z be a problem over U. Since all attributes from

F (z)

are constant on A, the set

Δ_{U} (z)

contains only one tuple. Therefore, the decision tree containing only one node labeled with this tuple solves the problem z relative to U, and

h_{U}^{(2)} (z) = 0

.

Let

k \in N \cup {0}

and, for each m,

0 \leq m \leq k

, the considered statement hold. Let us show that it holds for

k + 1

. Let

U = (A, F)

be a binary

(k + 1)

-information system and

z = (f_{1}, \dots, f_{n})

be a problem over U. For

i = 1, \dots, n

, choose a number

δ_{i} \in {0, 1}

such that the information system

(A (f_{i}, \neg δ_{i}), F)

is an

m_{i}

-information system where

1 \leq m_{i} \leq k

. Using the inductive hypothesis, we conclude that, for

i = 1, \dots, n

, there is a decision tree

Γ_{i}

over z, which uses only hypotheses, solves the problem z over

(A (f_{i}, \neg δ_{i}), F)

, and has depth at most

m_{i}

. We denote by

Γ

a decision tree in which the root is labeled with the hypothesis

H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

, the edge leaving the root and labeled with H enters the terminal node labeled with the tuple

(δ_{1}, \dots, δ_{n})

, and for

i = 1, \dots, n

, the edge leaving the root and labeled with

{f_{i} (x) = \neg δ_{i}}

enters the root of the tree

Γ_{i}

. One can show that

Γ

solves the problem z relative to U and

h (Γ) \leq k + 1

. Therefore,

h_{U}^{(2)} (z) \leq k + 1

for any problem z over U.

Let

U \in C

. Then, U is a k-information system for some natural k, and for each problem z over U, we have

h_{U}^{(3)} (z) \leq h_{U}^{(2)} (z) \leq k

. Therefore,

h_{U}^{(2)} (n) = O (1)

and

h_{U}^{(3)} (n) = O (1)

.

(b) Let

U = (A, F) \in D ∖ C

. First, we show that

h_{U}^{(2)} (n) = O (log n)

. Let

z = (f_{1}, \dots, f_{n})

be an arbitrary problem over U. From Lemma 5.1 [16], it follows that

| Δ_{U} {(z) | \leq (4 n)}^{I (U)}

. The proof of this lemma is based on results similar to the ones obtained by Sauer [20] and Shelah [21]. We consider a decision tree

Γ

over z, which solves z relative to U and uses only hypotheses. This tree is constructed by the halving algorithm [1,12]. We describe the work of this tree for an arbitrary element a from A. Set

Δ =

Δ_{U} (z)

. If

| Δ | = 1

, then the only n-tuple from

Δ

is the solution

z (a)

of the problem z for the element a. Let

| Δ | \geq 2

. For

i = 1, \dots, m

, we denote by

δ_{i}

a number from

{0, 1}

such that

| Δ (f_{i}, δ_{i}) | \geq | Δ (f_{i}, \neg δ_{i}) |

. The root of

Γ

is labeled with the hypothesis

H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

. After this query, either the problem z is solved (if the answer is H) or we halve the number of objects in the set

Δ

(if the answer is a counterexample

{f_{i} (x) = \neg δ_{i}}

). In the latter case, set

Δ =

Δ_{U} (z) (f_{i}, \neg δ_{i})

. The decision tree

Γ

continues to work with the element a and the set of n-tuples

Δ

in the same way. Let, during the work with the element a, the considered decision tree make q queries. After the

(q - 1)

th query, the number of remaining n-tuples in the set

Δ

is at least two and at most

{(4 n)}^{I (U)} / 2^{q - 1}

. Therefore,

2^{q} \leq {(4 n)}^{I (U)}

and

q \leq I (U) {log}_{2} (4 n)

. Therefore, during the processing of the element a, the decision tree

Γ

makes at most

I (U) {log}_{2} (4 n)

queries. Since a is an arbitrary element from A, the depth of

Γ

is at most

I (U) {log}_{2} (4 n)

. Since z is an arbitrary problem over U, we obtain

h_{U}^{(2)} (n) = O (log n)

. Therefore,

h_{U}^{(3)} (n) = O (log n)

.

Using Lemma 2 and the relation

U \notin C

, we obtain that, for any

d \in N

, there exists d-complete tree

G_{d}

over U. Let

F (G_{d}) = {f_{1}, \dots, f_{n_{d}}}

. We know that

n_{d} \leq 2^{d}

. Denote

z_{d} = (f_{1}, \dots, f_{n_{d}})

. From Lemma 1, it follows that

h_{U}^{(2)} (z_{d}) \geq d

and

h_{U}^{(3)} (z_{d}) \geq \frac{d}{{log}_{2} (2 d)}

. As a result, we have

h_{U}^{(2)} (2^{d}) \geq d

and

h_{U}^{(3)} (2^{d}) \geq \frac{d}{{log}_{2} (2 d)}

. Let

n \in N

and

n \geq 8

. Then, there exists

d \in N

such that

2^{d} \leq n < 2^{d + 1}

. We have

d > {log}_{2} n - 1

,

h_{U}^{(2)} (n) \geq {log}_{2} n - 1

,

h_{U}^{(2)} (n) = Ω (log n)

, and

h_{U}^{(2)} (n) = Θ (log n)

. It is easy to show that the function

\frac{x}{{log}_{2} (2 x)}

is nondecreasing for

x \geq 2

. Therefore,

h_{U}^{(3)} (n) \geq \frac{{log}_{2} n - 1}{{log}_{2} (2 ({log}_{2} n - 1))}

and

h_{U}^{(3)} (n) = Ω (\frac{log n}{log log n})

.

(c) Let

U = (A, F) \notin D

. We now consider an arbitrary problem

z = (f_{1}, \dots, f_{n})

over U and a decision tree over z, which uses only hypotheses and solves the problem z over U in the following way. For a given element

a \in A

, the first query is about the hypothesis

H_{1} = {f_{1} (x) = 1, \dots, f_{n} (x) = 1}

. If the answer is

H_{1}

, then the problem z is solved for the element a. If, for some

i \in {1, \dots, n}

, the answer is

{f_{i} (x) = 0}

, then the second query is about the hypothesis

H_{2}

obtained from

H_{1}

by replacing the equality

f_{i} (x) = 1

with the equality

f_{i} (x) = 0

, etc. It is clear that after at most n queries, the problem z for the element a will be solved. Thus,

h_{U}^{(2)} (z) \leq n

and

h_{U}^{(3)} (z) \leq n

. Since z is an arbitrary problem over U, we have

h_{U}^{(2)} (n) \leq n

and

h_{U}^{(3)} (n) \leq n

for any

n \in N

.

Let

n \in N

. Since

U \notin D

, there exist attributes

f_{1}, \dots, f_{n} \in F

such that, for any

(δ_{1}, \dots, δ_{n}) \in {0, 1}^{n}

, the equation system

{f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

is consistent on A. We now consider the problem

z = (f_{1}, \dots, f_{n})

and an arbitrary decision tree

Γ

over z, which solves the problem z over U and uses both attributes and hypotheses. Let us show that

h (Γ) \geq n

. If

n = 1

, then the considered inequality holds since

| Δ_{U} (z) | \geq 2

. Let

n \geq 2

. It is easy to show that an equation system over z is inconsistent if and only if it contains equations

f_{i} (x) = 0

and

f_{i} (x) = 1

for some

i \in {1, \dots, n}

. For each node v of the decision tree

Γ

, we denote by

S_{v}

the union of systems of equations attached to edges in the path from the root of

Γ

to v. A node v of

Γ

will be called consistent if the equation system

S_{v}

is consistent.

We now construct a complete path

ξ

in the decision tree

Γ

, for which the nodes are consistent. We start from the root that is a consistent node. Let the path reach a consistent node v of

Γ

. If v is a terminal node, then the path

ξ

is constructed. Let v be a working node labeled with an attribute

f_{i} \in F (z)

. Then, there exists

δ \in {0, 1}

for which the system of equations

S_{v} \cup {f_{i} (x) = δ}

is consistent. Then, the path

ξ

will pass through the edge leaving v and labeled with the system of equations

{f_{i} (x) = δ}

. Let v be labeled with a hypothesis

H = {f_{1} (x) = δ_{1}, \dots, f_{n} (x) = δ_{n}}

. If there exists

i \in {1, \dots, n}

such that the system of equations

S_{v} \cup {f_{i} (x) = \neg δ}

is consistent, then the path

ξ

will pass through the edge leaving v and labeled with the system of equations

{f_{i} (x) = \neg δ}

. Otherwise,

S_{v} = H

, and the path

ξ

will pass through the edge leaving v and labeled with the system of equations H.

Let all edges in the path

ξ

be labeled with systems of equations containing one equation each. Since all nodes of

ξ

are consistent, the equation system

S (ξ)

is consistent. We now show that

S (ξ)

contains at least n equations. Let us assume that this system contains less than n equations. Then, the set

Δ_{U} (z) π (ξ)

contains more than one n-tuple, which is impossible. Therefore, the length of the path

ξ

is at least n. Let there be edges in

ξ

, which are labeled with hypotheses, and the first edge in

ξ

labeled with a hypothesis H leaves the node v. Then,

S_{v} = H

, and the length of

ξ

is at least n. Therefore,

h (Γ) \geq n

,

h_{U}^{(3)} (z) \geq n

, and

h_{U}^{(2)} (z) \geq n

. As a result, we obtain

h_{U}^{(3)} (n) \geq n

and

h_{U}^{(2)} (n) \geq n

. Thus,

h_{U}^{(2)} (n) = n

and

h_{U}^{(3)} (n) = n

for any

n \in N

. □

5. Proof of Theorem 3

First, we prove several auxiliary statements.

Proposition 1.

R \subseteq D

.

Proof.

Let

U \in R

. By Theorem 1,

h_{U}^{(1)} (n) = Θ (log n)

. Let us assume that

U \notin D

. Then, for any

n \in N

, there exists a problem

z = (f_{1}, \dots, f_{n})

over U such that

| Δ_{U} (z) | = 2^{n}

. Let

Γ

be a decision tree over z, which solves the problem z relative to U and uses only attributes. Then,

Γ

should have at least

2^{n}

terminal nodes. One can show that the number of terminal nodes in the tree

Γ

is at most

2^{h (Γ)}

. Then,

2^{n} \leq 2^{h (Γ)}

,

h (Γ) \geq n

, and

h_{U} (z) \geq

n .

Therefore,

h_{U}^{(1)} (n) \geq n

for any

n \in N

, which is impossible. Thus,

R \subseteq D

. □

Proposition 2.

C \subseteq D

.

Proof.

Let

U \in C

. By Theorem 2,

h_{U}^{(2)} (n) = O (1)

. Let us assume that

U \notin D

. Then, by Theorem 2,

h_{U}^{(2)} (n) = n

for any

n \in N

, which is impossible. Therefore,

C \subseteq D

. □

Proposition 3.

R \cap C = \emptyset

.

Proof.

Assume the contrary:

R \cap C \neq \emptyset

and

U = (A, F) \in R \cap C

. Let

r, k \in N

, U be an r-reduced information system and U be a k-information system. We now consider an arbitrary problem

z = (f_{1}, \dots, f_{n})

over U and describe a decision tree

Γ

over z, which uses only attributes, solves the problem z over U, and has depth at most

k r

.

For

i = 1, \dots, n

, let

δ_{i}

be a number from

{0, 1}

such that

(A (f_{i}, \neg δ_{i}), F)

is an

m_{i}

-information system with

0 \leq m_{i} < k

. Let t be the maximum number from the set

{1, \dots, n}

such that the system of equations

S = {f_{1} (x) = δ_{1}, \dots, f_{t} (x) = δ_{t}}

is consistent. Then, there exists a subsystem

{f_{i_{1}} (x) = δ_{i_{1}}, \dots, f_{i_{p}} (x) = δ_{i_{p}}}

of the system S, which has the same set of solutions as S and for which

p \leq r

. For a given

a \in A

, the decision tree

Γ

computes sequentially values

f_{i_{1}} (a), \dots, f_{i_{p}} (a)

.

If, for some

q \in {1, \dots, p}

,

f_{i_{1}} (a) = δ_{i_{1}}, \dots, f_{i_{q - 1}} (a) = δ_{i_{q - 1}}

, and

f_{i_{q}} (a) = \neg δ_{i_{q}}

, then the decision tree

Γ

continues to work with the problem z and the information system

U^{'} = (A^{'}, F)

where

A^{'}

is the set of solutions on A of the equation system

{f_{i_{1}} (x) = δ_{i_{1}}, \dots, f_{i_{q - 1}} (x) = δ_{i_{q - 1}}, f_{i_{q}} (x) = \neg δ_{i_{q}}}

. We have that

U^{'}

is an

l^{'}

-information system for some

l^{'} \leq m_{i_{q}} < k

.

Let

f_{i_{1}} (a) = δ_{i_{1}}, \dots, f_{i_{p}} (a) = δ_{i_{p}}

. If

t = n

, then

(δ_{1}, \dots, δ_{n})

is the solution of the problem z for the considered element a. Let

t < n

. Then, the decision tree

Γ

continues to work with the problem z and the information system

U^{″} = (A^{″}, F)

where

A^{″}

is the set of solutions on A of the equation system

{f_{i_{1}} (x) = δ_{i_{1}}, \dots, f_{i_{p}} (x) = δ_{i_{p}}}

. We know that the equation system

{f_{1} (x) = δ_{1}, \dots, f_{t} (x) = δ_{t}, f_{t + 1} (x) = δ_{t + 1}}

is inconsistent. Therefore, the system

{f_{i_{1}} (x) = δ_{i_{1}}, \dots, f_{i_{p}} (x) = δ_{i_{p}}, f_{t + 1} (x) = δ_{t + 1}}

is inconsistent. Hence,

A^{″} \subseteq A (f_{t + 1}, \neg δ_{t + 1})

and

U^{″}

is an

l^{″}

-information system for some

l^{″} \leq m_{t + 1} < k

.

As a result, after the computation of the values of at most r attributes, we either solve the problem z or reduce the consideration of the problem z over the k-information system U to the consideration of the problem z over some l-information system where

l < k

. After the computation of the values of at most

r k

attributes, we solve the problem z since each problem over the 0-information system has exactly one possible solution. Therefore,

h_{U}^{(1)} (z) \leq r k

and

h_{U}^{(1)} (n) = O (1)

. By Theorem 1,

h_{U}^{(1)} (n) = Θ (log n)

. The obtained contradiction shows that

R \cap C = \emptyset

. □

Proposition 4.

For any infinite binary information system, its indicator vector coincides with one of the rows of Table 1.

Proof.

Table 3 contains as rows all three-tuples from the set

{0, 1}^{3}

. We now show that the rows with the numbers 5–8 cannot be indicator vectors of infinite binary information systems. Assume the contrary: there is

i \in {5, 6, 7, 8}

such that the row with the number i is the indicator vector of an infinite binary information system U. If

i = 5

, then

U \in R

and

U \notin D

, but this is impossible, since, by Proposition 1,

R \subseteq D

. If

i = 6

, then

U \in C

and

U \notin D

, but this is impossible, since, by Proposition 2,

C \subseteq D

. If

i = 7

, then

U \in R

and

U \notin D

, but this is impossible, since, by Proposition 1,

R \subseteq D

. If

i = 8

, then

U \in R

and

U \in C

, but this is impossible, since, by Proposition 3,

R \cap C = \emptyset

. Therefore, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 3 with Numbers 1–4. Thus, it coincides with one of the rows of Table 1. □

Define an infinite binary information system

U_{1} = (A_{1}, F_{1})

as follows:

A_{1} = N

and

F_{1}

is the set of all functions from

N

to

{0, 1}

.

Lemma 3.

The information system

U_{1}

belongs to the class

V_{1}

.

Proof.

It is easy to show that the information system

U_{1}

has an infinite I-dimension. Therefore,

U_{1} \notin D

. Using Proposition 4, we obtain

i n d (U) = (0, 0, 0)

, i.e.,

U_{1} \in V_{1}

. □

For any

i \in N

, we define two functions

p_{i} : N \to {0, 1}

and

l_{i} : N \to {0, 1}

. Let

j \in N

. Then,

p_{i} (j) = 1

if and only if

j = i

and

l_{i} (j) = 1

if and only if

j > i

.

Define an infinite binary information system

U_{2} = (A_{2}, F_{2})

as follows:

A_{2} = N

and

F_{2} = {p_{i} : i \in N} \cup {l_{i} : i \in N}

.

Lemma 4.

The information system

U_{2}

belongs to the class

V_{2}

.

Proof.

For

n \in N

, denote

S_{n} = {p_{1} (x) = 0, \dots, p_{n} (x) = 0}

. One can show that the equation system

S_{n}

is consistent and each proper subsystem of

S_{n}

has a set of solutions different from the set of solutions of

S_{n}

. Therefore,

U_{2} \notin R

. Using attributes from the set

{l_{i} : i \in N}

, we can construct a d-complete tree over

U_{2}

for each

d \in N

. By Lemma 1 and Theorem 2,

U_{2} \notin C

. One can show that

I (U_{2}) = 1

. Therefore,

U_{2} \in D

. Thus,

i n d (U_{2}) = (0, 1, 0)

, i.e.,

U_{2} \in V_{2}

. □

Define an infinite binary information system

U_{3} = (A_{3}, F_{3})

as follows:

A_{3} = N

and

F_{3} = {p_{i} : i \in N}

.

Lemma 5.

The information system

U_{3}

belongs to the class

V_{3}

.

Proof.

It is easy to show that

U_{3}

is a 1-information system. Therefore,

U_{3} \in C

. Using Proposition 4, we obtain

i n d (U_{3}) = (0, 1, 1)

, i.e.,

U_{3} \in V_{3}

. □

Define an infinite binary information system

U_{4} = (A_{4}, F_{4})

as follows:

A_{4} = N

and

F_{4} = {l_{i} : i \in N}

.

Lemma 6.

The information system

U_{4}

belongs to the class

V_{4}

.

Proof.

Let us consider an arbitrary consistent system of equations S over

U_{4}

. We now show that there is a subsystem of S, which has at most two equations and the same set of solutions as S. Let S contain both equations of the kind

l_{i} (x) = 1

and

l_{j} (x) = 0

. Denote

i_{0} = max {i : l_{i} (x) = 1 \in S}

and

j_{0} = min {j : l_{j} (x) = 0 \in S}

. One can show that the system of equations

S^{'} = {l_{i_{0}} (x) = 1, l_{j_{0}} (x) = 0}

has the same set of solutions as S. The case when S contains for some

δ \in {0, 1}

only equations of the kind

l_{p} (x) = δ

can be considered in a similar way. In this case, the equation system

S^{'}

contains only one equation. Therefore, the information system

U_{4}

is 2-reduced and

U_{4} \in

R

. Using Proposition 4, we obtain

i n d (U_{4}) = (1, 1, 0)

, i.e.,

U_{4} \in V_{4}

. □

Proof of Theorem 3.

From Proposition 4, it follows that, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Using Lemmas 3–6, we conclude that each row of Table 1 is the indicator vector of some infinite binary information system. □

6. Conclusions

Based on the results of exact learning, test theory, and rough set theory, for an arbitrary infinite binary information system, we studied three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. These three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses, and (iii) decision trees using both attributes and hypotheses. We described possible types of behavior for each of these three functions. We also studied the join behavior of these functions and distinguished four corresponding complexity classes of infinite binary information systems. In the future, we plan to translate the obtained results into the language of exact learning.

The problems studied in this paper allow us to confine ourselves to considering only the crisp (conventional) sets that are completely defined by attributes. However, in the future, when we investigate approximately defined problems or approximate decision trees, it will be necessary to work with rough sets given by their lower and upper approximations. This will require a wider range of rough set theory techniques than those used in the present paper.

Funding

Research funded by King Abdullah University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). The author is greatly indebted to the anonymous reviewers for their useful comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

References

Angluin, D. Queries and concept learning. Mach. Learn. 1988, 2, 319–342. [Google Scholar] [CrossRef] [Green Version]
Pawlak, Z. Rough sets. Int. J. Parallel Program. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Pawlak, Z. Rough Sets—Theoretical Aspects of Reasoning about Data; Theory and Decision Library: Series D; Kluwer: Dordrecht, The Netherlands, 1991; Volume 9. [Google Scholar]
Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
Chegis, I.A.; Yablonskii, S.V. Logical methods of control of work of electric schemes. Trudy Mat. Inst. Steklov 1958, 51, 270–360. (In Russian) [Google Scholar]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing depth of decision trees with hypotheses. In Rough Sets–International Joint Conference, Proceedings of the IJCRS 2021, Bratislava, Slovakia, 19–24 September 2021; Ramanna, S., Cornelis, C., Ciucci, D., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12872, pp. 123–133. [Google Scholar]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing number of nodes in decision trees with hypotheses. In Proceedings of the 25th International Conference on Knowledge—Based and Intelligent Information & Engineering Systems (KES 2021), Szczecin, Poland, 8–10 September 2021; Watrobski, J., Salabun, W., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C., Eds.; Elsevier: Amsterdam, The Netherlands, 2021; Volume 192, pp. 232–240. [Google Scholar]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Sorting by decision trees with hypotheses (extended abstract). In Proceedings of the 29th International Workshop on Concurrency, Specification and Programming, CS&P 2021, Berlin, Germany, 27–28 September 2021; CEUR Workshop Proceedings. Schlingloff, H., Vogel, T., Eds.; CEUR-WS.org: Aachen, Germany, 2021; Volume 2951, pp. 126–130. [Google Scholar]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Optimization of decision trees with hypotheses for knowledge representation. Electronics 2021, 10, 1580. [Google Scholar] [CrossRef]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Entropy-based greedy algorithm for decision trees using hypotheses. Entropy 2021, 23, 808. [Google Scholar] [CrossRef] [PubMed]
Angluin, D. Queries revisited. Theor. Comput. Sci. 2004, 313, 175–194. [Google Scholar] [CrossRef] [Green Version]
Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learn. 1988, 2, 285–318. [Google Scholar] [CrossRef]
Maass, W.; Turán, G. Lower bound methods and separation results for on-line learning models. Mach. Learn. 1992, 9, 107–145. [Google Scholar] [CrossRef] [Green Version]
Moshkov, M. Conditional tests. In Problemy Kibernetiki; Yablonskii, S.V., Ed.; Nauka Publishers: Moscow, Russia, 1983; Volume 40, pp. 131–170. (In Russian) [Google Scholar]
Moshkov, M. On depth of conditional tests for tables from closed classes. In Combinatorial-Algebraic and Probabilistic Methods of Discrete Analysis; Markov, A.A., Ed.; Gorky University Press: Gorky, Russia, 1989; pp. 78–86. (In Russian) [Google Scholar]
Moshkov, M. Time complexity of decision trees. In Transactions on Rough Sets III; Peters, J.F., Skowron, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3400, pp. 244–459. [Google Scholar]
Moshkov, M. Test theory and problems of machine learning. In Proceedings of the International School-Seminar on Discrete Mathematics and Mathematical Cybernetics, Ratmino, Russia, 31 May–3 June 2001; MAX Press: Moscow, Russia, 2001; pp. 6–10. [Google Scholar]
Pawlak, Z. Information systems theoretical foundations. Inf. Syst. 1981, 6, 205–218. [Google Scholar] [CrossRef] [Green Version]
Naiman, D.Q.; Wynn, H.P. Independence number and the complexity of families of sets. Discr. Math. 1996, 154, 203–216. [Google Scholar] [CrossRef] [Green Version]
Sauer, N. On the density of families of sets. J. Comb. Theory A 1972, 13, 145–147. [Google Scholar] [CrossRef] [Green Version]
Shelah, S. A combinatorial problem; stability and order for models and theories in infinitary languages. Pac. J. Math. 1972, 41, 241–261. [Google Scholar] [CrossRef]

Table 1. Possible indicator vectors of infinite binary information systems.

	$R$	$D$	$C$
1	0	0	0
2	0	1	0
3	0	1	1
4	1	1	0

Table 2. Summary of Theorems 1–3.

	$R$	$D$	$C$	$h_{U}^{(1)} (n)$	$h_{U}^{(2)} (n)$	$h_{U}^{(3)} (n)$
$V_{1}$	0	0	0	n	n	n
$V_{2}$	0	1	0	n	$Θ (log n)$	$Ω (\frac{log n}{log log n}), O (log n)$
$V_{3}$	0	1	1	n	$O (1)$	$O (1)$
$V_{4}$	1	1	0	$Θ (log n)$	$Θ (log n)$	$Ω (\frac{log n}{log log n}), O (log n)$

Table 3. All 3-tuples from the set

{0, 1}^{3}

.

Table 3. All 3-tuples from the set

{0, 1}^{3}

.

	$R$	$D$	$C$
1	0	0	0
2	0	1	0
3	0	1	1
4	1	1	0
5	1	0	0
6	0	0	1
7	1	0	1
8	1	1	1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Moshkov, M. On the Depth of Decision Trees with Hypotheses. Entropy 2022, 24, 116. https://doi.org/10.3390/e24010116

AMA Style

Moshkov M. On the Depth of Decision Trees with Hypotheses. Entropy. 2022; 24(1):116. https://doi.org/10.3390/e24010116

Chicago/Turabian Style

Moshkov, Mikhail. 2022. "On the Depth of Decision Trees with Hypotheses" Entropy 24, no. 1: 116. https://doi.org/10.3390/e24010116

APA Style

Moshkov, M. (2022). On the Depth of Decision Trees with Hypotheses. Entropy, 24(1), 116. https://doi.org/10.3390/e24010116

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Depth of Decision Trees with Hypotheses

Abstract

1. Introduction

2. Basic Notions

3. Main Results

4. Proof of Theorem 2

5. Proof of Theorem 3

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI