On the Depth of Decision Trees with Hypotheses

In this paper, based on the results of rough set theory, test theory, and exact learning, we investigate decision trees over infinite sets of binary attributes represented as infinite binary information systems. We define the notion of a problem over an information system and study three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. The considered three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses (an analog of equivalence queries from exact learning), and (iii) decision trees using both attributes and hypotheses. The first function has two possible types of behavior: logarithmic and linear (this result follows from more general results published by the author earlier). The second and the third functions have three possible types of behavior: constant, logarithmic, and linear (these results were published by the author earlier without proofs that are given in the present paper). Based on the obtained results, we divided the set of all infinite binary information systems into four complexity classes. In each class, the type of behavior for each of the considered three functions does not change.


Introduction
Decision trees are studied in different areas of computer science, in particular in exact learning [1], rough set theory [2][3][4], and test theory [5]. In some sense, these theories deal with dual objects: for example, membership queries from exact learning correspond to attributes from test theory and rough set theory. In contrast to test theory and rough set theory, in exact learning, besides membership queries, equivalence queries are also considered. We extend the model considered in test theory and rough set theory by adding the notion of a hypothesis that is an analog of equivalence query. Papers [6][7][8][9][10] are related mainly to the experimental study of decision trees with hypotheses. The present paper contains a theoretical study of the depth of decision trees with hypotheses.
An infinite binary information system is a pair U = (A, F) where A is an infinite set of elements and F is an infinite set of functions (attributes) from A to {0, 1}. A problem over U is given by a finite number of attributes f 1 , . . . , f n from F: for a ∈ A, we should find the tuple ( f 1 (a), . . . , f n (a)). To solve this problem, we can use decision trees with two types of queries. We can ask about the value of an attribute f i ∈ { f 1 , . . . , f n }. As a result, we obtain an answer of the kind f i (x) = δ where δ ∈ {0, 1}. We also can ask if a hypothesis f 1 (x) = δ 1 , . . . , f n (x) = δ n is true where δ 1 , . . . , δ n ∈ {0, 1}. Either we obtain the confirmation or a counterexample in the form f i (x) = ¬δ i . The depth of decision trees with hypotheses can be essentially less than the depth of decision trees using only attributes. As an example, we consider the problem of the computation of the disjunction x 1 ∨ · · · ∨ x n . The minimum depth of a decision tree solving this problem using only attributes x 1 , . . . , x n is equal to n. However, the minimum depth of a decision tree with hypotheses solving this problem is equal to one: it is enough to ask only about the hypothesis x 1 = 0, . . . , x n = 0. If it is true, then the considered disjunction is equal to zero. Otherwise, it is equal to one.
Based on the results of exact learning, rough set theory, and test theory [1,[11][12][13][14][15][16], we study for an arbitrary infinite binary information system three functions of the Shannon type that characterize the growth in the worth case of the minimum depth of a decision tree solving a problem with the growth of the number of attributes in the problem description. The considered three functions correspond to the following three cases: (i) Only attributes are used in decision trees; (ii) Only hypotheses are used in decision trees; (iii) Both attributes and hypotheses are used in decision trees.
We show that the first function has two possible types of behavior: logarithmic and linear. The second and third functions have three possible types of behavior: constant, logarithmic, and linear. Bounds for the case (i) can be derived from more general results obtained in [15,16]. Results related to the cases (ii) and (iii) were presented in the conference paper [17] without proofs. In the present paper, we consider complete proofs for the cases (ii) and (iii). We also investigate the join behavior of these three functions and describe four complexity classes of infinite binary information systems; these results are completely new.
The obtained results allow us to understand the difference of time complexity for conventional decision trees that use only queries based on one attribute each and for decision trees with hypotheses. Moreover, we know now which combinations of types of behavior of the three Shannon-type functions we can take under consideration of an arbitrary infinite binary system, and we know the criteria for each combination.
This paper consists of six sections. In Sections 2 and 3, we consider the basic notions and main results. Sections 4 and 5 contain proofs of the main results, and Section 6 gives a short conclusion.

Basic Notions
Let A be a set of elements and F be a set of functions from A to {0, 1}. Functions from F are called attributes, and the pair U = (A, F) is called a binary information system (this notion is close to the notion of information systems proposed by Pawlak [18]). If A and F are infinite sets, then the pair U = (A, F) is called an infinite binary information system.
A problem over U is an arbitrary n-tuple z = ( f 1 , . . . , f n ) where n ∈ N, N is the set of natural numbers {1, 2, . . .}, and f 1 , . . . , f n ∈ F. The problem z may be interpreted as a problem of searching for the tuple z(a) = ( f 1 (a), . . . , f n (a)) for an arbitrary a ∈ A. The number dim z = n is called the dimension of the problem z. Denote F(z) = { f 1 , . . . , f n }. We denote by P(U) the set of problems over U.
A system of equations over U is an arbitrary equation system of the kind: . . , g m ∈ F, and δ 1 , . . . , δ m ∈ {0, 1} (if m = 0, then the considered equation system is empty). This equation system is called a system of equations over z if g 1 , . . . , g m ∈ F(z). The considered equation system is called consistent (on A) if its set of solutions on A is nonempty. The set of solutions of the empty equation system coincides with A.
As algorithms for problem z solving, we consider decision trees with two types of queries. We can choose an attribute f i ∈ F(z) and ask about its value. This query has two possible answers: { f i (x) = 0} and { f i (x) = 1}. We can formulate a hypothesis over z in the form H = { f 1 (x) = δ 1 , . . . , f n (x) = δ n } where δ 1 , . . . , δ n ∈ {0, 1} and ask about this hypothesis. This query has n + 1 possible answers: H, { f 1 (x) = ¬δ 1 }, . . . , { f n (x) = ¬δ n } where ¬1 = 0 and ¬0 = 1. The first answer means that the hypothesis is true. Other answers are counterexamples.
A decision tree over z is a marked finite directed tree with the root in which: • Each terminal node is labeled with an n-tuple from the set {0, 1} n ; • Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set F(z) or with a hypothesis over z; • If a working node is labeled with an attribute f i from F(z), then there are two edges, which leave this node and are labeled with the systems of equations { f i (x) = 0} and { f i (x) = 1}, respectively; • If a working node is labeled with a hypothesis: . . , f n (x) = δ n } over z, then there are n + 1 edges, which leave this node and are labeled with the system of equations H, { f 1 (x) = ¬δ 1 }, . . . , { f n (x) = ¬δ n }, respectively. Let Γ be a decision tree over z. A complete path in Γ is an arbitrary directed path from the root to a terminal node in Γ. We now define an equation system S(ξ) over U associated with the complete path ξ. If there are no working nodes in ξ, then S(ξ) is the empty system. Otherwise, S(ξ) is the union of equation systems assigned to the edges of the path ξ. We denote by A(ξ) the set of solutions on A of the system of equations S(ξ) (if this system is empty, then its solution set is equal to A).
We say that a decision tree Γ over z solves the problem z relative to U if, for each element a ∈ A and for each complete path ξ in Γ such that a ∈ A(ξ), the terminal node of the path ξ is labeled with the tuple z(a).
We now consider an equivalent definition of a decision tree solving a problem. Denote by ∆ U (z) the set of tuples (δ 1 , . . . , δ n ) ∈ {0, 1} n such that the system of equations . . , f n }, and σ 1 , . . . , σ m ∈ {0, 1}. Denote: Let Γ be a decision tree over the problem z. We correspond to each complete path ξ in the tree Γ a word π(ξ) in the alphabet {( f i , δ) : f i ∈ F(z), δ ∈ {0, 1}}. If the equation system S(ξ) is empty, then π(ξ) is the empty word. If S(ξ) = { f i 1 (x) = σ 1 , . . . , f i m (x) = σ m }, then π(ξ) = ( f i 1 , σ 1 ) · · · ( f i m , σ m ). The decision tree Γ over z solves the problem z relative to U if, for each complete path ξ in Γ, the set ∆ U (z)π(ξ) contains at most one tuple, and if this set contains exactly one tuple, then the considered tuple is assigned to the terminal node of the path ξ.
As the time complexity of a decision tree Γ, we consider its depth h(Γ), that is the maximum number of working nodes in a complete path in the tree Γ.
Let z ∈ P(U). We denote by h (1) U (z) the minimum depth of a decision tree over z, which solves z relative to U and uses only attributes from F(z). We denote by h (2) U (z) the minimum depth of a decision tree over z, which solves z relative to U and uses only hypotheses over z. We denote by h U (z) the minimum depth of a decision tree over z, which solves z relative to U and uses both attributes from F(z) and hypotheses over z.
For i = 1, 2, 3, we define a function of the Shannon type h

Main Results
Let U = (A, F) be an infinite binary information system and r ∈ N. The information system U is called r-reduced if, for each consistent on A system of equations over U, there exists a subsystem of this system that has the same set of solutions and contains at most r equations. We denote by R the set of infinite binary information systems each of which is r-reduced for some r ∈ N.
The next theorem follows from the results obtained in [15], where we considered closed classes of test tables (decision tables). It also follows from the results obtained in [16], where we considered the weighted depth of decision trees. Theorem 1. Let U be an infinite binary information system. Then, the following statements hold: The empty set of attributes is independent by definition. We now define the independence dimension or I-dimension I(U) of the information system U (this notion is similar to the notion of the independence number of the family of sets considered by Naiman and Wynn in [19]). If, for each m ∈ N, the set F contains an independent subset of cardinality m, then I(U) = ∞. Otherwise, I(U) is the maximum cardinality of an independent subset of the set F. We denote by D the set of infinite binary information systems with a finite independence dimension.
Let U = (A, F) be a binary information system, which is not necessarily infinite, f ∈ F, and δ ∈ {0, 1}. Denote: We now define inductively the notion of a k-information system, k ∈ N ∪ {0}. The binary information system U is called a 0-information system if all attributes from F are constant on the set A. Let, for some k ∈ N ∪ {0}, the notion of a m-information system be defined for m = 0, . . . , k. The binary information system U is called a (k + 1)-information system if it is not a m-information system for m = 0, . . . , k and, for any f ∈ F, there exist numbers δ ∈ {0, 1} and m ∈ {0, . . . , k} such that the information system (A( f , δ), F) is a m-information system. It is easy to show by induction on k that if U = (A, F) is a k-information system, then U = (A , F), A ⊆ A, is a l-information system for some l ≤ k.
We denote by C the set of infinite binary information systems for each of which there exists k ∈ N such that the considered system is a k-information system. The following theorem was presented in [17] without proof.

Theorem 2.
Let U be an infinite binary information system. Then, the following statements hold: U (n) = n for any n ∈ N.
Let U be an infinite binary information system. We now consider the join behavior of U (n). It depends on the belonging of the information system U to the sets R, D, and C. We correspond to the information system U its indicator vector ind   Table 1. Each row of Table 1 is the indicator vector of some infinite binary information system.
For i = 1, 2, 3, 4, we denote by V i the class of all infinite binary information systems, for which the indicator vector coincides with the ith row of Table 1. Table 2 summarizes Theorems 1-3. The first column contains the name of complexity class V i . The next three columns describe the indicator vector of information systems from this class. The last U (n) for information systems from the class V i .

Proof of Theorem 2
We precede with the proof of Theorem 2 by two lemmas. Let d ∈ N. A d-complete tree over the information system U = (A, F) is a marked finite directed tree with the root in which: • Each terminal node is not labeled; • Each nonterminal node is labeled with an attribute f ∈ F. There are two edges leaving this node that are labeled with the systems of equations { f (x) = 0} and { f (x) = 1}, respectively; • The length of each complete path (the path from the root to a terminal node) is equal to d; • For each complete path ξ, the equation system S(ξ), which is the union of equation systems assigned to the edges of the path ξ, is consistent.
Let G be a d-complete tree over U and F(G) be the set of all attributes attached to the nonterminal nodes of the tree G. The number of nonterminal nodes in G is equal to The results mentioned in the following lemma are obtained by methods similar to those used by Littlestone [12], Maass and Turán [13], and Angluin [11]. Lemma 1. Let U = (A, F) be a binary information system, d ∈ N, G be a d-complete tree over U, and z be a problem over U such that F(G) ⊆ F(z). Then  U (z) ≥ 1. Let, for t ∈ N and for any natural d, 1 ≤ d ≤ t, the considered statement hold. Assume now that d = t + 1, G is a d-complete tree over U, z is a problem over U such that F(G) ⊆ F(z), and Γ is a decision tree over z with the minimum depth, which solves the problem z and uses only hypotheses. Let f be the attribute attached to the root of the tree G and H be the hypothesis attached to the root of the decision tree Γ. Then, there is an edge that leaves the root of Γ and is labeled with the equation system { f (x) = δ} where the equation f (x) = ¬δ belongs to the hypothesis H. This edge enters to the root of the subtree of Γ, which is denoted by Γ f . There is an edge that leaves the root of G and is labeled with the equation system { f (x) = δ}. This edge enters the root of the subtree of G, which is denoted by G δ . One can show that the decision tree Γ f solves the problem z relative to the information system U = (A( f , δ), F) and G δ is a t-complete tree over U . It is clear that F(G δ ) ⊆ F(z). Using the inductive hypothesis, we obtain h(Γ f ) ≥ t. Therefore, h(Γ) ≥ t + 1 = d and h . . . , f n ) and Γ be a decision tree over z with the minimum depth, which solves the problem z and uses both attributes and hypotheses. The d-complete tree G has 2 d complete paths ξ 1 , . . . , ξ 2 d . For i = 1, . . . , 2 d , we denote by a i a solution of the equation system S(ξ i ). Denote B = {a 1 , . . . , a 2 d }. We now show that the decision tree Γ contains a complete path, the length of which is at least d log 2 (2d) . We describe the process of this path construction beginning with the root of Γ.

Proof. (a) We prove the inequality h
Let the root of Γ be labeled with an attribute f i 0 . For δ ∈ {0, 1}, we denote by B δ the set of solutions on B of the equation system { f i 0 (x) = δ} and choose σ ∈ {0, 1} for which |B σ | = max{|B 0 |, |B 1 |}. It is clear that |B σ | ≥ |B| 2 ≥ |B| 2d . In the considered case, the beginning of the constructed path in Γ is the root of Γ, the edge that leaves the root and is labeled with the equation system { f i 0 (x) = σ}, and the node to which this edge enters.
Let as assume now that the root of Γ is labeled with a hypothesis H = { f 1 (x) = δ 1 , . . . , f n (x) = δ n }. We denote by ξ H the complete path in G for which the system of equations S(ξ H ) is a subsystem of H. Let the nonterminal nodes of the complete path ξ H be labeled with the attributes f i 1 , . . . , f i d . For j = 1, . . . , d, we denote by B j the set of solutions on B of the equation 2d . In the considered case, the beginning of the constructed path in Γ is the root of Γ, the edge that leaves the root and is labeled with the equation system { f i l (x) = ¬δ i l }, and the node to which this edge enters.
We continue the construction of the complete path in Γ in the same way such that after the tth query, we have at least |B| (2d) t elements from B. The process of path construction continues at least until |B| (2d) t ≤ 1, i.e., at least until log 2 |B| ≤ t log 2 (2d). Since |B| = 2 d , we have h(Γ) ≥ t ≥ d log 2 (2d) and h

Lemma 2.
Let U = (A, F) be a binary information system, k ∈ N ∪ {0}, and U not be an m-information system for m = 0, . . . , k. Then, there exists a (k + 1)-complete tree over U.
Proof. We prove the considered statement by induction on k. Let k = 0. In this case, U is not a 0-information system. Then, there exists an attribute f ∈ F, which is not constant on A. Using this attribute, it is easy to construct a 1-complete tree over U.
Let the considered statement hold for some k, k ≥ 0. We now show that it also holds for k + 1. Let U = (A, F) be a binary information system, which is not an m-information system for m = 1, . . . , k + 1. Then, there exists an attribute f ∈ F such that, for any δ ∈ {0, 1}, the information system U δ = (A( f , δ), F) is not an m-information system for m = 1, . . . , k. Using the inductive hypothesis, we conclude that, for any δ ∈ {0, 1}, there exists a (k + 1)-complete tree G δ over U δ . Denote by G a directed tree with root in which the root is labeled with the attribute f , and for any δ ∈ {0, 1}, there is an edge that leaves the root, is labeled with the equation system { f (x) = δ}, and enters the root of the tree G δ . One can show that the tree G is a (k + 2)-complete tree over U.

Proof of Theorem 2. It is clear that h
U (z) for any problem z over U. Therefore, h U (n) for any n ∈ N. (a) Let k ∈ N ∪ {0}. We now show by induction on k that, for each binary k-information system U (not necessarily infinite) for each problem z over U, the inequality h (2) U (z) ≤ k holds. Let U = (A, F) be a binary 0-information system and z be a problem over U. Since all attributes from F(z) are constant on A, the set ∆ U (z) contains only one tuple. Therefore, the decision tree containing only one node labeled with this tuple solves the problem z relative to U, and h (2) U (z) = 0. Let k ∈ N ∪ {0} and, for each m, 0 ≤ m ≤ k, the considered statement hold. Let us show that it holds for k + 1. Let U = (A, F) be a binary (k + 1)-information system and z = ( f 1 , . . . , f n ) be a problem over U. For i = 1, . . . , n, choose a number δ i ∈ {0, 1} such that the information system (A( f i , ¬δ i ), F) is an m i -information system where 1 ≤ m i ≤ k. Using the inductive hypothesis, we conclude that, for i = 1, . . . , n, there is a decision tree Γ i over z, which uses only hypotheses, solves the problem z over (A( f i , ¬δ i ), F), and has depth at most m i . We denote by Γ a decision tree in which the root is labeled with the hypothesis H = { f 1 (x) = δ 1 , . . . , f n (x) = δ n }, the edge leaving the root and labeled with H enters the terminal node labeled with the tuple (δ 1 , . . . , δ n ), and for i = 1, . . . , n, the edge leaving the root and labeled with { f i (x) = ¬δ i } enters the root of the tree Γ i . One can show that Γ solves the problem z relative to U and h(Γ) ≤ k + 1. Therefore, h (2) U (z) ≤ k + 1 for any problem z over U.
Let U ∈ C. Then, U is a k-information system for some natural k, and for each problem z over U, we have h . . , f n ) be an arbitrary problem over U. From Lemma 5.1 [16], it follows that |∆ U (z)| ≤ (4n) I(U) . The proof of this lemma is based on results similar to the ones obtained by Sauer [20] and Shelah [21]. We consider a decision tree Γ over z, which solves z relative to U and uses only hypotheses. This tree is constructed by the halving algorithm [1,12]. We describe the work of this tree for an arbitrary element a from A. Set ∆ = ∆ U (z). If |∆| = 1, then the only n-tuple from ∆ is the solution z(a) of the problem z for the element a. Let |∆| ≥ 2.
. . , f n (x) = δ n }. After this query, either the problem z is solved (if the answer is H) or we halve the number of objects in the set ∆ (if the answer is a counterexample { f i (x) = ¬δ i }). In the latter case, set ∆ = ∆ U (z)( f i , ¬δ i ). The decision tree Γ continues to work with the element a and the set of n-tuples ∆ in the same way. Let, during the work with the element a, the considered decision tree make q queries. After the (q − 1)th query, the number of remaining n-tuples in the set ∆ is at least two and at most (4n) I(U) /2 q−1 . Therefore, 2 q ≤ (4n) I(U) and q ≤ I(U) log 2 (4n). Therefore, during the processing of the element a, the decision tree Γ makes at most I(U) log 2 (4n) queries. Since a is an arbitrary element from A, the depth of Γ is at most I(U) log 2 (4n). Since z is an arbitrary problem over U, we obtain h Using Lemma 2 and the relation U / ∈ C, we obtain that, for any d ∈ N, there exists As a result, we have h U (n) = Ω(log n), and h (2) U (n) = Θ(log n). It is easy to show that the function x log 2 (2x) is nondecreasing for x ≥ 2. Therefore, h  . . , f n ) over U and a decision tree over z, which uses only hypotheses and solves the problem z over U in the following way. For a given element a ∈ A, the first query is about the hypothesis . . , f n (x) = 1}. If the answer is H 1 , then the problem z is solved for the element a. If, for some i ∈ {1, . . . , n}, the answer is { f i (x) = 0}, then the second query is about the hypothesis H 2 obtained from H 1 by replacing the equality f i (x) = 1 with the equality f i (x) = 0, etc. It is clear that after at most n queries, the problem z for the element a will be solved. Thus, h U (n) ≤ n for any n ∈ N. Let n ∈ N. Since U / ∈ D, there exist attributes f 1 , . . . , f n ∈ F such that, for any (δ 1 , . . . , δ n ) ∈ {0, 1} n , the equation system { f 1 (x) = δ 1 , . . . , f n (x) = δ n } is consistent on A. We now consider the problem z = ( f 1 , . . . , f n ) and an arbitrary decision tree Γ over z, which solves the problem z over U and uses both attributes and hypotheses. Let us show that h(Γ) ≥ n. If n = 1, then the considered inequality holds since |∆ U (z)| ≥ 2. Let n ≥ 2. It is easy to show that an equation system over z is inconsistent if and only if it contains equations f i (x) = 0 and f i (x) = 1 for some i ∈ {1, . . . , n}. For each node v of the decision tree Γ, we denote by S v the union of systems of equations attached to edges in the path from the root of Γ to v. A node v of Γ will be called consistent if the equation system S v is consistent.
We now construct a complete path ξ in the decision tree Γ, for which the nodes are consistent. We start from the root that is a consistent node. Let the path reach a consistent node v of Γ. If v is a terminal node, then the path ξ is constructed. Let v be a working node labeled with an attribute f i ∈ F(z). Then, there exists δ ∈ {0, 1} for which the system of equations S v ∪ { f i (x) = δ} is consistent. Then, the path ξ will pass through the edge leaving v and labeled with the system of equations { f i (x) = δ}. Let v be labeled with a hypothesis H = { f 1 (x) = δ 1 , . . . , f n (x) = δ n }. If there exists i ∈ {1, . . . , n} such that the system of equations S v ∪ { f i (x) = ¬δ} is consistent, then the path ξ will pass through the edge leaving v and labeled with the system of equations { f i (x) = ¬δ}. Otherwise, S v = H, and the path ξ will pass through the edge leaving v and labeled with the system of equations H.
Let all edges in the path ξ be labeled with systems of equations containing one equation each. Since all nodes of ξ are consistent, the equation system S(ξ) is consistent. We now show that S(ξ) contains at least n equations. Let us assume that this system contains less than n equations. Then, the set ∆ U (z)π(ξ) contains more than one n-tuple, which is impossible. Therefore, the length of the path ξ is at least n. Let there be edges in ξ, which are labeled with hypotheses, and the first edge in ξ labeled with a hypothesis H leaves the node v. Then, S v = H, and the length of ξ is at least n. Therefore, h(Γ) ≥ n, h

Proof of Theorem 3
First, we prove several auxiliary statements.

Proposition 1. R ⊆ D.
Proof. Let U ∈ R. By Theorem 1, h (1) U (n) = Θ(log n). Let us assume that U / ∈ D. Then, for any n ∈ N, there exists a problem z = ( f 1 , . . . , f n ) over U such that |∆ U (z)| = 2 n . Let Γ be a decision tree over z, which solves the problem z relative to U and uses only attributes. Then, Γ should have at least 2 n terminal nodes. One can show that the number of terminal nodes in the tree Γ is at most 2 h(Γ) . Then, 2 n ≤ 2 h(Γ) , h(Γ) ≥ n, and h U (z) ≥ n. Therefore, h (1) U (n) ≥ n for any n ∈ N, which is impossible. Thus, R ⊆ D.
U (n) = n for any n ∈ N, which is impossible. Therefore, C ⊆ D.
Proof. Assume the contrary: R ∩ C = ∅ and U = (A, F) ∈ R ∩ C. Let r, k ∈ N, U be an r-reduced information system and U be a k-information system. We now consider an arbitrary problem z = ( f 1 , . . . , f n ) over U and describe a decision tree Γ over z, which uses only attributes, solves the problem z over U, and has depth at most kr. For i = 1, . . . , n, let δ i be a number from {0, 1} such that (A( f i , ¬δ i ), F) is an m iinformation system with 0 ≤ m i < k. Let t be the maximum number from the set {1, . . . , n} such that the system of equations . . , f i p (x) = δ i p } of the system S, which has the same set of solutions as S and for which p ≤ r. For a given a ∈ A, the decision tree Γ computes sequentially values f i 1 (a), . . . , f i p (a).
If, for some q ∈ {1, . . . , p}, f i 1 (a) = δ i 1 , . . . , f i q−1 (a) = δ i q−1 , and f i q (a) = ¬δ i q , then the decision tree Γ continues to work with the problem z and the information system . . , f i p (a) = δ i p . If t = n, then (δ 1 , . . . , δ n ) is the solution of the problem z for the considered element a. Let t < n. Then, the decision tree Γ continues to work with the problem z and the information system U is an l -information system for some l ≤ m t+1 < k.
As a result, after the computation of the values of at most r attributes, we either solve the problem z or reduce the consideration of the problem z over the k-information system U to the consideration of the problem z over some l-information system where l < k. After the computation of the values of at most rk attributes, we solve the problem z since each problem over the 0-information system has exactly one possible solution. Therefore,    Table 1. Table 3 contains as rows all three-tuples from the set {0, 1} 3 . We now show that the rows with the numbers 5-8 cannot be indicator vectors of infinite binary information systems. Assume the contrary: there is i ∈ {5, 6, 7, 8} such that the row with the number i is the indicator vector of an infinite binary information system U. If i = 5, then U ∈ R and U / ∈ D, but this is impossible, since, by Proposition 1, R ⊆ D. If i = 6, then U ∈ C and U / ∈ D, but this is impossible, since, by Proposition 2, C ⊆ D. If i = 7, then U ∈ R and U / ∈ D, but this is impossible, since, by Proposition 1, R ⊆ D. If i = 8, then U ∈ R and U ∈ C, but this is impossible, since, by Proposition 3, R ∩ C = ∅. Therefore, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 3 with Numbers 1-4. Thus, it coincides with one of the rows of Table 1.  Define an infinite binary information system U 1 = (A 1 , F 1 ) as follows: A 1 = N and F 1 is the set of all functions from N to {0, 1}. Lemma 3. The information system U 1 belongs to the class V 1 .
For any i ∈ N, we define two functions p i : N → {0, 1} and l i : N → {0, 1}. Let j ∈ N. Then, p i (j) = 1 if and only if j = i and l i (j) = 1 if and only if j > i.
Define an infinite binary information system U 2 = (A 2 , F 2 ) as follows: A 2 = N and Proof. For n ∈ N, denote S n = {p 1 (x) = 0, . . . , p n (x) = 0}. One can show that the equation system S n is consistent and each proper subsystem of S n has a set of solutions different from the set of solutions of S n . Therefore, U 2 / ∈ R. Using attributes from the set {l i : i ∈ N}, we can construct a d-complete tree over U 2 for each d ∈ N. By Lemma 1 and Theorem 2, U 2 / ∈ C. One can show that I(U 2 ) = 1. Therefore, U 2 ∈ D. Thus, ind(U 2 ) = (0, 1, 0), i.e., U 2 ∈ V 2 .
Define an infinite binary information system U 3 = (A 3 , F 3 ) as follows: A 3 = N and F 3 = {p i : i ∈ N}. Lemma 5. The information system U 3 belongs to the class V 3 .
Define an infinite binary information system U 4 = (A 4 , F 4 ) as follows: A 4 = N and F 4 = {l i : i ∈ N}. Lemma 6. The information system U 4 belongs to the class V 4 .
Proof. Let us consider an arbitrary consistent system of equations S over U 4 . We now show that there is a subsystem of S, which has at most two equations and the same set of solutions as S. Let S contain both equations of the kind l i (x) = 1 and l j (x) = 0. Denote i 0 = max{i : l i (x) = 1 ∈ S} and j 0 = min{j : l j (x) = 0 ∈ S}. One can show that the system of equations S = {l i 0 (x) = 1, l j 0 (x) = 0} has the same set of solutions as S. The case when S contains for some δ ∈ {0, 1} only equations of the kind l p (x) = δ can be considered in a similar way. In this case, the equation system S contains only one equation. Therefore, the information system U 4 is 2-reduced and U 4 ∈ R. Using Proposition 4, we obtain ind(U 4 ) = (1, 1, 0), i.e., U 4 ∈ V 4 . Proof of Theorem 3. From Proposition 4, it follows that, for any infinite binary information system, its indicator vector coincides with one of the rows of Table 1. Using Lemmas 3-6, we conclude that each row of Table 1 is the indicator vector of some infinite binary information system.

Conclusions
Based on the results of exact learning, test theory, and rough set theory, for an arbitrary infinite binary information system, we studied three functions of the Shannon type, which characterize the dependence in the worst case of the minimum depth of a decision tree solving a problem on the number of attributes in the problem description. These three functions correspond to (i) decision trees using attributes, (ii) decision trees using hypotheses, and (iii) decision trees using both attributes and hypotheses. We described possible types of behavior for each of these three functions. We also studied the join behavior of these functions and distinguished four corresponding complexity classes of infinite binary information systems. In the future, we plan to translate the obtained results into the language of exact learning.
The problems studied in this paper allow us to confine ourselves to considering only the crisp (conventional) sets that are completely defined by attributes. However, in the future, when we investigate approximately defined problems or approximate decision trees, it will be necessary to work with rough sets given by their lower and upper approximations. This will require a wider range of rough set theory techniques than those used in the present paper.