Decision Trees for Binary Subword-Closed Languages

Mikhail Moshkov

doi:10.3390/e25020349

Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia

Entropy2023, 25(2), 349;https://doi.org/10.3390/e25020349

This article belongs to the Section Complexity

Version Notes

Order Reprints

Abstract

In this paper, we study arbitrary subword-closed languages over the alphabet

{0, 1}

(binary subword-closed languages). For the set of words

L (n)

of the length n belonging to a binary subword-closed language L, we investigate the depth of the decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of the recognition problem, for a given word from

L (n)

, we should recognize it using queries, each of which, for some

i \in {1, \dots, n}

, returns the ith letter of the word. In the case of the membership problem, for a given word over the alphabet

{0, 1}

of the length n, we should recognize if it belongs to the set

L (n)

using the same queries. With the growth of n, the minimum depth of the decision trees solving the problem of recognition deterministically is either bounded from above by a constant or grows as a logarithm, or linearly. For other types of trees and problems (decision trees solving the problem of recognition nondeterministically and decision trees solving the membership problem deterministically and nondeterministically), with the growth of n, the minimum depth of the decision trees is either bounded from above by a constant or grows linearly. We study the joint behavior of the minimum depths of the considered four types of decision trees and describe five complexity classes of binary subword-closed languages.

Keywords:

subword-closed language; recognition problem; membership problem; deterministic decision tree; nondeterministic decision tree

1. Introduction

In this paper, we study arbitrary binary languages (languages over the alphabet

E = {0, 1}

) that are subword closed: if a word

w_{1} u_{1} w_{2} \dots w_{m} u_{m} w_{m + 1}

belongs to a language, then the word

u_{1} \dots u_{m}

belongs to this language. Subword-closed languages have attracted the attention of researchers in the field of formal languages for many years [1,2,3,4,5].

For the set of words

L (n)

of the length n belonging to a binary subword-closed language L, we investigate the depth of the decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of the recognition problem, for a given word from

L (n)

, we should recognize it using queries, each of which, for some

i \in {1, \dots, n}

, returns the ith letter of the word. In the case of the membership problem, for a given word over the alphabet E of the length n, we should recognize if it belongs to

L (n)

using the same queries.

For an arbitrary binary subword-closed language, with the growth of n, the minimum depth of the decision trees solving the problem of recognition deterministically is either bounded from above by a constant or grows as a logarithm, or linearly. For other types of trees and problems (decision trees solving the problem of recognition nondeterministically and decision trees solving the membership problem deterministically and nondeterministically), with the growth of n, the minimum depth of decision trees is either bounded from above by a constant or grows linearly. We study the joint behavior of the minimum depths of the considered four types of decision trees and describe five complexity classes of binary subword-closed languages.

In [6], the following results were announced without proof. For an arbitrary regular language, with the growth of n, (i) the minimum depth of the decision trees solving the problem of recognition deterministically is either bounded from above by a constant or grows as a logarithm, or linearly, and (ii) the minimum depth of the decision trees solving the problem of recognition nondeterministically is either bounded from above by a constant or grows linearly. Proofs for the case of decision trees solving the problem of recognition deterministically can be found in [7,8]. To apply the considered results to a given regular language, it is necessary to know a deterministic finite automaton (DFA) accepting this language.

Each subword-closed language over a finite alphabet is a regular language [3]. In this paper, we do not assume that binary subword-closed languages are given by DFAs. So, we cannot use the results from [6,7,8]. Instead of this, for binary subword-closed languages, we describe simple criteria for the behavior of the minimum depths of decision trees solving the problems of recognition and membership deterministically and nondeterministically.

This paper is a theoretical work related to the field of formal languages. It has no direct applications. In the theory of formal languages, various parameters of languages are studied, in particular the growth of the number of words of the language with the growth of the length of words and, for regular languages, the minimum number of states of the automaton accepting the language. For many years, the author has been introducing new parameters of languages into scientific use: the minimum depth of deterministic and nondeterministic decision trees for the recognition and membership problems related to the language [6,7,8,9]. The present paper continues this line of research.

There is now an extensive collection of methods for constructing decision trees. It includes (i) a variety of greedy heuristics based on measures of uncertainty, such as entropy and the Gini index [10,11,12], (ii) exact optimization algorithms based on dynamic programming, branch-and-bound search, SAT-based methods, etc., [13,14,15,16], and (iii) approximate optimization algorithms with bounds of accuracy that are applicable to obtain theoretical results about the complexity of decision trees [8,17].

In this paper, we found simple combinatorial parameters of binary subword-closed languages, which made it possible to obtain bounds on the depth of the decision trees without using the effective but rather complicated methods developed in the monographs [8,17].

The rest of this paper is organized as follows. In Section 2, we consider the main notions; in Section 3, the main results; in Section 4, the proofs; and in Section 5, short conclusions.

2. Main Notions

Let

ω = {0, 1, 2, \dots}

be the set of nonnegative integers and

E = {0, 1}

. By

E^{*}

, we denote the set of all finite words over the alphabet E, including the empty word

λ

. Any subset L of the set

E^{*}

is called a binary language. This language is called subword closed if, for any word

w_{1} u_{1} w_{2} \dots w_{m} u_{m} w_{m + 1}

belonging to L, the word

u_{1} \dots u_{m}

belongs to L, where

w_{i}

,

u_{j}

\in E^{*}

,

i = 1, \dots, m + 1

,

j = 1, \dots, m

. For any natural n, we denote by

L (n)

the set of words from L, for which length is equal to n. We consider two problems related to the set

L (n)

. The problem of recognition: for a given word from

L (n)

, we should recognize it using attributes (queries)

l_{1}^{n}, \dots, l_{n}^{n}

, where

l_{i}^{n}

,

i \in {1, \dots, n}

, is a function from

E^{*} (n)

to E such that

l_{i}^{n} (a_{1} \dots a_{n}) = a_{i}

for any word

a_{1} \dots a_{n} \in E^{*} (n)

. The problem of membership: for a given word from

E^{*} (n)

, we should recognize if this word belongs to the set

L (n)

using the same attributes. To solve these problems, we use decision trees over

L (n)

.

A decision tree over

L (n)

is a marked finite directed tree with the root, which has the following properties:

The root and the edges leaving the root are not labeled.
Each node, which is not the root or terminal node, is labeled with an attribute from the set ${l_{1}^{n}, \dots, l_{n}^{n}}$ .
Each edge leaving a node, which is not a root, is labeled with a number from E.

A decision tree over

L (n)

is called deterministic if it satisfies the following conditions:

Exactly one edge leaves the root.
For any node, which is not the root nor terminal node, the edges leaving this node are labeled with pairwise different numbers.

Let

Γ

be a decision tree over

L (n)

. A complete path in

Γ

is any sequence

ξ = v_{0}, e_{0}, \dots, v_{m}, e_{m}, v_{m + 1}

of nodes and edges of

Γ

such that

v_{0}

is the root,

v_{m + 1}

is a terminal node,

v_{i}

is the initial, and

v_{i + 1}

is the terminal node of the edge

e_{i}

for

i = 0, \dots, m

. We define a subset

E (n, ξ)

of the set

E^{*} (n)

in the following way: if

m = 0

, then

E (n, ξ) = E^{*} (n)

. Let

m > 0

, the attribute

l_{i_{j}}^{n}

be assigned to the node

v_{j}

and

b_{j}

be the number assigned to the edge

e_{j}

,

j = 1, \dots, m

. Then,

E (n, ξ) = {a_{1} \dots a_{n} \in E^{*} (n) : a_{i_{1}} = b_{1}, \dots, a_{i_{m}} = b_{m}} .

Let

L (n) \neq \emptyset

. We say that a decision tree

Γ

over

L (n)

solves the problem of recognition for

L (n)

nondeterministically if

Γ

satisfies the following conditions:

Each terminal node of $Γ$ is labeled with a word from $L (n)$ .
For any word $w \in L (n)$ , there exists a complete path $ξ$ in the tree $Γ$ such that $w \in E (n, ξ)$ .
For any word $w \in L (n)$ and for any complete path $ξ$ in the tree $Γ$ such that $w \in E (n, ξ)$ , the terminal node of the path $ξ$ is labeled with the word w.

We say that a decision tree

Γ

over

L (n)

solves the problem of recognition for

L (n)

deterministically if

Γ

is a deterministic decision tree, which solves the problem of recognition for

L (n)

nondeterministically.

We say that a decision tree

Γ

over

L (n)

solves the problem of membership for

L (n)

nondeterministically if

Γ

satisfies the following conditions:

Each terminal node of $Γ$ is labeled with a number from E.
For any word $w \in E^{*} (n)$ , there exists a complete path $ξ$ in the tree $Γ$ such that $w \in E (n, ξ)$ .
For any word $w \in E^{*} (n)$ and for any complete path $ξ$ in the tree $Γ$ such that $w \in E (n, ξ)$ , the terminal node of the path $ξ$ is labeled with the number 1 if $w \in L (n)$ and with the number 0, otherwise.

We say that a decision tree

Γ

over

L (n)

solves the problem of membership for

L (n)

deterministically if

Γ

is a deterministic decision tree which solves the problem of membership for

L (n)

nondeterministically.

Let

Γ

be a decision tree over

L (n)

. We denote by

h (Γ)

the maximum number of nodes in a complete path in

Γ

that are not the root nor terminal node. The value

h (Γ)

is called the depth of the decision tree

Γ

.

We denote by

h_{L}^{r a} (n)

(

h_{L}^{r d} (n)

) the minimum depth of a decision tree, which solves the problem of recognition for

L (n)

nondeterministically (deterministically). If

L (n) = \emptyset

, then

h_{L}^{r a} (n) = h_{L}^{r d} (n) = 0

.

We denote by

h_{L}^{m a} (n)

(

h_{L}^{m d} (n)

) the minimum depth of a decision tree, which solves the problem of membership for

L (n)

nondeterministically (deterministically). If

L (n) = \emptyset

, then

h_{L}^{m a} (n) = h_{L}^{m d} (n) = 0

.

3. Main Results

Let L be a binary subword-closed language. For any

a \in E

and

i \in ω

, we denote by

a^{i}

the word

a \dots a

of the length i (if

i = 0

, then

a^{i} = λ

). For any

a \in E

, let

\bar{a} = 1

if

a = 0

and

\bar{a} = 0

if

a = 1

.

We define the parameter

H o m (L)

of the language L, which is called the homogeneity dimension of the language L. If for each natural number m, there exists

a \in E

such that the word

a^{m} \bar{a} a^{m}

belongs to L, then

H o m (L) = \infty

. Otherwise,

H o m (L)

is the maximum number

m \in ω

such that there exists

a \in E

for which the word

a^{m} \bar{a} a^{m}

belongs to L. If

L = \emptyset

, then

H o m (L) = 0

.

We now define the parameter

H e t (L)

of the language L, which is called the heterogeneity dimension of the language L. If for each natural number m, there exists

a \in E

such that the word

a^{m} {\bar{a}}^{m}

belongs to L, then

H e t (L) = \infty

. Otherwise,

H e t (L)

is the maximum number

m \in ω

such that there exists

a \in E

for which the word

a^{m} {\bar{a}}^{m}

belongs to L. If

L = \emptyset

, then

H e t (L) = 0

.

Theorem 1.

Let L be a binary subword-closed language.

(a): If $H o m (L) = \infty$ , then $h_{L}^{r d} (n) = Θ (n)$ and $h_{L}^{r a} (n) = Θ (n)$ .
(b): If $H o m (L) < \infty$ and $H e t (L) = \infty$ , then $h_{L}^{r d} (n) = Θ (log n)$ and $h_{L}^{r a} (n) = O (1)$ .
(c): If $H o m (L) < \infty$ and $H e t (L) < \infty$ , then $h_{L}^{r d} (n) = O (1)$ and $h_{L}^{r a} (n) = O (1)$ .

Example 1.

Let us consider the binary subword-closed language

L_{0} = {1^{i} 0^{j} : i, j \in ω}

. One can show that

H o m (L_{0}) = 0

and

H e t (L_{0}) = \infty

. By Theorem 1,

h_{L_{0}}^{r d} (n) = Θ (log n)

and

h_{L_{0}}^{r a} (n) = O (1)

.

For a binary subword-closed language L, we denote by

L^{C}

its complementary language

E^{*} ∖ L

. The notation

| L | = \infty

means that L is an infinite language, and the notation

| L | < \infty

means that L is a finite language.

Theorem 2.

Let L be a binary subword-closed language.

(a): If $| L | = \infty$ and $L^{C} \neq \emptyset$ , then $h_{L}^{m d} (n) = Θ (n)$ and $h_{L}^{m a} (n) = Θ (n)$ .
(b): If $| L | < \infty$ or $L^{C} = \emptyset$ , then $h_{L}^{m d} (n) = O (1)$ and $h_{L}^{m a} (n) = O (1)$ .

Example 2.

One can show that, for the binary subword-closed language

L_{0} = {1^{i} 0^{j} : i, j \in ω}

, considered in Example 1,

| L_{0} | = \infty

and

L_{0}^{C} \neq \emptyset

. By Theorem 2,

h_{L_{0}}^{m d} (n) = Θ (n)

and

h_{L_{0}}^{m a} (n) = Θ (n)

.

To study all possible types of joint behavior of functions

h_{L}^{r d} (n)

,

h_{L}^{r a} (n)

,

h_{L}^{m d} (n)

, and

h_{L}^{m a} (n)

for binary subword-closed languages L, we consider five classes of languages

L_{1}, \dots, L_{5}

described in the columns 2–5 of Table 1. In particular,

L_{1}

consists of all binary subword-closed languages L with

H o m (L) = \infty

and

L^{C} \neq \emptyset

. It is easy to show that the complexity classes

L_{1}, \dots, L_{5}

are pairwise disjointed, and each binary subword-closed language belongs to one of these classes. The behavior of functions

h_{L}^{r d} (n)

,

h_{L}^{r a} (n)

,

h_{L}^{m d} (n)

, and

h_{L}^{m a} (n)

for languages from these classes is described in the last four columns of Table 1. For each class, the results considered in Table 1 follow from Theorems 1 and 2 and the following three remarks: (i) from the condition

H o m (L) = \infty

, it follows

| L | = \infty

, (ii) from the condition

H e t (L) = \infty

, it follows

| L | = \infty

, and (iii) from the condition

H o m (L) < \infty

, it follows

L^{C} \neq \emptyset

.

Table 1. Joint behavior of functions

h_{L}^{r d}

,

h_{L}^{r a}

,

h_{L}^{m d}

, and

h_{L}^{m a}

for binary subword-closed languages.

We now show that the classes

L_{1}, \dots, L_{5}

are nonempty. To this end, we consider the following five binary subword-closed languages:

\begin{matrix} L_{1} & = & {0^{i} 10^{j}, 0^{i} : i, j \in ω}, \\ L_{2} & = & E^{*}, \\ L_{3} & = & {0^{i} 1^{j} : i, j \in ω}, \\ L_{4} & = & {0^{i} : i \in ω}, \\ L_{5} & = & {0} . \end{matrix}

It is easy to see that

L_{i} \in L_{i}

for

i = 1, \dots, 5

.

4. Proofs of Theorems 1 and 2

In this section, we prove Theorems 1 and 2. First, we consider two auxiliary statements. For a word

w,

we denote by

| w |

its length.

Lemma 1.

Let L be a binary subword-closed language for which

H o m (L) < \infty

. Then, any word w from L can be represented in the form

w_{1} a^{i} w_{2} {\bar{a}}^{j} w_{3},

(1)

where

a \in E

,

i, j \in ω

, and

w_{1}

,

w_{2}

,

w_{3}

are words from

E^{*}

with length at most

2 H o m (L)

each.

Proof.

Denote

m =

H o m (L)

. Then, the words

0^{m + 1} 10^{m + 1}

and

1^{m + 1} 01^{m + 1}

do not belong to L. Let w be a word from L. Then, for any

a \in E

, any entry of the letter a in w has at most m

\bar{a}

s to the left of this entry (we call it l-entry of a) or at most m

\bar{a}

s to the right of this entry (we call it r-entry of a). Let

a \in E

. We say that w is (i) a-l-word if any entry of a in w is l-entry; (ii) a-r-word if any entry of a in w is r-entry; and (iii) a-b-word if w is not a-l-word and is not a-r-word. Let

c, d \in {l, r, b}

. We say that w is

c d

-word if w is 0-c-word and 1-d-word. There are nine possible pairs

c d

. We divide them into four groups: (a)

l l

and

r r

, (b)

l r

and

r l

, (c)

l b

,

r b

,

b l

, and

b r

, and (d)

b b

, and consider them separately. Let

w = a_{1} \dots a_{n} .

We assume that w contains both 0s and 1s. Otherwise, w can be represented in the form (1).

(a) Let w be

l l

-word. Let

a_{n} = 0

and

a_{i}

be the rightmost entry of 1 in w. Because w is

l l

-word, there are at most m 1s to the left of

a_{n}

and at most m 0s to the left of

a_{i}

. Denote

w_{1} = a_{1} \dots a_{i}

. Then,

w_{1}

contains at most m 0s and at most m 1s, i.e., the length of

w_{1}

is at most

2 m

. Moreover, to the right of

a_{i}

, there are only 0s. Thus,

w = w_{1} 0^{n - i}

, where

| w_{1} | = i \leq 2 m

, i.e., w can be represented in the form (1).

Let

a_{n} = 1

and

a_{i}

be the rightmost entry of 0 in w. Denote

w_{1} = a_{1} \dots a_{i}

. Then,

w_{1}

contains at most m 0s and at most m 1s, i.e.,

| w_{1} | \leq 2 m

. Moreover, to the right of

a_{i}

, there are only 1s. Thus,

w = w_{1} 1^{n - i}

, i.e., w can be represented in the form (1).

One can prove in a similar way that any

r r

-word can be represented in the form (1).

(b) Let w be

l r

-word,

a_{i}

be the rightmost entry of 0, and

a_{j}

be the leftmost entry of 1. Then, either

j = i + 1

or

j < i

. Let

j = i + 1

. Then,

w = 0^{i} 1^{n - i}

, i.e., w can be represented in the form (1). Let now

j < i

. Denote

w_{2} = a_{j} \dots a_{i}

. The word w has at most m 0s to the right of

a_{j}

and at most m 1s to the left of

a_{i}

. Therefore,

| w_{2} | \leq 2 m

and

w = 0^{j - 1} w_{2} 1^{n - i}

, i.e., w can be represented in the form (1).

One can prove in a similar way that any

r l

-word can be represented in the form (1).

(c) Let w be

l b

-word;

a_{i}

be the rightmost entry of 1 such that to the left of this entry, we have at most m 0s; and

a_{j}

be the next after

a_{i}

entry of 1. It is clear that to the right of

a_{j}

, there are at most m 0s,

j \geq i + 2

, and all letters

a_{i + 1}, \dots, a_{j - 1}

are equal to 0. Let

a_{k}

be the rightmost entry of 0. Then, to the left of

a_{k}

, there are at most m 1s. It is clear that either

k = j - 1

or

k > j

. Denote

w_{1} = a_{1} \dots a_{i}

. Then,

| w_{1} | \leq 2 m

. Let

k = j - 1

. In this case,

w = w_{1} 0^{j - i - 1} 1^{n - j + 1}

, i.e., w can be represented in the form (1). Let

k > j

. Denote

w_{2} = a_{j} \dots a_{k}

. Then,

| w_{2} | \leq 2 m

. We have

w = w_{1} 0^{j - i - 1} w_{2} 1^{n - k}

, i.e., w can be represented in the form (1).

One can prove in a similar way that any

r b

- or

b l

- or

b r

-word can be represented in the form (1).

(d) Let w be

b b

-word,

a_{i}

be the rightmost entry of 0 such that there are at most m 1s to the left of this entry, and

a_{j}

be the next after

a_{i}

entry of 0. Then, there are at most m 1s to the right of

a_{j}

,

j \geq i + 2

, and

w = a_{1} \dots a_{i} 1 \dots 1 a_{j} \dots a_{n}

. Denote

A = {1, \dots, i}

,

B = {i + 1, \dots, j - 1}

, and

C = {j, \dots, n}

. Let

a_{k}

be the rightmost entry of 1 such that there are at most m 0s to the left of this entry and

a_{l}

be the next after

a_{k}

entry of 1. Then, there are at most m 0s to the right of

a_{l}

,

l \geq k + 2

, and

w = a_{1} \dots a_{k} 0 \dots 0 a_{l} \dots a_{n}

.

There are four possible types of location of

a_{k}

and

a_{l}

: (i)

k \in A

and

l \in A

, (ii)

k \in A

and

l \in B

(the combination

k \in A

and

l \in C

is impossible because all letters with indices from B are 1s, but all letters between

a_{k}

and

a_{l}

are 0s), (iii)

k \in B

and

l \in C

(the combination

k \in B

and

l \in B

is impossible because all letters with indices from B are 1s, but all letters between

a_{k}

and

a_{l}

are 0s), and (iv)

k \in C

and

l \in C

. We now consider cases (i)–(iv) in detail.

(i) Let

k \in A

and

l \in A

. Then,

w = a_{1} \dots a_{k} 0 \dots 0 a_{l} \dots a_{i} 1 \dots 1 a_{j} \dots a_{n}

. Denote

w_{1} = a_{1} \dots a_{k}

,

w_{2} = a_{l} \dots a_{i}

, and

w_{3} = a_{j} \dots a_{n}

. The length of

w_{1}

is at most

2 m

because from the left of

a_{k}

, there are at most m 0s, and from the left of

a_{i}

, there are at most m 1s. We can prove in a similar way that

| w_{2} | \leq 2 m

and

| w_{3} | \leq 2 m

. Therefore, w can be represented in the form (1).

(ii) Let

k \in A

and

l \in B

. Then,

l = i + 1

and

w = a_{1} \dots a_{k} 0 \dots 0 a_{i} a_{i + 1} 1 \dots 1 a_{j} \dots a_{n},

where

a_{i} = 0

and

a_{i + 1} = 1

. Denote

w_{1} = a_{1} \dots a_{k}

and

w_{3} = a_{j} \dots a_{n}

. It is easy to show that

| w_{1} | \leq 2 m

and

| w_{3} | \leq 2 m

. Therefore, w can be represented in the form (1).

(iii) Let

k \in B

and

l \in C

. Then,

k = j - 1

and

w = a_{1} \dots a_{i} 1 \dots 1 a_{j - 1} a_{j} 0 \dots 0 a_{l} \dots a_{n},

where

a_{j - 1} = 1

and

a_{j} = 0

. Denote

w_{1} = a_{1} \dots a_{i}

and

w_{3} = a_{l} \dots a_{n}

. It is easy to show that

| w_{1} | \leq 2 m

and

| w_{3} | \leq 2 m

. Therefore, w can be represented in the form (1).

(iv) Let

k \in C

and

l \in C

. Then,

w = a_{1} \dots a_{i} 1 \dots 1 a_{j} \dots a_{k} 0 \dots 0 a_{l} \dots a_{n}

. Denote

w_{1} = a_{1} \dots a_{i}

,

w_{2} = a_{j} \dots a_{k}

, and

w_{3} = a_{l} \dots a_{n}

. It is easy to show that

| w_{1} | \leq 2 m

,

| w_{2} | \leq 2 m

, and

| w_{3} | \leq 2 m

. Therefore, w can be represented in the form (1). □

Lemma 2.

Let L be a binary subword-closed language for which

H o m (L) < \infty

and

H e t (L) < \infty

. Then, there exists natural p such that

| L (n) | \leq p

for any natural n.

Proof.

Denote

m = max (H o m (L), H e t (L))

. Then, the words

0^{m + 1} 1^{m + 1}

and

1^{m + 1} 0^{m + 1}

do not belong to L. Using Lemma 1, we obtain that each word w from L can be represented in the form

w_{1} a^{i} w_{2} {\bar{a}}^{j} w_{3}

, where

a \in E

, the length of

w_{k}

is at most

t = 2 m

for

k = 1, 2, 3

,

i, j \in ω

, and

i \leq m

or

j \leq m

. We now evaluate the number of such words, for which length is equal to n. Let

k \in {1, 2, 3}

. Then, the number of different words

w_{k}

is at most

2^{0} + 2^{1} + \dots + 2^{t} < 2^{t + 1}

. Let us assume that the words

w_{1}

,

w_{2}

, and

w_{3}

are fixed and

| w_{1} | + | w_{2} | + | w_{3} | \leq n

. Then, the number of different words

a^{i} {\bar{a}}^{j}

of the length

n - | w_{1} | - | w_{2} | - | w_{3} |

is at most

4 (m + 1)

because

i \leq m

or

j \leq m

. Thus, the number of words in

L (n)

is at most

p = 2^{3 t + 3} (2 t + 4)

. □

Proof of Theorem 1.

It is clear that

h_{L}^{r a} (n) \leq h_{L}^{r d} (n)

for any natural n.

(a) Let

H o m (L) = \infty

and n be a natural number. Then, there exists

a \in E

such that

a^{n} \bar{a} a^{n} \in L

. Therefore,

a^{n}, a^{i} \bar{a} a^{n - i - 1} \in L (n)

for

i = 0, \dots, n - 1

. Let

Γ

be a decision tree over

L (n)

, which solves the problem of recognition for

L (n)

nondeterministically and has the minimum depth

h_{L}^{r a} (n)

, and

ξ

be a complete path in

Γ

such that

a^{n} \in E (n, ξ)

. Let us assume that there is

i \in {0, \dots, n - 1}

such that the attribute

l_{i + 1}^{n}

is not attached to any node of

ξ

, which is not the root nor the terminal node. Then,

a^{i} \bar{a} a^{n - i - 1} \in E (n, ξ)

, which is impossible. Therefore,

h (Γ) \geq n

and

h_{L}^{r a} (n) \geq n

. It is easy to show that

h_{L}^{r d} (n) \leq n

. Thus,

h_{L}^{r a} (n) = h_{L}^{r d} (n) = n

for any natural n.

(b) Let

H o m (L) < \infty

and

H e t (L) = \infty

. By Lemma 1, each word from L can be represented in the form

w_{1} a^{i} w_{2} {\bar{a}}^{j} w_{3}

, where

a \in E

, the length of

w_{k}

is at most

t = 2 H o m (L)

for

k = 1, 2, 3

, and

i, j \in ω

. Note that either

w_{2} = λ

or

w_{2}

is a word of the kind

\bar{a} \dots a

.

Let n be a natural number such that

n \geq 10 t

. We now describe the work of a decision tree over

L (n)

, which solves the problem of recognition for

L (n)

deterministically. Let

w \in L (n)

. We represent this word as follows:

w = L_{1} L_{2} L_{3} A R_{3} R_{2} R_{1}

, where the length of each word

L_{1}, L_{2}, L_{3}, R_{3}, R_{2}, R_{1}

is equal to t. First, we recognize all letters in the words

L_{1}, L_{2}, R_{2}, R_{1}

using

4 t

queries (attributes). We now consider four cases.

(i) Let

L_{2} = R_{2} = a^{t}

for some

a \in E

. Then,

L_{3} A R_{3} = a^{n - 4 t}

, and the word w is recognized.

(ii) Let

L_{2} = a^{t}

for some

a \in E

, and

R_{2}

contains both 0 and 1. Then,

R_{2}

has an intersection with the word

w_{2}

. It is clear that

w_{2}

has no intersection with the word A and

L_{3} A = a^{n - 5 t}

. We recognize all letters of the word

R_{3}

. As a result, the word w will be recognized.

(iii) Let

R_{2} = a^{t}

for some

a \in E

, and

L_{2}

contains both 0 and 1. Then,

L_{2}

has an intersection with the word

w_{2}

. It is clear that

w_{2}

has no intersection with the word A and

A R_{3} = a^{n - 5 t}

. We recognize all letters of the word

L_{3}

. As a result, the word w will be recognized.

(iv) Let

L_{2} = a^{t}

and

R_{2} = {\bar{a}}^{t}

for some

a \in E

. Then, we need to recognize the position of the word

w_{2}

and the word

w_{2}

itself. Beginning with the left, we divide

L_{3} A R_{3}

and, probably, a prefix of

R_{2}

into blocks of the length t. As a result, we have

k \leq n / t

blocks. We recognize all letters in the block with the number

r = ⌈k / 2⌉

. If all letters in this block are equal to

\bar{a}

, then we apply the same procedure to the blocks with numbers

1, \dots, r - 1

. If all letters in this block are equal to a, then we apply the same procedure to the blocks with numbers

r + 1, \dots, k

. If the considered block contains both 0 and 1, then we recognize t letters before this block and t letters after this block and, as a result, recognize both the word

w_{2}

and its position. After each iteration, the number of blocks is at most one-half of the previous number of blocks. Let q be the whole number of iterations. Then, after the iteration

q - 1

, we have at least one unchecked block. Therefore,

k / 2^{q - 1} \geq 1

and

q \leq {log}_{2} k + 1

.

In case (i), to recognize the word w, we make

4 t

queries. In cases (ii) and (iii), we make

5 t

queries. In case (iv), we make at most

t {log}_{2} (n / t) + 7 t

queries. As a result, we have

h_{L}^{r d} (n) = O (log n)

.

Because

H e t (L) = \infty

, for any natural n, the set

L (n)

contains for some

a \in E

words

a^{i} {\bar{a}}^{n - i}

for

i = 0, \dots, n

. Then,

| L (n) | \geq n + 1

, and each decision tree

Γ

over

L (n)

solving the problem of recognition for

L (n)

deterministically has at least

n + 1

terminal nodes. One can show that the number of terminal nodes in

Γ

is at most

2^{h (Γ)}

. Therefore,

h (Γ) \geq {log}_{2} (n + 1)

. Thus,

h_{L}^{r d} (n) = Ω (log n)

and

h_{L}^{r d} (n) = Θ (log n)

.

We now prove that

h_{L}^{r a} (n) = O (1)

. To this end, it is enough to show that there is a natural number c such that, for each natural n and for each word

w \in L (n)

, there exists a subset

B_{w}

of the set of attributes

{l_{1}^{n}, \dots, l_{n}^{n}}

such that

|B_{w}| \leq c

and, for any word

u \in L (n)

different from w, there exists an attribute

l_{i}^{n} \in B_{w}

for which

l_{i}^{n} (w) \neq l_{i}^{n} (u)

. We now show that as c, we can use the number

7 t

. In case (i), in the capacity of the set

B_{w}

, we can choose all attributes corresponding to

4 t

letters from the subwords

L_{1}

,

L_{2}

,

R_{2}

, and

R_{1}

. In case (ii), we can choose all attributes corresponding to

5 t

letters from the subwords

L_{1}

,

L_{2}

,

R_{3}

,

R_{2}

, and

R_{1}

. In case (iii), we can choose all attributes corresponding to

5 t

letters from the subwords

L_{1}

,

L_{2}

,

L_{3}

,

R_{2}

, and

R_{1}

. In case (iv), in the capacity of the set

B_{w}

, we can choose all attributes corresponding to

4 t

letters from the subwords

L_{1}

,

L_{2}

,

R_{2}

, and

R_{1}

, and

3 t

letters from the block containing both 0 and 1 and from the blocks that are its left and right neighbors.

(c) Let

H o m (L) < \infty

and

H e t (L) < \infty

. By Lemma 2, there exists natural p such that

| L (n) | \leq p

for any natural n. Let n be a natural number. Then, the set

L (n)

contains at most p words, and there exists a subset B of the set of attributes

{l_{1}^{n}, \dots, l_{n}^{n}}

such that

|B| \leq p^{2}

and, for any two different words

u, w \in L (n)

, there exists an attribute

l_{i}^{n} \in B

for which

l_{i}^{n} (w) \neq l_{i}^{n} (u)

. It is easy to construct a decision tree over

L (n)

which solves the problem of recognition for

L (n)

deterministically by sequentially computing attributes from B. The depth of this tree is at most

p^{2}

. Therefore,

h_{L}^{r d} (n) = O (1)

and

h_{L}^{r a} (n) = O (1)

. □

Proof of Theorem 2.

It is clear that

h_{L}^{m a} (n) \leq h_{L}^{m d} (n)

for any natural n.

(a) Let

| L | = \infty

,

L^{C} \neq \emptyset

, and

w_{0}

be a word with the minimum length from

L^{C}

. Because

| L | = \infty

,

L (n) \neq \emptyset

for any natural n. Let n be a natural number such that

n > | w_{0} |

and

Γ

be a decision tree over

L (n)

that solves the problem of membership for

L (n)

nondeterministically and has the minimum depth. Let

w \in L (n)

and

ξ

be a complete path in

Γ

such that

w \in E (n, ξ)

. Then, the terminal node of

ξ

is labeled with the number 1. Let us assume that the number of nodes labeled with attributes in

ξ

is at most

n - | w_{0} |

. Then, we can change at most

| w_{0} |

letters in the word w such that the obtained word

w^{'}

will satisfy the following conditions:

w_{0}

is a subword of

w^{'}

and

w^{'} \in

E (n, ξ)

. However, it is impossible because in this case

w^{'} \notin L (n)

and

w^{'} \in

E (n, ξ)

, but the terminal node of

ξ

is labeled with the number 1. Therefore, the depth of

Γ

is greater than

n - | w_{0} |

. Thus,

h_{L}^{m a} (n) = Ω (n)

. It is easy to construct a decision tree over

L (n)

that solves the problem of membership for

L (n)

deterministically and has a depth equal to n. Therefore,

h_{L}^{m d} (n) = O (n)

. Thus,

h_{L}^{m d} (n) = Θ (n)

and

h_{L}^{m a} (n) = Θ (n)

.

(b) Let

| L | < \infty

. Then, there exists natural m such that

L (n) = \emptyset

for any natural

n \geq m

. Therefore, for each natural

n \geq m

,

h_{L}^{m d} (n) = 0

and

h_{L}^{m a} (n) = 0

.

Let

L^{C} = \emptyset

, n be a natural number, and

Γ

be a decision tree over

L (n)

which consists of the root, a terminal node labeled with

1,

and an edge that leaves the root and enters the terminal node. One can show that

Γ

solves the problem of membership for

L (n)

deterministically and has a depth equal to 0. Therefore,

h_{L}^{m d} (n) = 0

and

h_{L}^{m a} (n) = 0

. □

5. Conclusions

In this paper, we studied arbitrary binary subword-closed languages. For the set of words

L (n)

of the length n belonging to a binary subword-closed language L, we investigated the depth of the decision trees solving the recognition and the membership problems deterministically and nondeterministically. We proved that with the growth of n, the minimum depth of the decision trees solving the problem of recognition deterministically is either bounded from above by a constant or grows as a logarithm, or linearly. For other types of trees and problems, with the growth of n, the minimum depth of the decision trees is either bounded from above by a constant or grows linearly. We also studied the joint behavior of the minimum depths of the considered four types of decision trees and described five complexity classes of binary subword-closed languages.

In this paper, we did not assume that a binary subword-closed language is given by a deterministic finite automaton accepting this language. So, we could not use the parameters of the automaton for the study of decision tree complexity as it was done in [6,7,8,9]. Instead of this, for binary subword-closed languages, we described simple combinatorial criteria for the behavior of the minimum depths of the decision trees solving the problems of recognition and membership deterministically and nondeterministically.

In the future, we are planning to generalize this approach to some other classes of formal languages.

Funding

Research funded by the King Abdullah University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST).

Conflicts of Interest

The author declares no conflict of interest.

References

Atminas, A.; Lozin, V.V. Deciding Atomicity of Subword-Closed Languages. In Lecture Notes in Computer Science, Proceedings of the Developments in Language Theory-26th International Conference, DLT 2022, Tampa, FL, USA, 9–13 May 2022, Proceedings; Diekert, V., Volkov, M.V., Eds.; Springer: Berlin/Heidelberg, Germany, 2022; Volume 13257, pp. 69–77. [Google Scholar]
Brzozowski, J.A.; Jirásková, G.; Zou, C. Quotient Complexity of Closed Languages. Theory Comput. Syst. 2014, 54, 277–292. [Google Scholar] [CrossRef]
Haines, L.H. On Free Monoids Partially Ordered by Embedding. J. Comb. Theory 1969, 6, 94–98. [Google Scholar] [CrossRef]
Hospodár, M. Power, positive closure, and quotients on convex languages. Theor. Comput. Sci. 2021, 870, 53–74. [Google Scholar] [CrossRef]
Okhotin, A. On the State Complexity of Scattered Substrings and Superstrings. Fundam. Inform. 2010, 99, 325–338. [Google Scholar] [CrossRef]
Moshkov, M. Complexity of Deterministic and Nondeterministic Decision Trees for Regular Language Word Recognition. In Aristotle University of Thessaloniki, Proceedings of the 3rd International Conference Developments in Language Theory, Thessaloniki, Greece, 20–23 July 1997; Bozapalidis, S., Ed.; DLT: Toronto, ON, Canada, 1997; pp. 343–349. [Google Scholar]
Moshkov, M. Decision Trees for Regular Language Word Recognition. Fundam. Inform. 2000, 41, 449–461. [Google Scholar] [CrossRef]
Moshkov, M. Time Complexity of Decision Trees. In Trans. Rough Sets III; Peters, J.F., Skowron, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3400, pp. 244–459. [Google Scholar]
Moshkov, M. Decision trees for regular factorial languages. Array 2022, 15, 100203. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann: Burlington, MA, USA, 1993. [Google Scholar]
Rokach, L.; Maimon, O. Data Mining with Decision Trees-Theory and Applications; Series in Machine Perception and Artificial Intelligence; World Scientific: Singapore, 2007; Volume 69. [Google Scholar]
AbouEisha, H.; Amin, T.; Chikalov, I.; Hussain, S.; Moshkov, M. Extensions of Dynamic Programming for Combinatorial Optimization and Data Mining; Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2019; Volume 146. [Google Scholar]
Aglin, G.; Nijssen, S.; Schaus, P. Learning optimal decision trees using caching branch-and-bound search. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3146–3153. [Google Scholar]
Narodytska, N.; Ignatiev, A.; Pereira, F.; Marques-Silva, J. Learning optimal decision trees with SAT. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 1362–1368. [Google Scholar]
Verwer, S.; Zhang, Y. Learning optimal classification trees using a binary linear program formulation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, Washington, DC, USA, 7–14 February 2019; pp. 1625–1632. [Google Scholar]
Moshkov, M. Comparative Analysis of Deterministic and Nondeterministic Decision Trees; Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2020; Volume 179. [Google Scholar]

Table 1. Joint behavior of functions

h_{L}^{r d}

,

h_{L}^{r a}

,

h_{L}^{m d}

, and

h_{L}^{m a}

for binary subword-closed languages.

Table 1. Joint behavior of functions

h_{L}^{r d}

,

h_{L}^{r a}

,

h_{L}^{m d}

, and

h_{L}^{m a}

for binary subword-closed languages.

	$Hom (L)$	$Het (L)$	$\| L \|$	$L^{C}$	$h_{L}^{rd}$	$h_{L}^{ra}$	$h_{L}^{md}$	$h_{L}^{ma}$
$L_{1}$	$= \infty$			$\neq \emptyset$	$Θ (n)$	$Θ (n)$	$Θ (n)$	$Θ (n)$
$L_{2}$	$= \infty$			$= \emptyset$	$Θ (n)$	$Θ (n)$	$O (1)$	$O (1)$
$L_{3}$	$< \infty$	$= \infty$			$Θ (log n)$	$O (1)$	$Θ (n)$	$Θ (n)$
$L_{4}$	$< \infty$	$< \infty$	$= \infty$		$O (1)$	$O (1)$	$Θ (n)$	$Θ (n)$
$L_{5}$	$< \infty$	$< \infty$	$< \infty$		$O (1)$	$O (1)$	$O (1)$	$O (1)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Decision Trees for Binary Subword-Closed Languages

Abstract

1. Introduction

2. Main Notions

3. Main Results

4. Proofs of Theorems 1 and 2

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics