Previous Article in Journal
Is Idempotence “More Fundamental” than Non-Contradiction?
Previous Article in Special Issue
Two Classes of Intensifiers in Mandarin Chinese: From Subjectivity to Evidentiality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Syntactic Learning over Tree Tiers

Department of Linguistics, Stony Brook University, 100 Nicolls Road, Stony Brook, NY 11794, USA
Submission received: 1 July 2025 / Revised: 30 March 2026 / Accepted: 14 April 2026 / Published: 6 May 2026
(This article belongs to the Special Issue Logic, Language, and Information)

Abstract

The class of tier-based strictly 2-local (TSL2) languages has been shown to be useful in modeling patterns across different linguistic domains. This paper discusses the learnability of the intersection closure of the TSL2 languages, multi-TSL2 (MTSL2). I present two learning algorithms, one that learns a relevant subclass of MTSL in polynomial time, and one that learns MTSL proper but requires potentially exponential time. Both algorithms generalize across tree-based and string-based data representations. I show that each algorithm correctly learns its target class from a limited sample of positive data, and discuss the tradeoffs between the two. The success of these algorithms delivers a key learning result for subregular linguistics, and demonstrates the utility of subregular language classes in developing a unified learning theory that spans different linguistic domains.

1. Introduction

Linguistics is driven by two central questions: What is the character of linguistic knowledge, and how is it acquired from limited data? The program of subregular linguistics (see [1,2] and references therein) addresses these questions by identifying formal classes of languages which are sufficiently expressive to account for phenomena across linguistic domains but still restrictive enough that they do not extend to “pathological” patterns which are universally absent in language. The goal of identifying such classes is twofold: Firstly, it sets an upper bound on the complexity of human language, lending insight into the computational nature of language in the brain. Second, relevant formal classes serve as possible hypothesis spaces for language learning. Without a well-defined hypothesis space (i.e., limitations on what concepts a learner will consider), learning is impossible. The hypothesis space is what guides when and how a learner can generalize from the data it has seen. In subregular linguistics, the goal is to find classes which can both express the kinds of patterns found in natural language and also be learned under an acquisition-like paradigm.
The subregular class of tier-based strictly local (TSL) languages [3] has recently emerged as particularly relevant for linguists. String-based TSL languages are largely able to capture the typology of phonotactic patterns seen in natural language, including local dependencies, long-distance harmony, and blocking [4]. Meanwhile, the parallel class of TSL languages over trees has recently been shown to capture a variety of patterns in syntax, including verb agreement [5], case assignment [6], and movement patterns [7]. Intuitively, TSL languages can capture dependencies that are immediately local, once a certain set of irrelevant elements is ignored.
Over string-based data representations, the TSL languages are known to be efficiently learnable from positive data [8,9]. Human languages, however, typically involve many such TSL patterns operating at once, and these may interact with each other. To capture multiple TSL patterns which are active at once, the more complex class of multi-TSL (MTSL) is needed. An MTSL language is simply the intersection of one or more TSL languages—i.e., several TSL patterns applying at once. This paper presents two novel algorithms for learning MTSL languages, generalized across both string-based and tree-based data representations. First and foremost, each learning algorithm serves as a proof by existence for the learnability of the language class it learns (MTSL and a relevant subclass). They show that the class can be learned, what kind of data and time is needed for learning, and what learning strategy is effective. Although the behavior of formal algorithms may offer insight into the learning strategies that underlie human language learning, this paper delivers a formal result, rather than an explicit theory of acquisition. These formal results are crucial for the advancement of the subregular linguistic program; a concept class must be learnable in order to be a good candidate for modeling human language.
The approach to learning taken in this paper comes out of the grammatical inference tradition [10,11,12], which is concerned with both theoretical and empirical results. These algorithms offer theoretical guarantees about their behavior, are explicitly interpretable, and learn formal grammars, which in this case have a clear linguistic interpretation. This is in contrast to the modern natural language processing tradition, which is largely dominated by neural networks and empirical results [13]. Connecting neural approaches with formal language theory is an area of active research [14,15,16], and an interesting direction of future work could be to compare empirical performance between neural networks and the algorithms presented here on linguistically relevant data. This paper, however, focuses solely on theoretical results.
In the domain of syntax, various approaches have been taken to understand the formal aspects of learning. Bod [17] discusses learning phrase structure grammars using probability and analogy. Additionally, there is a substantial body of work dedicated to the distributional learning of various subclasses of the context-free and mildly context-sensitive languages, including substitutable languages [18,19], congruential languages [20], and finite-kernel languages [21]. These approaches all learn directly from strings of symbols, with no structural information, and offer insight into the kinds of properties that can make context-free languages learnable under certain learning paradigms.
By contrast, the formal subregular approach taken in this paper is deterministic and generalizable across string-based and tree-based data representations. The learners I present here take tree structures directly as input and learn a grammar of constraints on these tree structures. Although this learning setup is thus abstracted from models of child language learning in which this tree structure is completely hidden from the learner, it is still relevant to study for two reasons: First, existing work on (M)TSL syntax largely uses the formalism of Minimalist grammars [22], in which structures can be represented as dependency trees, with immediate dominance encoding argument structure relationships. This makes them essentially a representation of an utterance’s semantics, something that could conceivably be available to the learner. Secondly, if we are to take (M)TSL seriously as a possible hypothesis space for syntax, we must start by showing its learnability before building up to a more end-to-end learning approach.
Within the subregular literature, work on learning MTSL is limited to an algorithm sketched by McMullin et al. [23] which suggests an approach for learning MTSL over strings. Section 3 of this paper builds on McMullin et al.’s [23] sketch and generalizes it to cover both string-based and tree-based data representations. The resulting algorithm, the Constraint-Unique Tier Inference Algorithm (CUTIA), is efficient in terms of both time and data: it computes a grammar in polynomial time with respect to the size of the input sample, and the representative sample required to identify a given language is polynomial in the size of the grammar for that language. However, although CUTIA is efficient and useful for many types of multi-tiered patterns, it imposes some additional restrictions on the types of MTSL languages that it can successfully learn. For this reason, Section 4 presents a second learner, the Multi Tier-based Strictly Local Bottom-Up Factor Inference Algorithm (MTSL-BUFIA), which is able to learn the full MTSL class. The tradeoff for this added power is that MTSL-BUFIA loses the polynomial time bound offered by CUTIA.
These two learners are not redundant but complementary: CUTIA offers speed at the expense of power, and MTSL-BUFIA offers power at the expense of (worst-case) speed. These algorithms offer formal learning results and draw connections across linguistic domains, linking tree-based syntactic patterns and string-based phonological patterns with one unified learning strategy.
The rest of this paper is organized as follows: Section 2 outlines the relational structures that will be used to model strings and trees, as well as the relevant relations and substructures that will be needed (Section 2.1) and introduces the formal definitions of the language classes being examined and the learning paradigm being used (Section 2.2). Section 3 introduces a learning algorithm for a subclass of MTSL and demonstrates its effectiveness on a linguistically-inspired example (Section 3.1) as well as proving the learning guarantees it offers (Section 3.2) and discussing its limitations (Section 3.3). Section 4 introduces a second algorithm which learns the class of MTSL proper (Section 4.1, Section 4.2 and Section 4.3) and proves that it exactly identifies the class in the limit from polynomial data (Section 4.4). Section 5 compares the two algorithms, discusses tradeoffs between them, and reflects on the insights they offer into the nature of the MTSL class and its learnability. Finally, Section 6 offers ideas for future directions and concludes.

2. Preliminaries

2.1. Trees, Strings, and Substructures

The algorithms presented in this paper generalize across both string-based and tree-based data representations. Trees are modeled here using Gorn domains [24], with string models being a special case of tree models.
Definition 1
(Trees). A tree t over an alphabet (set of symbols) Σ consists of a Gorn domain D of elements and a labeling function l : D Σ which maps each domain element to a label drawn from Σ. Elements in D are strings of natural numbers drawn from N , and D must additionally meet the following criteria:
1. 
ε D (unique root);
2. 
u N [ j N [ u j D u D ] ] (mother-of closure);
3. 
u N [ i < j N [ u j D u i D ] ] (left-sibling closure).
The set of all well-formed trees over an alphabet Σ is denoted T Σ .
Definition 2
(Tree relations). Let D be the domain of a well-formed tree t. Given any u , v D :
  • u v : = i N [ u i = v ] (immediate dominance);
  • u + v : = q N + [ u q = v ] (proper dominance);
  • u Y v : = i N , w N [ u = w i v = w ( i + 1 ) ] (immediate left sibling);
  • u v : = i < j N , w , q , p N [ u = w i q v = w j p ] (precedence).
These domain elements may also be referred to as Gorn addresses, and they essentially represent the path through the tree required to reach each element from the root. If u dominates v ( u v ), we say u is the parent of v, and v the child of u. A node is an address paired with its corresponding label. The Gorn address of the root is always ε , the empty string. The address of any other node is equal to the address of its parent node, concatenated with the number of left siblings it has. Figure 1 illustrates how Gorn addresses encode tree structures.
In this framework, strings can be modeled as a special case of trees. Namely, they are unary-branching trees in which each node may immediately dominate at most one other node. The immediate dominance relation then corresponds to the notion of string successor, with proper dominance corresponding to string precedence. Alternately, strings could be modeled as “shallow” trees in which nodes other than the root node may not have any children. In this case, immediate left siblinghood will correspond to string successor, and precedence to string precedence. Either approach ensures that all strings are also trees, meaning that any results obtained for tree models carry over trivially to string models.
Definition 3
(Tier projection). Let T Σ be a tier alphabet. The T-projection of a tree with domain D and labeling function l consists of a set of licensed tier elements D T and the tier-relativized relations T (immediate tier-dominance) and Y T (immediate tier-siblinghood). These are defined as follows:
  • D T : = { a D | l ( a ) T } ;
  • u T v : = ( u , v D T ) ( u + v ) ( w D T [ u + w w + v ] ) ;
  • u Y T v : = ( u , v D T ) ( u v ) ( w D T [ w T u w T v q D T [ w T q u q q v ] ] ) .
Notably, the relations between any two tier elements (i.e., nodes whose labels are both members of T) are preserved in the tier-relativized counterparts of those relations. This means that, when T = Σ , T reduces to ◁ and Y T reduces to Y T . Figure 2 replicates the example from Figure 1 projected onto the { b , c } tier.
For mathematical convenience, we additionally stipulate that there are distinguished root and leaf markers, , Σ such that v D T [ T v u D T [ u T v ] ] and v D T [ v T u D T [ v T u ] ] . These make it easier to talk about the edges of the tree.
Definition 4
(Two-Factor). Given a tree t with domain D and labeling function l and a relation R { , Y } x R y is an R 2-factor of t iff there are nodes u , v D such that l ( u ) = x , l ( v ) = y , and u R v . Additionally, given a tier T Σ , x R T y is an R 2-factor of t on tier T iff there are nodes u , v D such that l ( u ) = x , l ( v ) = y , and u R T v (note that tier relations are defined such that this can only hold if x and y are both tier symbols).
Two-factors, then, are simply size 2 substructures of a given tree. For any tree t, 2 f a c ( t ) is defined as the set of all 2-factors present in t, and 2 f a c T ( t ) is the set of all 2-factors of t on tier T.
Definition 5
(2-paths). Given a tree t with domain D and labeling function l:
  • For every pair of elements u , v D such that u + v , there is a dominance 2-path f , V in t where f is a dominance 2-factor l ( u ) l ( y ) and V is the set of symbols which prevent u and v from forming this 2-factor: { l ( w ) | w D u + w w + v } .
  • For every pair of elements u , v D such that u v , there is a sibling 2-path f , V in t where f is a sibling 2-factor l ( u ) Y l ( v ) and V is the set of symbols which prevent u and v from forming this 2-factor: { l ( w ) | w D ( ( u w w v ) ( w + u ¬ ( w + v ) ) ( w + v ¬ ( w + u ) ) ) } .
We denote the set of all these 2-paths (both dominance and sibling) as 2 p a t h s ( t ) .
Two-paths were initially introduced by Jardine and Heinz [25] and represent the idea of “intervention” in potential 2-factors. A 2-path consists of a (possible) 2-factor in a given structure paired with the set of symbols that prevent that 2-factor from being present. In the case where a given 2-factor is present, there will be a corresponding 2-path with an empty intervener set.
Figure 3 demonstrates finding 2-factors and 2-paths in an example tree. A 2-path f , V in a tree t can also be referred to as a 2-path of or for f in t. Notably, the same (potential) 2-factor can appear in multiple 2-paths, since multiple pairs of elements with the same labels may stand in the same relation (dominance or precedence). For example, the tree shown in Figure 3 contains two 2-paths for the 2-factor a a : one with the empty intervener set (corresponding to the root a node and its rightmost child) and one with the intervener set { b } (corresponding to the root a node and the leftmost foot node a). Additionally, the 2-paths for a given language L, denoted 2 p a t h s ( L ) , consist of all 2-paths in all trees in L: t L 2 p a t h s ( t ) .

2.2. Languages, Grammars, and Learning

A tree language L is a subset of T Σ . A class of tree languages, then, is some set of tree languages L P ( T Σ ) . A grammar G is a finite representation of a (possibly infinite) tree language L, which must be able to classify any tree t T Σ as a member or nonmember of L. In this case, we say that G generates L, and that L is the language of G.
The task of learning is about discovering the correct grammar G for the target language L. Both of the algorithms presented in this paper learn specific classes of tree languages under Gold’s [26] exact identification in the limit paradigm. Under this paradigm, an algorithm is considered able to learn a class L of languages iff for any language L L and any positive presentation of that language, the algorithm converges within a finite amount of input data on a grammar G which generates L. A positive presentation is simply an ordered sequence of elements from L such that every element appears at some point in the sequence, which is fed to the algorithm element by element so that after n units of time the algorithm has access to the first n items in the sequence.
Notably, this framework assumes that the learning data consists of positive examples only. The learner receives no counterexamples and cannot ask questions. These qualities are desirable for learning algorithms for natural-language patterns, since overwhelming evidence from the literature on child language acquisition suggests that children learn from positive input only, discarding and ignoring non-examples, corrections, and explicit explanation.
In addition to meeting the requirements for Gold’s [26] exact identification in the limit, the algorithms in this paper also offer guarantees about the amount of time and/or data they will need. This is assessed using De La Higuera’s [27] extension of Gold’s paradigm: exact identification in polynomial time and data. Exact identification in polynomial time means that the algorithm must return a grammar which generates the target language in a number of steps which is polynomial in the size of the input sample. The polynomial data requirement means that the information which distinguishes a target language from other potential targets, the characteristic sample, must be present in the input data, and that this sample must be polynomial in the size of the target grammar. Both of these requirements make the learning paradigm more similar to human language acquisition, which happens in just a few years and with a relatively small amount of data.
The algorithms in this paper are aimed at learning the multi-tier-based strictly local (MTSL) class of tree languages, which has been shown to have linguistic relevance in multiple domains. The following definitions formalize MTSL and some related subclasses.
Definition 6
(TSL2). A language L over an alphabet Σ is tier-based strictly 2-local (TSL2) if there is some tier T Σ and some set of 2-factors B 2 f a c ( T Σ ) such that L = { t T Σ | 2 f a c ( t ) B = } . The pair T , B forms a grammar G for L.
The TSL(2) languages were introduced by Heinz et al. [28] in order to describe the types of constraints found in phonology, and have since been used to model patterns across linguistic domains. The definition of TSL2 used in this paper parallels this string-based version. It is worth noting that this definition is different from the one commonly used in the subregular syntax literature, introduced by Graf and Kostyszyn [29]. Graf and Kostyszyn’s version of tree-TSL involves a constraint function for each symbol which restricts the set of strings produced by concatenating the labels of the tier-children of each node bearing the relevant symbol. This constraint function is symbol-specific and can be arbitrarily complex, allowing for a high degree of flexibility. In fact, the constraint function TSL2 languages are a proper superset of the TSL2 languages as formulated here. However, it is worth noting that many of the analyses given in the literature over the former also hold over the latter. Additionally, a TSL2 grammar can be thought of as a basis for a set of constraint function TSL2 grammars: it provides a foundational set of tier projections and constraints, over which the constraint functions for each symbol can be learned using an appropriate string-based algorithm. In this paper, TSL2 will be used solely to refer to the language class given in Definition 6, and the constraint function-based version will not be discussed further.
Since many TSL patterns may be present simultaneously in human language, it is relevant to discuss the intersection closure of the TSL languages, the MTSL languages.
Definition 7
(MTSL2). A language L over an alphabet Σ is multi tier-based strictly 2-local (MTSL2) if there is some set G of 2-factor, tier pairs G = { f 1 , T 1 , f 2 , T 2 , f n , T n } , with each f i 2 f a c ( T Σ ) and each T i Σ , such that: L = { t T Σ | f i , T i G [ f i 2 f a c T i ( t ) ] } .
Definition 8
(CUTI-MTSL2). A language L over an alphabet Σ is constraint-unique multi tier-based strictly 2-local (CUTI-MTSL2) if L is an MTSL language with grammar G = { f 1 , T 1 , f 2 , T 2 , f n , T n } such that:
  • For any 2-factor f i ( 2 f a c ( T Σ ) 2 f a c ( L ) ) , there is exactly one tier T i such that f i , T i G (constraint uniqueness).
  • For each pair f i , T i G it must be the case that i ) each symbol σ T which is not part of f i is attested in an intervener set for f i in 2 p a t h s ( L ) , and ii each smallest intervener set (for f i in 2 p a t h s ( L ) ) in which σ appears contains only other elements in T, which do not themselves appear in any smaller intervener set (for f i in 2 p a t h s ( L ) ) (tier-element independence).
CUTI-MTSL is a proper subclass of MTSL, and is the class learned by the Constraint-Unique Tier Inference Algorithm (CUTIA) discussed in Section 3. The two additional restrictions introduced by this class are constraint uniqueness, meaning that a given 2-factor can be banned on at most one tier, and tier-element independence, meaning that all tier elements for each tier on which there is a constraint must be able to freely occur either with or without all non-tier elements. In principle, these two restrictions could be independently varied, creating a paradigm of four possible MTSL variants rather than two. However, the work in this paper is focused on learning only the MTSL proper and the CUTI-MTSL.

3. CUTIA

The first key result of this paper is a learning algorithm for the class of CUTI-MTSL tree languages. The Constraint-Unique Tier Inference Algorithm (CUTIA), defined in Algorithm 1, builds on prior work by McMullin et al. [23] on learning restrictions on multiple tiers over strings. CUTIA, following McMullin et al. [23], leverages the idea that any 2-factor which is absent from the input sample must be missing for a principled reason: namely, it must be banned on some tier. The notion of 2-paths is used to systematically construct the most general such tier consistent with the data.
Algorithm 1 CUTIA
Data: Positive sample I
Result: Grammar G, a set of 〈2-factor, forbidding tier〉 pairs.

G := {}
B := 2 f a c ( T Σ ) 2 f a c ( I )
foreach f : = ρ 1 R ρ 2 B :
      T : = { ρ 1 , ρ 2 }
      S : = { V for   x , V 2 p a t h s ( I )   where   x = f }
      for cardinality c in { 1 , , | Σ | } :
          V c : = { s S where   | s | = c }
          N : = { }
          foreach  V V c :
              if: σ [ σ V σ T ]
              then: N : = N V
          T : = T N
      G : = G { ρ 1 R ρ 2 , T }
return G

3.1. Algorithm

CUTIA proceeds by iterating through all possible 2-factors which are absent in the input data. For each of these, it initializes a tier with only the two (or sometimes one) symbols that make up the 2-factor itself. Then, the intervener sets from all of the 2-paths for the 2-factor are considered in batches by ascending cardinality. In each batch, all elements of each set which does not contain any tier elements are added (simultaneously) to the tier. Once all the intervener sets have been considered, the 2-factor, tier pair is added to the grammar.
To understand this algorithm’s operation, let us look at how it learns a simple example language. This example language will be the intersection of two linguistically motivated TSL2 languages. Let Σ : = { l , m , c , x } . Graf [1] illustrates how the syntactic operation of movement can be modeled using a TSL tree language. Importantly, there must be a 1-to-1 correspondence between moving elements and the landing sites they will move to, and landing sites must come above movers. This can be modeled formally on the movement tier T 1 : = { l , m } : each mover m must be dominated by a landing site l, and each landing site must dominate one mover. This can be handled with the following constraints over T 1 : { T 1 m , m T 1 m , l T 1 , m Y T 1 m , l T 1 } (An astute reader might notice that this does not quite enforce the notion that a landing site must have exactly one mover tier-child. This is because the complete syntactic pattern requires Graf and Kostyszyn [29]’s more complex definition of tree TSL. Since this example is primarily for illustrative purposes, we will take the grammar given here to be a close-enough approximation and not dwell further on this difference.). As mentioned in Section 2.2, the algorithms in this paper could be combined with an appropriate string-language learner to precisely compute the requisite constraint functions to achieve the “exactly one” pattern.
The second TSL pattern is inspired by the linguistic phenomenon of extraction morphology, analyzed as TSL by Graf [30]. This phenomenon involves certain elements (e.g., complementizers) that appear with a special form only in the case where they fall along a movement path, i.e., between a mover and its landing site.
In Irish, for example, there are two different forms of the complementizer element which introduce a subordinate clause, which alternate depending on whether a wh-question element (like who, what, or why) has moved out of that clause (and into a higher position above the complementizer). McCloskey [31] illustrates this with the sentences given in Example (1). Here, the wh-element meaning who has moved out of the embedded clause in order to form a question, passing the sites of the two complementizers along the way. In this circumstance, the standard Irish complementizer go is ungrammatical, and the special form a must be used instead.
(1)
Cé  a/*go   dúradh léithi  a/*go    cheannódh é?
Who C-wh/*C was-said with-her C-wh/*C would-buy it
‘Who was she told would buy it?’
In our toy alphabet, c represents the special complementizer form which can only occur on a movement path. This pattern can be enforced with constraints on a different tier, T 2 : = { l , m , c } . The necessary constraints over T 2 are: { T 2 c , m T 2 c , c T 2 , c Y T 2 c , m Y T 2 c , c Y T 2 m } .
The target language for CUTIA will be the intersection of these two TSL languages. Figure 4 illustrates a sample of trees which are in this language, and will constitute the input data for CUTIA. The algorithm proceeds by first determining which 2-factors are absent in the input, as shown in Table 1. Then, it will calculate the tier for each unattested 2-factor σ 1 R σ 2 , beginning with an initial tier of T = { σ 1 , σ 2 } . Table 2 shows the evaluation of all intervener sets for each 2-factor, ordered by size. All elements of intervener sets which do not contain any existing tier elements are inserted (in batches by intervener set size, which will be essential to establishing polynomial sample size in Section 3.2) into T. At each step, the constraint and the tier will be added to G. The final grammar consists of all 2-factors from Table 2 paired with their corresponding final tier.
In the final grammar, the { m , l } constraints ( m , m m , and m Y m ) ensure that each mover must be dominated by a landing site, and two movers cannot be consecutive tier-children of the same landing site (two movers for a single target). In addition, the constraints on the { m , l , a } tier ensure that an agreeing head can only appear below a landing site, but never above one unless a mover intervenes. The sibling constraints broaden the restrictions against multiple movers for a single target by preventing any adjacent tier-sibling movement path elements (m or c).
In fact, these are exactly the constraints we expected to find on T 1 and T 2 , with just one small difference: a few constraints appear on the { c , l } tier ( c , c Y c ) or { c , m } tier ( c ), rather than the { l , m , c } tier ( T 2 ). This is because of the algorithm’s preference for finding the most general possible tier over which to enforce constraints. Any 2-factor that is banned on the { c , l } (or { c , m } ) tier will necessarily also be absent from any superset tier, and because of the active constraints on the movement tier, each of these 2-factors is indeed impossible on the smaller tier it is banned on. So the grammar output here does in fact generate exactly the target language.

3.2. Identification in Polynomial Time and Data

Here, I establish that CUTIA constitutes a conclusive learnability result by proving that it identifies the class of tree CUTI-MTSL in the limit from positive data in the sense of Gold [26], with polynomial bounds on time and data as  De La Higuera [27]. The proof relies on establishing a representative sample and demonstrating that it characterizes the language with respect to CUTIA.
Definition 9
(Representative Sample). For a CUTI-MTSL language L over alphabet Σ whose grammar is G = { f 0 , T 0 , f 1 , T 1 , f n , T n } , a set D of trees is a representative sample iff all of the following hold:
1. 
D L ;
2. 
x 2 f a c ( T Σ ) [ x { f : f , T G } x 2 f a c ( D ) ] ;
3. 
f , T G [ σ T s y m b o l s ( f ) [ f , V 2 p a t h s ( D ) [ σ V ¬ f , V 2 p a t h s ( L ) [ σ V | V | < | V | ] ] ] ] .
This essentially states that a representative sample must contain all 2-factors which are not banned on any tier, and that for each banned 2-factor, each symbol of the tier on which it is banned (except those already present in the 2-factor) must be attested as an intervener for that 2-factor as part of a smallest intervener set that it could be part of.
Lemma 1.
For any CUTI-MTSL language L, there is a representative sample D for L which is polynomial in the size of G for any grammar G of L.
Proof. 
The first condition requires that any 2-factor which is not banned on some tier is present in D. There are at most 2 · | Σ | 2 such 2-factors, each of which can be constructed with at most 3 nodes (inserting a shared parent node for sibling 2-factors), giving a total size of 6 · | Σ | 2 .
The second condition requires that for each banned 2-factor, each tier symbol from the tier on which it is banned must be attested as an intervener in the smallest possible intervener set it can be contained in. Since there are at most 2 · | Σ | 2 banned 2-factors and | Σ | symbols on any tier, this condition imposes an additional 6 · | Σ | 3 space requirement.
Taken together, the space complexity of D is O ( | Σ | 3 ) , and therefore polynomial in both the size of the alphabet and the size of the grammar (assuming all alphabet symbols are used in the grammar).    □
Lemma 2.
Given an input sample I of size n, CUTIA runs in polynomial time in the size of n.
Proof. 
The algorithm will first need to compute 2-factors and 2-paths for I. As discussed in Section 2.1, 2-paths can be computed using Gorn addresses. Gorn addresses can be mapped to nodes in a single tree traversal (i.e., linear time in the number of nodes). Each pair of nodes must be considered both as possible dominance 2-factors and possible sibling 2-factors, and all other nodes must be considered as possible interveners. Membership of each node in the intervener set can be decided in linear time at worst, by comparing each symbol in the Gorn addresses of the relevant nodes (the maximum length of a Gorn address is the size of the tree). Thus, the time complexity of computing 2-paths is O ( n 4 ) .
Once the 2-paths have been computed, finding the attested 2-factors is trivial, since these are simply the 2-factors from any 2-path where the intervener set is the empty set.
Then, the algorithm will execute the outer for-loop. This loop will run at most 2 · | Σ | 2 times. Σ itself is bounded by the size of the input (in the worst case, each node will have a different label), and so we will just use n going forward. The two inner loops together iterate through all intervener sets (just once, even though these are two loops. The middle loop is just handling bucketing by size.), of which there are at most n 2 (one for each pair of nodes). Thus, the for-loop runs in O ( n 4 ) time.
In total, the algorithm runs in O ( n 4 + n 4 ) = O ( n 4 ) time.    □
Lemma 3.
Given any finite input sample I drawn from a CUTI-MTSL language L with grammar G = { f 0 , T 0 , f 1 , T 1 , f i , T i } which itself contains a representative sample D for L (i.e., D I L ), CUTIA will return a grammar G = G .
Proof. 
Consider the set B of all the 2-factors that are banned on any tier in G. Since I contains only valid trees in L, these 2-factors must all be absent from I. By definition of a representative sample, all other possible 2-factors over Σ must be present in D, and therefore also in I. Thus, the set that will be iterated over in the outer loop is B = B . Each 2-factor f i B will be associated with some tier T i . Since B = B , f i must also be part of some pair f i , T j G . T i will be determined by finding all intervener sets for f i , and considering them in groups from the smallest cardinality to the largest. T i will be the union of all the intervener sets for which no element of that set shows up in a smaller intervener set. By definition of a representative sample, each element of T j must show up in the smallest possible intervener set. Thus, each element of T j will be added to T i . Furthermore, no matter what data is in I, no symbol that is not part of T j can be attested in an intervener set without a member of T j (otherwise this would be an illicit construction). Since all elements in T j are guaranteed to show up in intervener sets without any non-tier elements (which are consequently smaller), they will have been added to T i already by the time any intervener set containing a non-tier element is considered. Therefore, T i will contain all and only the elements of T j , i.e., T i = T j . Since B = B , G = { f 0 , T 0 , f 1 , T 1 , f i , T i = G } .    □
Theorem 1.
For any CUTI-MTSL tree language L, CUTIA identifies a grammar for L in polynomial time and data.
Proof. 
From lemmas 1, 2, and 3.    □

3.3. Limitations

I have now introduced CUTIA and shown that it learns the class of CUTI-MTSL efficiently with respect to both time and data. It is not, however, able to learn the full expression of MTSL proper, which leaves the question of what kinds of languages are in the gap between these classes, i.e., what languages belong to the MTSL class but not the class of CUTI-MTSL?
A core challenge of learning MTSL is capturing not just multiple TSL patterns, but also their interactions with each other. CUTI-MTSL dodges this issue by restricting the ways that tiers can interact. By requiring that all tier elements (for each tier) be independent of all non-tier elements (the tier-element independence condition), restrictions from one tier are prevented from creating relationships between elements that interfere with other tiers.
To illustrate this, let us consider an example of one such language that exists in this gap, and which, consequently, CUTIA cannot learn. The example language we will use is string-based, but recall that under the modeling framework introduced in Section 2.1, strings are just unary branching trees. Thus, a string written a b c corresponds to a tree with a as the root, b as its only child, and c as the only child of (the node labeled) b. Its 2-factors are a b and b c (for convenience, 2-factors may also be written as length 2 strings: a b and b c ).
The string language L that we will consider is defined by the regular expression in Example 1.
Example 1.
( ( a b + c ) | ( e b + d ) )
This is the set of all strings consisting of any number of sequences of one a followed by one or more bs followed by one c interspersed with any number of sequences of one e followed by one or more bs followed by one d. So a b b b c e b d , e b b b b b d , and a b c e b d e b d a b b c are all valid strings in the language, but a b d and e d a c are not. This involves local restrictions, for example, a and e can only be followed by b, but also non-local dependencies that interact: b can be followed by c only in the case where the first b in its sequence was preceded by a, and similarly with d and e. This violates the “tier-element independence” requirement of CUTI-MTSL, which stipulates that each element on each tier must be independent from (i.e., not bound to always occur next to) each non-tier element. To enforce the constraint “a cannot be followed by d unless there is a e in between them”, a, d, and e must all be tier elements. However, b must be off the tier, since its presence as an intervener between a and d does not rescue the structure (i.e., a b d is still banned). But none of these other tier elements (a, d, and e) are independent from b: each is required to either precede or follow a b. As we will see, this makes it impossible for CUTIA to correctly learn L.
Suppose CUTIA is given the sample S presented in Example 2. This turns out to contain all possible intervener sets for 2-factors which are not attested directly in the sample, making it “complete” as far as 2-paths are concerned–no further data points drawn from L can add any 2-paths that CUTIA will consider.
Example 2.
S : = a b b b c , e b d , a b c e b d e b d e b d , e b d e b d a b c a b c a b b c e b d , a b c e b d a b c e b d a b c e b d
Once again, CUTIA will proceed by computing the possible 2-factors that are missing from the sample. As before, intervener sets are collected for each missing 2-factor and used to compute the corresponding tier. Table 3 illustrates the missing 2-factors, intervener sets, and the final tier learned by CUTIA as it attempts to learn this pattern.
The grammar learned by CUTIA does not successfully generate the target language. For example, the string a b d , which is not in the target language, is accepted by the grammar that CUTIA generates: a b and b d sequences are both permitted, and a d is banned only over the tier containing the entire alphabet. CUTIA is unable to disentangle whether the important intervener in the a d 2-factor is a b or an e because b (a non-tier element for this 2-factor in any successful grammar for L) is not independent of the other segments involved. Constraints from the segmental tier ( Σ ) interfere with discovering patterns on other tiers.
In natural language, CUTIA’s restriction to the class of CUTI-MTSL poses a problem for some patterns in the domain of syntax. Many theories of syntactic structure predict that syntactic elements are not independent of each other in this way. In particular, most work in the Minimalist tradition [32] assumes a universal spine (i.e., hierarchy of projections) in which certain functional elements always appear in a specific order. This inherently introduces the kind of element interdependence that cannot be represented by CUTI-MTSL, and consequently cannot be learned by CUTIA. For example, this universal spine makes it impossible for CUTIA to successfully learn the pattern for a well-known syntactic phenomenon, the English that-trace effect, which can be represented with a fairly simple TSL2 analysis [7]. The that-trace effect refers to the robustly attested fact that, for many speakers of English, it is illicit to introduce an embedded clause with an overtly pronounced that complementizer in the case where the subject of that embedded clause moves out from under the complementizer. This accounts for the grammaticality contrast between the sentences in Example 3.
Example 3.
1. 
*Who did you say that likes Sally?
2. 
Who did you say that John likes?
(Sentences marked with an * are ungrammatical.)
The TSL analysis for the that-trace effect relies on forbidding complementizers (functional elements which introduce a new clause) with an overtly pronounced that from dominating elements which undergo both subject and wh-movement on a tier with all complementizers (of any pronunciation) and all nominal (noun-like) moving elements. Under standard syntactic analyses, however, all complementizers dominate some kind of tense element, which themselves dominate some kind of verb element. These other mandatory elements are not relevant for the that-trace phenomenon, and need to be omitted from the tier enforcing the that-trace effect in order for the analysis to succeed. However, they are directly interdependent with tier elements: complementizers, tense nodes, and verbs always show up in sequence. This makes the overall combination of basic syntactic clause structure and the that-trace effect a pattern which is not CUTI-MTSL, i.e., a pattern which CUTIA cannot learn. To account for the full breadth of syntactic patterns, an algorithm is needed that can learn the full class of MTSL.

4. MTSL-BUFIA

This section introduces the Multi Tier-based Strictly Local Bottom-Up Factor Inference Algorithm (MTSL-BUFIA), which learns the class of MTSL proper. MTSL-BUFIA combines key insights from CUTIA and from the Bottom-Up Factor Inference Algorithm (BUFIA) introduced by Chandlee et al. [33]. In comparison to CUTIA, it offers a tradeoff: MTSL-BUFIA can learn any MTSL language, not just CUTI-MTSL ones, but it requires (in the worst case) exponential time to do so, rather than the polynomial time required by CUTIA.
Similar to CUTIA, MTSL-BUFIA leverages the idea that any 2-factor that is not present in the input data must be absent because it is banned on at least one tier. Additionally, 2-paths are used to inform how these forbidding tiers should be constructed: each set of interveners for a given 2-factor must contain at least one element from each tier on which that 2-factor is banned. Unlike CUTIA, however, this algorithm also uses the fact that the space of possible forbidding tiers is partially ordered, and can therefore be traversed using a bottom-up breadth-first search, where the search space is pruned as constraints are located. This search strategy is inspired by Chandlee et al. [33]’s BUFIA. Using this approach, the most general tiers on which 2-factors do not appear can be exhaustively located.

4.1. Canonical Form

It is possible for multiple distinct MTSL grammars to be extensionally equivalent, i.e., generate the same language. For this reason, I provide a canonical form for MTSL grammars.
Definition 10
(MTSL Canonical Grammar). The Canonical Grammar for an MTSL language is a set of 2-factor, tier pairs G = { f 0 , T 0 , f 1 , T 1 , f n , T n } for which the following hold:
1. 
f i , T i , f j , T j G [ f i = f j T i T j T j T i ] (incomparable tier constraints);
2. 
f i , T i [ L ( G ) L ( { f i , T i } ) T i [ T i T i f i , T i G ] ] (exhaustivity).
This says that in order for an MTSL grammar to be canonical, (1) all forbidding tiers for the same factor must be incomparable, and (2) if the language generated by G obeys some constraint against a 2-factor f i on some tier T i , then there must be a constraint in G which bans f i on T i or some subset tier. The essential idea is that all surface-true constraints must be represented in the grammar in their most general form (i.e., on the smallest possible tier).
Lemma 4.
Any MTSL grammar is extensionally equivalent to a unique canonical MTSL grammar.
Proof. 
Consider any MTSL grammar G. Suppose L ( G ) obeys some constraint f i , T i G . We can then construct a grammar G = G f i , T i . Since f i , T i is true over L ( G ) and the two grammars differ only in the presence of this constraint, L ( G ) = L ( G ) . This process can be repeated to yield a G for which f i , T i [ f i , T i G L ( G ) L ( f i , T i ) ] and L ( G ) = L ( G ) . G then meets requirement 2 of a canonical grammar, since every surface-true constraint of L ( G ) is present in G .
Then, if f i , T i , f i , T j G [ T i T j ] , we can construct G = G { f i , T j } . Since any form which contains the 2-factor f i on the T j tier will necessarily also contain f i on the T i tier, any language which obeys f i , T i must also obey f i , T j . Therefore, L ( G ) = L ( G ) . Additionally, G still meets requirement 2 since the subset tier is always preserved. Once again, we can repeat this process until we reach a G for which f i , T i , f i , T j G [ T i T j ] (i.e., requirement 1 holds) and L ( G ) = L ( G ) = L ( G ) . G thus meets both requirements for a canonical MTSL grammar and is equivalent to G, meaning any MTSL grammar is equivalent to some canonical MTSL grammar.
Next, consider any canonical MTSL grammar G, and consider some other canonical MTSL grammar G such that L ( G ) = L ( G ) . If G G , it must be the case that either G contains some constraint which is not in G, or that G contains some constraint which is not in G . Suppose G contains a constraint f i , T i which is not in G. Since L ( G ) = L ( G ) , L ( G ) must obey this constraint. Since G is canonical, it must then (by requirement 2) contain some constraint f i , T j such that T j T i . But then, since G is also canonical and must obey f i , T j , G must contain some constraint f i , T k such that T k T j . But then by transitivity, T k T i , meaning that G violates requirement 1 and cannot be canonical. Thus, G cannot contain any constraints that are not in G and G G . Suppose G contains some constraint f i , T i not in G . By the same logic, G must then contain some f i , T j , and G must contain some f i , T k such that T k T j T i . This means G violates requirement 1 and cannot be canonical. Thus, G G and G = G .    □

4.2. BUFIA

To find the forbidding tier(s) for each 2-factor, MTSL-BUFIA uses a bottom-up, breadth-first search of the partially ordered set of possible tiers. This approach is inspired by BUFIA [33,34], an algorithm designed to find the most general constraints over this type of partially ordered search space.
It is easy enough to see that the set of possible tiers (i.e., the powerset of Σ ) is partially ordered under the subset relation. Furthermore, the set of constraints introduced by these possible tiers (for the same 2-factor) obeys that same partial ordering. For example, consider a toy language with alphabet Σ = { a , b , c , d } with just one constraint: a b , { a , b , d } . So strings like a d b c c a and b c a c c d b (with tier projections a d b a and b a d b ) are in the language, but d a c c b d a is out, since its projection ( d a b d a ) includes the substring a b . It is immediately clear that no string in this language can contain a b on the superset tier of { a , b , c , d } ( = Σ ) , since this would necessarily mean a b is present on the { a , b , d } tier as well. More generally, a factor f i is absent from a tier T i if and only if every “possible” occurrence of f i is precluded by the intervention of some element from T i . If T i T j , then any factor absent from T i will also be absent from T j , since the intervention of an element from T i entails the intervention of an element from T j (all elements of T i are in T j ). Therefore, any language that obeys the constraint f i , T i obeys all other constraints f i , T j where T i T j . Figure 5 visualizes this property on the set of possible forbidding tiers for this toy language rooted at the { a , b } tier.
These entailment relationships between possible constraints are exactly what BUFIA uses for learning. Starting from the “bottom” (most general constraint), BUFIA proceeds upwards layer by layer, looking for constraints which are surface-true. When it finds them, it adds them to the grammar and prunes away the section of the search space above that new constraint. Chandlee et al. [33] also outline several useful learning guarantees for this algorithm. Crucially, BUFIA is guaranteed to find only, and all incomparable constraints which are consistent with the input data, and it is guaranteed to find the most general such constraints—exactly the properties required to construct canonical MTSL grammars.

4.3. Algorithm

MTSL-BUFIA, defined in Algorithm 2, operates as follows: First, it finds all possible 2-factors which are absent from the input sample. This absence is an indicator that each of these 2-factors is forbidden on some tier(s). The algorithm then iterates through these missing 2-factors and computes the intervener sets for each. These first two steps are identical to the operation of CUTIA.
Algorithm 2 MTSL-BUFIA
Data: Positive sample I
Result: Grammar G, a set of 〈2-factor, forbidding tier〉 pairs.

G := {}
B := 2 f a c ( T Σ ) 2 f a c ( I )
for each f : = ρ 1 R ρ 2 B :
    S : = { V   for   x , V 2 p a t h s ( I )   where   x = f }
    Q := [{ ρ 1 , ρ 2 }]
    while Q ≠ []:
         T = Q. p o p ( )
         if  f , T G [ T T ] :
           continue
         if  V S [ T V = ] :
           Q. a p p e n d ( N e x t S u p e r s e t s ( T ) )
         else:
           G = G { f , T }
return G
Once the intervener set is collected, the algorithm begins its bottom-up search for the smallest tier(s) on which each 2-factor is missing. It begins by looking at the tier consisting only of the symbols present in the 2-factor itself, and proceeds breadth-first to increasingly larger possible tiers. These larger tiers are generated using the N e x t S u p e r s e t s ( ) function. For some tier T and alphabet Σ , the next supersets of T are defined by N e x t S u p e r s e t s ( T ) : = { T | T T ( | T | = | T | + 1 ) } . To determine whether a 2-factor is absent from a tier projection T, it must be the case that there is no intervener set V for that 2-factor such that V T is empty. In other words, each intervener set must contain some tier element, otherwise the 2-factor in question is present on that tier.
Any time the search encounters a tier where the 2-factor is not present in the data over that tier, the 2-factor, tier pair is added to the grammar, and all supersets of that tier are removed from the search space. In this way, the search space of possible tiers is pruned as the search proceeds.
Once this bottom-up search has been conducted for every 2-factor, the final grammar is returned.
To see MTSL-BUFIA in action, let us revisit the language from Example 1, which CUTIA failed to learn. Recall that this is the language L defined by the regular expression ( ( a b + c ) | ( e b + d ) ) . Suppose MTSL-BUFIA is provided with the same data sample (given in Example 2) containing all possible 2-paths in L. Its first two steps are identical to CUTIA’s: compute the missing 2-factors and find the intervener sets for these missing factors (columns 1 and 2 of Table 4).
Next, MTSL-BUFIA will conduct a bottom-up search of the possible tiers. The search procedure is diagrammed in Figure 6 and Figure 7, for the 2-factors a a and d b respectively. If each intervener set contains at least one tier element, the factor is missing on that tier, and the tier is added as a constraint (notated by the symbol). For example, in Figure 6, the { a , b } tier is added as a forbidding tier for the a a 2-factor because each intervener set contains a b. This is exactly as expected– a a cannot be present on the { a , b } tier because every a must be followed by at least one b. Similarly, each a must be followed by a c before another a can be present, and so this 2-factor is also banned on the { a , c } tier.
For the d b 2-factor, meanwhile, there must be either a or e occurring between them. Projecting just one of these to the tier would predict that that symbol is always required–clearly not the case from the attested 2-paths. For example, if d b were banned on the { a , b , d } tier, the (licit) sequence e b d e b d would be banned, since its tier projection would be b d b d , which contains the d b 2-factor. Therefore, MTSL-BUFIA continues its search for possible tiers until it encounters the { a , b , d , e } tier, over which d b is indeed always absent.
If a tier is not a forbidding tier (indicated with ), then the search continues upwards. Notice, however, that supersets of previously added forbidding tiers are not searched, so the final set { a , b , c , d , e } (i.e., Σ ) is never reached in Figure 7, and the search in Figure 6 can get no higher than { a , d , e } , since any other tiers that could be searched would be supersets of one of the existing forbidding tiers, { a , b } or { a , c } .
The final forbidding tier(s) for all 2-factors are given in column three of Table 4.
MTSL-BUFIA is able to do what CUTIA could not: the grammar it learns exactly generates the target language. In particular, the string a b d , which CUTIA’s grammar was unable to rule out, is forbidden in MTSL-BUFIA’s grammar by the constraints a d , { a , c , d } and a d , { a , c , e } . The dependence between a and b poses no issue because MTSL-BUFIA finds all of the (smallest) constraints which are surface-true, including constraints against the same 2-factor on multiple tiers.
In addition, MTSL-BUFIA can learn all the same patterns that CUTIA can. Table 5 shows MTSL-BUFIA learning the grammar for the example language given in Section 3.1, modeling both movement and extraction morphology patterns. The grammar that MTSL-BUFIA finds is identical to the one found by CUTIA.

4.4. Identification in Polynomial Data

In this section, I establish that MTSL-BUFIA identifies the class of MTSL2 in the limit from positive data in the sense of Gold [26], with a polynomial bound on the size of the required data sample as per De La Higuera [27]. Once again, this proof operates by establishing a representative sample and demonstrating that it characterizes the language with respect to MTSL-BUFIA.
Definition 11
(Grammar Size). For any MTSL-k grammar G, its size is defined by:
| G | : = f , T G k + | T |
Definition 12
(Representative Sample). For a MTSL language L over alphabet Σ whose grammar is G = { f 0 , T 0 , f 1 , T 1 , f i , T i } , a set D of structures is a representative sample iff all of the following hold:
1. 
D L ;
2. 
x 2 f a c ( Σ ) [ x { f : f , T G } x 2 f a c ( D ) ] ;
3. 
f , T G [ σ T s y m b o l s ( f ) [ f , V 2 p a t h s ( D ) [ σ V ¬ σ V [ σ T σ σ ] ] ] ;
4. 
x { f : f , T G } [ ¬ T [ ( x , V 2 p a t h s ( D ) V T ) ( ¬ x , T G [ T T ] ) ] ] .
This essentially states that (1) a representative sample must contain all 2-factors which are not banned on any tier, and (2) for each banned 2-factor and each tier on which it is banned, every tier element that is not part of the 2-factor itself must be present in an intervener set (for that same 2-factor) which contains no other tier elements, and (3) for each banned 2-factor there can be no set of elements which is “represented” (i.e., at least one element of this set is present) in every intervener set (for that 2-factor) but which is not itself a superset of a tier on which that 2-factor is banned.
Lemma 5.
For any MTSL language L, there is a representative sample D for L which is polynomial in the size of G for any grammar G of L.
Proof. 
The first condition requires that all 2-factors that are not banned on any tier are present in the sample. This mirrors the requirement for representative samples for CUTIA in Section 3.2, and imposes a total size requirement of 6 · | Σ | 2 ( 2 · | Σ | 2 possible 2-factors each constructed with at most 3 nodes).
The second condition stipulates that each tier element of each restricting tier for each banned 2-factor must appear in some intervener set for that 2-factor without any other tier elements. Ensuring these intervener sets requires structures containing the two symbols present in the 2-factor, plus at most one additional parent node to connect siblings, giving a total size of 4 for each such structure. Since one such structure is needed for each tier symbol, the amount of data required to satisfy condition two is linear in the size of the grammar (since the size of the grammar is just the number of total tier symbols plus twice the number of 2-factors).
The third condition is about making sure that there are “small enough” intervener sets represented in the sample (i.e., those uncluttered by many non-tier elements). Constructing the smallest possible sample that fulfills conditions 1 and 2 ensures this condition without the need for additional data.
Therefore, the space complexity of the minimal representative sample is O ( | Σ | 2 + | G | ) (where | G | is the number of total symbols present in G). Assuming all alphabet symbols are used in the grammar, this is polynomial in the size of the grammar.     □
Lemma 6.
Given any finite input sample I drawn from an MTSL language L with canonical grammar G = { f 0 , T 0 , f 1 , T 1 , f i , T i } which itself contains a representative sample D for L (i.e., D I L ), MTSL-BUFIA will return a grammar G = G .
Proof. 
Consider the set B of all the 2-factors that are banned on any tier in G. Since I contains only valid structures in L, these 2-factors must all be absent from I. By definition of a representative sample, all other possible 2-factors over Σ must be present in D, and therefore also in I. B is therefore equivalent to the set that will be iterated over in the outer loop. Each f i B must therefore be associated with one or more pairs f i , T 1 , f i , T 2 f i , T k G . First, we show that every constraint in G is in G. Suppose G contains a forbidding tier for some f i , f i , T i which is not one of those in G. It must then be the case that each intervener set for f i , contains at least one element from T i , otherwise this pair would not get added to the grammar. Because D I , the set of intervener sets for f i , S must contain all intervener sets for f i which are present in D. By requirement 3 for representative samples, it must then be the case that there is some constraint f j , T j G such that T j T i . Since the queue (Q) grows breadth-first, the algorithm will consider T j as a forbidding tier for f i before it considers T i . Since f i , T j holds on all the data, each intervener set in I for f i must contain at least one element of T j . Therefore, the algorithm will add f i , T j to the grammar, and when it considers T i this tier will be discounted since T j T i is already present as a forbidding tier for f i . Therefore, G cannot contain any forbidding tiers for any f i which are not contained in G.
Next, we show that every constraint in G is in G . Suppose G contains a forbidding tier for f i , f i , T i which is not contained in G . In order for T i to not be added to G in the inner loop, it must be the case that either a forbidding tier f i , T j is already in G , where T j T i , or there is some intervener set V which contains no elements of T i . As established above, the only constraints that will be added to G are those that are also present in G. Therefore, f i , T j G f i , T j G . However, if G were to contain f i , T j and f i , T i , where T j T i , it would not be a canonical grammar by the definition in Section 4.1. Since f i , T i G , and I consists of only positive data generated by G, there can be no such intervener set V such that V T i = , since this would mean f i was in fact present on the tier T i , thereby directly violating the constraint. Therefore, T i will be added to G during the inner loop, and G cannot contain any constraints not contained in G .
Since G G and G G , the two are equal: G = G .     □
Theorem 2.
For any MTSL language L, MTSL-BUFIA identifies the canonical grammar for L in the limit using polynomial data.
Proof. 
From Lemmas 5 and 6.     □

5. Discussion

In the worst case, MTSL-BUFIA runs in exponential time. To see why, consider the time required for each step: As established in Section 3.2, the time complexity of computing 2-paths is O ( n 4 ) . Then, for each missing 2-factor, the relevant tier(s) must be found. There are at most 2 · | Σ | 2 2-factors to consider. Any missing 2-factor could be banned only on the segmental tier (i.e., T = Σ ), in which case the algorithm would have to traverse all possible tiers to discover this, with a time complexity of 2 | Σ | for each factor. This yields a time complexity of O ( | Σ | 2 · 2 | Σ | ) . It is the final step of traversing the powerset of the alphabet—the BUFIA component of MTSL-BUFIA—which introduces its exponential complexity, a disadvantage against CUTIA, which is guaranteed to run in polynomial time.
The utility of BUFIA, however, does not come from its worst-case performance but rather from its ability to turn sparsity in the input data to its advantage. When the input data obeys highly general constraints, BUFIA is able to prune away large chunks of the search space along the way, enabling it to tractably search very large spaces under the right conditions. Given that the nature of natural language is highly marked by sparsity, MTSL-BUFIA’s exponential worst-case bound may not indicate intractability on human language datasets. Indeed, an optimized implementation of BUFIA has been successfully used to model the phonotactics of Bolivian Quechua [35]. In the case of subregular syntax, many of the phenomena which have been analyzed as TSL require tiers with three or fewer symbols (i.e., highly restrictive), which would play into BUFIA’s ability to prune the search space.
Another contrast between the two algorithms is the concept classes each one learns. The class of MTSL proper is defined in terms of properties of the grammar, whereas the formulation of CUTI-MTSL presented here is defined in terms of properties of the language itself. This is not necessarily a problem for the learning result, and it is possible that such a grammatical description could be found for CUTI-MTSL. However, if such a description does not exist or is completely byzantine, it may suggest something about the “naturalness” of this class as compared to other formal classes which do have logical, algebraic, or automata characterizations.
CUTIA and MTSL-BUFIA offer different things in terms of the classes they learn and the time complexity they guarantee. Together, however, they provide insight into the nature of the MTSL class and how it can be learned. Firstly, both algorithms use the 2-paths present in the input sample as the sole determiner of how to construct their grammar. This shows that both MTSL and CUTI-MTSL are completely characterized by 2-paths: two (CUTI-)MTSL languages which have the same set of 2-paths are the same language.
Additionally, the interaction between restrictions on multiple tiers is what makes MTSL a difficult class to learn. Element interactions from one tier can obscure relationships from another tier, making it harder for a learner to construct an accurate grammar. MTSL is a powerful class and not an easy one to learn. However, these algorithms show that it is learnable under the right conditions.

6. Conclusions

This paper introduces two algorithms for MTSL learning: CUTIA and MTSL-BUFIA, proves their effectiveness, and discusses the tradeoffs between them. CUTIA offers a stronger guarantee on time complexity but learns a restricted concept class, while MTSL-BUFIA learns the full MTSL class but sacrifices the time efficiency guarantee.
The added complexity offered by MTSL-BUFIA is needed for some syntactic phenomena, but it is possible that the more efficient CUTIA would be sufficient for learning phonological patterns. These algorithms can be used to explore the degree of tier interaction in different domains of human language and to discover where the full power of MTSL is needed. Future work in using these algorithms on natural language data can also explore the average-case time complexity of MTSL-BUFIA and whether the requisite representative sample for learning target patterns is typically present in the language input a human child receives.
Additionally, MTSL-BUFIA is suggestive of several possible extensions that could be fruitful to explore. Lambert [9] provides a method for online learning of TSLk languages, and it might be possible to adapt MTSL-BUFIA along those same lines. While MTSL-BUFIA is limited to the MTSL2 languages, the concept of intervener sets can be extended to larger k-factors, allowing for the possibility of learning MTSLk. The set of possible k-factor, tier pairs is itself partially ordered and can be traversed using BUFIA to yield a grammar with mixed sizes of k-factors. Another useful aspect of BUFIA is that it is well-suited to feature-based representations, so this approach could also be easily extended to operate over features rather than segments.
Finally, another area of future exploration is into additional concept classes situated in the gap between MTSL and CUTI-MTSL. For example, what would it mean to enforce only constraint-uniqueness or only tier-element independence? Are there different restrictions on MTSL that are more appropriate for natural language? This paper offers a foundation on which to continue probing these open questions.

Funding

This research was partially funded by National Science Foundation grant number BCS-1845344, and additionally supported by the Institute for Advanced Computational Science at Stony Brook University.

Data Availability Statement

No data was created for this project.

Acknowledgments

I am grateful to the anonymous reviewers for the ESSLLI 2024 student session for their constructive comments, as well as to Thomas Graf and Jefferey Heinz for helpful feedback and guidance.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TSL2Tier-based Strictly 2-Local
MTSL2Multi-Tier-based Strictly 2-Local
CUTI-MTSLConstraint Unique Multi Tier-based Strictly Local
BUFIABottom-Up Factor Inference Algorithm
MTSL-BUFIAMulti-Tier-based Strictly 2-Local Bottom-Up Factor Inference Algorithm
CUTIAConstraint Unique Tier Inference Algorithm

References

  1. Graf, T. Subregular linguistics: Bridging theoretical linguistics and formal grammar. Theor. Linguist. 2022, 48, 145–184. [Google Scholar] [CrossRef]
  2. Heinz, J. The computational nature of phonological generalizations. In Phonological Typology, Phonetics and Phonology; Walter de Gruyter GmbH & Co KG: Berlin, Germany, 2018; pp. 126–195. [Google Scholar]
  3. Lambert, D. Relativized Adjacency. J. Log. Lang. Inf. 2023, 32, 707–731. [Google Scholar] [CrossRef]
  4. McMullin, K.; Hansson, G.Ó. Long-distance phonotactics as tier-based strictly 2-local languages. In Proceedings of the Annual Meeting on Phonology, Cambridge, MA, USA, 19–21 September 2014. [Google Scholar]
  5. Hanson, K. Tier-based strict locality and the typology of agreement. J. Lang. Model. 2025, 13, 43–97. [Google Scholar] [CrossRef]
  6. Hanson, K. A TSL Analysis of Japanese Case; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; Volume 6, pp. 15–24. [Google Scholar]
  7. Graf, T. Typological implications of tier-based strictly local movement. In Proceedings of the Society for Computation in Linguistics, Online, 7–9 February 2022; pp. 184–193. [Google Scholar]
  8. Jardine, A.; McMullin, K. Efficient learning of tier-based strictly k-local languages. In Proceedings of the International Conference on Language and Automata Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2017; pp. 64–76. [Google Scholar]
  9. Lambert, D. Grammar interpretations and learning TSL online. In Proceedings of the International Conference on Grammatical Inference; PMLR: London, UK, 2021; pp. 81–91. [Google Scholar]
  10. de la Higuera, C. Grammatical Inference: Learning Automata and Grammars; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  11. Heinz, J.; de la Higuera, C.; van Zaanen, M. Grammatical Inference for Computational Linguistics; Synthesis Lectures on Human Language Technologies; Morgan and Claypool: San Rafael, CA, USA, 2015. [Google Scholar]
  12. Wieczorek, W. Grammatical Inference: Algorithms, Routines and Applications; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  13. Jurafsky, D.; Martin, J. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed.; Online manuscript released January 6, 2026; Available online: https://web.stanford.edu/~jurafsky/slp3 (accessed on 13 April 2026).
  14. Strobl, L.; Merrill, W.; Weiss, G.; Chiang, D.; Angluin, D. What Formal Languages Can Transformers Express? A Survey. Trans. Assoc. Comput. Linguist. 2024, 12, 543–561. [Google Scholar] [CrossRef]
  15. Li, T.; Precup, D.; Rabusseau, G. Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning. Mach. Learn. 2024, 113, 2619–2653. [Google Scholar] [CrossRef]
  16. Van Der Poel, S.; Lambert, D.; Kostyszyn, K.; Gao, T.; Verma, R.; Andersen, D.; Chau, J.; Peterson, E.; Clair, C.S.; Fodor, P.; et al. Mlregtest: A benchmark for the machine learning of regular languages. J. Mach. Learn. Res. 2024, 25, 1–45. [Google Scholar]
  17. Bod, R. From exemplar to grammar: Integrating analogy and probability in language learning. Cogn. Sci. 2008, 33, 752–793. [Google Scholar] [CrossRef] [PubMed]
  18. Clark, A.; Eyraud, R. Polynomial Identification in the Limit of Substitutable Context-free Languages. J. Mach. Learn. Res. 2007, 8, 1725–1745. [Google Scholar]
  19. Clark, A. Learning trees from strings: A strong learning algorithm for some context-free grammars. J. Mach. Learn. Res. 2013, 14, 3537–3559. [Google Scholar]
  20. Clark, A. Distributional learning of some context-free languages with a minimally adequate teacher. In Proceedings of the International Colloquium on Grammatical Inference; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–37. [Google Scholar]
  21. Yoshinaka, R. Towards dual approaches for learning context-free grammars based on syntactic concept lattices. In Proceedings of the International Conference on Developments in Language Theory; Springer: Berlin/Heidelberg, Germany, 2011; pp. 429–440. [Google Scholar]
  22. Stabler, E. Derivational minimalism. In Proceedings of the International Conference on Logical Aspects of Computational Linguistics; Springer: Berlin/Heidelberg, Germany, 1996; pp. 68–95. [Google Scholar]
  23. McMullin, K.; Aksënova, A.; De Santo, A. Learning phonotactic restrictions on multiple tiers. Soc. Comput. Linguist. 2019, 2, 377–378. [Google Scholar]
  24. Gorn, S. Explicit definitions and linguistic dominoes. In Systems and Computer Science; University of Toronto Press: Toronto, ON, Canada, 1967. [Google Scholar] [CrossRef]
  25. Jardine, A.; Heinz, J. Learning Tier-based Strictly 2-Local Languages. Trans. Assoc. Comput. Linguist. 2016, 4, 87–98. [Google Scholar] [CrossRef]
  26. Gold, E.M. Language identification in the limit. Inf. Control 1967, 10, 447–474. [Google Scholar] [CrossRef]
  27. De La Higuera, C. Characteristic sets for polynomial grammatical inference. Mach. Learn. 1997, 27, 125–138. [Google Scholar] [CrossRef]
  28. Heinz, J.; Rawal, C.; Tanner, H.G. Tier-based strictly local constraints for phonology. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 58–64. [Google Scholar]
  29. Graf, T.; Kostyszyn, K. Multiple wh-movement is not special: The subregular complexity of persistent features in Minimalist grammars. Soc. Comput. Linguist. 2021, 4, 275–285. [Google Scholar]
  30. Graf, T. Diving deeper into subregular syntax. Theor. Linguist. 2022, 48, 245–278. [Google Scholar] [CrossRef]
  31. McCloskey, J. The morphosyntax of WH-extraction in Irish. J. Linguist. 2001, 37, 67–100. [Google Scholar] [CrossRef]
  32. Chomsky, N. The Minimalist Program; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
  33. Chandlee, J.; Eyraud, R.; Heinz, J.; Jardine, A.; Rawski, J. Learning with partially ordered representations. arXiv 2019, arXiv:1906.07886. [Google Scholar] [CrossRef]
  34. Rawski, J. Structure and Learning in Natural Language. Ph.D. Thesis, Stony Brook University, Stony Brook, NY, USA, 2021. [Google Scholar]
  35. Swanson, L.; Heinz, J.; Rawski, J. Phonotactic Learning with Structure, Not Statistics; Linguistic Inquiry Remarks & Replies; Cambridge, MA, USA, 2025; To Appear. [Google Scholar]
Figure 1. Example tree showing Gorn addresses (left) and labels (right). Solid arrows indicate the dominance relation, while dotted ones indicate the left-sibling relation.
Figure 1. Example tree showing Gorn addresses (left) and labels (right). Solid arrows indicate the dominance relation, while dotted ones indicate the left-sibling relation.
Logics 04 00005 g001
Figure 2. Example of tree structure (left) and its corresponding projection onto the { b , c } tier (right). In the representation of the tier projection, solid arrows indicate tier dominance, and dashed ones indicate tier siblinghood.
Figure 2. Example of tree structure (left) and its corresponding projection onto the { b , c } tier (right). In the representation of the tier projection, solid arrows indicate tier dominance, and dashed ones indicate tier siblinghood.
Logics 04 00005 g002
Figure 3. Example of finding 2-factors and 2-paths for a tree. Each 2-factor corresponds to a 2-path with an empty intervener set.
Figure 3. Example of finding 2-factors and 2-paths for a tree. Each 2-factor corresponds to a 2-path with an empty intervener set.
Logics 04 00005 g003
Figure 4. Input sample for learning the MTSL2 language, enforcing both correspondence between movers and landing sites and specific extraction morphology along movement paths only.
Figure 4. Input sample for learning the MTSL2 language, enforcing both correspondence between movers and landing sites and specific extraction morphology along movement paths only.
Logics 04 00005 g004
Figure 5. Hierarchy of possible forbidding tiers for a b over Σ = { a , b , c , d } . If a b is absent from a particular tier (denoted by ), it must be absent from all superset tiers. If it is present on a tier (denoted ), it is necessarily present on all subset tiers.
Figure 5. Hierarchy of possible forbidding tiers for a b over Σ = { a , b , c , d } . If a b is absent from a particular tier (denoted by ), it must be absent from all superset tiers. If it is present on a tier (denoted ), it is necessarily present on all subset tiers.
Logics 04 00005 g005
Figure 6. Calculation of tiers for a a 2-factor.
Figure 6. Calculation of tiers for a a 2-factor.
Logics 04 00005 g006
Figure 7. Calculation of tiers for b e 2-factor.
Figure 7. Calculation of tiers for b e 2-factor.
Logics 04 00005 g007
Table 1. Set of all possible immediate dominance and immediate left sibling 2-factors, with unattested ones in parentheses.
Table 1. Set of all possible immediate dominance and immediate left sibling 2-factors, with unattested ones in parentheses.
l ( m )( c ) x
l l l m l c l x l Y l l Y m l Y c l Y x
m l ( m m )( m c ) m x m Y l ( m Y m )( m Y c ) m Y x
( c l ) c m c c c x c Y l ( c Y m )( c Y c ) c Y x
x l x m x c x x x Y l x Y m x Y c x Y x
Table 2. Tier calculation for each missing 2-factor. Grayed-out elements are those which are already on the tier by the time that intervener set is considered. Sets containing such elements (crossed out) are not added to the tier.
Table 2. Tier calculation for each missing 2-factor. Grayed-out elements are those which are already on the tier by the time that intervener set is considered. Sets containing such elements (crossed out) are not added to the tier.
2-FactorIntervenersTier
m { l } { l , x } { l , x , c } { l , m , x , c } { m } { m , l }
{ l , c } { l , m , x }
{ l , m , c }
c { l } { l , x } { l , c , m } { c } { c , l }
{ l , c }
m m { l } { l , c } { m } { m , l }
m c { l } { m , c } { m , c , l }
c l { m } { c , l } { m , c , l }
c { m } { m , x } { m , c , x } { m , l , c , x } { c } { m , c }
l { m } { m , x } { m , c , x } { m , l , c , x } { c } { m , l }
{ m , c } { m , l , x }
m Y m { l } { l , c } { l , c , x } { l , m , c , x } { m } { m , l }
m Y c { l } { l , x } { l , m , x } { l , m , c , x } { m , c } { m , c , l }
c Y m { l } { l , x } { l , m , x } { l , m , c , x } { m , c } { m , c , l }
c Y c { l } { c } { c , l }
Table 3. CUTIA attempting to learn the grammar for the target language given in Example 1.
Table 3. CUTIA attempting to learn the grammar for the target language given in Example 1.
2-FactorIntervenersTier
a a { b , c } { b , c , a } { b , c , e , d } { a , b , c , d , e } { a } { a , b , c }
a c { b } { a , b , c } { a , b , c , d , e } { a , c } { a , b , c }
a d { b , c , e } { b , c , e , a } { a , b , c , e , a } { a , d } { a , b , c , d , e }
{ b , c , e , d }
a e { b , c } { b , c , a } { b , c , a , e , d } { a , e } { a , b , c , e }
b a { c } { b , c } { a , b , c } { b , c , d , e } { a , b , c , d , e } { a , b } { a , b , c , d }
{ d } { b , d } { b , d , e } { a , b , c , d }
b e { c } { b , c } { b , d , e } { a , b , c , d } { a , b , c , d , e } { b , e } { b , c , d , e }
{ d } { b , d } { a , b , c } { b , c , d , e }
c b { a } { a , b } { a , b , c } { a , b , c , d } { a , b , c , d , e } { b , c } { a , b , c , e }
{ e } { b , e } { b , d , e } { a , b , d , e }
c c { a , b } { a , b , c } { a , b , d , e } { a , b , c , d , e } { c } { a , b , c }
c d { e , b } { b , d , e } { a , b , c , e } { a , b , c , d , e } { c , d } { b , c , d , e }
d b { a } { a , b } { a , b , c } { a , b , c , e } { a , b , c , d , e } { b , d } { a , b , d , e }
{ e } { b , e } { b , d , e } { a , b , d , e }
d c { a , b } { a , b , c } { a , b , d , e } { a , b , c , d , e } { c , d } { a , b , c , d }
d d { e , b } { b , d , e } { a , b , c , e } { a , b , c , d , e } { d } { b , d , e }
e a { b , d } { b , d , e } { a , b , c , d , e } { a , e } { a , b , d , e }
e c { b , d , a } { a , b , d , e } { a , b , c , d , e } { c , e } { a , b , c , d , e }
{ a , b , c , d }
e d { b } { b , d , e } { a , b , c , d , e } { d , e } { b , d , e }
e e { b , d } { b , d , e } { a , b , c , d } { a , b , c , d , e } { e } { b , d , e }
Table 4. MTSL-BUFIA learning the grammar for the target language given in Example 1.
Table 4. MTSL-BUFIA learning the grammar for the target language given in Example 1.
2-FactorIntervenersTier(s)
a a { b , c } , { b , c , a } , { b , c , e , d } , { b , c , e , d , a } { a , b } , { a , c }
a c { b } , { a , b , c } , { b , c , e , d , a } { a , c , b }
a d { b , c , e } , { b , c , e , a } , { b , c , e , d } , { a , b , c , d , e } { a , d , b } , { a , d , c } , { a , d , e }
a e { b , c } , { a , b , c } , { b , c , a , e , d } { a , e , c } , { a , e , b }
b a { c } , { d } , { b , c } , { b , d } , { c , a , b } , { d , e , b } , { a , b , c , d }
{ c , e , b , d } , { d , a , b , c } , { a , b , c , d , e }
b e { c } , { d } , { b , c } , { b , d } , { b , d , e } , { a , b , c } , { b , c , d , e }
{ a , b , c , d } , { b , c , d , e } , { a , b , c , d , e }
c b { a } , { e } , { a , b } , { b , e } , { a , b , c } , { b , d , e } , { a , b , c , e }
{ a , b , c , d } , { e , b , d , a } , { a , b , c , d , e }
c c { a , b } , { a , b , c } , { a , b , e , d } , { a , b , c , d , e } { c , a } , { c , b }
c d { b , e } , { b , d , e } , { a , b , c , e } , { a , b , c , d , e } { c , d , b } , { c , d , e }
d b { a } , { e } , { a , b } , { b , e } , { a , b , c } , { b , d , e } , { a , b , d , e }
{ a , b , c , e } , { a , b , d , e } , { a , b , c , d , e }
d c { a , b } , { a , b , c } , { a , b , d , e } , { a , b , c , d , e } { d , c , a } , { d , c , b }
d d { b , e } , { b , d , e } , { e , b , a , c } , { a , b , c , d , e } { d , b } , { d , e }
e a { b , d } , { b , d , e } , { a , b , c , d , e } { a , b , e } , { a , d , e }
e c { b , d , a } , { b , d , a , e } , { b , d , a , c } , { a , b , c , d , e } { e , c , a } , { e , c , b } , { e , c , d }
e d { b } , { b , d , e } , { a , b , c , d , e } { e , d , b }
e e { b , d } , { b , d , e } , { b , d , a , c } , { a , b , c , d , e } { b , e } , { e , d }
Table 5. MTSL-BUFIA computing the grammar for the tree language given in Section 3.1.
Table 5. MTSL-BUFIA computing the grammar for the tree language given in Section 3.1.
2-FactorIntervenersTier(s)
m { l } { l , x } { l , x , c } { l , m , x , c } { m , l }
{ l , c } { l , m , x }
{ l , m , c }
c { l } { l , x } { l , c , m } { c , l }
{ l , c }
m m { l } { l , c } { m , l }
m c { l } { m , c , l }
c l { m } { m , c , l }
c { m } { m , x } { m , c , x } { m , l , c , x } { m , c }
l { m } { m , x } { m , c , x } { m , l , c , x } { m , l }
{ m , c } { m , l , x }
m Y m { l } { l , c } { l , c , x } { l , m , c , x } { m , l }
m Y c { l } { l , x } { l , m , x } { l , m , c , x } { m , c , l }
c Y m { l } { l , x } { l , m , x } { l , m , c , x } { m , c , l }
c Y c { l } { c , l }
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Swanson, L. Syntactic Learning over Tree Tiers. Logics 2026, 4, 5. https://doi.org/10.3390/logics4020005

AMA Style

Swanson L. Syntactic Learning over Tree Tiers. Logics. 2026; 4(2):5. https://doi.org/10.3390/logics4020005

Chicago/Turabian Style

Swanson, Logan. 2026. "Syntactic Learning over Tree Tiers" Logics 4, no. 2: 5. https://doi.org/10.3390/logics4020005

APA Style

Swanson, L. (2026). Syntactic Learning over Tree Tiers. Logics, 4(2), 5. https://doi.org/10.3390/logics4020005

Article Metrics

Back to TopTop