2.1. Syntactic Recursion and Recursive Spell-Outs
To understand the theoretical and conceptual consequences of the syntax–prosody mapping and how the mapping of recursive syntactic structures can lead to prosodic recursion, it is important to define what syntacticians understand a recursive structure to be.
Karlsson (
2010) makes a clear distinction between syntactic recursion due to embedding and syntactic iteration: “recursion builds structure by increasing embedding depth, whereas iteration yields flat output structures, repetitive sequences on the same depth level as the first instance” (p. 45). We illustrate this distinction with the following English sentences:
(2) | a. | Sydney thinks [cp1 that Robin believes [cp2 that Charlie won the competition]]. |
| b. | Robin saw the essay [cp1 that Sydney wrote for the department [cp2 which hired Charlie]]. |
| | |
(3) | a. | Charlie [[[went] vp1 quickly] vp2 by train] vp3]. |
| b. | That was [dp a [np nice [np1 beautiful [np2 sunny [np3 day]]]]]. |
The sentences in (2) are clear examples of recursion, where the recursive CP yields extra (self-) embedding. (2a) involves a typical embedding involving verbs that select for complement clauses. (2b) involves noun phrases with relative clauses, with the first relative clause (CP
1) containing another noun phrase with a relative clause (CP
2). These are also cases of nested structures. (3a,b), on the other hand, are edge additions, which, depending on your structural assumptions, can be considered to be iterations rather than recursive structures.
1 Another case of potential recursion concerns the so-called VP-shell structures (
Larson 1988), which have often been used to analyze double complement sentences or sentences involving a PP-goal, such as those in (4a,b).
(4) | a. | Nora gave a sweater [pp to Sydney]. |
| | |
| b. | Nora gave Sydney a sweater. |
The structure in (5a), below, is the Larsonian structure for dative PPs, as in (4a), and (5b) is a structure that is often used post-
Larson (
1988) for double object complements, as in (4b).
2 In both (5a) and (5b), the higher V selects for another VP, leading to a typical recursive structure, one that involves “self-embedding” (see
Nevins et al. 2009).
(5) | |
It should be noted that even works in the pre-Minimalist Program era do not consider there to be recursion between the PP and the VPs or between the DP
Sydney and the PP in (5a). In both cases, we have clear-cut domination of phrase markers. However, these involve different categories, and thus there is no recursive embedding involved.
3 Indeed, if syntactic domination between XPs were considered a form of recursion, then there would never have been any controversy concerning whether Pirahã constitutes an exception to the claim that syntactic recursion is a defining property of human language (see
Nevins et al. 2009).
Since the Minimalist Program (
Chomsky 1995), the Larsonian view of double object structures has been called into question (see, e.g.,
Beck and Johnson 2004, as well as the overview article by
Citko et al. 2017). More updated structures are given in (6a,b). The structure in (6a) reflects the possessive relation between the two objects, represented by a small clause-like structure (i.e., the PP is a small clause, see
Harley 2002). (6b) retains roughly the idea that the two objects are in one XP (small clause), with a subject–predicate relation.
(6) | |
Note that neither (6a) nor (6b) are recursive structures (not even forms of “self-recursion”), as there are no multiple layers of VP. Furthermore, it is important to note that the node that dominates the verb and its objects,
vP, is not considered to be a lexical projection (see
Chomsky 1995 and subsequent work).
4 In short, in contrast to the Larsonian representations in (5), in current syntactic theory, VP-recursion does not characterize double object cases like those represented in (6a,b).
Chomsky (
2001) makes it very clear that syntactic derivation is cyclic and that the phonological cycle is not an independent cycle. In formal terms, derivations proceed in phases. When the derivation reaches a phase, it is handed over (spelled-out) to the phonological component (and the semantic component).
5 Chomsky (
2001) considers both
vP and CP to be phases (and DP is a potential phase). In other words, during a derivation of a sentence involving an embedded clause like (2a) (structure in (7a)), there is a recursive process of phasal spell-out. In the literature, the recursive process of phasal spell-out is called “multiple spell-out” (see
Epstein et al. 1998,
Uriagereka 1999 and
Chomsky 2000). In (7a), the
vP-phase of the embedded clause will first go through phasal spell-out where it is transferred to the phonological component. The next spell-out cycle is the embedded CP phase, and so on. The structure in (7b) indicates the structure of a noun phrase with a relative clause (like the sentence in (2b)), under a Kaynian analysis of relative clauses (
Kayne 1994). As we can see in (7b), this yields phases embedded inside the DP (plus two additional phases are inside the DP
the department which hired Charlie) and thus recursive phasal spell-out.
(7) | |
The structure in (8) gives the complete representation of a sentence including both a direct and an indirect object. As we have seen in (6), the verb and both objects are within the
vP, belonging to the
vP-phase, and the whole sentence is another phase, CP. Given the structure in (8), a sentence involving a direct object and an indirect object has two phasal spell-outs.
(8) | |
Kratzer and Selkirk (
2007) were among the first to adopt a phasal spell-out approach to syntax–prosody mapping within the SPMH framework (see also
Ishihara (
2007)). If phases indeed play a role in syntax–prosody mapping, we need to consider the implications phasal syntax has for prosodic recursion. We have seen that in narrow syntax, we have a process of phasal spell-out, which is by definition recursive. In other words, within the minimalist framework, the phonological component is built up cyclically based on the phasal spell-outs. It thus follows that recursive prosodic structure building is motivated by recursive phasal spell-outs in an interface theory like SPMH (1), where prosodic structure mirrors syntactic structure. We demonstrate this formally in the next section.
2.2. Mapping Prosodic Structure with Syntactic Structure
We follow work such as
Bennett and Elfner (
2019),
Féry (
2016),
Hamlaoui and Szendroï (
2015),
Ito and Mester (
2012,
2013),
Nespor and Vogel (
1986),
Selkirk (
2009,
2011), and
Truckenbrodt (
1995,
1999,
2007) in assuming the following universal syntax–prosody correspondences:
(9) | Syntactic category | Prosodic category |
| Clause (CP) | Intonation Phrase (ι) |
| lexical XP | Phonological Phrase (φ) |
As
Ito and Mester (
2012,
2013) and
Selkirk (
2011) make especially clear, it is of central theoretical importance to explicitly define the syntactic constituents that correspond to each prosodic category, as otherwise meaningful cross-linguistic comparisons of the phonological and phonetic properties of the Phonological Phrase and Intonation Phrase are not possible. We refine the definitions of the relevant syntactic categories in (9) in two ways. First, we adopt
Cheng and Downing’s (
2007,
2009,
2016) proposal that phase-based domains—e.g., not only CP but also
vP—correspond to the prosodic category of the Intonation Phrase (ι). This proposal finds motivation in
Chomsky’s (
2004, p. 124) suggestion that, semantically, it is natural to think that both
vP and CP are phases because they are both “propositional constructions.” Propositional constructions are the constituents that most naturally correspond to the Intonation Phrase. As we saw in the preceding section, CP-internal phases stand in a syntactic embedding and recursion relationship with CP, and this provides an additional motivation for mapping them to the same prosodic constituent as CP. In addition, we adopt
Féry’s (
2011, p. 1909) proposal that the Phonological Phrase maps XP arguments. We assume that the distinction between lexical XPs vs. functional XPs defines a crucial distinction between the types of constituents that the Phonological Phrase vs. Intonation Phrase can correspond to, following work such as
Truckenbrodt (
2007) and
Selkirk (
2011, p. 453). Since
vP is a not a lexical category, this provides yet another motivation for not mapping it to the Phonological Phrase, but rather to the Intonation Phrase. Finally, like most of the work cited in this paper, we assume that the syntax–prosodic correspondence in (9) is formally expressed using mutual
AlignL/R or
Match constraints, which evaluate the mapping between specific prosodic constituents and specific syntactic ones and, by default, are high-ranked.
Syntax is not the only factor defining the parse into prosodic constituents. From the earliest literature on the phonology–syntax interface (e.g.,
Nespor and Vogel 1986,
Selkirk 1986), it has been demonstrated that the mapping of prosodic structure with syntactic structure is not perfect, i.e., there are mismatches and adjustments, motivated by prosodic principles, which are characterized by some (e.g.,
van der Hulst 2010, p. 304) to involve “flattening” of the structure. Based on an analysis of prosodic phrasing in Zulu (Bantu S.42, South Africa), we examine next what type of recursive prosodic structures we expect if we have relatively close mapping between phasal spell-outs and prosodic constituents, also drawing attention to mismatches.
We begin with examples from Zulu that illustrate a partial mapping between phases and ι. The consistent prosodic correlate of ι in Zulu is phrase penultimate vowel lengthening. As
Cheng and Downing (
2007,
2009,
2016) show, only the right edge of the phase defines a right ι boundary, as this is the locus of phonological cues to prosodic structure. This is shown in (10a,b), where curly brackets indicate ι boundaries and an acute accent indicates a High tone. In (10a,b), we have not indicated the
vP-phase boundary, as it coincides with the CP-phase boundary at the right edge:
(10) | a. | { [úm-fúndísi | ú-fúndel-ê: | ábá-zal’ | ín-cwa:di]cp }ι | | |
| | 1-teacher | 1-read.to-perf | 2-parent | 9-letter | | |
| | ‘The teacher read to the parents a letter.’ | | |
| | | | | | | |
| b. | { [ú-Síph’ | ú-fún’ | [ úkúth’ | ú-Thándi | á-théng’ | í-bhayiséki:li]cp] }ι ]cp }ι |
| | 1-Sipho | 1-want | that | 1-Thandi | 1-buy | 5-bicycle |
| | ‘Sipho wants Thandi to buy a bicycle.’ | | |
The prosodic phrasing in (10b) and (11), which parses the entire sentence into a single ι, further shows that the left-edge of each syntactic phase does not correlate with a prosodic boundary in Zulu. A separate constraint—and an assumption of exhaustive parsing—is responsible for the left-edge alignment of ι only with the outermost CP (i.e., root clause or illocutionary clause in
Selkirk’s (
2005,
2011) terminology), leading to a mismatch between syntactic phases and prosodic constituents. This is shown schematically in (11), which includes the
vP-phase boundaries.
(11) | { [cp subject | [vP verb | [cp that | subject | [vP verb | object ]} ]} ]} ]} |
That is, even though each
vP and each CP are syntactic phases, only the right edge of each of these phases is correlated with the prosodic cues, which condition an ι boundary in this analysis (see
Cheng and Downing 2016 and
Bonet et al. 2019 for detailed discussion of this kind of edge asymmetry in Zulu and other Bantu languages).
The syntax–prosody mapping algorithm that we are motivating here builds upon the notion of phasal spell-outs. Regardless of whether the mapping is done cyclically, with the prosodic cycle embedded in the syntactic cycle, it is clear from the Zulu data that the phase is the constituent that is mapped to ι’s. Since phasal spell-out is a recursive process, as explained in the preceding section, the prosodic structure building necessarily leads to recursive ι’s if the prosody–syntax mapping constraints outrank
NoRecursion.
6 What is further evident from (10) is that the mapping has an edge bias, which is further supported by sentences with a relative clause in the subject noun phrase like (12a), where the phrasing contrasts with that of sentences with a relative clause in the object noun phrase like (12b). Note that penult vowel lengthening is found at the right edge of each phase while there continue to be no prosodic cues to the left edge of each phase. This is particularly evident in the case of object relative clauses, as in (12b). That is, these data show that the recursive ι‘s that parse the phases (CP and
vP) can have recursive prosody at the right edge but not at the left:
(12) | a. | {[úm-fúndísi | [ó-thól-ê: | ín-dánda:tho]cp }ι | ú-zo-thóla | úm-klóme:lo] cp }ι |
| | 1-teacher | rel.1-find-Perf | 9-ring | 1-fut-get | 3-rewards |
| | ‘The teacher who found the ring will get a reward.’ |
|
| b. | {[si-phul’ | [ím-baz’ | é-théngw-é | námhlâ:nje] cp) }ι cp }ι | |
| | we-break | 9-axe | rel.9-be.bought-perf | today | |
| | ‘We broke the axe that has been bought today.’ | |
Assuming the raising analysis of relative clauses (e.g.,
Kayne 1994),
Cheng and Downing (
2007) show that (a) the head-noun of a relative clause phrases prosodically with the relative clause; (b) only the right-edge of the CP phase conditions an ι boundary, and (c) in Zulu (as in many other languages), the DP does not have a phasal status.
7 This is confirmed by the lack of any prosodic cues motivating a corresponding ι boundary associated with DPs (also in (10a,b)). Note again that the relative clauses and the matrix clauses in (12a,b) have recursive ι boundaries (mirroring recursive CP-phasal spell-outs as well as recursive
vP-phasal spell-outs).
The constraints and tableau below, adapted from
Cheng and Downing (
2016), formalize the analysis. The relevant syntactic category to capture the generalizations for Zulu is the phase. Following the universal syntax–prosody correspondence principles discussed for (9), the phase maps to the Intonation Phrase. The edge asymmetry is captured by asymmetric alignment constraints: the right edge of every phase maps to an ι, to satisfy (13a,b), while only the left edge of the root clause maps to ι, to satisfy (13c,d).
8(13) | a. | AlignR[Phase, IntPh] (AlignR-Phase): Align the right edge of every phase (vP/CP) with the right edge of an Intonation Phrase (IntPh). |
| b. | AlignR[IntPh, Phase] (AlignR-IntPh): Align the right edge of every Intonation Phrase (IntPh) with the right edge of a phase (vP/CP). |
| c. | AlignL[IntPh, Root CP] (AlignL- IntPh): Align the left edge of every Intonation Phrase with the left edge of a Root CP. |
| d. | AlignL[CP, IntPh] (AlignL-Root CP): Align the left edge of every root CP with the left edge of an Intonation Phrase. |
The tableau in (14) exemplifies the analysis of a sentence like (12a); only crucial constraints are included for clarity of exposition. Note that the syntax–prosody mapping constraints are, by default, equally high-ranked:
(14) | [úm-fúndísi [ó-thól-ê: ín-dánda:tho]cp ú-zo-thóla úm-klóme:lo] cp | AlignL-IntPh | AlignR-Phase | AlignR-IntPh | No Recursion |
a. | [úm-fúndísi [ó-thól-ê: ín-dánda:tho]cp ú-zo-thóla úm-klóme:lo] cp { } } | | | | * |
b. | [úm-fúndísi [ó-thól-ê: ín-dánda:tho]cp ú-zo-thóla úm-klóme:lo] cp { } | | *! | | |
c. | [úm-fúndísi [ó-thól-ê: ín-dánda:tho]cp ú-zo-thóla úm-klóme:lo] cp { } { } | *! | | | |
Candidate (14a) is optimal, as it only violates low-ranked
NoRecursion, the constraint penalizing recursive prosodic structure (
Selkirk 1995;
Truckenbrodt 1995,
1999). As we can see in this candidate, ranking
AlignR above
NoRecursion optimizes recursive prosodic structure to mirror recursive syntactic structure. While Candidates (14b, c) satisfy
NoRecursion, Candidate (14b) is non-optimal, as the right phase edge following the relative clause does not map to the right edge of ι. Candidate (14c) is non-optimal, as it contains a left ι constituent edge that does not map to the left edge of a root/illocutionary CP.
Cheng and Downing (
2016) show that this approach straightforwardly accounts for prosodic phrasing of relative clauses, in particular, not only in Zulu but also in other languages, leading to analyses where prosodic recursion is justified by syntactic recursion as defined in
Section 2.1.
9All of the examples from Zulu that illustrate nested, embedded recursion in the syntax lead to recursive prosodic structure at the level of the Intonation Phrase (ι). We turn now to analyses appealing to recursion in prosodic units smaller than ι, namely, the Phonological Phrase (φ). A striking case that appears to motivate recursive φ structure is the assignment of final High tone in Chimwiini (Bantu G.412, Somalia), as demonstrated in
Kisseberth (
2005,
2010a,
2010b,
2017). Consider first the examples in (15a,b) from
Kisseberth (
2010a, pp. 225–27), where we see that by default all Phonological Phrases in Chimwiini, defined as in (9), are realized with a High tone on the phrase penult syllable.
10(15) | a. | (mw-aalímu)φ | (wa-somelele | w-áana) φ | (chúwo) φ |
| | 1-teacher | 2om-read to | 2-children | 7.book |
| | ‘The teacher read a book to the children.’ |
| | |
| b. | (m-pholeze | (cháayi) φ | (ka | chi-jámu) φ | |
| | s/he-cooled.down | 7.tea | with | 7-saucer | |
| | ‘S/he cooled down the tea with a saucer.’ |
Sentences in (16a-d), below, from
Kisseberth (
2005, pp. 142–43) illustrate cases with first and second person subject prefixes on the verb, rather than third person subject prefixes, as in the above examples. As we can see, these subject prefixes trigger a final High tone rather than a penult High tone: compare the sentences in (16a,b). These examples also show that the High tone motivated by the morphology of the verb is realized on the final word of the Phonological Phrase containing the verb, not on the verb itself. Note in (16a) that a verb with a third-person subject prefix triggers the default penult High tone realization in the Phonological Phrase, just as the default cases we see in (15a,b). Strikingly, in both (16c) and (16d), we see that each verbal complement is realized with a final High tone. Kisseberth accounts for this, as shown in (16c), by proposing that the indirect object and the direct object are recursively parsed into a φ that includes the triggering verb, and in (16d), the instrumental phrase is also recursively parsed into a φ that includes preceding verbal complements as well as the triggering verb:
(16)11 | a. | (n-jilee | namá)φ | Cf. | (wa-jilee náma)φ |
| | I-ate | meat | | they-ate meat |
| | ‘I ate meat.’ | | ‘they ate meat’ |
| | | | | |
| b. | sí)φ | (chi-lele | ma-sku | ma-zimá)φ |
| | we | we-slept | night | whole |
| | ‘We slept the whole night.’ |
| | |
| c. | ((ni-m-lisile | mweenziwá)φ | deení)φ | |
| | I-1obj-pay.to | 1.friend | debt | |
| | ‘I paid my friend the debt.’ |
| | |
| d. | (((ni-m-tindilile | mwaaná)φ | namá)φ | kaa | chi-sú)φ |
| | I-1obj-cut.for | 1.child | meat | with | knife |
| | ‘I cut for the child meat with a knife.’ |
In
Kisseberth’s (
2005,
2010a,
2010b,
2017) analysis, the domain of final High tone assignment in Chimwiini is the recursive φ that contains the triggering verb. Kisseberth accounts for the repeated assignment of the final High tone as follows. As shown in (16), recursive φ structure allows the triggering verb to be contained in all of the φ’s that realize the final High tone. Repeated occurrences of the final High tone are accounted for by assigning a High tone to the final syllable of every recursion of the φ that contains the triggering verb. This might be considered a form of High tone agreement.
An alternative analysis, which does not rely on prosodic recursion of φ, is provided if we consider the simplified syntactic structure of (16d) in (17). We assume that verbs in Chimwiini, such as verbs in other Bantu languages, raise to a position higher than
vP, but lower than TP (see
Julien 2002 among others). In (17), the verb has moved via
v0 and the head of the Applicative Phrase (Appl
0) to a higher head below T
0, indicated as γ
0, which hosts the final vowel, expressing aspectual or mood information.
(17) | |
|
As indicated in (17), each DP argument maps to a φ. The apparently recursive nature of final High tone assignment in this analysis continues to follow from requiring each φ within the domain of the triggering verb to be realized with a High tone on the final syllable. However, in this alternative analysis, the domain of the final High tone assignment does not follow from syntactic or prosodic recursion, as each DP argument is parsed in an independent prosodic domain. Our alternative analysis is based on the observation that, in structures like in (17), the verb has moved up to γ
0 via the phasal head
v0, and the domain of the
vP-phase is therefore extended. More specifically, the phrasal projection of γ
0, i.e., γP, can be considered to be the extended phase (see
Den Dikken 2007,
Gallego 2010 among others, or more flexible phasal definition in
Bošković 2014).
12 In other words, in (17), the γP is the phase that is the relevant syntactic domain for High tone “agreement” with the verb.
Following
Cheng and Downing’s (
2016) approach in which phases are prosodically mapped to ι, the Chimwiini pattern can be accounted for by aligning a High tone to the right edge of every φ within the domain of the ι that maps to the phasal γP. This is comparable to the recent proposal by
Sande et al. (
2020) that long-distance morphologically conditioned phonological effects are typically bound by phases. The ι that corresponds to the phasal γP is defined formally by adapting
Cheng and Downing’s (
2009,
2016) analysis of prosodic domains in Zulu, sketched above. As shown in (14), above, each phasal domain is right-aligned with ι. The CP that defines the root clause is left- and right-aligned with ι.
To account for the fact that the phasal γP alone defines a domain for the final High tone assignment, we need a motivation for also left-aligning an ι boundary with that phasal domain. We observe that Chimwiini shares another property with Zulu, namely, that subjects and other preverbal DPs are phrased separately from what follows. In Zulu, this phrasing is optional for subjects; however, in Chimwiini, it is obligatory, as
Kisseberth’s (
2017) detailed study shows. Note that most Bantu languages are
pro-drop languages: a subject marker is obligatorily realized on a main clause verb, but an overt co-referential subject DP is optional. As work since, at least,
Bresnan and Mchombo (
1987) observes, subject markers therefore ambiguously have both grammatical and anaphoric agreement properties when an overt subject DP occurs. This ambiguity paves the way for subject DPs to be analyzed either as a clause-external topic or as a clause-internal subject. In Chimwiini, we propose, following
Cheng and Downing’s (
2009,
2016) analysis of similar phrasing patterns in other Bantu languages, that the subject is actually a clause-external topic, adjoined to the root clause CP on a separate plane and therefore phrased separately from it, just as other left-dislocated constituents are:
13,14(18) | |
Formally, this phrasing is achieved by left-aligning the phase beginning the root clause with ι. The subject/topic is parsed into its own ι to satisfy the constraint
StrongStart (19), which penalizes unparsed or recursively parsed material at the left edge of a prosodic domain (see
Selkirk 2011, p. 470 for a similar formal analysis of preposed DPs in XiTsonga).
(19) | StrongStart (Selkirk 2011; Figure [38]) |
| A prosodic constituent (π) optimally begins with a leftmost daughter constituent that is not lower in the prosodic hierarchy than the constituent that immediately follows: * ( πn πn+1 … |
The analysis of the Chimwiini example (16b), which incorporates
StrongStart into the constraint set motivated for Zulu, is exemplified in the tableau in (20);
NoRecursion is omitted from this tableau as it is too low-ranked to determine the optimal output. Note that
StrongStart must outrank the syntax–prosody correspondence constraints
AlignR(Phase) and
AlignR(IntPh) in order to optimize a prosodic parse that does not mirror syntactic constituency:
(20) | Parse into Intonation Phrase (ι); curly brackets indicate ι boundaries |
| [CP (sí)φ [CP [γP chi-lele (masku mazimá)φ ]]] | AlignL (CP) | Strong Start | AlignR-(Phase) | AlignR- (IntPh) |
a. | {[CP (sí)φ {[CP [γP chi-lele (masku mazimá)φ ]]}} | | *! | | |
b. | {{[CP (sí)φ }{[CP [γP chi-lele (masku mazimá)φ ]]}} | | | | * |
c. | {[CP (sí)φ [CP [γP chi-lele (masku mazimá)φ ]]}} | *! | | | |
As shown, Candidate (20b), which parses the subject DP into both a Phonological Phrase and an Intonation Phrase, is optimal because this parse satisfies StrongStart. In this parse, both the first prosodic constituent (within the outermost Intonation Phrase) and the one that follows are Intonation Phrases. As a result, the first prosodic constituent is on the same level of the Prosodic Hierarchy as the second one. This contrasts with (20a), which violates StrongStart because the first prosodic daughter of the sentence is a Phonological Phrase, which is lower in the Prosodic Hierarchy than the second prosodic daughter, which is an Intonation Phrase. Candidate (20c) satisfies StrongStart: the initial Phonological Phrase is followed by unparsed material, which is necessarily lower in the Prosodic Hierarchy. However, this parse is non-optimal as it violates higher-ranked AlignL (CP).
To sum up, under this analysis, there is no need to appeal to recursive prosodic structure to account for the apparently iterative final High tone assignment in Chimwiini. This is a welcome result since there is no recursive syntactic structure to motivate a parse into recursive φ.