Lexical Category and Downstep in Japanese

: In pursuing the mapping between syntax and phonology/prosody, little attention has been paid to the kinds of syntactic information that can affect prosody. In this paper, we explore Japanese downstep, a process in phrasal phonology. What syntactic information affects downstep and what does not? Speciﬁcally, do lexical categories affect downstep? We investigate the effects of nouns, adjectives, and verbs in different syntactic settings (e.g., [X 1 [X 2 N]], [[X 1 X 2 ] N], predicative X) through production experiments. We found that adjectives in [X 1 [X 2 N]] may block downstep, whereas adjectives in other structures as well as nouns and verbs generally do not block it. We analyze this phonological patterning as being derivative of an interaction between syntactic structures and lexical categories.


Introduction
In the literature on the syntax-phonology interface, syntactic information is often considered visible in phonology (Nespor and Vogel 1986;Selkirk 1984;Truckenbrodt 1995, et seq.). This is particularly true with respect to sentence-level syntax, as reflected in prosody. For example, the contrast between a statement and question is often made using different intonation patterns. Since Japanese downstep is a process in phrasal phonology, it offers an excellent case for testing the hypothesis that syntactic information is mapped onto prosody. In Japanese downstep, an accented phrase triggers the phrase that follows to be rendered in a lower pitch register (e.g., Kubozono 1989;Pierrehumbert and Beckman 1988;Poser 1984), as in (1a). Figure 1a depicts the pitch contour. The accented word aóku 'blue' triggers the word that follows it, nagái 'long', to be rendered in a lower pitch (the acute accent mark indicates a vowel in the accented syllable), and nagái further triggers the next word, négi 'leek', to be rendered in an even lower-pitched register. In contrast, in (1b), as shown in Figure 1b, the word amaku 'sweet' does not trigger downstep because it is unaccented; thus, the pitch peak in the word that follows it, nagái 'long', is not as low as the pitch peak of the word in the same position, nagái, in (1a).
The major phrase (MaP) is the domain of downstep (see Igarashi 2015; Ishihara 2015 for a review). For example, in (1a) and Figure 1, focusing on the noun phrase aóku nagái négi 'green and long leek', the whole phrase constitutes a single MaP, with each word forming a minor phrase (MiP), a phrase that allows at most one accent: ((aóku) MiP (nagái) MiP (négi) MiP ) MaP . 1 Moreover, as discussed extensively in Section 2, downstep is reportedly sensitive to certain syntactic information, including whether a given constituent is a maximal projection (i.e., an XP) (Selkirk and Tateishi 1991), branching structures (e.g., Kubozono 1989Kubozono , 1992Ito and Mester 2013), and the part of speech of a given word Hwang 2016, 2019;Hwang and Hirayama 2021;Selkirk and Tateishi 1991). The left edges of relevant syntactic elements are presumably mapped onto the left edges of MaPs, which then block downstep.
The general question regarding the syntax-prosody interface we are concerned with in this paper is what kinds of syntactic information can affect phrasal phonology. An exploration of the literature on downstep shows that different syntactic structures and different kinds of syntactic boundaries have been discussed. What affects the process and what does not? In what follows, we address one specific aspect of this question on which the literature offers different views, namely whether and how parts of speech affect downstep. The main goal of this paper is to shed new light on this issue.
1. a. ane-wa aóku nagái négi to itta big sister-TOP blue long leek COMP say.PAST 'My big sister said, "Green and long leek"'.
b. ane-wa amaku nagái négi to itta big sister-TOP sweet long leek COMP say.PAST 'My big sister said, "Sweet and long leek"'.
Languages 2022, 7, x FOR PEER REVIEW 2 of 17 of relevant syntactic elements are presumably mapped onto the left edges of MaPs, which then block downstep. The general question regarding the syntax-prosody interface we are concerned with in this paper is what kinds of syntactic information can affect phrasal phonology. An exploration of the literature on downstep shows that different syntactic structures and different kinds of syntactic boundaries have been discussed. What affects the process and what does not? In what follows, we address one specific aspect of this question on which the literature offers different views, namely whether and how parts of speech affect downstep. The main goal of this paper is to shed new light on this issue.  This paper is organized as follows. Section 2 reviews the literature on downstep and factors that reportedly affect it. Section 3 presents the methodology of the production experiment that we conducted to test whether downstep realization is sensitive to parts of speech, or more specifically, whether adjectives, nouns, and verbs have different effects on the process. The results are reported in Section 4 and discussed in Section 5. Section 6 presents the study's conclusions.

Japanese Downstep and Syntax
This section reviews the literature on the interaction between Japanese downstep and syntax. The review shows that different kinds of syntactic information may block downstep, while not all syntactic boundaries affect it. We discuss the effects of maximal projection boundaries (Selkirk and Tateishi 1991), the boundary between the subject noun phrase (NP) and predicate verb phrase (VP) Ishihara 2016), relative clause boundaries , and parts of speech (Selkirk and Tateishi 1991;Hwang 2016, 2019;Hwang and Hirayama 2021) on downstep. Selkirk and Tateishi (1991) argue that the left edges of maximal projections, or XPs, block downstep. In other words, the left edges of XPs are mapped onto the left edges of MaPs, affecting the realization of downstep by blocking it. Ishihara (2019) reports the cumulative effects of XPs on metrical boost with downstep. Metrical boost is a rise of pitch at the beginning of a right-branching structure (Kubozono 1988(Kubozono , 1989(Kubozono , 1993. For example, Figure 1. Pitch curves of (1) [A female speaker]: (a) ane-wa aóku nagái négi to itta; (b) ane-wa amaku nagái négi to itta. (9). This paper is organized as follows. Section 2 reviews the literature on downstep and factors that reportedly affect it. Section 3 presents the methodology of the production experiment that we conducted to test whether downstep realization is sensitive to parts of speech, or more specifically, whether adjectives, nouns, and verbs have different effects on the process. The results are reported in Section 4 and discussed in Section 5. Section 6 presents the study's conclusions.

Japanese Downstep and Syntax
This section reviews the literature on the interaction between Japanese downstep and syntax. The review shows that different kinds of syntactic information may block downstep, while not all syntactic boundaries affect it. We discuss the effects of maximal projection boundaries (Selkirk and Tateishi 1991), the boundary between the subject noun phrase (NP) and predicate verb phrase (VP) Ishihara 2016), relative clause boundaries , and parts of speech (Selkirk and Tateishi 1991;Hwang 2016, 2019;Hwang and Hirayama 2021) on downstep. Selkirk and Tateishi (1991) argue that the left edges of maximal projections, or XPs, block downstep. In other words, the left edges of XPs are mapped onto the left edges of MaPs, affecting the realization of downstep by blocking it. Ishihara (2019) reports the cumulative effects of XPs on metrical boost with downstep. Metrical boost is a rise of pitch at the beginning of a right-branching structure (Kubozono 1988(Kubozono , 1989(Kubozono , 1993. For example, between the left-branching phrase [[náma-no 'raw-GEN' áyu-no 'ayu-GEN'] niói 'smell'] 'smell of raw ayu (fish)' and right-branching phrase [kowái 'terrible' [mé-no 'eye-GEN' yámai 'disease']] 'terrible eye disease' (examples from Kubozono 1989, pp. 33-34), downstep occurs in both phrases. However, metrical boost occurs on mé-no at the beginning of the right-branching, but not on áyu-no. Ishihara (2019) finds that the effect is larger when there are multiple left edges of XPs, although there is interspeaker variation. Ishihara (2016) reports that the left edge of a predicate VP variably blocks downstep, whereas  (and this study) do not find that this particular type of boundary affects downstep in this way.
Clause boundaries have not been intensively studied with downstep, although they have been argued to support the presence of the Intonational Phrase in Japanese (e.g., Kawahara and Shinya 2008;Selkirk 2009;Ishihara 2019). Furthermore,  tested whether relative clauses would affect downstep, finding that the left edge of the relative clause does not block downstep.
Lexical categories have been reported to affect downstep, but the literature does not agree on how they do this. Selkirk and Tateishi (1991)  The empirical results of Hirayama and Hwang (2016), Hwang and Hirayama (2021), and Kubozono (1992) are opposite to those reported in Selkirk and Tateishi (1991). Kubozono (1992) reports that downstep occurs in [N 1 [N 2 N 3 ]]. Hirayama and Hwang (2016) and Hwang and Hirayama (2021), like Selkirk and Tateishi, compared [A 1 [A 2 N]] and [N 1 [N 2 N 3 ]], and investigated whether downstep occurred. However, they increased the number of speakers in their experiment and adopted the more traditional definition of downstep found in the Japanese literature (e.g., Kubozono 1988Kubozono , 1989Kubozono , 1993Pierrehumbert and Beckman 1988;Poser 1984). 2 The results suggest that downstep occurred in [N 1 [N 2 N 3 ]], in particular at N 2 , the target of downstep, as in (2), but it was blocked at A 2 in [A 1 [A 2 N]]. The target words/phrases of the process are underlined. 3 Downstepped targets are indicated with ! in (2) to (5) below, although the head nouns are not marked, as that is not the focus of investigation here.   tested adjectives and verbs in their past forms, but retained the right-branching structure, as in (3) They also tested these parts of speech in the predicative position in non-relative clauses, as in (4): N-ga A, N-ga V.
b. An example of N-ga V magó-ga !nirámu grandchild-NOM stare '(Someone's) grandchild stares (at someone) disfavourably'.  report that downstep occurred in all conditions (i.e., (3) and (4)) but note that in (3), the pattern was much more robust in the verb condition (3b) than in the adjective condition (3a) in that all speakers showed downstep in the former while there was interspeaker variation in the latter.  extended the investigation of downstep to remaining possibilities researched in Hwang (2016, 2019) and Hwang and Hirayama (2021), and tested verbs in their nonpast forms in relative clauses, nouns accompanied by the past tense form of a copula in relative clauses, and nouns in the predicative position in non-relative clauses, as in (5) c. An example of N-ga N magó-ga !námi grandchild-NOM Nami '(My) grandchild is called Nami'. Table 1 recapitulates the results obtained in Hwang (2016, 2019), , and Hwang and Hirayama (2021) with respect to the presence and absence of downstep and particular parts of speech (nouns, adjectives, and verbs). It can be observed that adjectives may block downstep when they are in attributive use, modifying the head noun, whereas adjectives in predicative use do not block downstep, and nouns and verbs do not block downstep regardless of the type of use (attributive or predicative).  (2b) [shiroi [nagai mame]] 'white, long beans', does not sound quite natural in Japanese, although it is not ungrammatical. They propose that because of this unnaturalness, (some) speakers inserted a phonological phrase (i.e., MaP) boundary between the two adjectives, which resulted in the downstep being blocked. The same can apply to the other NPs that have an adjective in the past tense form, as in (3a). They point out that when two adjectives are used to modify a noun, the structure where the first one appears in the -te (or gerundive) form (i.e., [[A-te A] N]) sounds more natural (e.g., [[shiroku-te nagai] mame] 'white and long beans'). 5 To summarize past findings on possible syntactic effects on downstep in Japanese, clause boundaries such as the relative clause boundary do not appear to affect it, whereas phrase-level information (the left edges of XPs) may affect it. Another line of investigation is concerned with parts of speech, although there is debate as to which categories block the process.

Speech Materials
Of the different types of syntactic information, we focus on the relation between lexical categories and downstep. Specifically, we test the effect of adjectives, nouns, and verbs on downstep by paying careful attention to the structures of the test sentences. Recall from Section 2 that  point out that a particular structure (i.e., [X [X N]]) might have yielded unnaturalness, causing a MaP boundary to be inserted to block downstep, especially when adjectives are involved. In this experiment, we use structures in which the combination of two modifying items forms a constituent ([[X X] N]), rather than individually modifying the head noun, so that the sentences are more natural for the speaker. As shown in the following, the structure [[X X] N] involves a left-branching constituent [X X], but it is not recursive in that the first X does not modify the second X. We return to this structural aspect in Section 5.
We prepared two structures for nouns and verbs and one structure for adjectives. In all these structures, the target of downstep is X 2 in [[X 1 X 2 ] N]. First, for nouns, we prepared a structure in which X 1 and X 2 are nouns, and X 1 is accompanied by the particle -to 'and', as in (6a), as well as a structure in which X 1 and X 2 are nouns, and X 1 is accompanied by -de, the continuative form (or renyookei) of the copula -da, as in (6b). hot pot 'hot pot that is tasteless and has lamb' For verbs, we prepared a structure in which X 1 and X 2 are verbs, and X 1 is in the continuative form, as in (7a), and a structure in which X 1 and X 2 are verbs, and X 1 is in the -te form, as in (7b) Finally, in the structure we prepared for adjectives, X 1 and X 2 are adjectives, and X 1 is in the continuative form, as in (8). There are no adjective test phrases in which A 1 in [[A 1 A 2 ] N] is in the -te form, because adjectives are always accented in this form; hence, we cannot test whether (paradigmatic) downstep has occurred without a comparable phrase with an unaccented item in X 1 .
To test whether downstep occurs, we follow the traditional (i.e., paradigmatic) understanding of Japanese downstep (e.g., Kubozono 1989;Pierrehumbert and Beckman 1988;Poser 1984; see Note 2) and compare these phrases, as in (6)-(8), with others containing an unaccented item in the position that precedes the target. Downstep is judged to be present if the pitch peak of the target position in a phrase with an accented trigger, as in (9a), is lower than the pitch peak of the target position in a phrase with an unaccented trigger, as in (9b). 8 Recall that downstep is triggered by an accented word. Therefore, a phrase with an accented trigger, as in (9a), constitutes a downstep environment, whereas downstep would not occur if the trigger position is occupied by an unaccented item, as in (9b). 9. a. Accented trigger (= (8) For each of the five structures seen above (two for nouns, two for verbs, and one for adjectives), there are two accented vs. unaccented pairs, totalling 20 (5 × 2 × 2) phrases (see Appendix A for a full list of these phrases). Eight filler phrases are not discussed in this paper because they are intended for other studies. All 28 phrases are inserted into the carrier phrase ane-wa ____ to itta '(my) sister said ____'. The sentences are pseudo-randomized five times, yielding five lists for participants to read aloud.

Recording and Speakers
Recording was done in a studio with sound-attenuated walls. We used a Marantz digital recorder (PMD661) with a 44.1 kHz sampling rate and 24-bit quantization. The microphone was a unidirectional dynamic headset microphone (SHURE SM10A, frequency response: 50-15,000 Hz). There was a practice session at the beginning, in which sentences in the structures discussed in Section 3.1 were used, although they included lexical items different from the test items.
Twelve Japanese speakers from Tokyo and nearby areas participated in the study. They were all university students (1 male and 11 female speakers, mean: 19.2 years old, range: 18;3-21;0). Their dialect (i.e., Tokyo dialect) is comparable to the one discussed in previous studies on Japanese downstep. One speaker was removed from the analysis because her utterance was generally creaky, particularly during the test phrases. Tokens read with unexpected accentuation that went unnoticed during the recording were also removed from the analysis (n = 14). One speaker's recording was clipped for one list, such that there are four repetitions rather than five for that speaker. This left us with 1066 sentences for analysis.

Acoustic Analysis and Statistics
The peak f0 in each phrase including the carrier's subject/topic phrase ane-wa '(my) sister-TOP' was measured, as shown in (10), where the phrase boundaries are marked with a vertical bar (|). Figure 1 also illustrates the boundaries. The f0 measurements were performed using Praat (Boersma and Weenink 2020) and running a script called ProsodyPro (Xu 2013) on the phrase intervals made manually. To examine whether the target phrases were downstepped, linear mixed-effects analyses were conducted on the relationship between the peak f0s (Hz) of the target and trigger accentuation (accented vs. unaccented) using R (ver. 3.6.2, R Core Team 2019) and the lmerTest package. Speakers were included as random effects (random intercepts) in the model. Items were not included as random effects because when models with and without items as random effects were compared using anova, the results did not show any significant differences (p > 0.05) in any of the conditions reported below. Figure 2 shows the representative f0 contours of [[V 1 V 2 ] N] (7a) where V 1 is in the continuative form. The peak f0 of the target in the accented condition (solid line) is quite lower than that in the unaccented condition (dotted line) (see more contours in Figure 1 for (9)).  Table 2 provides the results. As seen, accentedness has a significant effect on the peak f0 values in all conditions examined, with the f0 peaks in the target position being higher when the trigger is unaccented than when it is accented. 9 These results can be interpreted as downstep being present in all the conditions examined in the study. Figures 3-5 show the mean f0 peak values for the adjective, verb, and noun tokens, respectively. The topic phrase is the subject ane-wa '(my) sister-TOP' in the carrier phrase. The figures indicate that in mean terms, the peak pitch of the target position is always higher when the preceding word (i.e., the one in the trigger position) is unaccented  Table 2 provides the results. As seen, accentedness has a significant effect on the peak f0 values in all conditions examined, with the f0 peaks in the target position being higher when the trigger is unaccented than when it is accented. 9 These results can be interpreted as downstep being present in all the conditions examined in the study. Figures 3-5 show the mean f0 peak values for the adjective, verb, and noun tokens, respectively. The topic phrase is the subject ane-wa '(my) sister-TOP' in the carrier phrase. The figures indicate that in mean terms, the peak pitch of the target position is always higher when the preceding word (i.e., the one in the trigger position) is unaccented (dashed lines), compared to when it is accented (solid lines), indicating that there is downstep in all cases. When we looked at individual graphs for each speaker (not given here), unlike in Hirayama and Hwang (2019), there does not seem to be interspeaker variation in the adjective patterns (or in the patterns of the other categories), and downstep is robustly found.

Discussion
Together with the results of relevant previous studies, the results of this research reveal that adjectives in a particular structure, i.e., [X [A N]], show either the absence of downstep or presence of variable patterns in downstep (see Table 3). In other words, these results cumulatively suggest that first, the structure [X [X N]] (where two items independently modify the head noun) is important since downstep may be blocked in this structure, while in others ([[X X] N] and N-ga X), downstep occurs irrespective of the parts of speech. Second, adjectives are different from nouns and verbs in that when put in [X [X N]], they may block downstep. How can these patterns be accounted for? Below, we first discuss the data in relation to the proposals put forward in the literature on Japanese downstep in terms of syntax-prosody mapping. The discussion shows that the mapping from the syntax can account for the contrasting pattern between the left-branching and right-branching structures and contrast between the patterns with adjectives and nouns. However, it still does not explain the difference between adjectives and verbs as seen in Table 3. We then discuss other areas, i.e., semantics and pragmatics, to explain the data. Table 3. Presence (yes) and absence (no) of downstep in Hwang (2016, 2019), , Hwang and Hirayama (2021), and this study.

Noun
Adjective Verb Before testing the syntax-prosody mapping proposals against the patterns in Table 3, we present the prosodic phrasing suggested by the data in Table 3 (11). When X 2 is an adjective, a MaP boundary can be inserted before it, or the whole phrase makes a single MaP (11a). 10 When X 2 is a noun or verb, the whole phrase makes a MaP (11b).

MaP phrasing for [X 1 [X 2 N]]
a. X 2 = A Variation between ((X 1 ) MiP  As reviewed in Section 2, Selkirk (2009) and Selkirk and Tateishi (1991) argue in line with Align Theory that the left edge of maximal projections (XPs) is aligned with the left edge of MaP, blocking downstep. This is not readily tenable given the presence of downstep in our data, for example, in the N 2 in [N 1 -no [N 2 -no N 3 ]] (see Table 3). A standard syntax would project a maximal projection NP for the N 2 . Thus, the left edge of this XP would be mapped onto the left edge of a MaP and downstep would be blocked. However, downstep was robustly found there. Kubozono (e.g., 1989, p. 59;1992, p (11a), to explain the downstep blockage when the second X is A. (Kubozono uses the branching structure in the explanation of the phenomenon called the metrical boost, which he analyzes as occurring in addition to downstep at the beginning of an intermediate constituent in the right-branching structure (Section 2)).
The idea of recognizing syntactic recursivity as reflected in recursive prosody in Japanese (and other languages) is discussed elsewhere as well (e.g., Mester 2012, 2013). In particular, Ito and Mester (2013, p. 34ff.) analyze the prosodic phrasing of left-branching NPs [[N-no N-no] N] (containing recursive NPs) and right-branching NPs [N-no [A-i N]] as an interaction of the syntax-prosody mapping that uses Match Theory (e.g., Selkirk 2009), where syntactic XPs are mapped onto recursive Phonological Phrases (ϕ), with phonological constraints (such as requirement for binarity for the Phonological Phrase and prohibition of recursivity). In their model, there is a possibility of prosodic phrasing that can assume additional bracketing for the right-branching structure. 11 In fact, in their syntactic bracketing (p. 34ff.), the NP with the right-branching syntax [[X 1 ] [[X 2 ] N]] is more complex than the NP with the left-branching one [[[X 1 ] X 2 ] N] in that it has more structure (note the additional bracket pair for the former). With this structural difference in mind, their syntax-prosody mapping would yield additional prosodic bracketing for the constituent [[X 2 ] N] in the right-branching NP: ((X 1 )ϕ min ((X 2 )ϕ min (N)ϕ min )ϕ)ϕ max . Recursive Phonological Phrases like this may explain the contrast between the left-branching and right-branching difference with downstep as seen in Table 3. Here, the intermediate Phonological Phrase ((X 2 )ϕ (N)ϕ)ϕ, in particular, the left boundary of this prosodic constituent, may block downstep. The variable nature of downstep blocking in Table 3 can be accounted for if the level of this constituent (intermediate ϕ) is acknowledged and the claim that prosodic effects become cumulatively strong as more boundaries coincide at higher levels (Fougeron 2001, et seq.; Ishihara 2019) is adopted. The boundary here is not as strong as the boundary at the maximal PhPhrase (ϕ max ), resulting in variable blocking of downstep.
Note, however, that in Table 3 downstep is not always (variably) blocked at the beginning of an intermediate constituent in a right-branching structure. It is blocked only when A is involved. This can be explained if we explore syntax-prosody mapping and adopt the architecture of prosodic hierarchy where a lower-level constituent is exhaustively contained in an immediately higher-level constituent as in the Strict Layer Hypothesis (e.g., Selkirk 1984;Nespor and Vogel 1986). In Table 3, the right-branching structure with A may actually involve an embedded clause, here a relative clause: [[X] RC [[A] RC N]]. 12 A in the past tense form, as in (3a), projects a relative clause. A in the nonpast tense form (with the suffix -i), as in (2b), may also do so (e.g., Kuno 1973;Yamakido 2005; see Yamakido 2000 for other references). If embedded clauses are mapped onto Intonational Phrases or PClauses (Ishihara 2019 and references therein), which is a level higher than the MaP/Phonological Phrase, the left edge of the embedded clause that houses the A is mapped onto the left edge of an Intonational Phrase/PClause. Assuming that the edges of a higher-level prosodic category coincide with those in an immediately lower level prosodic category, the left edge of the Intonational Phrase/PClause is at the left edge of a MaP, which would block downstep. This is also compatible with the fact that the A in the predicate position [Nga] subj [A] Pred does not block downstep: the left edge of the predicate AP does not coincide with a clause boundary to the exclusion of the subject NP (if we assume a definition of syntactic clause as containing both the subject NP and the predicate); thus, there is no Intonational Phrase/PClause boundary there. This line of account is also compatible with other patterns, especially with [N 1 -no [N 2 -no N 3 ]], since there is not a clause boundary to the left of N 2 , and thus no PClause boundary there.
The above accounts are all based on the syntax-prosody mapping hypotheses. Different syntactic branching structures are mapped onto different recursive PhPhrase structures. A syntactic clause is mapped onto an Intonational Phrase/PClause, and the syntactic difference between A and N in terms of their behaviour in clause projection provides different PClause phrasing. However, the pattern with V in Table 3 still cannot be explained, since V, having inflection regarding tense, would project a clause just like A if we assume a standard syntax. Then, the abovementioned accounts would predict that downstep would be blocked when V is involved in the right-branching structure [X [V N]], which is not the case in the data and downstep is robust there.
How are the differences in parts of speech, in particular the anomaly of A in the downstep data in Table 3, explained? The above discussion reveals that syntax may not be enough. That itself is not surprising because there are analyses in the literature on prosody that do not rely on syntax. One example is the information structure: focus is often said to affect prosodic phrasing. Other pragmatic cues have also been argued to be reflected in prosody such as illocutionary force (e.g., Selkirk 2009). 13 Below, we explore accounts in terms of semantics, another field that deals with meaning, pragmatics, and an interaction with the baseline condition (i.e., phrases with an unaccented trigger).
N, A, and V differ in terms of their denotation. Placed in the right-branching structure [X 1 [X 2 N]], in which the two Xs individually modify the head noun and thus do not form a constituent, the semantic properties of the categories in X 2 may cause a conflict with X 1 in parsing from X 1 to X 2 , in which case a MaP is created beginning with X 2 , blocking downstep. If the Xs are verbs, X 1 and X 2 are interpreted to have a certain semantic relation. For example, in [mayóu [nayámu magó]] (5a), although mayóu 'get lost' and nayámu 'worry' individually modify magó '(one's) grandchild', one can easily identify a causal relation between the actions the verbs denote, i.e., the grandchild gets lost and as a result, gets worried. We can generalize the semantic relation in question as being temporal as the verbs typically denote actions/events: the actions/events conveyed by V 1 and V 2 are interpreted in such a way that one of them temporally precedes the other. This temporally ordered relation ensures a natural flow between the two verbs in [V 1 [V 2 N]], which results in a single phonological phrase (MaP). 14 In contrast, when X 2 in [X 1 [X 2 N]] is an adjective (i.e., [A 1 [A 2 N]], [V.past [A.past N]]), since adjectives denote a state (or a property) and not an event, X 1 and X 2 cannot forge a temporal relation of the kind observed with verbs. For example, [shirói [nagái mamé]] 'white, long beans' (2a), an example of [A 1 [A 2 N]], is difficult (if not impossible) to interpret in such a way that the two states described by the adjectives are temporally ordered. Rather, since adjectives convey a state, when the two Xs are adjectives (i.e., [A 1 [A 2 N]]), the states of A 1 and A 2 temporally coexist without being related in terms of any temporal precedence. 15 In order to refer to such a situation (e.g., beans that are white and long), native speakers would probably disfavour the rightbranching structure [A 1 [A 2 N]] (2a). They would rather prefer to use another construction, for example, the one in which the first adjective is in the continuative form, [[shiró-ku nagái ] mamé], as in (8) ] 'a grandchild who stared disfavourably and was tired' (3a)), since verbs are involved, a temporal relation is expected between the verb action/event and adjective state. However, [V.past [A.past N] fails to be interpreted in such a way that the state of A begins after the event of V (begins and) ends or that the event of V begins after the state of A (begins and) ends. Crucially, however, it can be interpreted to mean that the event of V begins after the state of A has begun. For example, in (3a), the event of the grandchild staring disfavourably began after the start of their state of being tired. In this interpretation, there is no temporal precedence relation between the event of V and state of A in a strict sense, but it is still possible to claim that there is a partial temporal precedence relation between the two. The existence of this relation can be the source of the interspeaker variation in the presence of downstep in [V.past [A.past N]] (see Table 3): Some speakers needed a strict temporal precedence relation between the two Xs in [X 1 [X 2 N]] in order to make a single MaP over them. As such, they inserted a MaP boundary at the left edge of A in [V [A N]]. If a noun appears in X 2 in [X 1 [X 2 N]] (i.e., [N 1 -no [N 2 -no N]], [V.past [N datta N]]) when a verb is involved, i.e., in [V.past [N datta N]], a semantic relation similar to the one found in verbs exists between the two Xs. When accompanied by datta, the past tense form of a copular, a noun can denote an event (e.g., dame-datta can mean 'failed'), and thus it is verb-like, which creates a certain temporal precedence relation between the V and N. In [N 1 -no [N 2 -no N]], the semantically underspecified nature of the particle -no and frequency may play a role. It has been argued (e.g., in den Dikken and Singhapreecha 2004) that Japanese -no, like, for example, English of and Spanish de, is a linker that encodes a variety of semantic relations for the nouns it links. Because of this, and presumably with the help that the construction N-no N is very frequent in Japanese, speakers can smoothly parse from N 1 to N 2 in [N 1 -no [N 2 -no N]], creating a single downstep domain.
Another possible factor for the MaP boundary to the left of A in [X [A N]] is focus. 17 One possible interpretation of the blocking of downstep is that the speakers somehow emphasized the A, which resulted in placing the focus on that element. (The source of the focus placement could be the unnaturalness created by the structure and category A as discussed above.) Since focus can be realized even when the item is downstepped (e.g., Ishihara 2016), a study is necessary to test this account in which we carefully control for focus in examining downstep in [X [A N]].
Another possible line of explanation for the absence of downstep in [X [A N]] is based on variable phrasings in the baseline condition, i.e., phrases with an unaccented trigger. In Figure 3, the f0 peaks from the Trigger to Target show a downtrend in the baseline condition ([[AA] N] U), which is not observed, at least not to the same degree, in Figure 4 (verb) and Figure 5 (noun). This can suggest that there is a MiP boundary between X 1 and X 2 in [[X 1 X 2 ] N] when X 1 is unaccented and adjectives are involved, while this is usually not the case and the two Xs would make a single MiP when X 1 is unaccented. If there is indeed a MiP boundary with adjectives but not with verbs and nouns, this may partly explain the different patterns that adjectives demonstrate compared to nouns and verbs. This should also be further investigated.

Conclusions
We investigated questions with respect to the syntax-phonology interface with particular focus on downstep in Japanese. What kinds of syntactic information can affect phrasal phonology such as downstep? Do particular parts of speech affect downstep? If so, what does that mean linguistically? The results of the production experiment in this study, together with past research results, suggest that adjectives may block downstep if they are in a particular syntactic position of right-branching [X 1 [X 2 N]], that is, with the two Xs individually modifying the head noun. However, adjectives did not affect downstep in other structures such as [[X 1 X 2 ] N], where the two Xs form a left-branching constituent to the exclusion of N, or in the predicate position of non-relative clauses (N-ga X). Nouns and verbs did not affect downstep when they were located in any of these positions.
We explored accounts in terms of syntax-prosody mapping hypotheses discussed in the literature, finding that the potential site for downstep blocking, i.e., the beginning of an intermediate constituent in the right-branching structure, can be accounted for with recursive PhPhrases: an intermediate level PhPhrase may block downstep. The contrastive pattern between the N and A can be explained as the clause projection that A makes being mapped onto the PClause as opposed to no such projection with the N. However, syntax-prosody mapping cannot explain the difference between A and V. We then explored several possible factors in terms of semantics and pragmatics. Different parts of speech have different kinds of denotations, and an unsuccessful semantic relation held between X 1 and X 2 results in a prosodic phrasing such that a MaP boundary is inserted between the two Xs. Thus, the interaction between the syntactic structure and parts of speech explains the phonological patterning in the phrasal phonology. Institutional Review Board Statement: The study was conducted in accordance with the Declaration of Helsinki, and approved by Seikei University Research Ethics Committee (protocol code SREC 17-17, approval date 6 October 2017).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author with the permission of the participant. The data are not publicly available due to the permission status of the data.