English-Learning Infants’ Developing Sensitivity to Intonation Contours

Megha Sundara; Sónia Frota

doi:10.3390/languages10070148

and

¹

Department of Linguistics, University of California, Los Angeles, CA 90095, USA

²

Center of Linguistics, Universidade de Lisboa, 1600-214 Lisbon, Portugal

^*

Author to whom correspondence should be addressed.

Languages2025, 10(7), 148;https://doi.org/10.3390/languages10070148

This article belongs to the Special Issue Advances in the Acquisition of Prosody

Version Notes

Order Reprints

Abstract

In four experiments, we investigated when and how English-learning infants perceive intonation contours that signal prosodic units. Using visual habituation, we probed infants’ ability to discriminate disyllabic sequences with a fall versus a rise in pitch on the final syllable, a salient cue used to distinguish statements from questions. First, we showed that at 8 months, English-learning infants can distinguish statement falls from question rises, as has been reported previously for their European Portuguese-learning peers who have extensive experience with minimal pairs that differ just in pitch rises and falls. Next, we conducted three experiments involving 4-month-olds to determine the developmental roots of how English-learning infants begin to tune into these intonation contours. In Experiment 2, we showed that unlike 8-month-olds, monolingual English-learning 4-month-olds are unable to distinguish statement and question intonation when they are presented with segmentally varied disyllabic sequences. Monolingual English-learning 4-month-olds only partially succeeded even when tested without segmental variability and a sensitive testing procedure (Experiment 3). When tested with stimuli that had been resynthesized to remove correlated duration cues as well, 4-month-olds demonstrated only partial success (Experiment 4). We discuss our results in the context of extant developmental research on how infants tune into linguistically relevant pitch cues in their first year of life.

Keywords:

pitch; development; prosody; statement; question; suprasegmental; attunement

1. Introduction

In recent years, there has been increasing interest in how infants tune into linguistically relevant pitch differences, which are essential for signaling native-language prosody. Across languages, pitch can signal differences in word meaning (e.g., tone differences in Mandarin or pitch accent differences in Japanese) or mark edges of phrasal units, usually to convey syntax, pragmatics, and discourse meanings (e.g., English or Portuguese).

We know from existing research that how monolingual infants’ initial abilities and language experience interact to influence perception of pitch depends on the linguistic function of pitch. For example, French-learning newborns can detect differences between groups of disyllabic Japanese words with a High–Low compared to Low–High pitch accent (Nazzi et al., 1998), as can Japanese-learning 5- and 10-month-olds (Sato et al., 2009). Similarly, infants with and without experience with Limburgian pitch accents can successfully distinguish them at 6, 9, and 12 months of age (Ramachers et al., 2018). Thus, infants can detect pitch accents from birth, and this ability is not affected by language experience.

In contrast, language experience plays a key role in infants’ developing ability to distinguish tones, as has been demonstrated for consonant and vowel. Infants under 6 months, regardless of whether they are learning a tone language, can typically distinguish minimal pairs differing in tone (Mattock & Burnham, 2006; Mattock et al., 2008; Yeung et al., 2013; Liu & Kager, 2014; Shi et al., 2017; but see also Chen & Kager, 2016). In the absence of experience with a tone language, in the second half of their first-year, infants’ ability to distinguish tone contrasts may (a) reduce (e.g., Mattock & Burnham, 2006; Mattock et al., 2008; Shi et al., 2017; Yeung et al., 2013), (b) be maintained (e.g., Liu & Kager, 2014; Shi et al., 2017), or (c) even be facilitated (e.g., Chen & Kager, 2016; Singh et al., 2018; Tsao, 2017). However, the decline in discrimination of some tone contrasts among non-tone-language learners is temporary, and infants recover the ability to distinguish tone contrasts by their second year (e.g., Liu & Kager, 2014). No such decrease in discrimination has been observed for infants tested on native-language tone contrasts (e.g., Tsao, 2017; Yeung et al., 2013). Interestingly, infants’ initial abilities to distinguish non-native tone contrasts early in infancy seems to be limited to cases where they are tested with minimal pairs, that is, without segmental variability. For instance, when tested with more-varied segments (Frota et al., 2016), European Portuguese-learning infants fail to distinguish groups of disyllabic sequences that differ in Mandarin Chinese tones. Thus, although infants display some initial ability to distinguish tone contrasts, it has been proposed that age, language experience, and the acoustic salience of the tone contrast influence tone discrimination in infancy (see (Kalashnikova et al., 2024) for an overview and a multi-lab report (Liu et al., 2024)).

In this paper, we investigate infants’ discrimination of pitch differences used to delineate phrasal units, specifically statements from questions. The use of pitch differences at the phrasal level, referred to as intonation, is quite common cross-linguistically, albeit the details of how pitch is used to signal phrasal structure differ across languages (Jun, 2005, 2014; Ladd, 2008; Frota & Prieto, 2015). Pitch is used in many languages to distinguish statements, exclamations, or questions. Research on European Portuguese-learning infants shows that 5- to 6- as well as 8- to 9-month-olds can distinguish groups of two-syllable sequences in their native language with the characteristic falling pattern for statements from the rising pattern for questions (Frota et al., 2014). Based on these results, Frota et al. argue that European Portuguese-learning infants tune into language-specific phrasal pitch marking as early as 5 months of age.

These results from European Portuguese contrast with the findings obtained from infants learning English. English statements and yes/no questions are also distinguished by a falling versus rising pitch (e.g., Geffen & Mintz, 2017). However, unlike in European Portuguese, English statements and questions additionally differ in lexical items and word order. English-learning infants ranging in age from 4.5 to 24 months have been reported to fail to distinguish English statements from yes/no questions when stimuli differ in pitch but have an identical word order (Soderstrom et al., 2011). Only when word-order cues are available in addition to intonation do English-learning 7-month-olds reliably distinguish statements from questions (Geffen, 2014; see also Best et al., 1991). By 12 months, word order alone is enough to allow English-learning infants to distinguish statements from questions (Geffen & Mintz, 2015). What remains unclear, then, is whether English-learning infants can distinguish yes/no questions from statements based on pitch differences alone.

In Experiment 1, we tested whether English-learning 8-month-olds could distinguish pitch differences used to distinguish questions from statements, as in European Portuguese. In Experiments 2–4, we tested 4-month-olds to determine the developmental trajectory of infants’ emerging sensitivity to intonation. Together, these results have implications for our understanding of how initial perceptual sensitivities to pitch contours are reorganized by language experience and thus bear on theories of perceptual development.

2. Experiment 1: Can English-Learning 8-Month-Olds Distinguish Statements from Questions Based on Pitch Differences Alone?

In Experiment 1, we used the European Portuguese disyllabic sequences used previously by Frota et al. (2014) to test English-learning 8-month-olds. We did so for a number of reasons. In European Portuguese, questions and statements are distinguished minimally. Questions differ from statements in that they have a longer final syllable, which carries a pitch rise; in contrast, statements have a final pitch fall on a syllable that is roughly the same duration as the penultimate syllable.

Using European Portuguese stimuli allowed us to systematically vary the intonation contour, controlling for segmental content. This was important to us because in previous experiments wherein English-learning infants failed to discriminate statements from questions, there was substantial variability in the English intonation contours, particularly within statements (Soderstrom et al., 2011; see also Geffen, 2014). There was also substantial variability in segmental content and word order across trials because the stimuli in both sets of experiments consisted of longer sentences. So, it is unclear whether English-learning infants in these experiments were challenged by segmental variability, within-category pitch variation, or their inability to integrate the two.

In fact, we know from previous research that English-learning infants are not insensitive to pitch used for phrasal marking. English-learning infants can use intonation to distinguish closely related languages like English and German between 5 and 7 months of age (Chong et al., 2018). They are also able to use pitch cues to detect clause boundaries (Seidl & Cristia, 2008), and, in fact, English-learning 6-month-olds rely on pitch cues to detect clause boundaries (Seidl, 2007). In light of these findings, the failure of English-learning infants to distinguish statements from questions is puzzling.

In Experiment 1, we tested English-learning 8-month-olds on minimally contrastive pitch contours using the stimuli used by Frota et al. (2014). Using the segmentally varied European Portuguese stimuli (and the exact same procedure) allowed us to compare our results directly with those of Frota et al. (2014). Such a cross-linguistic comparison using the exact same stimuli is critical in order to identify whether patterns of development are language-specific as well as whether infants’ performance can be attributed to the specific properties of the stimulus used (see also Houston et al., 2000; Van Ommen et al., 2020). Recall that European Portuguese-learning infants are able to discriminate intonation differences in signaling questions and statements at 5 and 8 months of age. If English-learning 8-month-olds are able to distinguish the statements and questions in natural European Portuguese disyllabic sequences, we can conclude that segmental variability does not pose a challenge for distinguishing pitch categories.

To confirm that English-learning 8-month-olds were relying solely on pitch differences, we also tested them using duration-neutralized resynthesized stimuli. As mentioned above, natural European Portuguese disyllabic questions and statements differ in pitch, specifically a pitch rise vs. a pitch fall. Additionally, they also differ in the duration of the rise- or fall-carrying syllable. The syllables with rises are longer than the ones with a fall. We were motivated to resynthesize the stimuli given that differences in the duration of syllables in English signal stress (Crystal & House, 1990) and English-learning 8-month-olds can distinguish segmentally varied disyllables that vary in stress alone (Skoruppa et al., 2011). If English-learning 8-month-olds can distinguish between statements and questions in European Portuguese using pitch alone, we expected them to succeed with duration-neutralized resynthesized stimuli in addition to the natural stimuli previously used by Frota et al. (2014).

2.1. Methods

2.1.1. Participants

The final sample included 48 monolingual English-learning 8-month-olds (with an average age of 250 days, with a range of 224:293, 23 of whom were female). Half were tested on the natural stimuli, and the other half were tested on duration-neutralized stimuli. Infants were only included if they had at least 90% exposure to English, as assessed using a detailed parental-language questionnaire (Sundara & Scutellaro, 2011), and no exposure to a tone language. Average exposure to English was 98% (90:100). All infants were full-term, had no history of speech language or hearing disorders, and healthy on the day of testing, as confirmed by the parents. An additional 10 kids were tested, but they were excluded from analysis because they became fussy (4) or did not habituate in the maximum number of trials (5) or because of equipment malfunction (1).

2.1.2. Stimuli

The stimuli are described in detail in Frota et al. (2014) and available online in the supplementary material for that paper. They consisted of 16 disyllabic sequences with sonorants, produced with initial stress as statements or yes/no questions. As is typical in European Portuguese, statements were produced with a falling intonation, whereas yes/no questions were produced with a falling–rising pattern. All sequences were produced in an infant-directed register by a female native speaker of European Portuguese. The significant pitch differences between the statements and questions were restricted to the second syllable. The f0 declined by about 25 Hz on the second syllable of statements, whereas it increased by about 192 Hz in questions, resulting in differences in the final f0 (163 Hz vs. 380 Hz). Additionally, the second syllable of questions (392 ms) was significantly longer than that of statements (232 ms).

Next, to eliminate the differences in duration between statements and questions, we re-synthesized the stimuli using PSOLA in PRAAT (Boersma, 2001; method previously utilized by Chong et al., 2018). The script is available on the project OSF page. First, the original pitch contours were extracted from syllables produced with either a statement or question intonation. Next, these contours replaced the original pitch contour on disyllabic sequences produced with a question intonation. The resulting statement and question stimuli consisted of disyllables produced with identical duration profiles but with either an f0 fall characteristic of statements or the falling–rising intonation typical of questions. The duration profile of the question was selected as the base because the second syllable is longer in the question, and this meant that there was a longer duration for the pitch difference to be realized. A perceptual assessment made by 3 phonetically trained listeners indicated that it was also more salient. The resynthesized stimuli are available on the project OSF page.

2.1.3. Procedure and Design

Infants were tested using the visual fixation procedure. They were seated on their parent’s lap 3.5 feet away from a 46-inch TV screen in a dimly lit room. The monitor was used to present visual displays. A Canon HD camera was placed below the monitor to record the infant’s gaze. Stimulus presentation was controlled through Habit X (Cohen et al., 2004) from a computer in an adjacent room. The experimenter and the parent wore Peltor headphones over which masking audio was played so they could not influence the infant’s behavior.

All testing was done as described in Frota et al. (2014). At the beginning of each trial, a looming circle appeared on the TV to attract the infant’s attention to the display. Once the infant looked at the screen, a black and white checkerboard was presented, concomitant with audio stimuli. The presentation of the audio stimuli was completely contingent on the infant’s gaze. If the infant looked away for more than 2 s, a new trial was signaled by the looming stimuli. Maximum trial duration was 16 s, and a trial was repeated if the infant did not look at the screen in the first 5 s. Infant looking time to trials was the dependent variable.

Testing was carried out in 2 phases—a habituation phase and a test phase. Half the infants were habituated to statements, and the other half were habituated to questions. As in Frota et al.’s (2014) study, the sliding average of 4 consecutive trials was monitored during habitation. When the looking time to the last 4 consecutive trials was less than 60% of the looking time to the first 4 trials, the habituation phase ended. Data obtained from infants who did not habituate within 25 trials were excluded from the final analysis.

Once infants were habituated, they were presented a control trial, where they heard items from the same category to which they were habituated, and a test trial, where they heard items from the other category. The order of presentation of the control and test trial was counterbalanced. We made one change to Frota et al.’s protocol. After the test phase, infants were presented with a post-test trial with repeated presentations of the item ‘pok’ in an animated voice. An infant’s data was excluded from the analysis if looking to the post-test trial was not higher than the average of the last 4 habituation trials. We did this to ensure that infants who did not dishabituate to the test trials were not simply disengaged from the task (Sundara et al., 2018).

2.1.4. Analysis

Infant looking-time data were analyzed using a linear mixed-effects model in R version 4.2 (R Core Team, 2021) using lme4. Because listening times are usually not normally distributed (as we confirmed with the Shapiro–Wilk test), we log-transformed them (Csibra et al., 2016), although we present the raw listening time in figures to allow comparison with published research; the pattern of results is the same even without the log transformation. The fixed effects included the between-subjects variables Habituation Stimuli (statement or question) and Stimulus type (full cue or duration-neutralized) and the within–subjects variable Trial type (control or test) and all interactions. All variables were dummy-coded for ease of interpretation, with the reference level identified when describing the finding. Additionally, the model included a random intercept for subjects to allow for differences in baseline listening times. This was the highest-level random-effects structure that converged (Barr et al., 2013). When necessary, planned comparisons were performed using the emmeans package in R (Lenth, 2025). De–identified raw data, model specifications, and complete model outputs are available on the project OSF site.

2.2. Results and Discussion

The raw looking-time data from Experiment 1 are presented in Figure 1. Only the main effect of Trial type was significant [F(1, 44) = 15.27, p = 0.0003]. Follow-up planned comparisons confirmed that the effect of Trial type was significant for the 8-month-olds tested on the natural, unedited European Portuguese stimuli [t(44) = 2.8, p = 0.008] as well as the re-synthesized stimuli with the duration differences neutralized [t(44) = 2.7, p = 0.009]. That is, in both conditions, 8-month-olds listened significantly longer to the test trials in comparison to the control trials.

Figure 1. Raw looking times (means, SE) for the control and test trials for 8-month-olds tested with 60% habituation criteria on variable-content disyllabic sequences that were fully unedited or resynthesized in order to neutralize duration differences. Individual subject data are superimposed on group averages.

These results confirm that English-learning 8-month-olds are able to distinguish pitch rises from falls when presented with segmentally varied disyllabic sequences. They successfully distinguished European Portuguese disyllables, even when the correlated duration difference was neutralized through re-synthesis. That is, monolingual English-learning 8-month-olds were able to distinguish statements from questions based on pitch differences alone. Furthermore, they behaved just like European Portuguese-learning 8-month-olds. Therefore, the previously reported difficulties in distinguishing statements from questions by English-learning infants cannot be attributed to limitations in detecting pitch differences used for phrase marking.

3. Experiment 2: Are English-Learning 4-Month-Olds like Their European Portuguese-Learning Peers?

In Experiment 2, we tested the developmental trajectory of English-learning infants’ sensitivity to pitch differences that signal phrasal prosody. For this, 4-month-olds were tested using the natural, unedited stimuli and the same methods used by Frota et al. (2014) and in Experiment 1. Recall that European Portuguese-learning 5-month-olds behave like 8-month-olds (learning European Portuguese or English) and can successfully distinguish between disyllables with question rises and those with statement falls. If language experience with minimally contrastive pairs is necessary, we would not expect English-learning 4-month-olds to succeed given that English has few such minimally contrastive pairs.

3.1. Methods

3.1.1. Participants

The final sample included 24 4-month-olds (with an average age of 126 days, with a range of 114:147, 14 of whom were female). The average exposure to English was 98% (91:100) for this group. An additional four 4-month-olds were tested but subsequently excluded from the final sample because they became fussy (3) or failed to habituate in the maximum number of trials (1). All the other inclusion criteria were identical to those in Experiment 1.

3.1.2. Stimuli, Procedure and Design, and Analysis

In Experiment 2, the infants were only tested using the full, unedited European Portuguese disyllabic stimuli. The procedure, design, and analysis were identical to those in Experiment 1. The fixed effects included the between-subjects variable Habituation Stimuli (questions or statements) and the within-subjects variable Trial type (control or test). The random intercepts for subject and log-transformed looking times, serving as the dependent variables, were the same as in Experiment 1.

3.2. Results and Discussion

The raw looking-time data from Experiment 2 are presented in Figure 2. Neither the main effect of Trial type [F(1, 22) = 2.3, p = 0.14] nor its interaction with Habituation Stimuli [F(1, 22) = 1.8, p = 0.2] was significant. Thus, we found no evidence that English-learning 4-month-olds can distinguish question rises from statement falls, even when using European Portuguese stimuli that were strictly controlled for within-category intonation variability. Because the young English-learning infants tested failed to distinguish question rises from statement falls, unlike their monolingual European Portuguese-learning peers, we can infer that the European Portuguese-learning 5-month-olds already displayed language-specific attunement.

Figure 2. Raw looking times (means, SE) for the control and test trials for 4-month-olds tested with 60% habituation criteria on full, unedited, variable disyllabic sequences. Individual subject data are superimposed on group averages.

4. Experiment 3: Can English-Learning 4-Month-Olds Succeed with Reduced Segmental Variability?

In Experiment 3, we reduced the segmental variability in the European Portuguese stimuli. We presented English-learning 4-month-olds with only the sequence /lamu/ produced with either question or statement intonation. We were motivated to do so by the parallels between the results from Experiments 1 and 2 and by previous findings on infants’ perception of lexical stress contrasts.

Lexical stress refers to the relative difference in the articulatory effort used to produce syllables in a word. As a result, compared to unstressed syllables, stressed syllables are louder, longer, and/or additionally marked by pitch. We know from previous research that introducing variability, specifically segmental variability, can increase the difficulty of distinguishing lexical stress contrasts. Infants under 6 months, whether they are learning Italian (Sansavini et al., 1997), English (Spring & Dale, 1977), German (Höhle et al., 2009), Spanish (Skoruppa et al., 2013), or French (Skoruppa et al., 2009), are sensitive to lexical stress when stimuli are tightly controlled for segmental content. Older infants learning a language like French that does not have lexical stress, however, have difficulty detecting lexical stress even when segmental content is controlled (Höhle et al., 2009; Bijeljac-Babic et al., 2012). Between 8 and 12 months of age, infants learning a language that has lexical stress, however, are able to distinguish lexical stress contrasts even when presented with segmentally varied items (Skoruppa et al., 2009, 2011).

We reasoned that if English-learning infants’ ability to discriminate pitch to mark phrasal prosody mirrors the developmental trajectory of lexical stress perception, 4-month-olds might succeed in distinguishing pitch differences if segmental variability is reduced. This would also be consistent with findings on tone perception. When tested with minimal pairs differing in tone, English-learning 4-month-olds have been shown to be able to successfully distinguish them (e.g., see (Mattock et al., 2008; see Liu & Kager, 2014) for similar findings regarding Dutch-learning infants), but European Portuguese-learning 5- to 6-month-olds fail to distinguish Mandarin tone contrasts when presented with segmentally varied sequences (Frota et al., 2016). So, when tested on a minimal pair, English-learning 4-month-olds could succeed by treating the pitch rise that is limited to the final syllable as tone.

To give English-learning infants every opportunity to succeed, we also tested them in an additional, more sensitive procedure. In the study by Frota et al. (2014), infants were habituated till the looking time for the last four trials was at least 60% of the looking time to the first four trials. As in Experiment 1, we habituated 4-month-olds to a 60% looking time decline criterion; we also added another group of infants who were habituated to a greater extent. In the latter group, the infants were habituated till the looking time for the last four trials was at least 50% of the looking time for the first four trials. The more stringent habituation criteria resulted in a more sensitive paradigm (Sundara et al., 2018; see also Bijeljac-Babic et al., 2012).

4.1. Methods

4.1.1. Participants

The final sample included 44 4-month-olds (with an average age of 127 days, with a range of 110:153, and 21 females). The subject inclusion criteria were identical to those applied in Experiment 1. On average, the infants had 99% exposure to English (range 90:100). An additional two infants were tested, but they were excluded from the final sample because they became too fussy (1) or did not habituate within the maximum number of trials (1).

4.1.2. Stimuli

In Experiment 3, infants were only tested on one disyllable, /lamu/, to limit segmental variability.

4.1.3. Procedure and Design

The procedure and design were identical to those applied in Experiment 1. Half the infants were habituated till their average looking time in the last four trials showed a decline of 60%, whereas the other half were habituated till their looking time declined by 50%. The latter group was thus habituated to a greater extent, making this the more sensitive condition.

4.1.4. Analysis

The analysis was identical to that in Experiment 1 except for our inclusion of Extent of Habituation (60%, 50%) as a between-subjects variable, in addition to Habituation Stimuli (statement or question). As in Experiments 1 and 2, the within-subjects variable was Trial type (control or test), and the dependent variable was log looking time.

4.2. Results and Discussion

The raw looking time data from Experiment 3 are presented in Figure 3. Only the main effect of Trial type [F(1, 40) = 14.2, p = 0.0005] and the interaction of Habituation Stimuli and Trial Type [F(1, 40) = 8.7, p = 0.005] were significant. Specifically, the effect of Trial type was significant for statements [t(40) = 4.8, p < 0.0001] but not questions [t(44) = 0.6, p = 0.6]. That is, the 4-month-olds were able to distinguish between statements and questions only when they had been habituated to the statements. The Extent of Habituation did not have a significant main effect, nor did it interact with any other variable (see OSF for full model output).

Figure 3. Raw looking times (means, SE) for the control and test trials for monolingual-English 4-month-olds tested with 60% and 50% habituation criteria on one disyllable /lamu/, broken down by Habituation Condition. Individual subject data are superimposed on group averages.

Our results are different from the results reported by Soderstrom et al. (2011), who habituated infants between 4 and 24 months of age to sentences with a question or statement intonation till the infants demonstrated a 65% decline in looking time. They found that infants listened longer to the question stimuli with a rising intonation, regardless of habituation condition.

In Experiment 3, unlike in Soderstrom et al.’s (2011) study, the 4-month-olds did not show an overall preference for question trials; we can see this in Figure 3, where the time the infants spent listening to statements (in the test trials) after having been habituated to the questions is numerically greater than the time spent listening to the questions themselves (in the control trials). Recall that our habituation criteria were overall more stringent than those used by Soderstrom et al. (2011) (60% and 50% compared to 65%). We think that the 4-month-olds also successfully habituated to the questions because of our more stringent habituation criteria, although the greater extent of habituation was not sufficient for them to switch their attention to the statements.

5. Experiment 4: Does Eliminating Correlated Duration Cues Enable 4-Month-Olds to Distinguish Question Rises from Statement Falls?

As is typical in European Portuguese, disyllables with question intonation have a rise on the final syllable that is significantly longer than the penultimate one. Thus, the question and statement /lamu/ differed in pitch movement as well as the duration of the constituent syllables. Note that in English, question stimuli are also usually longer than statements, as exemplified by the stimuli in Soderstrom et al. (2011). Thus, infants could succeed in distinguishing statements from questions by listening to the pitch differences, the duration differences, or both.

However, recent research indicates that infants’ ability to group sequences that vary in duration, as opposed to pitch, improves with age. Young infants either fail to group sequences that vary solely in duration, like rats (Bion et al., 2011; de la Mora et al., 2013), or are only successful at grouping them at older ages (Hay & Saffran, 2012; Yoshida et al., 2010). Even when they are able to group sequences by both pitch and duration, it has been claimed that it is easier to group sequences according to the former than the latter (Abboub et al., 2016). So, it has been argued that infants learn to tune into duration differences only as a result of language experience.

Also consistent with this idea, the ability to integrate pitch with duration information to detect prosodic boundaries emerges in infants between 6 and 8 months of age, but only in infants learning German, not French (Wellmann et al., 2012; Van Ommen et al., 2020) or Dutch (Johnson & Seidl, 2008).

There is some evidence that English-learning infants might tune into duration cues only in the second half of their first year. Specifically, we know based on word segmentation research that English-learning infants only gradually become sensitive to duration differences, at least those that accompany differences in vowel quality that distinguish stressed from unstressed syllables (e.g., Beckman, 1986). As a result, 9- but not 7-month-olds use stressed syllables that are longer, and have a more peripheral vowel, to find words (Thiessen & Saffran, 2003). This is also consistent with findings showing that older infants, namely, English-learning 8- and 12-month-olds, can successfully discriminate two-syllable words with an initial and final stress, even when the stimuli are segmentally varied (Skoruppa et al., 2011).

It is then possible that English-learning 4-month-olds are still in the process of tuning into duration differences and thus cannot use them. However, the duration differences are salient enough that the infants are unable to ignore them completely, making the task of distinguishing between European Portuguese statements and questions harder. In Experiment 4, we eliminated the duration differences in the final syllable by re-synthesizing the European Portuguese stimuli, as in Experiment 1. We tested 4-month-olds with the sensitive 50% habituation criteria. We reasoned that if infants are able to distinguish pitch but are confounded by the additional duration difference, they should succeed when tested with the re-synthesized, duration-neutralized stimuli.

5.1. Methods

5.1.1. Participants

The final sample included 22 4-month-olds (with an average age of 131 days, with a range of 120:138, and 10 females). The subject inclusion criteria were identical to those in Experiment 1. The average exposure to English for the 4-month-olds was 99% (91:100). An additional two infants were tested, but they were excluded from the final sample because they became too fussy (1) or did not habituate within the maximum number of trials (1).

5.1.2. Stimuli

The segmental variability was reduced in this experiment, as in Experiment 3, by using only one disyllabic sequence, /lamu/, that was re-synthesized, as described in Experiment 1.

5.1.3. Procedure and Design

The procedure and design were identical to those in previous experiments. All infants were habituated till their looking time declined by 50%.

5.1.4. Analysis

The analysis was similar to that in Experiment 3. Habituation condition (statement or question) was the between-subjects variable, with Trial type (control or test) as the within-subjects variable, and the dependent variable was log looking time, with a random intercept for subject.

5.2. Results and Discussion

The raw looking time data from Experiment 4 are presented in Figure 4. Again, the main effect of Trial type [F(1, 20) = 6.06, p = 0.02] and the interaction between Habituation Stimuli and Trial Type [F(1, 20) = 4.8, p = 0.04] were significant. Just as in Experiment 3, the effect of Trial type was significant for statements [t(20) = 3.3, p = 0.004] but not questions [t(20) = 0.20, p = 0.8]. In Experiment 4 as well, despite neutralizing the duration difference, the 4-month-olds were only able to distinguish between statements and questions when they had been habituated to the statements.

Figure 4. Raw looking times (means, SE) for the control and test trials for monolingual-English 4-month-olds tested with 50% habituation criteria on one disyllable, /lamu/, neutralized for duration differences; results are presented separately for the two Habitation Conditions. Individual subject data are superimposed on group averages.

6. General Discussion

In four experiments, we tested whether English-learning infants are able to distinguish question rises from statement falls based on pitch differences alone (the results are summarized in Table 1). To ensure that the stimuli systematically varied in intonation contours while segmental content was controlled for, we used European Portuguese disyllabic sequences. In European Portuguese, questions and statements are minimally contrastive, with a difference in pitch (and duration) on the final syllable. The stimuli we used had opposite directions of pitch change, with a difference of over 200 Hz towards the end of the disyllable. That is, there was a substantial difference in pitch between the two intonation contours. We used this cross-linguistic approach in order to isolate language-specific developmental changes from effects that are restricted to the specific stimulus properties.

Table 1. Summary of findings from Experiments 1–4.

In Experiments 2–4, we found that monolingual English-learning 4-month-olds’ ability to detect even this large pitch difference signaling phrasal prosody is limited. We found no evidence that 4-month-olds could distinguish questions from statements when they had to group segmentally varied disyllables (Experiment 2). The 4-month-old infants only partially succeeded when tested with a single disyllabic sequence /lamu/ (Experiments 3 and 4), distinguishing between the two only when they were habituated to statements but not questions. An analysis combining the data from all three sub-experiments using the single disyllable /lamu/ (n = 66) also confirmed that 4-month-olds show asymmetry in discrimination. Thus, our failure to find evidence of discrimination by monolingual English-learning 4-month-olds cannot be attributed to a lack of power due to a small sample size.

It is also clear that the monolingual English-learning 4-month-olds tested did not simply treat this difference in pitch on the second syllable as tone. We know this because if that were the case, they should have succeeded in distinguishing statements from questions. To date, regardless of whether infants are learning a tone language, they have been reported to distinguish minimal pairs (with no segmental variability) that differ only in tone before 6 months of age (see (Kalashnikova et al., 2024) for a summary).

One way in which the stimuli in our experiments differed from those used to test tone perception is that we used a disyllabic sequence, even under our reduced segmental variability conditions, whereas the published research on tone perception focuses on monosyllabic words. The English-learning 4-month-olds’ failure to distinguish falling and rising pitch on the second syllable of a two-syllable word reported here is inconsistent with their reported success is distinguishing a falling from rising tone in monosyllables (Mattock & Burnham, 2006; Mattock et al., 2008). This discrepancy provides additional support for the idea that the developmental timeline of attunement to pitch differences is modulated by the linguistic function of pitch.

So, why did the 4-month-olds fail to distinguish between questions and statements? Soderstrom et al. (2011) argue, and we agree, that infants in general might prefer questions with a final rising pitch. This preference is likely rooted in infants’ long-documented preference for a higher, more variable pitch (e.g., Papoušek et al., 1990; Trehub et al., 1984), possibly because of its association with positive affect and its role in drawing infants’ attention (e.g., Broesch & Bryant, 2015; Fernald & Kuhl, 1987; Trainor et al., 2000). That would explain why in Soderstrom et al.’s (2011) experiment with 65% habituation criteria, infants displayed an overall preference for questions. We used a more stringent habituation criteria (60% and 50%) precisely to ensure that infants would be habituated even to the question rises. As a result, in our Experiments, the infants did not demonstrate an overall preference for questions. Nonetheless, it is possible that the English-learning 4-month-olds failed to switch their attention to the statements, even after they were habituated to the question intonation, because of a latent preference for rising intonation (see also Oakes, 2010). If this is true, then we might expect English-learning 4-month-olds to succeed in a preference experiment, where they are presented questions versus statements, without habituation, indicating that they are able to distinguish between the two. We leave this for the future.

Given that, cross-linguistically, infants demonstrate a preference for higher, more variable pitch, the fact that European Portuguese-learning 5-month-olds’s succeeded where English-learning 4-month-olds failed must clearly be attributed to language experience. They distinguished question rises and statement falls even when tested with stimuli that were segmentally varied and in the presence of correlated duration cues. It is likely then that early experience with languages with robust correspondence between pitch distinctions and meaning, even phrasal, is necessary to overcome an initial preference for a rising pitch at the ends of phrases.

In the absence of such a robust correspondence, like in English (Bolinger, 1989; Frota, 2014; Pierrehumbert & Hirschberg, 1990), young infants’ ability to discriminate pitch differences early in infancy is limited, particularly when accompanied by segmental variability. Whether tuning into smaller differences in pitch or even those that involve changes in pitch timing (Butler et al., 2016) is possible in the absence of language experience remains to be determined.

By 8 months, however, English-learning infants’ ability to distinguish between questions with a rising pitch and statements with pitch falls improves. English-learning 8-month-olds succeeded when tested with varied segmental content, with or without correlated duration cues (Experiment 1). Thus, at 8 months of age, infants’ ability to distinguish pitch differences signaling large phrasal prosody is robust to the presence of correlated duration cues, just as has been reported for European Portuguese-learning 8-month-olds. Even in the absence of experience with robust correspondence between pitch and meaning, English-learning 8-month-olds were able to distinguish pitch differences signalling phrasal prosody.

The English-learning 8-month-olds’ success is particularly surprising given that we tested them on non-native stimuli. Recall that the original stimuli were recorded by a native Portuguese speaker. The stimuli consisted of sequences of sonorants, which are largely similar in the two languages, except for /r/. The vowels used were also similar to those in English but not identical in quality. Given the substantial literature indicating that infants tune into the phonetic categories of their native language within the second half of their first year, it is quite likely that, at least, the English-learning 8-month-olds detected the unfamiliarity of the stimuli. And yet they were able to detect the pitch difference when given the unfamiliar speech stimuli, attesting to the robustness of their ability.

The mismatch in the phonetic instantiation of the stimuli is also not likely to account for the limited success of the English-learning 4-month-olds. Phonetic perception has not been reported to be language-specific at this age. But, more importantly, the 4-month-olds had limited success even when tested just on /lamu/, which was selected because it was the most phonetically similar in English and Portuguese. That is, such phonetic differences are inevitable in any cross-linguistic comparison (e.g., Houston et al., 2000; Van Ommen et al., 2020). And we think that a cross-linguistic design provides many benefits overall.

The results from 8-month-olds reported here are different from those reported previously for English-learning infants. In Soderstrom et al.’s (2011) study, infants ranging in age from 4.5 to 24 months showed a preference for question rises but failed to discriminate question rises from statement falls. English-learning infants have only been reported to successfully distinguish questions and statements in previous research when the stimuli differ in word order in addition to intonation (Geffen, 2014; Best et al., 1991). Our findings are thus the first to demonstrate that English-learning infants are able to distinguish questions from statements in the first year of life using pitch alone.

There are two aspects of our design that, unlike previous experiments, allowed us to isolate infants’ sensitivity to intonation. First, our stimuli controlled for several extraneous variables that are correlated with questions and statements. In previous reports, the segmental content was variable, the question stimuli were longer, and there was significant variability in intonation, typically within the category of statements. In contrast, we were able to orthogonally manipulate the segmental content, the extent of habituation, and the presence of extraneous acoustic cues correlated with the question–statement distinction by using native stimuli produced by a European Portuguese speaker. Furthermore, by focusing our experiments on specific ages, we were able to uncover infants’ changing sensitivity to pitch cues.

By systematically manipulating segmental content, the extent of habituation, correlated duration cues, and the age of the infants, we showed that English-learning infants’ initial sensitivity to pitch differences signalling phrase boundaries is limited (observed only with /lamu/), and only later do they succeed when presented with variable segmental content. Such a developmental trajectory, wherein infants require experience to facilitate their ability to distinguish the pitch marking of phrasal structure abstracting away from segmental content, parallels what is observed for lexical stress. Only older infants, between 8 and 12 months old, are able to abstract away from variable segmental content to detect lexical stress, and they can only do so if they are learning a lexical stress language like English or Spanish (Skoruppa et al., 2009, 2011). Whether the ability to abstract away from segmental content to detect tone also improves with age remains to be determined.

In sum, we tested whether English-learning 4- and 8-month-olds could distinguish statements from questions based on pitch differences alone. We found that 8- but not 4-month-olds are able to use pitch differences to distinguish questions from statements, even when they vary in segmental content. These results confirm that the ability to perceive pitch differences that mark phrasal structure is affected by language experience early, by 5 months, in infants learning a language like European Portuguese with a tight correspondence between pitch and meaning. In the absence of specific language experience as well, the ability to perceive pitch differences marking phrasal structure is facilitated, but only by 8 months.

Author Contributions

Conceptualization, M.S.; methodology, M.S. and S.F.; formal analysis, M.S.; investigation, M.S.; resources, M.S.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.S. and S.F.; visualization, M.S.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a UCLA COR Faculty Research Grant to M.S. and UID/00214: Centro de Linguística da Universidade de Lisboa, https://doi.org/10.54499/UIDB/00214/2020 to S.F.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of University of California, Los Angeles (#10-001562; 10 December 2010—to date).

Informed Consent Statement

Informed consent was obtained from the caregivers of each infant participant involved in the study.

Data Availability Statement

Deidentified data are available from the OSF page for this project: https://osf.io/6dur3/?view_only=293abd1bfe96422684cccaf7434efa42, accessed on 8 February 2025.

Acknowledgments

We would like to thank Anya Mancillas, Victoria Mateu and Rosie Mejia for help with recruiting and testing infants. This article is a revised and much expanded version of Sundara et al. (2015), titled “The perception of boundary tones in infancy”, which was presented at the 18th International Congress of Phonetic Sciences. Glasgow, UK.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abboub, N., Boll-Avetisyan, N., Bhatara, A., Höhle, B., & Nazzi, T. (2016). An exploration of rhythmic grouping of speech sequences by French- and German-learning infants. Frontiers in Human Neuroscience, 10, 292. [Google Scholar] [CrossRef] [PubMed]
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. [Google Scholar] [CrossRef] [PubMed]
Beckman, M. E. (1986). Stress and non-stress accent. Foris. [Google Scholar]
Best, C. T., Levitt, A., & McRoberts, G. W. (1991, August 19–24). Examination of language-specific influences in infants’ discrimination of prosodic categories. The XIIth International Congress of Phonetic Sciences (pp. 162–165), Aix-en-Provence, France. [Google Scholar]
Bijeljac-Babic, R., Serres, J., Höhle, B., & Nazzi, T. (2012). Effect of bilingualism on lexical stress pattern discrimination in French-learning infants. PLoS ONE, 7, e30843. [Google Scholar] [CrossRef] [PubMed]
Bion, R. A. H., Benavides-Varela, S., & Nespor, M. (2011). Acoustic markers of prominence influence infants’ and adults’ segmentation of speech sequences. Language and Speech, 54(1), 123–140. [Google Scholar] [CrossRef]
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345. [Google Scholar]
Bolinger, D. (1989). Intonation and its uses: Melody in grammar and discourse. Stanford University Press. [Google Scholar]
Broesch, T., & Bryant, G. (2015). Prosody in infant-directed speech is similar across western and traditional cultures. Journal of Cognition and Development, 16, 31–43. [Google Scholar] [CrossRef]
Butler, J., Vigário, M., & Frota, S. (2016). Infants’ perception of broad and narrow focus. Language Learning and Development, 12(1), 1–13. [Google Scholar] [CrossRef]
Chen, A., & Kager, R. (2016). Discrimination of lexical tones in the first year of life. Infant and Child Development, 25(5), 426–439. [Google Scholar] [CrossRef]
Chong, A. J., Vicenik, C., & Sundara, M. (2018). Intonation plays a role in language discrimination by infants. Infancy, 23(6), 795–819. [Google Scholar] [CrossRef]
Cohen, L. B., Atkinson, D. J., & Chaput, H. H. (2004). Habit X: A new program for obtaining and organizing data in infant perception and cognition studies (Version 1.0). University of Texas. [Google Scholar]
Crystal, T. H., & House, A. S. (1990). Articulation rate and the duration of syllables and stress groups in connected speech. The Journal of the Acoustical Society of America, 88(1), 101–112. [Google Scholar] [CrossRef]
Csibra, G., Hernik, M., Mascaro, O., Tatone, D., & Lengyel, M. (2016). Statistical treatment of looking-time data. Developmental Psychology, 52(4), 521–536. [Google Scholar] [CrossRef] [PubMed]
de la Mora, D. M., Nespor, M., & Toro, J. M. (2013). Do humans and nonhuman animals share the grouping principles of the iambic-trochaic law? Attention, Perception & Psychophysics, 75(1), 92–100. [Google Scholar]
Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development, 10, 279–293. [Google Scholar] [CrossRef]
Frota, S. (2014). The intonational phonology of European Portuguese. In S.-A. Jun (Ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 6–42). Oxford University Press. [Google Scholar]
Frota, S., Butler, J., Lu, S., & Vigário, M. (2016, May 31–June 3). Infants’ perception of native and non-native pitch contrasts. The Speech Prosody Conference (pp. 692–696), Boston, MA, USA. [Google Scholar]
Frota, S., Butler, J., & Vigário, M. (2014). Infants’ perception of intonation: Is it a statement or a question? Infancy, 19(2), 194–213. [Google Scholar] [CrossRef]
Frota, S., & Prieto, P. (2015). Intonation in romance. Oxford University Press. [Google Scholar]
Geffen, S. (2014). When and how infants discriminate between declaratives and interrogatives. University of Southern California. [Google Scholar]
Geffen, S., & Mintz, T. (2015). Seven-month-olds discrimination of statements and questions. In Proceedings of the 36th Boston university conference on language development. Cascadilla Press. [Google Scholar]
Geffen, S., & Mintz, T. (2017). Prosodic differences between declaratives and interrogatives in infant-directed speech. Journal of Child Language, 44(4), 968–994. [Google Scholar] [CrossRef]
Hay, J. F., & Saffran, J. R. (2012). Rhythmic grouping biases constrain infant statistical learning. Infancy, 17(6), 610–641. [Google Scholar] [CrossRef]
Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., & Cutler, A. (2000). Cross-language word segmentation by 9-month-olds. Psychonomic Bulletin & Review, 7(3), 504–509. [Google Scholar]
Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). The development of language specific prosodic preferences during the first half year of life: Evidence from German and French. Infant Behavior & Development, 2, 262–274. [Google Scholar]
Johnson, E., & Seidl, A. (2008). Clause segmentation by 6-month-old infants: A cross-linguistic perspective. Infancy, 15, 440–455. [Google Scholar] [CrossRef]
Jun, S.-A. (Ed.). (2005). Prosodic typology: The phonology of intonation and phrasing. Oxford University Press. [Google Scholar]
Jun, S.-A. (Ed.). (2014). Prosodic typology II. Oxford University Press. [Google Scholar]
Kalashnikova, M., Singh, L., Tsui, A., Altunas, E., Burnham, D., Cannistraci, R., Chin, N. B., Feng, Y., Fern’andez-Merino, L., Götz, A., Gustavsson, L., Hay, J., Höhle, B., Kager, R., Lai, R., Liu, L., Marklund, E., Nazzi, T., Oliveira, D. S., … Woo, P. J. (2024). The development of tone discrimination in infancy: Evidence from a cross-linguistic, multi-lab report. Developmental Science, 27(3), e13459. [Google Scholar] [CrossRef]
Ladd, R. D. (2008). Intonational phonology. Cambridge University Press. [Google Scholar]
Lenth, R. V. (2025). Emmeans: Estimated marginal means, aka least-squares means. R package version 1.11.1-00001. Available online: https://rvlenth.github.io/emmeans/ (accessed on 8 February 2025).
Liu, L., & Kager, R. (2014). Perception of tones by infants learning a non-tone language. Cognition, 133(2), 385–394. [Google Scholar] [CrossRef] [PubMed]
Liu, L., Olstad, A. M. H., Gustavsson, L., Marklund, E., & Schwarz, I.-C. (2024). Developmental trajectories of non-native tone perception differ between monolingual and bilingual infants learning a pitch accent language. Infant Behavior & Development, 77, 102003. [Google Scholar] [CrossRef]
Mattock, K., & Burnham, D. (2006). Chinese and English infants’ tone perception: Evidence for perceptual reorganization. Infancy, 10, 241–265. [Google Scholar] [CrossRef]
Mattock, K., Molnar, M., Polka, L., & Burnham, D. (2008). The developmental course of lexical tone perception in the first year of life. Cognition, 106(3), 1367–1381. [Google Scholar] [CrossRef]
Nazzi, T., Floccia, C., & Bertoncini, J. (1998). Discrimination of pitch contours by neonates. Infant Behavior & Development, 21(4), 779–784. [Google Scholar]
Oakes, L. M. (2010). Using habituation of looking time to assess mental processes in infancy. Journal of Cognition and Development, 11(3), 255–268. [Google Scholar] [CrossRef]
Papoušek, M., Bornstein, M. H., Nuzzo, C., Papoušek, H., & Symmes, D. (1990). Infant responses to prototypical melodic contours in parental speech. Infant Behavior and Development, 13, 539–545. [Google Scholar] [CrossRef]
Pierrehumbert, J., & Hirschberg, J. (1990). The Meaning of Intonational contours in the interpretation of discourse. In P. R. Cohen, J. Morgan, & M. E. Pollack (Eds.), Intentions in communication (pp. 271–311). MIT Press. [Google Scholar]
Ramachers, S., Brouwer, S., & Fikkert, P. (2018). No perceptual reorganization for Limburgian tones? A cross-linguistic investigation with 6- to 12-month-old infants. Journal of Child Language, 45(2), 290–318. [Google Scholar] [CrossRef]
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. [Google Scholar]
Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology, 33, 3–11. [Google Scholar] [CrossRef]
Sato, Y., Sogabe, Y., & Mazuka, R. (2009). Development of hemispheric specialization for lexical pitch-accent in Japanese infants. Journal of Cognitive Neuroscience, 22(11), 2503–2513. [Google Scholar] [CrossRef]
Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Journal of Memory & Language, 57, 24–48. [Google Scholar]
Seidl, A., & Cristia, A. (2008). Developmental weighting in the weighting of prosodic cues. Developmental Science, 11, 596–606. [Google Scholar] [CrossRef] [PubMed]
Shi, R., Santos, E., Gao, J., & Li, A. (2017). Perception of similar and dissimilar lexical tones by non-tone-learning infants. Infancy, 22, 790–800. [Google Scholar] [CrossRef]
Singh, L., Fu, C. S. L., Seet, X. H., Tong, A. P. Y., Wang, J. L., & Best, C. T. (2018). Developmental change in tone perception in Mandarin monolingual, English monolingual, and Mandarin-English bilingual infants: Divergences between monolingual and bilingual learners. Journal of Experimental Child Psychology, 173, 59–77. [Google Scholar] [CrossRef]
Skoruppa, K., Cristià, A., Peperkamp, S., & Seidl, A. (2011). English-learning infants’ perception of word stress patterns. Journal of Acoustic Society of America, 130, EL50–EL55. [Google Scholar] [CrossRef]
Skoruppa, K., Pons, F., Bosch, L., Christophe, A., Cabrol, D., & Peperkamp, S. (2013). The development of word stress processing in French and Spanish infants. Language Learning and Development, 9, 88–104. [Google Scholar] [CrossRef]
Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastián-Gallés, N., Limissuri, R. A., & Peperkamp, S. (2009). Language-specific stress perception by 9-month-old French and Spanish infants. Developmental Science, 12, 914–919. [Google Scholar] [CrossRef]
Soderstrom, M., Ko, E.-S., & Nevzorova, U. (2011). It’s a question? Infants attend differently to yes/no questions and declaratives. Infant Behavior & Development, 34, 107–110. [Google Scholar]
Spring, D. R., & Dale, P. S. (1977). Discrimination of linguistic stress in early infancy. Journal of Speech Hearing Research, 20, 224–232. [Google Scholar] [CrossRef]
Sundara, M., Molnar, M., & Frota, S. (2015). The perception of boundary tones in infancy. In H. Little (Ed.), Proceedings of the 18th international congress of phonetic sciences. International Congress of Phonetic Sciences. [Google Scholar]
Sundara, M., Ngon, C., Skoruppa, K., Feldman, N. H., Onario, G. M., Morgan, J. L., & Peperkamp, S. (2018). Young infants’ discrimination of subtle phonetic contrasts. Cognition, 178, 57–66. [Google Scholar] [CrossRef]
Sundara, M., & Scutellaro, A. (2011). Rhythmic distance between languages affects the development of speech perception in bilingual infants. Journal of Phonetics, 39(4), 505–513. [Google Scholar] [CrossRef]
Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: Use of statistical and stress cues to word boundaries by 7- and 9-month-old infants. Developmental Psychology, 39, 706–716. [Google Scholar] [CrossRef] [PubMed]
Trainor, L. J., Austen, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Sciences, 11(3), 188–195. [Google Scholar] [CrossRef] [PubMed]
Trehub, S., Bull, D., & Thorpe, L. A. (1984). Infants’ perception of melodies: The role of melodic contour. Child Development, 55, 821–830. [Google Scholar] [CrossRef]
Tsao, F.-M. (2017). Perceptual improvement of lexical tones in infants: Effects of tone language experience. Frontiers in Psychology, 8, 558. [Google Scholar] [CrossRef]
Van Ommen, S., Boll-Avetisyan, N., Larraza, S., Wellmann, C., Bijeljac-Babic, R., Höhle, B., & Nazzi, T. (2020). Language-specific prosodic acquisition: A comparison of phrase boundary perception by French- and German-learning infants. Journal of Memory and Language, 112, 104108. [Google Scholar] [CrossRef]
Wellmann, C., Holzgrefe, J., Truckenbrodt, H., Wartenburger, I., & Höhle, B. (2012). How each prosodic boundary cue matters: Evidence from German infants. Frontiers in Psychology, 3, 580. [Google Scholar] [CrossRef]
Yeung, H. H., Chen, K. H., & Werker, J. F. (2013). When does native language input affect phonetic perception? The precocious case of lexical tone. Journal of Memory and Language, 68(2), 123–139. [Google Scholar] [CrossRef]
Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115, 356–361. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Stimuli	Age	Habituation Criteria	N	Discrimination?
Variable Disyllables; Full cue Variable Disyllables; Dur. Neutralized	8-months	60%	24	Yes
	8-months	60%	24	Yes
Variable Disyllables; Full cue	4-months	60%	24	No
Only /lamu/	4-months	60%	22	Statement: Yes
Only /lamu/	4-months	50%	22	Question: No
Only /lamu/; Dur. Neutralized	4-months	50%	22	Statement: Yes
Only /lamu/; Dur. Neutralized				Question: No