1. Introduction
In recent years, there has been increasing interest in how infants tune into linguistically relevant pitch differences, which are essential for signaling native-language prosody. Across languages, pitch can signal differences in word meaning (e.g., tone differences in Mandarin or pitch accent differences in Japanese) or mark edges of phrasal units, usually to convey syntax, pragmatics, and discourse meanings (e.g., English or Portuguese).
We know from existing research that how monolingual infants’ initial abilities and language experience interact to influence perception of pitch depends on the linguistic function of pitch. For example, French-learning newborns can detect differences between groups of disyllabic Japanese words with a High–Low compared to Low–High pitch accent (
Nazzi et al., 1998), as can Japanese-learning 5- and 10-month-olds (
Sato et al., 2009). Similarly, infants with and without experience with Limburgian pitch accents can successfully distinguish them at 6, 9, and 12 months of age (
Ramachers et al., 2018). Thus, infants can detect pitch accents from birth, and this ability is not affected by language experience.
In contrast, language experience plays a key role in infants’ developing ability to distinguish tones, as has been demonstrated for consonant and vowel. Infants under 6 months, regardless of whether they are learning a tone language, can typically distinguish minimal pairs differing in tone (
Mattock & Burnham, 2006;
Mattock et al., 2008;
Yeung et al., 2013;
Liu & Kager, 2014;
Shi et al., 2017; but see also
Chen & Kager, 2016). In the absence of experience with a tone language, in the second half of their first-year, infants’ ability to distinguish tone contrasts may (a) reduce (e.g.,
Mattock & Burnham, 2006;
Mattock et al., 2008;
Shi et al., 2017;
Yeung et al., 2013), (b) be maintained (e.g.,
Liu & Kager, 2014;
Shi et al., 2017), or (c) even be facilitated (e.g.,
Chen & Kager, 2016;
Singh et al., 2018;
Tsao, 2017). However, the decline in discrimination of some tone contrasts among non-tone-language learners is temporary, and infants recover the ability to distinguish tone contrasts by their second year (e.g.,
Liu & Kager, 2014). No such decrease in discrimination has been observed for infants tested on native-language tone contrasts (e.g.,
Tsao, 2017;
Yeung et al., 2013). Interestingly, infants’ initial abilities to distinguish non-native tone contrasts early in infancy seems to be limited to cases where they are tested with minimal pairs, that is, without segmental variability. For instance, when tested with more-varied segments (
Frota et al., 2016), European Portuguese-learning infants fail to distinguish groups of disyllabic sequences that differ in Mandarin Chinese tones. Thus, although infants display some initial ability to distinguish tone contrasts, it has been proposed that age, language experience, and the acoustic salience of the tone contrast influence tone discrimination in infancy (see (
Kalashnikova et al., 2024) for an overview and a multi-lab report (
Liu et al., 2024)).
In this paper, we investigate infants’ discrimination of pitch differences used to delineate phrasal units, specifically statements from questions. The use of pitch differences at the phrasal level, referred to as intonation, is quite common cross-linguistically, albeit the details of how pitch is used to signal phrasal structure differ across languages (
Jun, 2005,
2014;
Ladd, 2008;
Frota & Prieto, 2015). Pitch is used in many languages to distinguish statements, exclamations, or questions. Research on European Portuguese-learning infants shows that 5- to 6- as well as 8- to 9-month-olds can distinguish groups of two-syllable sequences in their native language with the characteristic falling pattern for statements from the rising pattern for questions (
Frota et al., 2014). Based on these results, Frota et al. argue that European Portuguese-learning infants tune into language-specific phrasal pitch marking as early as 5 months of age.
These results from European Portuguese contrast with the findings obtained from infants learning English. English statements and yes/no questions are also distinguished by a falling versus rising pitch (e.g.,
Geffen & Mintz, 2017). However, unlike in European Portuguese, English statements and questions additionally differ in lexical items and word order. English-learning infants ranging in age from 4.5 to 24 months have been reported to fail to distinguish English statements from yes/no questions when stimuli differ in pitch but have an identical word order (
Soderstrom et al., 2011). Only when word-order cues are available in addition to intonation do English-learning 7-month-olds reliably distinguish statements from questions (
Geffen, 2014; see also
Best et al., 1991). By 12 months, word order alone is enough to allow English-learning infants to distinguish statements from questions (
Geffen & Mintz, 2015). What remains unclear, then, is whether English-learning infants can distinguish yes/no questions from statements based on pitch differences alone.
In Experiment 1, we tested whether English-learning 8-month-olds could distinguish pitch differences used to distinguish questions from statements, as in European Portuguese. In Experiments 2–4, we tested 4-month-olds to determine the developmental trajectory of infants’ emerging sensitivity to intonation. Together, these results have implications for our understanding of how initial perceptual sensitivities to pitch contours are reorganized by language experience and thus bear on theories of perceptual development.
2. Experiment 1: Can English-Learning 8-Month-Olds Distinguish Statements from Questions Based on Pitch Differences Alone?
In Experiment 1, we used the European Portuguese disyllabic sequences used previously by
Frota et al. (
2014) to test English-learning 8-month-olds. We did so for a number of reasons. In European Portuguese, questions and statements are distinguished minimally. Questions differ from statements in that they have a longer final syllable, which carries a pitch rise; in contrast, statements have a final pitch fall on a syllable that is roughly the same duration as the penultimate syllable.
Using European Portuguese stimuli allowed us to systematically vary the intonation contour, controlling for segmental content. This was important to us because in previous experiments wherein English-learning infants failed to discriminate statements from questions, there was substantial variability in the English intonation contours, particularly within statements (
Soderstrom et al., 2011; see also
Geffen, 2014). There was also substantial variability in segmental content and word order across trials because the stimuli in both sets of experiments consisted of longer sentences. So, it is unclear whether English-learning infants in these experiments were challenged by segmental variability, within-category pitch variation, or their inability to integrate the two.
In fact, we know from previous research that English-learning infants are not insensitive to pitch used for phrasal marking. English-learning infants can use intonation to distinguish closely related languages like English and German between 5 and 7 months of age (
Chong et al., 2018). They are also able to use pitch cues to detect clause boundaries (
Seidl & Cristia, 2008), and, in fact, English-learning 6-month-olds rely on pitch cues to detect clause boundaries (
Seidl, 2007). In light of these findings, the failure of English-learning infants to distinguish statements from questions is puzzling.
In Experiment 1, we tested English-learning 8-month-olds on minimally contrastive pitch contours using the stimuli used by
Frota et al. (
2014). Using the segmentally varied European Portuguese stimuli (and the exact same procedure) allowed us to compare our results directly with those of
Frota et al. (
2014). Such a cross-linguistic comparison using the exact same stimuli is critical in order to identify whether patterns of development are language-specific as well as whether infants’ performance can be attributed to the specific properties of the stimulus used (see also
Houston et al., 2000;
Van Ommen et al., 2020). Recall that European Portuguese-learning infants are able to discriminate intonation differences in signaling questions and statements at 5 and 8 months of age. If English-learning 8-month-olds are able to distinguish the statements and questions in natural European Portuguese disyllabic sequences, we can conclude that segmental variability does not pose a challenge for distinguishing pitch categories.
To confirm that English-learning 8-month-olds were relying solely on pitch differences, we also tested them using duration-neutralized resynthesized stimuli. As mentioned above, natural European Portuguese disyllabic questions and statements differ in pitch, specifically a pitch rise vs. a pitch fall. Additionally, they also differ in the duration of the rise- or fall-carrying syllable. The syllables with rises are longer than the ones with a fall. We were motivated to resynthesize the stimuli given that differences in the duration of syllables in English signal stress (
Crystal & House, 1990) and English-learning 8-month-olds can distinguish segmentally varied disyllables that vary in stress alone (
Skoruppa et al., 2011). If English-learning 8-month-olds can distinguish between statements and questions in European Portuguese using pitch alone, we expected them to succeed with duration-neutralized resynthesized stimuli in addition to the natural stimuli previously used by
Frota et al. (
2014).
2.1. Methods
2.1.1. Participants
The final sample included 48 monolingual English-learning 8-month-olds (with an average age of 250 days, with a range of 224:293, 23 of whom were female). Half were tested on the natural stimuli, and the other half were tested on duration-neutralized stimuli. Infants were only included if they had at least 90% exposure to English, as assessed using a detailed parental-language questionnaire (
Sundara & Scutellaro, 2011), and no exposure to a tone language. Average exposure to English was 98% (90:100). All infants were full-term, had no history of speech language or hearing disorders, and healthy on the day of testing, as confirmed by the parents. An additional 10 kids were tested, but they were excluded from analysis because they became fussy (4) or did not habituate in the maximum number of trials (5) or because of equipment malfunction (1).
2.1.2. Stimuli
The stimuli are described in detail in
Frota et al. (
2014) and available online in the supplementary material for that paper. They consisted of 16 disyllabic sequences with sonorants, produced with initial stress as statements or yes/no questions. As is typical in European Portuguese, statements were produced with a falling intonation, whereas yes/no questions were produced with a falling–rising pattern. All sequences were produced in an infant-directed register by a female native speaker of European Portuguese. The significant pitch differences between the statements and questions were restricted to the second syllable. The f0 declined by about 25 Hz on the second syllable of statements, whereas it increased by about 192 Hz in questions, resulting in differences in the final f0 (163 Hz vs. 380 Hz). Additionally, the second syllable of questions (392 ms) was significantly longer than that of statements (232 ms).
Next, to eliminate the differences in duration between statements and questions, we re-synthesized the stimuli using PSOLA in PRAAT (
Boersma, 2001; method previously utilized by
Chong et al., 2018). The script is available on the project OSF page. First, the original pitch contours were extracted from syllables produced with either a statement or question intonation. Next, these contours replaced the original pitch contour on disyllabic sequences produced with a question intonation. The resulting statement and question stimuli consisted of disyllables produced with identical duration profiles but with either an f0 fall characteristic of statements or the falling–rising intonation typical of questions. The duration profile of the question was selected as the base because the second syllable is longer in the question, and this meant that there was a longer duration for the pitch difference to be realized. A perceptual assessment made by 3 phonetically trained listeners indicated that it was also more salient. The resynthesized stimuli are available on the project OSF page.
2.1.3. Procedure and Design
Infants were tested using the visual fixation procedure. They were seated on their parent’s lap 3.5 feet away from a 46-inch TV screen in a dimly lit room. The monitor was used to present visual displays. A Canon HD camera was placed below the monitor to record the infant’s gaze. Stimulus presentation was controlled through Habit X (
Cohen et al., 2004) from a computer in an adjacent room. The experimenter and the parent wore Peltor headphones over which masking audio was played so they could not influence the infant’s behavior.
All testing was done as described in
Frota et al. (
2014). At the beginning of each trial, a looming circle appeared on the TV to attract the infant’s attention to the display. Once the infant looked at the screen, a black and white checkerboard was presented, concomitant with audio stimuli. The presentation of the audio stimuli was completely contingent on the infant’s gaze. If the infant looked away for more than 2 s, a new trial was signaled by the looming stimuli. Maximum trial duration was 16 s, and a trial was repeated if the infant did not look at the screen in the first 5 s. Infant looking time to trials was the dependent variable.
Testing was carried out in 2 phases—a habituation phase and a test phase. Half the infants were habituated to statements, and the other half were habituated to questions. As in
Frota et al.’s (
2014) study, the sliding average of 4 consecutive trials was monitored during habitation. When the looking time to the last 4 consecutive trials was less than 60% of the looking time to the first 4 trials, the habituation phase ended. Data obtained from infants who did not habituate within 25 trials were excluded from the final analysis.
Once infants were habituated, they were presented a control trial, where they heard items from the same category to which they were habituated, and a test trial, where they heard items from the other category. The order of presentation of the control and test trial was counterbalanced. We made one change to Frota et al.’s protocol. After the test phase, infants were presented with a post-test trial with repeated presentations of the item ‘pok’ in an animated voice. An infant’s data was excluded from the analysis if looking to the post-test trial was not higher than the average of the last 4 habituation trials. We did this to ensure that infants who did not dishabituate to the test trials were not simply disengaged from the task (
Sundara et al., 2018).
2.1.4. Analysis
Infant looking-time data were analyzed using a linear mixed-effects model in R version 4.2 (
R Core Team, 2021) using
lme4. Because listening times are usually not normally distributed (as we confirmed with the Shapiro–Wilk test), we log-transformed them (
Csibra et al., 2016), although we present the raw listening time in figures to allow comparison with published research; the pattern of results is the same even without the log transformation. The fixed effects included the between-subjects variables Habituation Stimuli (statement or question) and Stimulus type (full cue or duration-neutralized) and the within–subjects variable Trial type (control or test) and all interactions. All variables were dummy-coded for ease of interpretation, with the reference level identified when describing the finding. Additionally, the model included a random intercept for subjects to allow for differences in baseline listening times. This was the highest-level random-effects structure that converged (
Barr et al., 2013). When necessary, planned comparisons were performed using the
emmeans package in R (
Lenth, 2025). De–identified raw data, model specifications, and complete model outputs are available on the project OSF site.
2.2. Results and Discussion
The raw looking-time data from Experiment 1 are presented in
Figure 1. Only the main effect of Trial type was significant [
F(1, 44) = 15.27,
p = 0.0003]. Follow-up planned comparisons confirmed that the effect of Trial type was significant for the 8-month-olds tested on the natural, unedited European Portuguese stimuli [
t(44) = 2.8,
p = 0.008] as well as the re-synthesized stimuli with the duration differences neutralized [
t(44) = 2.7,
p = 0.009]. That is, in both conditions, 8-month-olds listened significantly longer to the test trials in comparison to the control trials.
These results confirm that English-learning 8-month-olds are able to distinguish pitch rises from falls when presented with segmentally varied disyllabic sequences. They successfully distinguished European Portuguese disyllables, even when the correlated duration difference was neutralized through re-synthesis. That is, monolingual English-learning 8-month-olds were able to distinguish statements from questions based on pitch differences alone. Furthermore, they behaved just like European Portuguese-learning 8-month-olds. Therefore, the previously reported difficulties in distinguishing statements from questions by English-learning infants cannot be attributed to limitations in detecting pitch differences used for phrase marking.
4. Experiment 3: Can English-Learning 4-Month-Olds Succeed with Reduced Segmental Variability?
In Experiment 3, we reduced the segmental variability in the European Portuguese stimuli. We presented English-learning 4-month-olds with only the sequence /lamu/ produced with either question or statement intonation. We were motivated to do so by the parallels between the results from Experiments 1 and 2 and by previous findings on infants’ perception of lexical stress contrasts.
Lexical stress refers to the relative difference in the articulatory effort used to produce syllables in a word. As a result, compared to unstressed syllables, stressed syllables are louder, longer, and/or additionally marked by pitch. We know from previous research that introducing variability, specifically segmental variability, can increase the difficulty of distinguishing lexical stress contrasts. Infants under 6 months, whether they are learning Italian (
Sansavini et al., 1997), English (
Spring & Dale, 1977), German (
Höhle et al., 2009), Spanish (
Skoruppa et al., 2013), or French (
Skoruppa et al., 2009), are sensitive to lexical stress when stimuli are tightly controlled for segmental content. Older infants learning a language like French that does not have lexical stress, however, have difficulty detecting lexical stress even when segmental content is controlled (
Höhle et al., 2009;
Bijeljac-Babic et al., 2012). Between 8 and 12 months of age, infants learning a language that has lexical stress, however, are able to distinguish lexical stress contrasts even when presented with segmentally varied items (
Skoruppa et al., 2009,
2011).
We reasoned that if English-learning infants’ ability to discriminate pitch to mark phrasal prosody mirrors the developmental trajectory of lexical stress perception, 4-month-olds might succeed in distinguishing pitch differences if segmental variability is reduced. This would also be consistent with findings on tone perception. When tested with minimal pairs differing in tone, English-learning 4-month-olds have been shown to be able to successfully distinguish them (e.g., see (
Mattock et al., 2008; see
Liu & Kager, 2014) for similar findings regarding Dutch-learning infants), but European Portuguese-learning 5- to 6-month-olds fail to distinguish Mandarin tone contrasts when presented with segmentally varied sequences (
Frota et al., 2016). So, when tested on a minimal pair, English-learning 4-month-olds could succeed by treating the pitch rise that is limited to the final syllable as tone.
To give English-learning infants every opportunity to succeed, we also tested them in an additional, more sensitive procedure. In the study by
Frota et al. (
2014), infants were habituated till the looking time for the last four trials was at least 60% of the looking time to the first four trials. As in Experiment 1, we habituated 4-month-olds to a 60% looking time decline criterion; we also added another group of infants who were habituated to a greater extent. In the latter group, the infants were habituated till the looking time for the last four trials was at least 50% of the looking time for the first four trials. The more stringent habituation criteria resulted in a more sensitive paradigm (
Sundara et al., 2018; see also
Bijeljac-Babic et al., 2012).
4.1. Methods
4.1.1. Participants
The final sample included 44 4-month-olds (with an average age of 127 days, with a range of 110:153, and 21 females). The subject inclusion criteria were identical to those applied in Experiment 1. On average, the infants had 99% exposure to English (range 90:100). An additional two infants were tested, but they were excluded from the final sample because they became too fussy (1) or did not habituate within the maximum number of trials (1).
4.1.2. Stimuli
In Experiment 3, infants were only tested on one disyllable, /lamu/, to limit segmental variability.
4.1.3. Procedure and Design
The procedure and design were identical to those applied in Experiment 1. Half the infants were habituated till their average looking time in the last four trials showed a decline of 60%, whereas the other half were habituated till their looking time declined by 50%. The latter group was thus habituated to a greater extent, making this the more sensitive condition.
4.1.4. Analysis
The analysis was identical to that in Experiment 1 except for our inclusion of Extent of Habituation (60%, 50%) as a between-subjects variable, in addition to Habituation Stimuli (statement or question). As in Experiments 1 and 2, the within-subjects variable was Trial type (control or test), and the dependent variable was log looking time.
4.2. Results and Discussion
The raw looking time data from Experiment 3 are presented in
Figure 3. Only the main effect of Trial type [
F(1, 40) = 14.2,
p = 0.0005] and the interaction of Habituation Stimuli and Trial Type [
F(1, 40) = 8.7,
p = 0.005] were significant. Specifically, the effect of Trial type was significant for statements [
t(40) = 4.8,
p < 0.0001] but not questions [
t(44) = 0.6,
p = 0.6]. That is, the 4-month-olds were able to distinguish between statements and questions only when they had been habituated to the statements. The Extent of Habituation did not have a significant main effect, nor did it interact with any other variable (see OSF for full model output).
Our results are different from the results reported by
Soderstrom et al. (
2011), who habituated infants between 4 and 24 months of age to sentences with a question or statement intonation till the infants demonstrated a 65% decline in looking time. They found that infants listened longer to the question stimuli with a rising intonation, regardless of habituation condition.
In Experiment 3, unlike in
Soderstrom et al.’s (
2011) study, the 4-month-olds did not show an overall preference for question trials; we can see this in
Figure 3, where the time the infants spent listening to statements (in the test trials) after having been habituated to the questions is numerically greater than the time spent listening to the questions themselves (in the control trials). Recall that our habituation criteria were overall more stringent than those used by
Soderstrom et al. (
2011) (60% and 50% compared to 65%). We think that the 4-month-olds also successfully habituated to the questions because of our more stringent habituation criteria, although the greater extent of habituation was not sufficient for them to switch their attention to the statements.
5. Experiment 4: Does Eliminating Correlated Duration Cues Enable 4-Month-Olds to Distinguish Question Rises from Statement Falls?
As is typical in European Portuguese, disyllables with question intonation have a rise on the final syllable that is significantly longer than the penultimate one. Thus, the question and statement /lamu/ differed in pitch movement as well as the duration of the constituent syllables. Note that in English, question stimuli are also usually longer than statements, as exemplified by the stimuli in
Soderstrom et al. (
2011). Thus, infants could succeed in distinguishing statements from questions by listening to the pitch differences, the duration differences, or both.
However, recent research indicates that infants’ ability to group sequences that vary in duration, as opposed to pitch, improves with age. Young infants either fail to group sequences that vary solely in duration, like rats (
Bion et al., 2011;
de la Mora et al., 2013), or are only successful at grouping them at older ages (
Hay & Saffran, 2012;
Yoshida et al., 2010). Even when they are able to group sequences by both pitch and duration, it has been claimed that it is easier to group sequences according to the former than the latter (
Abboub et al., 2016). So, it has been argued that infants learn to tune into duration differences only as a result of language experience.
There is some evidence that English-learning infants might tune into duration cues only in the second half of their first year. Specifically, we know based on word segmentation research that English-learning infants only gradually become sensitive to duration differences, at least those that accompany differences in vowel quality that distinguish stressed from unstressed syllables (e.g.,
Beckman, 1986). As a result, 9- but not 7-month-olds use stressed syllables that are longer, and have a more peripheral vowel, to find words (
Thiessen & Saffran, 2003). This is also consistent with findings showing that older infants, namely, English-learning 8- and 12-month-olds, can successfully discriminate two-syllable words with an initial and final stress, even when the stimuli are segmentally varied (
Skoruppa et al., 2011).
It is then possible that English-learning 4-month-olds are still in the process of tuning into duration differences and thus cannot use them. However, the duration differences are salient enough that the infants are unable to ignore them completely, making the task of distinguishing between European Portuguese statements and questions harder. In Experiment 4, we eliminated the duration differences in the final syllable by re-synthesizing the European Portuguese stimuli, as in Experiment 1. We tested 4-month-olds with the sensitive 50% habituation criteria. We reasoned that if infants are able to distinguish pitch but are confounded by the additional duration difference, they should succeed when tested with the re-synthesized, duration-neutralized stimuli.
5.1. Methods
5.1.1. Participants
The final sample included 22 4-month-olds (with an average age of 131 days, with a range of 120:138, and 10 females). The subject inclusion criteria were identical to those in Experiment 1. The average exposure to English for the 4-month-olds was 99% (91:100). An additional two infants were tested, but they were excluded from the final sample because they became too fussy (1) or did not habituate within the maximum number of trials (1).
5.1.2. Stimuli
The segmental variability was reduced in this experiment, as in Experiment 3, by using only one disyllabic sequence, /lamu/, that was re-synthesized, as described in Experiment 1.
5.1.3. Procedure and Design
The procedure and design were identical to those in previous experiments. All infants were habituated till their looking time declined by 50%.
5.1.4. Analysis
The analysis was similar to that in Experiment 3. Habituation condition (statement or question) was the between-subjects variable, with Trial type (control or test) as the within-subjects variable, and the dependent variable was log looking time, with a random intercept for subject.
5.2. Results and Discussion
The raw looking time data from Experiment 4 are presented in
Figure 4. Again, the main effect of Trial type [
F(1, 20) = 6.06,
p = 0.02] and the interaction between Habituation Stimuli and Trial Type [
F(1, 20) = 4.8,
p = 0.04] were significant. Just as in Experiment 3, the effect of Trial type was significant for statements [
t(20) = 3.3,
p = 0.004] but not questions [
t(20) = 0.20,
p = 0.8]. In Experiment 4 as well, despite neutralizing the duration difference, the 4-month-olds were only able to distinguish between statements and questions when they had been habituated to the statements.
6. General Discussion
In four experiments, we tested whether English-learning infants are able to distinguish question rises from statement falls based on pitch differences alone (the results are summarized in
Table 1). To ensure that the stimuli systematically varied in intonation contours while segmental content was controlled for, we used European Portuguese disyllabic sequences. In European Portuguese, questions and statements are minimally contrastive, with a difference in pitch (and duration) on the final syllable. The stimuli we used had opposite directions of pitch change, with a difference of over 200 Hz towards the end of the disyllable. That is, there was a substantial difference in pitch between the two intonation contours. We used this cross-linguistic approach in order to isolate language-specific developmental changes from effects that are restricted to the specific stimulus properties.
In Experiments 2–4, we found that monolingual English-learning 4-month-olds’ ability to detect even this large pitch difference signaling phrasal prosody is limited. We found no evidence that 4-month-olds could distinguish questions from statements when they had to group segmentally varied disyllables (Experiment 2). The 4-month-old infants only partially succeeded when tested with a single disyllabic sequence /lamu/ (Experiments 3 and 4), distinguishing between the two only when they were habituated to statements but not questions. An analysis combining the data from all three sub-experiments using the single disyllable /lamu/ (n = 66) also confirmed that 4-month-olds show asymmetry in discrimination. Thus, our failure to find evidence of discrimination by monolingual English-learning 4-month-olds cannot be attributed to a lack of power due to a small sample size.
It is also clear that the monolingual English-learning 4-month-olds tested did not simply treat this difference in pitch on the second syllable as tone. We know this because if that were the case, they should have succeeded in distinguishing statements from questions. To date, regardless of whether infants are learning a tone language, they have been reported to distinguish minimal pairs (with no segmental variability) that differ only in tone before 6 months of age (see (
Kalashnikova et al., 2024) for a summary).
One way in which the stimuli in our experiments differed from those used to test tone perception is that we used a disyllabic sequence, even under our reduced segmental variability conditions, whereas the published research on tone perception focuses on monosyllabic words. The English-learning 4-month-olds’ failure to distinguish falling and rising pitch on the second syllable of a two-syllable word reported here is inconsistent with their reported success is distinguishing a falling from rising tone in monosyllables (
Mattock & Burnham, 2006;
Mattock et al., 2008). This discrepancy provides additional support for the idea that the developmental timeline of attunement to pitch differences is modulated by the linguistic function of pitch.
So, why did the 4-month-olds fail to distinguish between questions and statements?
Soderstrom et al. (
2011) argue, and we agree, that infants in general might prefer questions with a final rising pitch. This preference is likely rooted in infants’ long-documented preference for a higher, more variable pitch (e.g.,
Papoušek et al., 1990;
Trehub et al., 1984), possibly because of its association with positive affect and its role in drawing infants’ attention (e.g.,
Broesch & Bryant, 2015;
Fernald & Kuhl, 1987;
Trainor et al., 2000). That would explain why in
Soderstrom et al.’s (
2011) experiment with 65% habituation criteria, infants displayed an overall preference for questions. We used a more stringent habituation criteria (60% and 50%) precisely to ensure that infants would be habituated even to the question rises. As a result, in our Experiments, the infants did not demonstrate an overall preference for questions. Nonetheless, it is possible that the English-learning 4-month-olds failed to switch their attention to the statements, even after they were habituated to the question intonation, because of a latent preference for rising intonation (see also
Oakes, 2010). If this is true, then we might expect English-learning 4-month-olds to succeed in a preference experiment, where they are presented questions versus statements, without habituation, indicating that they are able to distinguish between the two. We leave this for the future.
Given that, cross-linguistically, infants demonstrate a preference for higher, more variable pitch, the fact that European Portuguese-learning 5-month-olds’s succeeded where English-learning 4-month-olds failed must clearly be attributed to language experience. They distinguished question rises and statement falls even when tested with stimuli that were segmentally varied and in the presence of correlated duration cues. It is likely then that early experience with languages with robust correspondence between pitch distinctions and meaning, even phrasal, is necessary to overcome an initial preference for a rising pitch at the ends of phrases.
In the absence of such a robust correspondence, like in English (
Bolinger, 1989;
Frota, 2014;
Pierrehumbert & Hirschberg, 1990), young infants’ ability to discriminate pitch differences early in infancy is limited, particularly when accompanied by segmental variability. Whether tuning into smaller differences in pitch or even those that involve changes in pitch timing (
Butler et al., 2016) is possible in the absence of language experience remains to be determined.
By 8 months, however, English-learning infants’ ability to distinguish between questions with a rising pitch and statements with pitch falls improves. English-learning 8-month-olds succeeded when tested with varied segmental content, with or without correlated duration cues (Experiment 1). Thus, at 8 months of age, infants’ ability to distinguish pitch differences signaling large phrasal prosody is robust to the presence of correlated duration cues, just as has been reported for European Portuguese-learning 8-month-olds. Even in the absence of experience with robust correspondence between pitch and meaning, English-learning 8-month-olds were able to distinguish pitch differences signalling phrasal prosody.
The English-learning 8-month-olds’ success is particularly surprising given that we tested them on non-native stimuli. Recall that the original stimuli were recorded by a native Portuguese speaker. The stimuli consisted of sequences of sonorants, which are largely similar in the two languages, except for /r/. The vowels used were also similar to those in English but not identical in quality. Given the substantial literature indicating that infants tune into the phonetic categories of their native language within the second half of their first year, it is quite likely that, at least, the English-learning 8-month-olds detected the unfamiliarity of the stimuli. And yet they were able to detect the pitch difference when given the unfamiliar speech stimuli, attesting to the robustness of their ability.
The mismatch in the phonetic instantiation of the stimuli is also not likely to account for the limited success of the English-learning 4-month-olds. Phonetic perception has not been reported to be language-specific at this age. But, more importantly, the 4-month-olds had limited success even when tested just on /lamu/, which was selected because it was the most phonetically similar in English and Portuguese. That is, such phonetic differences are inevitable in any cross-linguistic comparison (e.g.,
Houston et al., 2000;
Van Ommen et al., 2020). And we think that a cross-linguistic design provides many benefits overall.
The results from 8-month-olds reported here are different from those reported previously for English-learning infants. In
Soderstrom et al.’s (
2011) study, infants ranging in age from 4.5 to 24 months showed a preference for question rises but failed to discriminate question rises from statement falls. English-learning infants have only been reported to successfully distinguish questions and statements in previous research when the stimuli differ in word order in addition to intonation (
Geffen, 2014;
Best et al., 1991). Our findings are thus the first to demonstrate that English-learning infants are able to distinguish questions from statements in the first year of life using pitch alone.
There are two aspects of our design that, unlike previous experiments, allowed us to isolate infants’ sensitivity to intonation. First, our stimuli controlled for several extraneous variables that are correlated with questions and statements. In previous reports, the segmental content was variable, the question stimuli were longer, and there was significant variability in intonation, typically within the category of statements. In contrast, we were able to orthogonally manipulate the segmental content, the extent of habituation, and the presence of extraneous acoustic cues correlated with the question–statement distinction by using native stimuli produced by a European Portuguese speaker. Furthermore, by focusing our experiments on specific ages, we were able to uncover infants’ changing sensitivity to pitch cues.
By systematically manipulating segmental content, the extent of habituation, correlated duration cues, and the age of the infants, we showed that English-learning infants’ initial sensitivity to pitch differences signalling phrase boundaries is limited (observed only with /lamu/), and only later do they succeed when presented with variable segmental content. Such a developmental trajectory, wherein infants require experience to facilitate their ability to distinguish the pitch marking of phrasal structure abstracting away from segmental content, parallels what is observed for lexical stress. Only older infants, between 8 and 12 months old, are able to abstract away from variable segmental content to detect lexical stress, and they can only do so if they are learning a lexical stress language like English or Spanish (
Skoruppa et al., 2009,
2011). Whether the ability to abstract away from segmental content to detect tone also improves with age remains to be determined.
In sum, we tested whether English-learning 4- and 8-month-olds could distinguish statements from questions based on pitch differences alone. We found that 8- but not 4-month-olds are able to use pitch differences to distinguish questions from statements, even when they vary in segmental content. These results confirm that the ability to perceive pitch differences that mark phrasal structure is affected by language experience early, by 5 months, in infants learning a language like European Portuguese with a tight correspondence between pitch and meaning. In the absence of specific language experience as well, the ability to perceive pitch differences marking phrasal structure is facilitated, but only by 8 months.