1. Introduction
In spoken languages, variation in speech sounds typically takes place at different phonological levels. At a segmental level, phonemes and smaller phonological units are relevant, while larger units such as syllables play a role at a suprasegmental level. Languages such as Dutch, German, English, Italian, or Spanish, for example, distinguish stressed syllables and unstressed syllables, with stressed syllables being longer, louder, and higher in pitch than unstressed syllables (
Cutler & Jesse, 2021). Native adult listeners of these languages use these suprasegmental cues immediately to facilitate online recognition of spoken words (
Connell et al., 2018;
Cutler & van Donselaar, 2001;
Friedrich et al., 2004;
Jesse et al., 2017;
Reinisch et al., 2010;
Schild et al., 2014;
Soto-Faraco et al., 2001;
Sulpizio & McQueen, 2012;
van Donselaar et al., 2005; and
Zahner et al., 2019). For example, when hearing the stressed initial syllable of the word admiral (AD.mi.ral; capital letters denote the stressed syllable, dots the syllable boundaries), English adults immediately fixate the written version of admiral even in the presence of a written version of the segmental competitor, admiration (ad.mi.RATION;
Jesse et al., 2017). Here, we test whether already German-learning infants and toddlers immediately use suprasegmental cues for word recognition.
Typical syllable stress patterns give languages such as English, Dutch, Spanish, or German a suprasegmental regularity that infants already recognize in their first months of life. In English, for example, 90% of all words follow a trochaic pattern, meaning that they are disyllabic, with the stress being placed on the first syllable (
Cutler & Carter, 1987). A strong tendency towards trochees can also be seen in German, Spanish, and Dutch (
Cutler & Jesse, 2021;
Domahs et al., 2008;
Frota et al., 2020). In German, for example, 73 percent of disyllabic words are trochaic (
Domahs et al., 2008). Moreover, in speech directed to infants learning German, as many as 97 percent of words carry stress on their initial syllable (
Stärk et al., 2022). Therewith, syllable stress provides a reliable cue to speech segmentation (
Cutler & Carter, 1987), which infants indeed use early in development (e.g.,
Höhle et al., 2001;
Houston et al., 2000). When listening to spoken materials, infants extract a predominant stress pattern in their target language already within the first few months of life (
Becker et al., 2018;
Friederici et al., 2007; and
Höhle et al., 2014). Later, during their first year, infants develop a preference for the typical stress pattern and appear to use it to find word boundaries (i.e., for speech segmentation,
Jusczyk, 1999; for children learning English, see
Jusczyk et al., 1999; for children learning Dutch, see
Junge et al., 2012;
Kuijpers et al., 1998; and for children learning German see
Höhle et al., 2009;
Marimon et al., 2024).
In this study, we investigate whether young children learning German also use syllable stress when they assign meaning to spoken words. So far, infants’ ability to use and encode syllable stress for word-meaning mapping was tested with the offline habituation-switch technique applied in word learning studies (
Curtin, 2009,
2010,
2011). Across those experiments, 12- to 14-month-olds were habituated to novel word–novel object pairings. For example, a meaningless label like “BE.do.ka” was paired with novel object A, and another meaningless label like “be.DO.ka” was paired with another novel object B. These novel word–novel object pairings were repeatedly presented until the children’s looking times fell below a certain threshold. Dishabituation was then tested either with switch trials in which the pairing was reversed (“BE.do.ka” paired with object B and “be.DO.ka” paired with object A,
Curtin, 2009) or with switch trials in which the position of the stressed syllable varied (e.g., “do.BE.ka” paired with object A;
Curtin, 2011). Across all experiments, infants dishabituated during the switch trials. That is, they looked at the screen again for longer than during the last habituation trial. Similar results were found even when the infants did not have to store the stress of two novel word–novel object pairings during the habituation trials. Thus, when they were trained with only one novel word–novel object pair, e.g., “BE.do.ka” paired with a novel object A, they dishabituated when the stress patterns differed in the test trial, e.g., “be.DO.ka” paired with the same object A (
Curtin, 2011).
In the present study, we test whether infants use syllable stress during the time course of matching a presumably known word when it is correctly vs. incorrectly stressed with its respective referent. In contrast to previous studies using the habituation-switch paradigm, we measure online word processing as reflected in children’s eye movements. Crucially, previous studies measuring eye movements in adult native listeners of a language with variable stress have already shown that this measure reflects the use of syllable stress as soon as it is available in the acoustic signal (for native English adults, see
Jesse et al., 2017;
Reinisch et al., 2010; for native Italian adults, see
Sulpizio & McQueen, 2012; and for native German adults, see
Zahner et al., 2019).
To apply eye tracking with our young participants, we use a paradigm that is well established to test young children’s word-meaning mapping. This paradigm is referred to as either the Looking-While-Listening (LWL) paradigm (
Fernald et al., 2008;
Golinkoff et al., 2013, also used in this study), language-guided-looking paradigm (
Bergelson & Swingley, 2012,
2015,
2018) or intermodal/cross-modal preferential looking paradigm (
Kartushina & Mayor, 2019). In the LWL paradigm, young participants see two images on a screen (e.g., a baby and a car). At the same time, they hear an utterance or a spoken word naming one of both pictures (e.g., “Look at the baby”). Participants’ fixations to the named target picture are used to test whether they have assigned meaning to the spoken input. Results of some researchers using this paradigm suggested that infants as young as 6 months old show a robust comprehension for many words (
Bergelson & Aslin, 2017a;
Bergelson & Swingley, 2012,
2015,
2018;
Rocha et al., 2024;
Rosslund et al., 2023; and
Tincoff & Jusczyk, 1999,
2012). However, results of other studies indicated a somewhat later onset of the first significant word comprehension at around nine months or even later (
Beech & Swingley, 2023;
Bergelson, 2020;
Kartushina & Mayor, 2019;
Steil et al., 2021; and
Syrnyk & Meints, 2017).
The fixations in the LWL paradigm already proved to be a useful tool to investigate incremental word recognition in toddlers. This means that the time course of fixations indicates that even very young children do not wait until they have heard a whole word, but instead immediately consider referents of words that match even fragmentary speech input. For example, 18-month-olds looked to the correct referent like a baby while hearing only the initial part of the referent’s name, like “BA” taken from “BA.by” (
Fernald et al., 2001). In addition, children also consider word candidates that only partially match the input. Twenty-four-month-olds took longer to fixate the target picture (e.g., a dog) when the target’s spoken form overlapped in initial phonemes with the label of a distractor picture (e.g., a doll) than when there was no phonological overlap between the names of both pictures (e.g., dog and tree,
Swingley et al., 1999). In the present study, we want to find out whether very young children also use stress information incrementally.
Since the seminal study by
Swingley and Aslin (
2000), the LWL paradigm has been frequently exploited to test infants’ and toddlers’ sensitivities to the mispronunciation of segments. In those studies, children hear spoken labels that vary in a single segment from the canonical pronunciation of the target’s name, such as “VA.by” instead of “BA.by”. A recent meta-analysis by
Von Holzen and Bergmann (
2021) found that children under 31-months-of-age typically look less at the target picture when the target’s name is mispronounced (note that among the 32 included papers was also a study that tested 19-month-old German-speaking children,
Höhle et al., 2006). Nevertheless, across studies, the very young children looked at the target more than at the distractor when the word was mispronounced. With increasing numbers of mispronounced features (ranging from one to three feature changes) the effect of impaired target fixations increased. Sensitivity to mispronunciation was modulated by its position within the word, with the largest effects for onset mispronunciations and the smallest for coda mispronunciations. The meta-analysis also revealed that age did not modulate target identification. However, only 1 out of 32 included papers included infants younger than 12 months.
Toddlers learning English appear to handle misplaced stress comparable to segmental mispronunciation. In a recent study by
Campbell et al. (
2019), 17-month-olds heard either a canonically (i.e., correctly) stressed version of a word (e.g., “BA.by”) or an incorrectly stressed version of that word (e.g., “ba.BY”) while viewing two pictures on the screen (e.g., a baby and a chicken). The average number of fixations from word onset to two seconds thereafter showed that the toddlers only recognized the target when the target’s name was correctly stressed, but not when it was incorrectly stressed (as indicated by target fixation above chance). Nevertheless, the growth curve analysis of Campbell et al. suggested that participants’ fixations were attracted towards the target picture by the correctly and the incorrectly stressed version of the target word’s name. However, the estimated value for the correctly stressed word was double the size and developed faster than that of the incorrectly stressed word. Together, these results show that English-learning children at the end of their second year of life use stress as an important cue for incremental word processing.
In the present study, we investigate whether even very young German-learning infants use stress cues (incrementally) for word-meaning mapping. To this end, we test German-learning infants between 4 and 15 months old who are just learning their first words. That is, we do not assume that all infants already know all the words used in the experiment, but that they are in their initial steps of building a vocabulary (e.g.,
Bergelson & Swingley, 2012,
2015,
2018). To obtain a rough impression of the tested infants’ understanding of the words, we will ask the infants’ parents to indicate their infants’ knowledge of the tested words. As a control group, we test 2- to 4-year-old toddlers who already should know all the words used in our experiment. Like in the study of
Campbell et al. (
2019), children see displays of two pictures (e.g., a car and a baby) while hearing a spoken noun referring to one of both objects (the target, e.g., “Baby”). The disyllabic name of the target is either correctly stressed, i.e., with stress on the first syllable (“BA.by”), or it is incorrectly stressed, i.e., with stress on the second syllable (“ba.BY”).
Next to the different target languages, the present study differs from the study by
Campbell et al. (
2019) in three methodological aspects. First, we keep the distractor pictures constant during the presentation of the correctly and incorrectly stressed target names. In the study by Campbell et al., the target image was presented along with a different distractor, respectively. For example, when the children heard “BA.by”, they saw a picture of a baby and a picture of a dinosaur; meanwhile, when they heard “ba.BY”, they saw a picture of a baby and a picture of a chicken. This procedure cannot control different picture preferences for different distractors. Here, we follow the approach of presenting the same target–distractor pairs in the trials to be compared. Second, whereas Campbell et al. calculated the proportion of fixation for each single trial (fixation to the target divided by fixation to the whole display), we base our analysis on proportion indices (see
Bergelson & Swingley, 2012). That is, we include both trials for the same picture pair into the calculation. In addition to the same displays, the proportion indices further correct for possible picture preferences. Third, we include more objects (28 pictures) as compared to Campbell et al. (six objects), being targets and distractors, respectively. This results in 56 trials in our study compared to 12 trials in the former one.
With our study, we intend to investigate the developmental trajectory of the use of stress cues in incremental word-meaning mapping across infancy and toddlerhood of children learning German. Therefore, our study also differs from the earlier study by
Campbell et al. (
2019) in the age of the children tested. Whereas previously a group of 17-month-old children was tested, here we target (i) a younger and (ii) an older age group of young children. (i) With the younger age group (4 to 15 months old), we like to investigate when infants start using stress cues for comprehending their first words (even if they may not yet understand all the words used in the experiment). (ii) We tested even older toddlers than Campbell (17-month-olds) to be sure that the toddlers knew all the tested words. With the older age group (two to four years old), we intend to substantiate the former finding that toddlers use word stress and to replicate this for another language, namely German.
Our assumptions are as follows: first, word comprehension increases with age. This should be reflected in the main effect of age for the looking proportions to the named target pictures in our LWL paradigm. Second, very young children use syllable stress for incremental word-meaning mapping. This should be reflected in the main effect of stress, with incorrectly stressed target names (compared to correctly stressed ones) being recognized less or not at all (i.e., looking proportions at random level). If word comprehension and the use of syllable stress are becoming more stable with increasing age, this should be reflected in the interaction of stress and age reflected in the looking proportions. In other words, with increasing age, the difference in looking times when hearing the correctly vs. the incorrectly stressed words should increase. From that result, we would conclude that the use of syllable stress follows first word comprehension. Finally, we explore whether German-learning infants and toddlers immediately use syllable stress cues while hearing the spoken word by a growth curve analysis.
4. Discussion
We investigated whether 4-to-14-month-old German-learning infants and 2-to-4-year-old German-learning toddlers use syllable stress for incremental word recognition. Our first hypothesis stated that word comprehension, in general, increases with age. Indeed, while the target recognition effects in infants were rather weak, they increased with the increasing age in toddlers. Our finding for infants integrates into mixed LWL results for very young children. While results of some studies pointed to a robust word comprehension in infants as young as 6 months old (
Bergelson & Aslin, 2017a;
Bergelson & Swingley, 2012,
2015,
2018;
Rocha et al., 2024;
Rosslund et al., 2023; and
Tincoff & Jusczyk, 1999,
2012), results of other studies indicated a somewhat later onset of the first significant word comprehension at around 9 months or even later (
Beech & Swingley, 2023;
Bergelson, 2020;
Kartushina & Mayor, 2019;
Steil et al., 2021; and
Syrnyk & Meints, 2017). Our results rather align with these latter studies. Thus, we might have tapped the very first attempts of German-learning infants to link language input with common objects.
In addition to infants’ limited vocabulary, two specific factors of our design may have made it difficult to detect a robust word recognition in infants, namely the use of (i) disyllabic words and (ii) trials with picture pairs from the same category. (i) To be able to manipulate syllable stress within words, we had to rely on disyllabic words, which might be acquired a little later than the monosyllabic words used mainly in other studies (for English-learning infants, see the following:
Bergelson & Swingley, 2012,
2015,
2018;
Syrnyk & Meints, 2017; and
Tincoff & Jusczyk, 1999,
2012; for Norwegian-learning infants, see the following:
Kartushina & Mayor, 2019). Possibly, the German-learning infants would have shown a more robust word comprehension for (some) monosyllabic earlier acquired words like “Ball” (ball), “Hund” (dog), “Bett” (bed), or “Keks” (cookie). However, disyllabic nouns are very typical in German, and we are not aware of any study directly comparing the acquisition of mono- vs. disyllabic words. (ii) Five out of our fourteen yoked word pairs could be put in a similar category (living beings: baby–bird, rabbit–beetle; bath items: brush–diaper; body items: finger–hair; and kitchen items: fork–soup). It has been shown that young infants (but not older toddlers) have difficulties in fixating the named picture when the distractor picture belongs to the same category (e.g., food items;
Bergelson & Aslin, 2017b). In fact, visual inspection of our data indicates that for at least two pairs, “brush–diaper” and “finger–hair”, infants, but not toddlers, showed the lowest word comprehension (see
Figure 5). Both factors of our design may have contributed to the age effect on robust word comprehension.
Without question, various characteristics of the design and the material possibly contribute to the success of young children in word recognition studies (for further discussion, see
Kartushina & Mayor, 2019;
Steil et al., 2021). Future work needs to systematically test such factors. However, since the main aim of our study was not to show at what age infants show robust word comprehension, we believe that the possibility that they did not understand all words has less of an impact on our main results regarding stress.
Our second hypothesis stated that infants as well as toddlers use syllable stress for incremental word-meaning mapping. On a general level, our results support this assumption. The mixed effect model showed a main effect of stress (and no interaction of age and stress). However, post hoc analyses revealed that the stress effect was carried mainly by the toddlers. Only toddlers showed a stress effect in a post hoc separate mixed model. Although the stress effect for infants was numerically in the expected direction, it was not significant in the mixed effect model applied to infants only. However, at least the analysis by items showed that infants might use syllable stress in their first successful attempts to match objects to their names. Infants’ fixations on the targets were only above chance level when the target’s name was correctly stressed but at chance level when it was not correctly stressed. Finally, growth curve analyses revealed an overall steeper increase in looking times at the target picture when children heard the correctly stressed name of the target compared to its incorrectly stressed name for both groups, infants and toddlers. In sum, as we found some (albeit fragile) hints for a stress effect in infants, we cautiously conclude that our results suggest that the first word form representations in German-learning children encode syllable stress, and that even very young children use corresponding suprasegmental information for incremental spoken word recognition.
The rather fragile stress effects that we obtained for the infant group could originate from two sources: first, infants might not yet integrate syllable stress into their incremental word-meaning mapping. However, the small but systematic stress effects that infants show point towards some sensitivity of this group to syllable stress at the word level. Alternatively, infants’ restricted vocabulary might have diminished the number of trials that could potentially show the stress effect. Infants vary widely in the number of words they already understand. Crucially, infants might only know a few of the words presented in the experiment (see also the parents’ rating in
Table 1, second column, although the parents also might not be correct in judging the vocabulary of their children). Thus, most likely, infants were tested in words that they already knew as well as in words that they did not yet know. For the few words they have known, infants might show stress processing (driving the effects in the item analysis and in the GCA). We have to conclude that stress effects are difficult to achieve if not all infants understand all the words tested. This seems also in line with a recent mispronunciation study with very young infants, where 6-to-8-month-olds showed no recognition of words, and thus, mispronunciation processing was difficult to test (
Beech & Swingley, 2023). One solution to this could be to analyze only the words where parents indicated that infants might know them. However, when we consider the overall parents’ ratings of which words they thought their children already understood, we find no differences in PIs. That is, on trials with words rated on average lower by parents (e.g., beg, pillow, and soup, see
Table 1), children did not show lower PIs than on other trials (see
Figure 5). This observation confirms that parents’ ratings do not appear to reliably discriminate between words that very young children know and words that they do not know (see also the results of
Frank et al., 2021). We therefore refrained from analyzing the stress effects for both groups of words separately, which would further reduce the power of the analyses.
Although target recognition effects in infants were rather weak overall, they were stronger for correctly stressed target names compared to incorrectly stressed ones. In line with previous work with somewhat older children (
Campbell et al., 2019), we obtained hints for enhanced success in target recognition for correctly stressed words compared to their incorrectly stressed versions. At a descriptive level, this is also illustrated in
Figure 5. Here, we plotted the mean PIs per display for correctly stressed and incorrectly stressed trials and for infants and toddlers, separately. For nearly all item pairs, the PI was higher when the target’s name was correctly stressed in comparison to when it was incorrectly stressed (except for the pair fork–soup for both groups and the pairs bottle–hat and cucumber–pants for infants).
For toddlers, we found significant proportions of fixations to the target picture for both correctly and incorrectly stressed target names. This finding somewhat contrasts to the proportion of fixation results of the study by
Campbell et al. (
2019), in which 17-month-old English-learning toddlers did not show robust target recognition for incorrectly stressed target names. Despite the different target languages of participating children in both studies, the toddlers of our study were at least 6 months older than the formerly tested toddlers. Furthermore, we applied a somewhat different analysis strategy than Campbell et al. They calculated the proportion of fixation for each single trial (T/(T + D)) while we included two trials for the same picture pair into the calculation of the proportion of fixations (see Data Analysis). Therewith, we considered fixations to the target picture when it was the target and when it was the distractor. We did so in accordance with the procedure of
Bergelson and Swingley (
2012) to correct for picture preference. Thus, toddlers’ diverging fixation proportions of the target picture when hearing its incorrectly stressed name in the study by Campbell et al. and our study could be due to (i) the different target languages, (ii) our older sample, or (iii) the different dependent measure used to calculate the fixation proportion. Interestingly, the growth curve analysis by Campbell et al. attested that—also in their study—the toddlers recognized the target when hearing its incorrectly stressed name. From this we might conclude that the calculation of fixation proportions (PIs) applied here is more sensitive to detect subtle differences between the two conditions than the proportion index for single trials applied by Campbell et al., at least for the toddlers. Nevertheless, both studies showed that toddlers reliably distinguish between correctly and incorrectly stressed words.
As in the study by
Campbell et al. (
2019), our results of the mixed effect models and the GCA differed. In our study, the mixed models revealed a stress effect across infants and toddlers, while the GCA for the single groups revealed a stress effect on the
intercept only for toddlers but not for infants
2. Our divergent findings might be due to the different dependent measures included in the mixed effect models and the GCA. The PI entering the mixed effect models combined the data of two trials. It takes the picture preference into account by relating the looking behavior of displays in two trials once the word named the target and once the word named the distractor (see
Section 2.4). In contrast, single trials were included in the GCA. The different results might mean that the infants’ data is not that robust. However, we interpret the data as follows: although the infants show fragile word comprehension, it may be that for the (few) words they understand, they use the syllable stress information immediately. That is, we argue that the few words that infants understand determine the tiny stress effects in the item analysis and in the GCA. This would be in line with the assumption that infants’ word representations contain already very detailed information (see, for example,
Swingley, 2003, for evidence that young children encode phonetic details in words).
Turning to the different time terms, the infants and toddlers tested here appeared to use syllable stress incrementally in real-time word comprehension like adults do (
Connell et al., 2018;
Jesse et al., 2017;
Reinisch et al., 2010; and
Zahner et al., 2019). This was indicated by a linear increase in the stress effect in the GCA for both infants and toddlers. This finding is in line with
Campbell et al.’s (
2019) results for 17-month-olds who also reported interactions between word stress and time terms: they interpreted a positive interaction of word stress and the linear time term as indicating that the looking behavior was steeper for correctly compared to incorrectly stressed words. Similarly, they interpreted the negative interaction of word stress and the quadratic time term as indicting less bend in the fixation curve for correctly compared to incorrectly stressed words. From that, the authors concluded that the looking behavior was faster and increased through the whole trial for correctly compared to incorrectly stressed words. The tested infants and toddlers of our study showed the same pattern of results, except that toddlers—but not infants—additionally showed an effect of stress on the intercept.
Our finding that German-learning infants and toddlers use syllable stress for word comprehension is interesting, because they do not necessarily have to do so given their linguistic environment. To our knowledge, there is no minimal stress pair differentiating the meaning of early acquired words in German (as it is the case for words acquired later, like “AU.gust” for the month August and “au.GUST”, a male given name). Thus, stress in infancy and toddlerhood is not (yet) lexically contrastive, and for early learned words, it would be sufficient to rely on phonemes without taking stress into account. Moreover, early acquired words in German (at least disyllabic nouns) have a bias towards carrying stress on the first syllable, which may further lead children to neglect stress information. This bias is also reflected in production errors for older children. As soon as toddlers begin to produce their first words, they often omit the first syllable of words that begin with an unstressed syllable and just start these words with the stressed syllable (e.g., “del.FIN” to “FIN”; “ele.FANT” to “FANT”; and “ba.NA.ne” to “NA.ne”). That infants and toddlers use syllable stress nonetheless may show that children take further characteristics of their language into account. For example, since German is a language in which the stress can vary between syllables, this fact could lead children to process the different stress of the syllables, even if this is not necessary for the first disyllable nouns they learn.
In German, as in other languages with variable stress, syllable stress is transmitted through syllable length, intensity, and pitch (
Cutler & Jesse, 2021). Here, we used materials produced by a professional human speaker, who we have asked to stress the target word either correctly or produce a version that is stressed on the second syllable rather than on the first. We considered naturally spoken material to be more suitable for testing natural speech processing in very young children than computerized speech. In this way, however, we were unable to control or ensure the variation in all acoustic parameters. Thus, the speaker realized our intended stress manipulation not via different pitch but via different intensity values of the initial syllables of the correctly stressed words and their incorrectly stressed versions. In addition, timing of maximal pitch and maximal intensity differed (see
Section 2.2.2). Thus, our results indicate that children can rely on only one or two cues (here, intensity and timing) to process different stress patterns of syllables. This could also mean that we have underestimated the stress effect, as it would have been stronger if the length of the first syllable and the pitch values had been different between correctly and incorrectly stressed words.
A weakness of our material is the dominance of words whose nucleus of the second syllable is typically reduced to schwa in running speech (see
Section 2.2.2). Strictly speaking, it is not possible to stress such syllables without changing the vowel quality of the second syllable to a full vowel (typically “e” in German). We have instructed the speaker to produce the words with the wrong intonation as naturally as possible (see OSF link at the beginning of
Section 2). Nevertheless, the stress effect could in part be influenced by the violation of the vowel quality of the second syllable. However, this applies only to the average fixations to the target object. The initial temporal dynamics of target fixations, on the other hand, basically reflect the processing of the first syllables, which only carried full vowels. These early effects, which were evident in infants and toddlers, should therefore not be influenced by a possible variation in vowel quality.
Finally, we have to consider the function of syllable stress beyond word recognition. Even in languages allowing variation in the position of the stressed syllable, like German, English, or Dutch, a specific syllable position within words might receive stress more often than other syllable positions (
Cutler & Carter, 1987;
Domahs et al., 2008). Speech directed at children learning German even contains 97 percent of words that carry stress on their initial syllable (
Stärk et al., 2022). Infants extract a predominant stress pattern in their target language already within the first few months of life (
Friederici et al., 2007;
Höhle et al., 2014), develop a preference for this typical stress pattern, and appear to use it to find word boundaries (i.e., for speech segmentation;
Jusczyk, 1999). Consequently, infants learning German prefer trochaic over iambic words (
Höhle et al., 2009) and expect words to start with a stressed syllable (
Marimon et al., 2024). Hence, our results might be modulated by infants’ preference for trochaic patterns and their respective segmentation attempts.
Here we argue that the present data might not simply be interpreted in terms of the children’s preference to the canonical stress pattern or to their hampered segmentation attempts due to the incorrectly stressed first syllables of the target words. To rule out that the children in our study generally looked more at the screen when they heard a (trochaic) correctly stressed word compared to an (iambic) incorrectly stressed word, we analyzed looking towards the whole display. There were no significant differences in the looking behavior to the whole display when children heard a correctly stressed compared to an incorrectly stressed word, neither for infants (Wilcoxon signed rank test, V = 985, and
p = 0.184) nor for toddlers (V = 260,
p = 0.202.). Furthermore, to rule out that children rely on syllable stress for speech segmentation, we inserted a pause of approximately 200 milliseconds between the offset of the sentence and the onset of the target word. Infants weigh pauses as heavy segmental cues (
Seidl & Johnson, 2006). Thus, the pause may have prevented the participants from being distracted by the unstressed syllables at the beginning of the word (in the case of the incorrectly stressed words) during speech segmentation. Nonetheless, future research needs to investigate the extent to which delayed target fixation times for incorrectly stressed words reflect mismatch effects with the typical prosodic template in a given language or segmentation difficulties, at least in infants.
In sum, we conclude that word representations in early childhood code for syllable stress, and that German-learning infants and toddlers consider syllable stress information immediately while processing familiar words. However, it should not be overlooked that our study showed a limited reproducibility of word comprehension in infants, which leaves open alternative interpretations of our results, especially for this age group.