Previous Article in Journal
Hop(p)la in French and German
Previous Article in Special Issue
Perception and Interpretation of Contrastive Pitch Accent During Spoken Language Processing in Autistic Children
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Very Young Children Learning German Notice the Incorrect Syllable Stress of Words

by
Ulrike Schild
* and
Claudia Katrin Friedrich
Department of Psychology, University of Tübingen, D-72076 Tübingen, Germany
*
Author to whom correspondence should be addressed.
Languages 2025, 10(8), 197; https://doi.org/10.3390/languages10080197
Submission received: 20 December 2024 / Revised: 5 June 2025 / Accepted: 7 August 2025 / Published: 18 August 2025
(This article belongs to the Special Issue Advances in the Acquisition of Prosody)

Abstract

Syllable stress can help to quickly identify words in a language with variable stress placement like German. Here, we asked at what age incorrect syllable stress impairs language learners’ attempts to assign meaning to familiar words. We recorded the looking times of young children learning German aged from 4 to 15 months (infants, N69) and 2 to 4 years (toddlers, N28). Participants saw displays of two pictures (e.g., a car and a baby); one of both objects (the target) was named. The disyllabic name of the target was either correctly stressed on the first syllable (“BA.by”) or it was incorrectly stressed on the second syllable (“ba.BY”). On average, all children looked more at the target when they heard its correctly stressed name (compared to the incorrectly stressed name). Furthermore, the analyses of growth curves for all children showed a steeper increase in looking time at the target picture when children heard the correctly stressed target’s name compared to the incorrectly stressed name. These results thus suggest that even very young German-learning children use syllable stress for incremental word-meaning mapping. However, separate post hoc analyses revealed robust differences in overall target fixations only in toddlers but not in infants. The stronger effects in toddlers compared to infants could be related either to the growing vocabulary or the increasing sensitivity to word stress with increasing age.

1. Introduction

In spoken languages, variation in speech sounds typically takes place at different phonological levels. At a segmental level, phonemes and smaller phonological units are relevant, while larger units such as syllables play a role at a suprasegmental level. Languages such as Dutch, German, English, Italian, or Spanish, for example, distinguish stressed syllables and unstressed syllables, with stressed syllables being longer, louder, and higher in pitch than unstressed syllables (Cutler & Jesse, 2021). Native adult listeners of these languages use these suprasegmental cues immediately to facilitate online recognition of spoken words (Connell et al., 2018; Cutler & van Donselaar, 2001; Friedrich et al., 2004; Jesse et al., 2017; Reinisch et al., 2010; Schild et al., 2014; Soto-Faraco et al., 2001; Sulpizio & McQueen, 2012; van Donselaar et al., 2005; and Zahner et al., 2019). For example, when hearing the stressed initial syllable of the word admiral (AD.mi.ral; capital letters denote the stressed syllable, dots the syllable boundaries), English adults immediately fixate the written version of admiral even in the presence of a written version of the segmental competitor, admiration (ad.mi.RATION; Jesse et al., 2017). Here, we test whether already German-learning infants and toddlers immediately use suprasegmental cues for word recognition.
Typical syllable stress patterns give languages such as English, Dutch, Spanish, or German a suprasegmental regularity that infants already recognize in their first months of life. In English, for example, 90% of all words follow a trochaic pattern, meaning that they are disyllabic, with the stress being placed on the first syllable (Cutler & Carter, 1987). A strong tendency towards trochees can also be seen in German, Spanish, and Dutch (Cutler & Jesse, 2021; Domahs et al., 2008; Frota et al., 2020). In German, for example, 73 percent of disyllabic words are trochaic (Domahs et al., 2008). Moreover, in speech directed to infants learning German, as many as 97 percent of words carry stress on their initial syllable (Stärk et al., 2022). Therewith, syllable stress provides a reliable cue to speech segmentation (Cutler & Carter, 1987), which infants indeed use early in development (e.g., Höhle et al., 2001; Houston et al., 2000). When listening to spoken materials, infants extract a predominant stress pattern in their target language already within the first few months of life (Becker et al., 2018; Friederici et al., 2007; and Höhle et al., 2014). Later, during their first year, infants develop a preference for the typical stress pattern and appear to use it to find word boundaries (i.e., for speech segmentation, Jusczyk, 1999; for children learning English, see Jusczyk et al., 1999; for children learning Dutch, see Junge et al., 2012; Kuijpers et al., 1998; and for children learning German see Höhle et al., 2009; Marimon et al., 2024).
In this study, we investigate whether young children learning German also use syllable stress when they assign meaning to spoken words. So far, infants’ ability to use and encode syllable stress for word-meaning mapping was tested with the offline habituation-switch technique applied in word learning studies (Curtin, 2009, 2010, 2011). Across those experiments, 12- to 14-month-olds were habituated to novel word–novel object pairings. For example, a meaningless label like “BE.do.ka” was paired with novel object A, and another meaningless label like “be.DO.ka” was paired with another novel object B. These novel word–novel object pairings were repeatedly presented until the children’s looking times fell below a certain threshold. Dishabituation was then tested either with switch trials in which the pairing was reversed (“BE.do.ka” paired with object B and “be.DO.ka” paired with object A, Curtin, 2009) or with switch trials in which the position of the stressed syllable varied (e.g., “do.BE.ka” paired with object A; Curtin, 2011). Across all experiments, infants dishabituated during the switch trials. That is, they looked at the screen again for longer than during the last habituation trial. Similar results were found even when the infants did not have to store the stress of two novel word–novel object pairings during the habituation trials. Thus, when they were trained with only one novel word–novel object pair, e.g., “BE.do.ka” paired with a novel object A, they dishabituated when the stress patterns differed in the test trial, e.g., “be.DO.ka” paired with the same object A (Curtin, 2011).
In the present study, we test whether infants use syllable stress during the time course of matching a presumably known word when it is correctly vs. incorrectly stressed with its respective referent. In contrast to previous studies using the habituation-switch paradigm, we measure online word processing as reflected in children’s eye movements. Crucially, previous studies measuring eye movements in adult native listeners of a language with variable stress have already shown that this measure reflects the use of syllable stress as soon as it is available in the acoustic signal (for native English adults, see Jesse et al., 2017; Reinisch et al., 2010; for native Italian adults, see Sulpizio & McQueen, 2012; and for native German adults, see Zahner et al., 2019).
To apply eye tracking with our young participants, we use a paradigm that is well established to test young children’s word-meaning mapping. This paradigm is referred to as either the Looking-While-Listening (LWL) paradigm (Fernald et al., 2008; Golinkoff et al., 2013, also used in this study), language-guided-looking paradigm (Bergelson & Swingley, 2012, 2015, 2018) or intermodal/cross-modal preferential looking paradigm (Kartushina & Mayor, 2019). In the LWL paradigm, young participants see two images on a screen (e.g., a baby and a car). At the same time, they hear an utterance or a spoken word naming one of both pictures (e.g., “Look at the baby”). Participants’ fixations to the named target picture are used to test whether they have assigned meaning to the spoken input. Results of some researchers using this paradigm suggested that infants as young as 6 months old show a robust comprehension for many words (Bergelson & Aslin, 2017a; Bergelson & Swingley, 2012, 2015, 2018; Rocha et al., 2024; Rosslund et al., 2023; and Tincoff & Jusczyk, 1999, 2012). However, results of other studies indicated a somewhat later onset of the first significant word comprehension at around nine months or even later (Beech & Swingley, 2023; Bergelson, 2020; Kartushina & Mayor, 2019; Steil et al., 2021; and Syrnyk & Meints, 2017).
The fixations in the LWL paradigm already proved to be a useful tool to investigate incremental word recognition in toddlers. This means that the time course of fixations indicates that even very young children do not wait until they have heard a whole word, but instead immediately consider referents of words that match even fragmentary speech input. For example, 18-month-olds looked to the correct referent like a baby while hearing only the initial part of the referent’s name, like “BA” taken from “BA.by” (Fernald et al., 2001). In addition, children also consider word candidates that only partially match the input. Twenty-four-month-olds took longer to fixate the target picture (e.g., a dog) when the target’s spoken form overlapped in initial phonemes with the label of a distractor picture (e.g., a doll) than when there was no phonological overlap between the names of both pictures (e.g., dog and tree, Swingley et al., 1999). In the present study, we want to find out whether very young children also use stress information incrementally.
Since the seminal study by Swingley and Aslin (2000), the LWL paradigm has been frequently exploited to test infants’ and toddlers’ sensitivities to the mispronunciation of segments. In those studies, children hear spoken labels that vary in a single segment from the canonical pronunciation of the target’s name, such as “VA.by” instead of “BA.by”. A recent meta-analysis by Von Holzen and Bergmann (2021) found that children under 31-months-of-age typically look less at the target picture when the target’s name is mispronounced (note that among the 32 included papers was also a study that tested 19-month-old German-speaking children, Höhle et al., 2006). Nevertheless, across studies, the very young children looked at the target more than at the distractor when the word was mispronounced. With increasing numbers of mispronounced features (ranging from one to three feature changes) the effect of impaired target fixations increased. Sensitivity to mispronunciation was modulated by its position within the word, with the largest effects for onset mispronunciations and the smallest for coda mispronunciations. The meta-analysis also revealed that age did not modulate target identification. However, only 1 out of 32 included papers included infants younger than 12 months.
Toddlers learning English appear to handle misplaced stress comparable to segmental mispronunciation. In a recent study by Campbell et al. (2019), 17-month-olds heard either a canonically (i.e., correctly) stressed version of a word (e.g., “BA.by”) or an incorrectly stressed version of that word (e.g., “ba.BY”) while viewing two pictures on the screen (e.g., a baby and a chicken). The average number of fixations from word onset to two seconds thereafter showed that the toddlers only recognized the target when the target’s name was correctly stressed, but not when it was incorrectly stressed (as indicated by target fixation above chance). Nevertheless, the growth curve analysis of Campbell et al. suggested that participants’ fixations were attracted towards the target picture by the correctly and the incorrectly stressed version of the target word’s name. However, the estimated value for the correctly stressed word was double the size and developed faster than that of the incorrectly stressed word. Together, these results show that English-learning children at the end of their second year of life use stress as an important cue for incremental word processing.
In the present study, we investigate whether even very young German-learning infants use stress cues (incrementally) for word-meaning mapping. To this end, we test German-learning infants between 4 and 15 months old who are just learning their first words. That is, we do not assume that all infants already know all the words used in the experiment, but that they are in their initial steps of building a vocabulary (e.g., Bergelson & Swingley, 2012, 2015, 2018). To obtain a rough impression of the tested infants’ understanding of the words, we will ask the infants’ parents to indicate their infants’ knowledge of the tested words. As a control group, we test 2- to 4-year-old toddlers who already should know all the words used in our experiment. Like in the study of Campbell et al. (2019), children see displays of two pictures (e.g., a car and a baby) while hearing a spoken noun referring to one of both objects (the target, e.g., “Baby”). The disyllabic name of the target is either correctly stressed, i.e., with stress on the first syllable (“BA.by”), or it is incorrectly stressed, i.e., with stress on the second syllable (“ba.BY”).
Next to the different target languages, the present study differs from the study by Campbell et al. (2019) in three methodological aspects. First, we keep the distractor pictures constant during the presentation of the correctly and incorrectly stressed target names. In the study by Campbell et al., the target image was presented along with a different distractor, respectively. For example, when the children heard “BA.by”, they saw a picture of a baby and a picture of a dinosaur; meanwhile, when they heard “ba.BY”, they saw a picture of a baby and a picture of a chicken. This procedure cannot control different picture preferences for different distractors. Here, we follow the approach of presenting the same target–distractor pairs in the trials to be compared. Second, whereas Campbell et al. calculated the proportion of fixation for each single trial (fixation to the target divided by fixation to the whole display), we base our analysis on proportion indices (see Bergelson & Swingley, 2012). That is, we include both trials for the same picture pair into the calculation. In addition to the same displays, the proportion indices further correct for possible picture preferences. Third, we include more objects (28 pictures) as compared to Campbell et al. (six objects), being targets and distractors, respectively. This results in 56 trials in our study compared to 12 trials in the former one.
With our study, we intend to investigate the developmental trajectory of the use of stress cues in incremental word-meaning mapping across infancy and toddlerhood of children learning German. Therefore, our study also differs from the earlier study by Campbell et al. (2019) in the age of the children tested. Whereas previously a group of 17-month-old children was tested, here we target (i) a younger and (ii) an older age group of young children. (i) With the younger age group (4 to 15 months old), we like to investigate when infants start using stress cues for comprehending their first words (even if they may not yet understand all the words used in the experiment). (ii) We tested even older toddlers than Campbell (17-month-olds) to be sure that the toddlers knew all the tested words. With the older age group (two to four years old), we intend to substantiate the former finding that toddlers use word stress and to replicate this for another language, namely German.
Our assumptions are as follows: first, word comprehension increases with age. This should be reflected in the main effect of age for the looking proportions to the named target pictures in our LWL paradigm. Second, very young children use syllable stress for incremental word-meaning mapping. This should be reflected in the main effect of stress, with incorrectly stressed target names (compared to correctly stressed ones) being recognized less or not at all (i.e., looking proportions at random level). If word comprehension and the use of syllable stress are becoming more stable with increasing age, this should be reflected in the interaction of stress and age reflected in the looking proportions. In other words, with increasing age, the difference in looking times when hearing the correctly vs. the incorrectly stressed words should increase. From that result, we would conclude that the use of syllable stress follows first word comprehension. Finally, we explore whether German-learning infants and toddlers immediately use syllable stress cues while hearing the spoken word by a growth curve analysis.

2. Materials and Methods

Audio-files, pre-process data, and R scripts are available on the Open Science Framework (OSF): https://osf.io/azjqe/ (accessed on 1 August 2025).

2.1. Participants

All children were raised in monolingual contexts with more than 90% German spoken at home (Byers-Heinlein, 2015). Children born after the 37th week of pregnancy were included in the study (pre-specified participation criterion). The study was approved by the Ethics Committee for Psychological Research at the Faculty of Science at the University of Tübingen (Friedrich_2018_1025_139, Bauch_2021_0726_234).
Recording for this study started in 2018 and 2019, when we tested 47 infants with a laptop and a portable 60 Hz eye tracker (Tobii X2-60 Compact®, Tobii Technology AB, Stockholm, Sweden). A total of 34 of these data sets entered analysis. Thirteen infants that contributed no proportion index (PI) for both conditions were excluded. We stopped recordings from 2020 to 2022 due to the coronavirus pandemic and corresponding laboratory closures due to contact restrictions. In 2023, we tested 54 infants with a stationary 300 Hz eye tracker (Tobii TX 300®, Tobii Technology AB, Stockholm, Sweden), of whom 35 entered analysis. Four infants were accidentally tested twice; for them, only their first data set entered the analysis. Fifteen infants contributed no PI for both conditions. The final sample included 69 infants (39 girls) with a mean age of 297 days (range 132–461, SD 85).
In 2022 and 2023 we tested 42 toddlers with the 300 Hz eye tracker, of whom 28 entered analysis (11 female, range 23–46 months, and mean 35.5 (SD 6.2); see Figure 1 for the age distribution). One toddler was tested twice; for her, we only included the first recording. Finally, thirteen toddlers contributed no PI for both conditions. Overall, the attrition rate in this study was 33%. It lies within the very wide range reported for attrition rates in infant studies. Note that attrition rates vary widely between studies. In a systematic review, Slaughter and Suddendorf (2007) reported an average attrition rate of 13.7% in infant studies using visual paradigms that varied between 0 and 62% for single studies.

2.2. Materials

2.2.1. Questionnaires

The parents answered a demographic questionnaire with questions about their gender, age, and language; the gender, age, and language of their child; premature vs. full-term birth of their child; and other demographic data (e.g., family income and partner’s education). In addition, the parents of the infants (but not parents of the toddlers) completed a vocabulary questionnaire in which they were asked to estimate their child’s understanding of words. That is, parents were asked to indicate whether they think their child (i) understands (see Table 1, second column) (ii) and speaks a certain word and (iii) how often their child heard that certain word (Likert-scale, 1 = rarely to 5 = several times a day). The vocabulary questionnaire comprised 38 disyllabic German nouns, among them were the 28 target words used in the experiment.

2.2.2. Stimuli

Twenty-eight disyllabic, monomorphemic words were used as stimuli (see Table 1, first column). All words had their canonical stress on the first syllable. Words were spoken by a professional monolingual German speaker, once with the canonical stressed pattern (“BA.by) and once with an incorrect stress pattern (“ba.BY”). We instructed the speaker to produce the words in a child-directed way. In addition, we instructed the speaker to produce the correct and incorrect pronunciation as naturally as possible and to realize the same vowel quality for correctly and incorrectly stressed words. This is particularly important, as we mostly included words for which the second syllable might be reduced (see Table 1, all words except for Ba.By and AU.to). Most of the presented words could thus potentially entail a schwa in their second syllable. Our somewhat biased selection is caused by the fact that there are only a few early-learned words in German that do not contain an “e” in their unstressed syllable and for which it is thus not possible to reduce the vowel of the unstressed syllable. We think that the vowel quality of the nucleus of the second syllable for words that might potentially end with a schwa was not heavily changed from the correctly to the incorrectly stressed versions. To give interested readers an idea of whether the words sound strange, we make our materials available online (see the OSF link at the beginning of Section 2).
We analyzed some acoustic parameters to get an idea of how the speaker implemented incorrect stress placement. The length of correctly stressed and incorrectly stressed words differed significantly, t(27) = −10.96, p < 0.001. Stressed words (M 830 ms, SD 25 ms, range 601–117 ms) were shorter in duration than unstressed words (M 1109 ms, SD 19 ms, and 952–1310 ms). For correctly stressed words, the duration of the first syllable (M 486 ms, range 315–644 ms) and the second syllable (M 344 ms, range 205–589 ms) differed, t(27) = 6.73, p < 0.001. However, the duration of the first syllable (M 590 ms, range 380–884 ms) and the second syllable (M 519 ms, range 251–847 ms) was not significantly different for incorrectly stressed words, t(27) = 1.61, p = 0.12. Thus, both the first and the second syllables were longer for incorrectly stressed words compared to correct stressed words, t(27)both < −4.55, pboth < 0.001. This indicates that the speaker lengthened the whole incorrectly stressed word, in particular, the second syllable, which should receive the stress (contrary to the canonical form). Figure 2 displays the mean intensity and mean pitch of initially stressed and initially unstressed words per syllable. All intensity measures (first, maximal, and last) differed when both syllables of the correctly and incorrectly stressed words were compared (all p < 0.002, except for the last intensity measure for the second syllable). The pitch measures did not differ significantly between initially stressed and initially unstressed words and neither for the first nor for the second syllable. The timing differences in the maximal values of the first and second syllable, respectively, were significant for intensity, both p < 0.032, and for pitch, both p < 0.012. The maximum was reached later in the first syllables of the correctly stressed words (compared to the first syllables of the incorrectly stressed versions). Contrarily, the maximum was reached later in the second syllables of the incorrectly stressed versions (compared to the second syllables of the correctly stressed words). Thus, the intensity and fundamental frequency peaked later in the stressed syllables, respectively.
In one display, two of the 28 pictures were presented (yoked pairs: baby–bird*, bottle–hat, brush–diaper, car–plate, cheese–key, cucumber–pants, cup–glasses, mup–tractor, doll–cat, finger–hair, flower–nose, fork–soup, rabbit–beetle, and spoon–pillow; for (different) item difficulties, see Table 1). Most of the items were taken from a parental questionnaire for the word comprehension of one-year-olds (ELFRA, Grimm & Doil, 2006, see Table 1, third column). We also wanted to confirm our word choices via Wordbank (Frank et al., 2017). However, the German sample only starts at 18 months and only has production ratings. To have a better age match to our tested 4–15-month-old infants, we, therefore, added the (American) English sample at the age of 8 months (see Table 1, 4th column) and the (British) English sample at the age of 12 months (see Table 1, 5th column), both with understanding ratings. Our choices resulted in 14 yoked picture pairs. Care was taken to ensure that there were no phonological overlaps between the word beginnings of the names of both pictures in a pair. In addition, we tried to map picture pairs from different categories (bath, kitchen, body, and animal). However, out of the 14 pairs, 4 fell into the same category (brush–diaper, finger–hair, fork–soup, and rabbit–beetle). Picture pairs were presented four times so that each picture was the target (T) twice—once paired with its correctly stressed name and once paired with its incorrectly stressed name—and twice the distractor (D). In total, the experiment contained 56 trials (14 picture pairs × 4 times). Half of the children saw the same pictures always on the left or right side, respectively. The other half of the children saw the same pictures on the opposite sides. We pseudorandomized the stimuli. The experiment consisted of 4 blocks. In one block, all different 14 displays were presented. No repetition of a display occurred within one of each of the four blocks. We created three further lists: in the second list, trials were presented in the reversed order of the first list. In the third list, the first half of the stimuli was presented in the second half and vice versa. Finally, in the fourth list, trials were presented in the reversed order of the third list.

2.2.3. Looking-While-Listening (LWL) Experiment

The experiment was implemented with Presentation® (Neurobehavioral Systems Inc., Albany, CA, USA). It was presented either on an ASUS-Notebook (17.30 inch, screen resolution of 1920 × 1080 pixels)—when eye movements were recorded with the Tobii X2–60 Hz—or on a TFT monitor (23 inch, 1920 × 1080 pixels)—when eye movements were recorded with the Tobii TX 300 Hz (for details, see description of the sample). The trial scheme of the LWL paradigm is displayed in Figure 3. Each trial started with a flashing dot lasting 1 s (1920 × 1080 pixel). Then two pictures appeared simultaneously on the screen, followed by the carrier sentence (for male nouns, either “Schau zum…”, “Look at …”, or “Schau mal ein …”, or “Take a look, a ….”; for female nouns, either “Schau zur …” or “Schau mal eine …”; and for the pair finger–hair: “Schau …”, “Look) starting at 1100 ms after picture onset. Sentences were, on average, 1755 ms long. Note that there was a pause of approximately 200 ms between the offset of the carrier sentence and the start of the target word. The target word started 3 s after picture onset. After the target word onset, the picture pair remained on the screen for another 4 s. After every fourth trial, one of four attention catchers (as provided by Tobii, e.g., video of a bee) was presented for 1 s.

2.3. Procedure

Some of the children tested in 2018 and 2019 were recorded at home. All other infants and toddlers were tested in our lab. First, the parents were informed about the study and signed an informed consent. Depending on the child’s mood, the parents completed the questionnaire either at the beginning, during a pause, or at the end of the visit. During the experiment, the infants were seated on their parents’ lap, approximately 60 cm away from the screen, respectively. Parents were told to close their eyes. The toddlers were seated in a highchair, and the parents sat on one side next to their child, outside of the eye tracker zone. The experimenter sat on the other side of the child. We asked the parents not to point at the screen or say anything about the pictures and words presented. Before the start of the experiment, the five-point infant calibration, which is implemented in Tobii Studio, was performed. After the LWL experiment, the parents swaddled their child, and the auditory exchange of this diaper changing interaction was recorded for another research question covered elsewhere (work in progress). Before the farewell, families received a present for their participation (a book or a voucher for a book shop).

2.4. Data Analysis

We analyzed all data with R software (version 4.3.2, R Development Core Team, 2021). Considering the saccadic reaction time, we analyzed all eye movements within a time window starting at 366 ms after word onset lasting for 3.5 s. All eye movements towards the left side of the screen and all eye movements towards the right side of the screen entered analysis. Trials with less than 12.5% recorded sample points within the 3.5 s time window of analysis were excluded. To be included in the analysis, participants had to contribute at least one proportion index (PI, see below in this paragraph, how we calculated this index). To prevent the influence of picture preferences, one PI was calculated for each display (including the picture i and k, respectively; Bergelson & Swingley, 2012). Basically, the PIs resulted from relating looking times to both pictures in a pair (picturei and picturek) over the two trials, in which either one picture was named, or the other picture was named. Thus, the looking times for both pictures entered the calculation, two of them being looking times for the picture when it was the named target (Ti) and (Tk), and two of them being looking times for the same picture when it was not named, that is, when it was the distractor (Di) and (Dk). PIs reflected differences based on this logic: the minuend was calculated by fixations on the picturei when it was the target (Ti), divided by the sum of looking times for both pictures in this trial (i.e., the sum of looking times for the Ti and Dk). The subtrahend was calculated by fixations on the picturei when it was the distractori (Di), divided by the sum of looking times for both pictures in this trial (i.e., sum of looking times for the Tk and Di). This resulted in the following equation: PI Ti = Ti/(Ti + Dk) − Di/(Tk + Di). PIs were calculated separately for trials in which the correctly stressed names of the targets were played and PIs for trials in which incorrectly stressed names of the targets were played. The PI could range from −1 to +1. Values above zero indicate that children looked more to the target than to the distractor picture when they heard the target’s label, therewith reflecting word comprehension.
First, to test for word comprehension, we conducted non-parametric Wilcoxon tests (one-sided, against chance or between conditions) for the PI-by-participants, calculated over all stimulus pairs, and the PI-by-items (i.e., stimulus pair), calculated over all participants.
Second, to test for age and stress effects on the dependent measure (PI), we set up linear mixed effect models (estimated using ML and nloptwrap optimizer) using the lme4 package (version 1.1.37) in R (Bates et al., 2015). Note that for the age we entered the months for the toddlers and calculated months for the infants by days/30. Our null model included a fixed intercept and the random effect of participants with varying intercepts (a model with slope did not converge). Age and stress, as well as the interaction of both, were gradually included as fixed effects. We compared these models and only report the significant model. Confidence intervals (CIs) of 95% and p-values were computed using a Wald t-distribution approximation.
Finally, to analyze the time course of the eye movement data and to check whether results mirror that of the mixed effect models, we also performed a growth curve analysis (GCA). Therefore, we divided the eye-tracking data into 50 ms time bins (down-sampling). As fixation proportions are aggregated binary outcomes, and there was only a small number of trials, we used a weighted empirical-logit growth curve analysis (empirical-logit: log((looking to target + 0.5)/(looking to distractor + 0.5)), weights: (1/(looking to target + 0.5)) + (1/(looking to distractor + 0.5)), Mirman, 2014) to analyze the target gaze data from 0 to 4 s after target word onset (without subtracting the saccadic reaction time). The overall time course of target fixations was modeled with a fourth-order (quartic) orthogonal polynomial and the fixed effect of stress (correctly vs. incorrectly stressed target word; within participants) on these time terms. The model included participant random effects on all time terms. The parameter estimate degrees of freedom and corresponding p-values were estimated using Satterthwaite’s method. For all analyses, the statistical significance was based on the α-level of 0.05.

3. Results

3.1. Mean PIs for Infants and Toddlers

First, we looked at the mean PIs of the two different age groups separately to see if the infants showed word understanding at all and to obtain a hint of the size of a possible effect on toddlers.

3.1.1. Infants’ PIs

The measured PI-by-participants did not significantly differ from chance, neither for correctly stressed words, Mdn 0.03, V = 1415, CI [−0.01, Inf], and p = 0.11; nor for incorrectly stressed words, Mdn 0.02, V = 1352, CI [−0.01, Inf], and p = 0.19. Both conditions did not differ: Mdn 0.04, V = 1394, CI [−0.019, Inf], and p = 0.13. However, the measured PI-by-items differed significantly from chance for correctly stressed words, Mdn 0.04, V = 81, CI [0.003, Inf], and p = 0.04; but not for incorrectly stressed words, Mdn 0.02, V = 81, CI [−0.011, Inf], and p = 0.13. Both conditions did not differ: Mdn 0.03, V = 71, CI [−0.012, Inf], and p = 0.13. These data suggest that infants showed no (or, at most, very weak) overall comprehension of words. However, their word comprehension seemed to be greater for correctly stressed words than for incorrectly stressed words, at least when analyzing the items.

3.1.2. Toddlers’ PIs

The measured PI-by-participants differed from chance for both conditions (correctly stressed words: Mdn 0.36, V = 406, CI [0.310, Inf], and p < 0.001; incorrectly stressed words: Mdn 0.30, V = 406, and CI [0.250, Inf]). Both conditions differed significantly (Mdn 0.06, V = 286, CI [0.011, Inf], and p = 0.03). Similarly, the measured PI-by-items differed from chance for both conditions (correctly stressed words: Mdn 0.38, V = 105, CI [0.36, Inf], and p < 0.001; incorrectly stressed words: Mdn 0.31, V = 105, CI [0.28, Inf], and p < 0.001). Both conditions differed significantly: Mdn 0.06, V = 102, CI [0.041, Inf], and p < 0.001.
We also calculated the correlation between the PI of each participant and a difference score of the PI for correctly and incorrectly stressed words for each participant. The correlation was not significant: t = 0.878, df = 95, p = 0.38, and CI [−0.112, 0.284]. Thus, we did not find a relation between the size of the PI and the size of the stress effect.

3.2. Mixed Effect Model

Each infant contributed on average six PIs (out of fourteen) (range 1–13, SD 3.7) in the condition with correctly stressed labels, and seven PIs (range 2–14, SD 3.6) in the condition with incorrectly stressed labels. Each toddler contributed on average 10.7 PIs (out of 14) (range 2–14, SD 2.9) in the condition with correctly stressed labels, and 11.1 trials (range 3–14, SD 2.5) in the condition with incorrectly stressed labels.
The model with the best fit (χ2(1): 94.24, p < 0.001) predicted the PI with the fixed factors stress and age and participants as a random effect: PI ~ Stress + Age + (1|Participant). The model’s total explanatory power is moderate (conditional R2 = 0.17), and the part related to the fixed effects alone (marginal R2) is 0.15. The model’s intercept was at −0.06, 95% CI [−0.10, −0.01], t(1495) = −2.57, and p = 0.010. The effect of stress was significant and negative, beta = −0.04, 95% CI [−0.08, −0.005], t (1495) = −2.23, and p = 0.026; Std.beta = −0.11, 95% CI [−0.20, −0.01], indicating that the PI was greater for correctly compared to incorrectly stressed words. The effect of age was significant and positive, beta = 0.01, 95% CI [9.85 × 10−3, 0.01], t (1495) = 13.40, and p < 0.001; Std.beta = 0.38, 95% CI [0.32, 0.43], indicating that with increasing age, the PI increased. The interaction between stress and age was not significant, i.e., the model including the interaction between stress and age was not significantly better compared to the model including only the main effects: χ2(1) < 1, p = 0.37.1

3.3. Growth Curve Analysis (GCA)

3.3.1. Infants’ GCA

As time courses of eye movements showed pronounced differences for infants and toddlers (see Figure 4), the GCA was conducted for each group separately. For infants, the effect of stress on the linear term was significant, β = −0.073, SE = 0.036, and p = 0.040, indicating a steeper slope for correctly stressed than for incorrectly stressed words. The effects of stress on the cubic, β = −0.207, SE = 0.036, and p < 0.001, and on the quartic term, β = −0.165, SE = 0.036, and p < 0.001, were also significant. Figure 4a illustrates enhanced early fixations of the target pictures when the targets names were correctly stressed compared to incorrectly stressed target names. Neither the effect of stress on the intercept nor on the quadratic time term was significant, see Figure 4 upper panel (see OSF link at the beginning of Section 2 for data of individual infants). Overall, the time course of eye movements was flatter in infants compared to toddlers, probably reflecting that infants understood fewer words used in the experiment than toddlers.

3.3.2. Toddlers’ GCA

In contrast to the less clear time course of eye movement data obtained from the infants, toddlers showed a more pronounced time course of looking behavior. For toddlers, there was a significant effect of stress on the intercept term, β = −0.053, SE = 0.005, and p < 0.001, indicating higher overall target fixation proportions to the target picture when the target’s name was correctly stressed relative to the incorrectly stressed target name. There was also a significant effect of stress on the linear term, β = −0.088, SE = 0.045, and p = 0.049, indicating a steeper slope for correctly stressed compared to incorrectly stressed words. The stress on the quadratic term, β = −0.093, SE = 0.045, and p = 0.038, and on the cubic term, β = 0.187, SE = 0.045, and p < 0.001, but not on the quartic term, were significant, see Figure 4, lower panel (see OSF link at the beginning of Section 2 for data of individual toddlers).
In sum, the GCA suggests a greater increase in target looking when hearing the correctly stressed name of the target compared to the incorrectly stressed versions in both infants and toddlers (effect of stress on the linear term). However, the stress effect on toddlers seemed to be more robust than that of the infants, as there was also a main effect of stress for this group.

4. Discussion

We investigated whether 4-to-14-month-old German-learning infants and 2-to-4-year-old German-learning toddlers use syllable stress for incremental word recognition. Our first hypothesis stated that word comprehension, in general, increases with age. Indeed, while the target recognition effects in infants were rather weak, they increased with the increasing age in toddlers. Our finding for infants integrates into mixed LWL results for very young children. While results of some studies pointed to a robust word comprehension in infants as young as 6 months old (Bergelson & Aslin, 2017a; Bergelson & Swingley, 2012, 2015, 2018; Rocha et al., 2024; Rosslund et al., 2023; and Tincoff & Jusczyk, 1999, 2012), results of other studies indicated a somewhat later onset of the first significant word comprehension at around 9 months or even later (Beech & Swingley, 2023; Bergelson, 2020; Kartushina & Mayor, 2019; Steil et al., 2021; and Syrnyk & Meints, 2017). Our results rather align with these latter studies. Thus, we might have tapped the very first attempts of German-learning infants to link language input with common objects.
In addition to infants’ limited vocabulary, two specific factors of our design may have made it difficult to detect a robust word recognition in infants, namely the use of (i) disyllabic words and (ii) trials with picture pairs from the same category. (i) To be able to manipulate syllable stress within words, we had to rely on disyllabic words, which might be acquired a little later than the monosyllabic words used mainly in other studies (for English-learning infants, see the following: Bergelson & Swingley, 2012, 2015, 2018; Syrnyk & Meints, 2017; and Tincoff & Jusczyk, 1999, 2012; for Norwegian-learning infants, see the following: Kartushina & Mayor, 2019). Possibly, the German-learning infants would have shown a more robust word comprehension for (some) monosyllabic earlier acquired words like “Ball” (ball), “Hund” (dog), “Bett” (bed), or “Keks” (cookie). However, disyllabic nouns are very typical in German, and we are not aware of any study directly comparing the acquisition of mono- vs. disyllabic words. (ii) Five out of our fourteen yoked word pairs could be put in a similar category (living beings: baby–bird, rabbit–beetle; bath items: brush–diaper; body items: finger–hair; and kitchen items: fork–soup). It has been shown that young infants (but not older toddlers) have difficulties in fixating the named picture when the distractor picture belongs to the same category (e.g., food items; Bergelson & Aslin, 2017b). In fact, visual inspection of our data indicates that for at least two pairs, “brush–diaper” and “finger–hair”, infants, but not toddlers, showed the lowest word comprehension (see Figure 5). Both factors of our design may have contributed to the age effect on robust word comprehension.
Without question, various characteristics of the design and the material possibly contribute to the success of young children in word recognition studies (for further discussion, see Kartushina & Mayor, 2019; Steil et al., 2021). Future work needs to systematically test such factors. However, since the main aim of our study was not to show at what age infants show robust word comprehension, we believe that the possibility that they did not understand all words has less of an impact on our main results regarding stress.
Our second hypothesis stated that infants as well as toddlers use syllable stress for incremental word-meaning mapping. On a general level, our results support this assumption. The mixed effect model showed a main effect of stress (and no interaction of age and stress). However, post hoc analyses revealed that the stress effect was carried mainly by the toddlers. Only toddlers showed a stress effect in a post hoc separate mixed model. Although the stress effect for infants was numerically in the expected direction, it was not significant in the mixed effect model applied to infants only. However, at least the analysis by items showed that infants might use syllable stress in their first successful attempts to match objects to their names. Infants’ fixations on the targets were only above chance level when the target’s name was correctly stressed but at chance level when it was not correctly stressed. Finally, growth curve analyses revealed an overall steeper increase in looking times at the target picture when children heard the correctly stressed name of the target compared to its incorrectly stressed name for both groups, infants and toddlers. In sum, as we found some (albeit fragile) hints for a stress effect in infants, we cautiously conclude that our results suggest that the first word form representations in German-learning children encode syllable stress, and that even very young children use corresponding suprasegmental information for incremental spoken word recognition.
The rather fragile stress effects that we obtained for the infant group could originate from two sources: first, infants might not yet integrate syllable stress into their incremental word-meaning mapping. However, the small but systematic stress effects that infants show point towards some sensitivity of this group to syllable stress at the word level. Alternatively, infants’ restricted vocabulary might have diminished the number of trials that could potentially show the stress effect. Infants vary widely in the number of words they already understand. Crucially, infants might only know a few of the words presented in the experiment (see also the parents’ rating in Table 1, second column, although the parents also might not be correct in judging the vocabulary of their children). Thus, most likely, infants were tested in words that they already knew as well as in words that they did not yet know. For the few words they have known, infants might show stress processing (driving the effects in the item analysis and in the GCA). We have to conclude that stress effects are difficult to achieve if not all infants understand all the words tested. This seems also in line with a recent mispronunciation study with very young infants, where 6-to-8-month-olds showed no recognition of words, and thus, mispronunciation processing was difficult to test (Beech & Swingley, 2023). One solution to this could be to analyze only the words where parents indicated that infants might know them. However, when we consider the overall parents’ ratings of which words they thought their children already understood, we find no differences in PIs. That is, on trials with words rated on average lower by parents (e.g., beg, pillow, and soup, see Table 1), children did not show lower PIs than on other trials (see Figure 5). This observation confirms that parents’ ratings do not appear to reliably discriminate between words that very young children know and words that they do not know (see also the results of Frank et al., 2021). We therefore refrained from analyzing the stress effects for both groups of words separately, which would further reduce the power of the analyses.
Although target recognition effects in infants were rather weak overall, they were stronger for correctly stressed target names compared to incorrectly stressed ones. In line with previous work with somewhat older children (Campbell et al., 2019), we obtained hints for enhanced success in target recognition for correctly stressed words compared to their incorrectly stressed versions. At a descriptive level, this is also illustrated in Figure 5. Here, we plotted the mean PIs per display for correctly stressed and incorrectly stressed trials and for infants and toddlers, separately. For nearly all item pairs, the PI was higher when the target’s name was correctly stressed in comparison to when it was incorrectly stressed (except for the pair fork–soup for both groups and the pairs bottle–hat and cucumber–pants for infants).
For toddlers, we found significant proportions of fixations to the target picture for both correctly and incorrectly stressed target names. This finding somewhat contrasts to the proportion of fixation results of the study by Campbell et al. (2019), in which 17-month-old English-learning toddlers did not show robust target recognition for incorrectly stressed target names. Despite the different target languages of participating children in both studies, the toddlers of our study were at least 6 months older than the formerly tested toddlers. Furthermore, we applied a somewhat different analysis strategy than Campbell et al. They calculated the proportion of fixation for each single trial (T/(T + D)) while we included two trials for the same picture pair into the calculation of the proportion of fixations (see Data Analysis). Therewith, we considered fixations to the target picture when it was the target and when it was the distractor. We did so in accordance with the procedure of Bergelson and Swingley (2012) to correct for picture preference. Thus, toddlers’ diverging fixation proportions of the target picture when hearing its incorrectly stressed name in the study by Campbell et al. and our study could be due to (i) the different target languages, (ii) our older sample, or (iii) the different dependent measure used to calculate the fixation proportion. Interestingly, the growth curve analysis by Campbell et al. attested that—also in their study—the toddlers recognized the target when hearing its incorrectly stressed name. From this we might conclude that the calculation of fixation proportions (PIs) applied here is more sensitive to detect subtle differences between the two conditions than the proportion index for single trials applied by Campbell et al., at least for the toddlers. Nevertheless, both studies showed that toddlers reliably distinguish between correctly and incorrectly stressed words.
As in the study by Campbell et al. (2019), our results of the mixed effect models and the GCA differed. In our study, the mixed models revealed a stress effect across infants and toddlers, while the GCA for the single groups revealed a stress effect on the intercept only for toddlers but not for infants2. Our divergent findings might be due to the different dependent measures included in the mixed effect models and the GCA. The PI entering the mixed effect models combined the data of two trials. It takes the picture preference into account by relating the looking behavior of displays in two trials once the word named the target and once the word named the distractor (see Section 2.4). In contrast, single trials were included in the GCA. The different results might mean that the infants’ data is not that robust. However, we interpret the data as follows: although the infants show fragile word comprehension, it may be that for the (few) words they understand, they use the syllable stress information immediately. That is, we argue that the few words that infants understand determine the tiny stress effects in the item analysis and in the GCA. This would be in line with the assumption that infants’ word representations contain already very detailed information (see, for example, Swingley, 2003, for evidence that young children encode phonetic details in words).
Turning to the different time terms, the infants and toddlers tested here appeared to use syllable stress incrementally in real-time word comprehension like adults do (Connell et al., 2018; Jesse et al., 2017; Reinisch et al., 2010; and Zahner et al., 2019). This was indicated by a linear increase in the stress effect in the GCA for both infants and toddlers. This finding is in line with Campbell et al.’s (2019) results for 17-month-olds who also reported interactions between word stress and time terms: they interpreted a positive interaction of word stress and the linear time term as indicating that the looking behavior was steeper for correctly compared to incorrectly stressed words. Similarly, they interpreted the negative interaction of word stress and the quadratic time term as indicting less bend in the fixation curve for correctly compared to incorrectly stressed words. From that, the authors concluded that the looking behavior was faster and increased through the whole trial for correctly compared to incorrectly stressed words. The tested infants and toddlers of our study showed the same pattern of results, except that toddlers—but not infants—additionally showed an effect of stress on the intercept.
Our finding that German-learning infants and toddlers use syllable stress for word comprehension is interesting, because they do not necessarily have to do so given their linguistic environment. To our knowledge, there is no minimal stress pair differentiating the meaning of early acquired words in German (as it is the case for words acquired later, like “AU.gust” for the month August and “au.GUST”, a male given name). Thus, stress in infancy and toddlerhood is not (yet) lexically contrastive, and for early learned words, it would be sufficient to rely on phonemes without taking stress into account. Moreover, early acquired words in German (at least disyllabic nouns) have a bias towards carrying stress on the first syllable, which may further lead children to neglect stress information. This bias is also reflected in production errors for older children. As soon as toddlers begin to produce their first words, they often omit the first syllable of words that begin with an unstressed syllable and just start these words with the stressed syllable (e.g., “del.FIN” to “FIN”; “ele.FANT” to “FANT”; and “ba.NA.ne” to “NA.ne”). That infants and toddlers use syllable stress nonetheless may show that children take further characteristics of their language into account. For example, since German is a language in which the stress can vary between syllables, this fact could lead children to process the different stress of the syllables, even if this is not necessary for the first disyllable nouns they learn.
In German, as in other languages with variable stress, syllable stress is transmitted through syllable length, intensity, and pitch (Cutler & Jesse, 2021). Here, we used materials produced by a professional human speaker, who we have asked to stress the target word either correctly or produce a version that is stressed on the second syllable rather than on the first. We considered naturally spoken material to be more suitable for testing natural speech processing in very young children than computerized speech. In this way, however, we were unable to control or ensure the variation in all acoustic parameters. Thus, the speaker realized our intended stress manipulation not via different pitch but via different intensity values of the initial syllables of the correctly stressed words and their incorrectly stressed versions. In addition, timing of maximal pitch and maximal intensity differed (see Section 2.2.2). Thus, our results indicate that children can rely on only one or two cues (here, intensity and timing) to process different stress patterns of syllables. This could also mean that we have underestimated the stress effect, as it would have been stronger if the length of the first syllable and the pitch values had been different between correctly and incorrectly stressed words.
A weakness of our material is the dominance of words whose nucleus of the second syllable is typically reduced to schwa in running speech (see Section 2.2.2). Strictly speaking, it is not possible to stress such syllables without changing the vowel quality of the second syllable to a full vowel (typically “e” in German). We have instructed the speaker to produce the words with the wrong intonation as naturally as possible (see OSF link at the beginning of Section 2). Nevertheless, the stress effect could in part be influenced by the violation of the vowel quality of the second syllable. However, this applies only to the average fixations to the target object. The initial temporal dynamics of target fixations, on the other hand, basically reflect the processing of the first syllables, which only carried full vowels. These early effects, which were evident in infants and toddlers, should therefore not be influenced by a possible variation in vowel quality.
Finally, we have to consider the function of syllable stress beyond word recognition. Even in languages allowing variation in the position of the stressed syllable, like German, English, or Dutch, a specific syllable position within words might receive stress more often than other syllable positions (Cutler & Carter, 1987; Domahs et al., 2008). Speech directed at children learning German even contains 97 percent of words that carry stress on their initial syllable (Stärk et al., 2022). Infants extract a predominant stress pattern in their target language already within the first few months of life (Friederici et al., 2007; Höhle et al., 2014), develop a preference for this typical stress pattern, and appear to use it to find word boundaries (i.e., for speech segmentation; Jusczyk, 1999). Consequently, infants learning German prefer trochaic over iambic words (Höhle et al., 2009) and expect words to start with a stressed syllable (Marimon et al., 2024). Hence, our results might be modulated by infants’ preference for trochaic patterns and their respective segmentation attempts.
Here we argue that the present data might not simply be interpreted in terms of the children’s preference to the canonical stress pattern or to their hampered segmentation attempts due to the incorrectly stressed first syllables of the target words. To rule out that the children in our study generally looked more at the screen when they heard a (trochaic) correctly stressed word compared to an (iambic) incorrectly stressed word, we analyzed looking towards the whole display. There were no significant differences in the looking behavior to the whole display when children heard a correctly stressed compared to an incorrectly stressed word, neither for infants (Wilcoxon signed rank test, V = 985, and p = 0.184) nor for toddlers (V = 260, p = 0.202.). Furthermore, to rule out that children rely on syllable stress for speech segmentation, we inserted a pause of approximately 200 milliseconds between the offset of the sentence and the onset of the target word. Infants weigh pauses as heavy segmental cues (Seidl & Johnson, 2006). Thus, the pause may have prevented the participants from being distracted by the unstressed syllables at the beginning of the word (in the case of the incorrectly stressed words) during speech segmentation. Nonetheless, future research needs to investigate the extent to which delayed target fixation times for incorrectly stressed words reflect mismatch effects with the typical prosodic template in a given language or segmentation difficulties, at least in infants.
In sum, we conclude that word representations in early childhood code for syllable stress, and that German-learning infants and toddlers consider syllable stress information immediately while processing familiar words. However, it should not be overlooked that our study showed a limited reproducibility of word comprehension in infants, which leaves open alternative interpretations of our results, especially for this age group.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/languages10080197/s1, Table S1: Mixed model (PI ~ stress + age_in_months + (1|nb)) for infants; Table S2: Mixed model (PI ~ stress + age_in_months + (1|nb)) for toddlers.

Author Contributions

Conceptualization, U.S. and C.K.F.; methodology, U.S. and C.K.F.; software, U.S.; validation, U.S. and C.K.F.; formal analysis, U.S.; investigation, U.S.; resources, U.S. and C.K.F.; data curation, U.S.; writing—original draft preparation, U.S. and C.K.F.; writing—review and editing, U.S. and C.K.F.; visualization, U.S.; supervision, U.S. and C.K.F.; project administration, U.S.; funding acquisition, U.S. and C.K.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee for Psychological Research at the Faculty of Science at the University of Tübingen (Friedrich_2018_1025_139, Bauch_2021_0726_234, 14 November 2018).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Pre-process data and R scripts are available on the Open Science Framework (OSF): https://osf.io/azjqe/?view_only=f315db1785e3447a8260c612a219701a (accessed on 1 August 2025).

Acknowledgments

We are thankful to all participating families and children.

Conflicts of Interest

The authors declare no conflicts of interest.

Notes

1
Note, however, that when the mixed effect models were run separately for both groups (infants and toddlers), only toddlers (see Table S2, Supplementary Materials) but not infants (see Table S1, Supplementary Materials) showed a significant stress effect. This could mean that the effect of stress in the combined analysis was carried only by the toddlers.
2
However, when both groups are analyszed separatley, only toddlers, but not infants, showed a stress effect in the mixed effect model, which then fits to the stress effect on the intercept of the GCA in toddlers but not in infants.

References

  1. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. [Google Scholar] [CrossRef]
  2. Becker, A., Schild, U., & Friedrich, C. K. (2018). Tracking independence and merging of prosodic and phonemic processing across infancy. Developmental Science, 21, e12525. [Google Scholar] [CrossRef]
  3. Beech, C., & Swingley, D. (2023). Very young infants’ sensitivity to consonant mispronunciations in word recognition. Cogsci, 45, 792–798. [Google Scholar]
  4. Bergelson, E. (2020). The comprehension boost in early word learning: Older infants are better learners. Child Development Perspectives, 14(3), 142–149. [Google Scholar] [CrossRef]
  5. Bergelson, E., & Aslin, R. (2017a). Nature and origins of the lexicon in 6-mo-olds. Proceedings of the National Academy of Sciences of the United States of America, 114(49), 12916–12921. [Google Scholar] [CrossRef]
  6. Bergelson, E., & Aslin, R. (2017b). Semantic specificity in one-year-olds’ word comprehension. Language Learning and Development, 13(4), 481–501. [Google Scholar] [CrossRef]
  7. Bergelson, E., & Swingley, D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences of the United States of America, 109(9), 3253–3258. [Google Scholar] [CrossRef] [PubMed]
  8. Bergelson, E., & Swingley, D. (2015). Early word comprehension in infants: Replication and extension. Language Learning and Development, 11(4), 369–380. [Google Scholar] [CrossRef] [PubMed]
  9. Bergelson, E., & Swingley, D. (2018). Young infants’ word comprehension given an unfamiliar talker or altered pronunciations. Child Development, 89(5), 1567–1576. [Google Scholar] [CrossRef]
  10. Byers-Heinlein, K. (2015). Methods for studying infant bilingualism. In Cambridge University Press eBooks (pp. 133–154). Cambridge University Press. [Google Scholar] [CrossRef]
  11. Campbell, J., Graham, S., & Curtin, S. (2019). Word level stress and lexical processing in 17-month-old infants. Infancy, 24(1), 5–23. [Google Scholar] [CrossRef]
  12. Connell, K., Hüls, S., Martínez-García, M. T., Qin, Z., Shin, S., Yan, H., & Tremblay, A. (2018). English learners’ use of segmental and suprasegmental cues to stress in lexical access: An eye-tracking study. Language Learning, 68(3), 635–668. [Google Scholar] [CrossRef]
  13. Curtin, S. (2009). Twelve-month-olds learn novel word-object pairings differing only in stress pattern. Journal of Child Language, 36(5), 1157–1165. [Google Scholar] [CrossRef]
  14. Curtin, S. (2010). Young infants encode lexical stress in newly encountered words. Journal of Experimental Child Psychology, 105(4), 376–385. [Google Scholar] [CrossRef]
  15. Curtin, S. (2011). Do newly formed word representations encode non-criterial information? Journal of Child Language, 38(4), 904–917. [Google Scholar] [CrossRef] [PubMed]
  16. Cutler, A., & Carter, D. M. (1987). The predominance of strong initial syllables in the English vocabulary. Computer Speech & Language, 2(3–4), 133–142. [Google Scholar] [CrossRef]
  17. Cutler, A., & Jesse, A. (2021). Word stress in speech perception. In J. S. Pardo, L. C. Nygaard, R. E. Remez, & D. B. Pisoni (Eds.), The handbook of speech perception (pp. 239–265). Wiley-Blackwell. [Google Scholar]
  18. Cutler, A., & van Donselaar, W. (2001). Voornaam is not (really) a homophone: Lexical prosody and lexical access in Dutch. Language and Speech, 44(2), 171–195. [Google Scholar] [CrossRef] [PubMed]
  19. Domahs, U., Wiese, R., Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2008). The processing of German word stress: Evidence for the prosodic hierarchy. Phonology, 25(1), 1–36. [Google Scholar] [CrossRef]
  20. Fernald, A., Swingley, D., & Pinto, J. P. (2001). When half a Word is enough: Infants can recognize spoken words using partial phonetic information. Child Development, 72(4), 1003–1015. [Google Scholar] [CrossRef] [PubMed]
  21. Fernald, A., Zangl, R., Portillo, A. L., & Marchman, V. A. (2008). Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. In I. A. Sekerina, E. M. Fernández, & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in children’s language processing (pp. 97–135). John Benjamins. [Google Scholar]
  22. Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2017). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language, 44(3), 677–694. [Google Scholar] [CrossRef]
  23. Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2021). Measurement Properties of the CDI. In M. C. Frank, M. Braginsky, D. Yurovsky, & V. A. Marchman (Eds.), Variability and consistency in early language learning: The Wordbank project (pp. 45–64). The MIT Press. [Google Scholar] [CrossRef]
  24. Friederici, A. D., Friedrich, M., & Christophe, A. (2007). Brain responses in 4-month-old infants are already language specific. Current Biology, 17(14), 1208–1211. [Google Scholar] [CrossRef]
  25. Friedrich, C. K., Kotz, S. A., Friederici, A. D., & Alter, K. (2004). Pitch modulates lexical identification in spoken word recognition: ERP and behavioral evidence. Cognitive Brain Research, 20(2), 300–308. [Google Scholar] [CrossRef]
  26. Frota, S., Butler, J., Uysal, E., Severino, C., & Vigário, M. (2020). European Portuguese-learning infants look longer at iambic stress: New data on language specificity in early stress perception. Frontiers in Psychology, 11, 1890. [Google Scholar] [CrossRef]
  27. Golinkoff, R. M., Ma, W., Song, L., & Hirsh-Pasek, K. (2013). Twenty-five years using the intermodal preferential looking paradigm to study language acquisition: What have we learned? Perspectives on Psychological Science, 8(3), 316–339. [Google Scholar] [CrossRef] [PubMed]
  28. Grimm, H., & Doil, H. (2006). ELFRA. Elternfragebogen für die Früherkennung von Risikokindern. Hogrefe. [Google Scholar]
  29. Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., & Cutler, A. (2000). Cross-language word segmentation by 9-month-olds. Psychonomic Bulletin & Review, 7(3), 504–509. [Google Scholar] [CrossRef]
  30. Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). Language specific prosodic preferences during the first half year of life: Evidence from German and French infants. Infant Behavior and Development, 32(3), 262–274. [Google Scholar] [CrossRef] [PubMed]
  31. Höhle, B., Giesecke, D., & Jusczyk, P. W. (2001). Word segmentation in a foreign language: Further evidence for crosslinguistic strategies. Journal of the Acoustical Society of America, 110(5), 2687–2687. [Google Scholar] [CrossRef]
  32. Höhle, B., Pauen, S., Hesse, V., & Weissenborn, J. (2014). Discrimination of rhytmic pattern at 4 months and language performance at 5 years: A longitudinal analysis of data from German-learning children. Language Learning, 64(s2), 141–164. [Google Scholar] [CrossRef]
  33. Höhle, B., van de Vijver, R., & Weissenborn, J. (2006). Word processing at 19 months and its relation to language performance at 30 months: A retrospective analysis of data from German learning children. Advances in Speech Language Pathology, 8(4), 356–363. [Google Scholar] [CrossRef]
  34. Jesse, A., Poellmann, K., & Kong, Y.-Y. (2017). English listeners use suprasegmental cues to lexical stress early during spoken-word recognition. Journal of Speech, Language, and Hearing Research, 60(1), 190–198. [Google Scholar] [CrossRef]
  35. Junge, C., Kooijman, V., Hagoort, P., & Cutler, A. (2012). Rapid recognition at 10 months as a predictor of language development. Developmental Science, 15(4), 463–473. [Google Scholar] [CrossRef]
  36. Jusczyk, P. W. (1999). How infants begin to extract words from speech. Trends in Cognitive Sciences, 3(9), 323–328. [Google Scholar] [CrossRef] [PubMed]
  37. Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39(3–4), 159–207. [Google Scholar] [CrossRef]
  38. Kartushina, N., & Mayor, J. (2019). Word knowledge in six-to nine-month-old Norwegian infants? Not without additional frequency cues. Royal Society Open Science, 6(9), 180711. [Google Scholar] [CrossRef]
  39. Kuijpers, C., Coolen, R., Houston, D., & Cutler, A. (1998). Using the head-turning technique to explore crosslinguistic performance differences. In C. Rovee-Collier, L. Lipsitt, & H. Hayne (Eds.), Advances in infancy research (pp. 205–220). Ablex. [Google Scholar]
  40. Marimon, M., Langus, A., & Höhle, B. (2024). Prosody outweighs statistics in 6-month-old German-learning infants’ speech segmentation. Infancy, 29(5), 750–770. [Google Scholar] [CrossRef] [PubMed]
  41. Mirman, D. (2014). Growth curve analysis and visualization using R (1st ed.). Chapman and Hall/CRC. [Google Scholar] [CrossRef]
  42. R Development Core Team. (2021). R: A language and environment for statistical computing. R foundation for statistical computing. Available online: https://www.R-project.org/ (accessed on 1 August 2025).
  43. Reinisch, E., Jesse, A., & McQueen, J. M. (2010). Early use of phonetic information in spoken word recognition: Lexical stress drives eye movements immediately. Quarterly Journal of Experimental Psychology, 63(4), 772–783. [Google Scholar] [CrossRef]
  44. Rocha, S., Ní Choisdealbha, Á., Attaheri, A., Mead, N., Olawole-Scott, H., Grey, C., Williams, I., Gibbon, S., Boutris, P., & Brusini, P. (2024). Language acquisition in the longitudinal Cambridge UK BabyRhythm cohort. Collabra: Psychology, 10(1), 92998. [Google Scholar] [CrossRef]
  45. Rosslund, A., Hagelund, S., Mayor, J., & Kartushina, N. (2023). Mothers’ and fathers’ infant-directed speech have similar acoustic properties, but these are not associated with direct or indirect measures of word comprehension in 8-month-old infants. Journal of Child Language, 51(6), 1424–1449. [Google Scholar] [CrossRef]
  46. Schild, U., Becker, A. B. C., & Friedrich, C. K. (2014). Phoneme-free prosodic representations are involved in pre-lexical and lexical neurobiological mechanisms underlying spoken word processing. Brain and Language, 136, 31–43. [Google Scholar] [CrossRef]
  47. Seidl, A., & Johnson, E. K. (2006). Infant word segmentation revisited: Edge alignment facilitates target extraction. Developmental Science, 9(6), 565–573. [Google Scholar] [CrossRef]
  48. Slaughter, V., & Suddendorf, T. (2007). Participant loss due to “fussiness” in infant visual paradigms: A review of the last 20 years. Infant Behavior and Development, 30(3), 505–514. [Google Scholar] [CrossRef] [PubMed]
  49. Soto-Faraco, S., Sebastián-Gallés, N., & Cutler, A. (2001). Segmental and suprasegmental mismatch in lexical access. Journal of Memory and Language, 45(3), 412–432. [Google Scholar] [CrossRef]
  50. Stärk, K., Kidd, E., & Frost, R. L. (2022). Word segmentation cues in German child-directed speech: A corpus analysis. Language and Speech, 65(1), 3–27. [Google Scholar] [CrossRef]
  51. Steil, J. N., Friedrich, C. K., & Schild, U. (2021). No evidence of robust noun-referent associations in German-Learning 6- to 14-Month-Olds. Frontiers in Psychology, 12(4410), 718742. [Google Scholar] [CrossRef] [PubMed]
  52. Sulpizio, S., & McQueen, J. M. (2012). Italians use abstract knowledge about lexical stress during spoken-word recognition. Journal of Memory and Language, 66(1), 177–193. [Google Scholar] [CrossRef]
  53. Swingley, D. (2003). Phonetic detail in the developing lexicon. Language and Speech, 46(2–3), 265–294. [Google Scholar] [CrossRef]
  54. Swingley, D., & Aslin, R. N. (2000). Spoken word recognition and lexical representation in very young children. Cognition: International Journal of Cognitive Science, 76(2), 147–166. [Google Scholar] [CrossRef]
  55. Swingley, D., Pinto, J. P., & Fernald, A. (1999). Continuous processing in word recognition at 24 months. Cognition, 71(2), 73–108. [Google Scholar] [CrossRef]
  56. Syrnyk, C., & Meints, K. (2017). Bye-bye mummy—Word comprehension in 9-month-old infants. British Journal of Developmental Psychology, 35(2), 202–217. [Google Scholar] [CrossRef] [PubMed]
  57. Tincoff, R., & Jusczyk, P. W. (1999). Some beginnings of word comprehension in 6-month-olds. Psychological Science, 10(2), 172–175. [Google Scholar] [CrossRef]
  58. Tincoff, R., & Jusczyk, P. W. (2012). Six-month-olds comprehend words that refer to parts of the body. Infancy, 17(4), 432–444. [Google Scholar] [CrossRef]
  59. van Donselaar, W., Koster, M., & Cutler, A. (2005). Exploring the role of lexical stress in lexical recognition. Quarterly Journal of Experimental Psychology Section A, 58(2), 251–273. [Google Scholar] [CrossRef]
  60. Von Holzen, K., & Bergmann, C. (2021). The development of infants’ responses to mispronunciations: A meta-analysis. Developmental Psychology, 57(1), 1–18. [Google Scholar] [CrossRef] [PubMed]
  61. Zahner, K., Kutscheid, S., & Braun, B. (2019). Alignment of f0 peak in different pitch accent types affects perception of metrical stress. Journal of Phonetics, 74, 75–95. [Google Scholar] [CrossRef]
Figure 1. Distribution of age in months.
Figure 1. Distribution of age in months.
Languages 10 00197 g001
Figure 2. Example of the word “Auto” (Engl. car, upper panel, left) and “Puppe” (Engl. doll, upper panel, right) once initially stressed (above) and once initially unstressed (below), and (a) mean intensity and (b) pitch of initially stressed and initially unstressed words per syllable. The first solid vs. dashed vertical line indicates the mean length of the first syllable of the initially stressed words vs. the initially unstressed (i.e., incorrectly stressed) words, and the second solid vs. dashed vertical line indicates the mean length of the whole initially stressed words vs. initially unstressed word.
Figure 2. Example of the word “Auto” (Engl. car, upper panel, left) and “Puppe” (Engl. doll, upper panel, right) once initially stressed (above) and once initially unstressed (below), and (a) mean intensity and (b) pitch of initially stressed and initially unstressed words per syllable. The first solid vs. dashed vertical line indicates the mean length of the first syllable of the initially stressed words vs. the initially unstressed (i.e., incorrectly stressed) words, and the second solid vs. dashed vertical line indicates the mean length of the whole initially stressed words vs. initially unstressed word.
Languages 10 00197 g002
Figure 3. Trial scheme.
Figure 3. Trial scheme.
Languages 10 00197 g003
Figure 4. Time course data of (a) infants and (b) toddlers. Solid lines show the model fits; the points represent the data of each of the 50 ms time bins.
Figure 4. Time course data of (a) infants and (b) toddlers. Solid lines show the model fits; the points represent the data of each of the 50 ms time bins.
Languages 10 00197 g004
Figure 5. PI for word pairs for infants (unfilled dots) and toddlers (filled dots). Values above zero (horizontal line) indicate word comprehension. The word pairs brush–diaper, finger–hair, fork–soup, and rabbit–beetle belong to the same category (bath items, body parts, kitchen, and animal).
Figure 5. PI for word pairs for infants (unfilled dots) and toddlers (filled dots). Values above zero (horizontal line) indicate word comprehension. The word pairs brush–diaper, finger–hair, fork–soup, and rabbit–beetle belong to the same category (bath items, body parts, kitchen, and animal).
Languages 10 00197 g005
Table 1. Item difficulty of target words used in the experiment or proportion of children who understand the words according to their parents’ assessment.
Table 1. Item difficulty of target words used in the experiment or proportion of children who understand the words according to their parents’ assessment.
Yoked word pairs
in displays
Parental questionnaire
63 parents (6 questionnaires
were missing),
parents’ report of infants’ understanding of the particular word
Item difficulty for 12-month-olds’
receptive vocabulary (taken from ELFRA, Grimm & Doil, 2006)
Wordbank (English (American) Form: WG; Measure: Understands; 8-month-olds (Frank et al., 2017)Wordbank (English
(British)
Form: Oxford CDI; Measure: Understands; 12-month-olds (Frank et al., 2017)
Baby (baby)/Vogel (bird)0.51/0.2750.7/47.10.26/0.130.37/0.33
Flasche (bottle)/Mütze (hat)0.46/0.5535.0/32.90.37/0.090.41/0.33
Bürste (brush)/Windel (diaper/nappy)0.17/0.6315.7/-0.08/0.290.26/0.56
Auto (car)/Teller (plate)*/0.3861.4/-0.25/0.040.56/0
Käse (cheese)/Schlüssel (key)0.21/0.2910.0/26.40.06/0.090.26/0.15
Gurke (cucumber)/Hose (pants/trousers)0.41/0.37-/12.9-/0.07-/0
Tasse (cup)/Brille (glasses)0.32/0.3719.3/23.60.12/0.070.11/0.11
Becher (mug)/Traktor (tractor)0.41/0.24-/6.4-/--/-
Puppe (doll)/Katze (cat)0.25/0.4038.6/41.40.10/0.220.22/0.48
Finger (finger)/Haare (hair)0.48/*-/24.30.11/0.110.22/0.22
Blume (flower)/Nase (nose)0.30/*24.3/31.40.07/0.140.15/0.37
Gabel (fork)/Suppe (soup)0.27/0.027.9/0.70.06/-0/-
Hase (rabbit/bunny)/Käfer (beetle)0.27/0.148.6/-0.10/-0.19/-
Löffel (spoon)/Kissen (pillow)0.56/0.1137.1/-0.08/0.070.11/0
* Due to a copy-and-paste error, these three items were missing from the parental questionnaire. Note that the German sample in Wordbank only starts at 18 months. Therefore, to have comparably aged infants, we added item difficulties of 8-month-old English American and 12-month-old English British infants. A ‘-’ indicates that this word was not available in ELFRA or Wordbank.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Schild, U.; Friedrich, C.K. Very Young Children Learning German Notice the Incorrect Syllable Stress of Words. Languages 2025, 10, 197. https://doi.org/10.3390/languages10080197

AMA Style

Schild U, Friedrich CK. Very Young Children Learning German Notice the Incorrect Syllable Stress of Words. Languages. 2025; 10(8):197. https://doi.org/10.3390/languages10080197

Chicago/Turabian Style

Schild, Ulrike, and Claudia Katrin Friedrich. 2025. "Very Young Children Learning German Notice the Incorrect Syllable Stress of Words" Languages 10, no. 8: 197. https://doi.org/10.3390/languages10080197

APA Style

Schild, U., & Friedrich, C. K. (2025). Very Young Children Learning German Notice the Incorrect Syllable Stress of Words. Languages, 10(8), 197. https://doi.org/10.3390/languages10080197

Article Metrics

Back to TopTop