The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences

Chen, Aoju; van den Bergh, Huub

doi:10.3390/languages10090234

Open AccessArticle

The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences

by

Aoju Chen

^*

and

Huub van den Bergh

Institute for Language Sciences, Utrecht University, 3512 JK Utrecht, The Netherlands

^*

Author to whom correspondence should be addressed.

Languages 2025, 10(9), 234; https://doi.org/10.3390/languages10090234

Submission received: 19 December 2024 / Revised: 8 July 2025 / Accepted: 27 August 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Advances in the Acquisition of Prosody)

Download

Browse Figures

Versions Notes

Abstract

Central to the debate on the production–comprehension relationship in prosodic development is the acquisition of the focus-to-prosody mapping in West Germanic languages. Past research primarily examined the production–comprehension relationship in 4- to 5-year-old English and Dutch-speaking children and yielded evidence both for and against a production-precedes-comprehension asymmetry. Recent research shows a protracted developmental trajectory to adult-like use of the full range of prosodic means for focus marking in Dutch-speaking children, suggesting a comprehension-precedes-production asymmetry. Little is known about whether the production–comprehension relationship changes with age and differs between children. To elucidate the effect of age on the production–comprehension relationship and shed initial light on individual differences in this domain, we investigated production and comprehension of the focus-to-prosody mapping in SVO sentences by 71 Dutch-speaking children aged 4 to 8 years, using picture-based production and online comprehension tasks. Multilevel modelling showed that the children’s comprehension was predictive of their production in sentence-initial focus but not in sentence-final focus across ages. However, this predictive relationship between comprehension and production differed for different children depending on whether their comprehension was adult-like. In conclusion, we have found limited evidence that children’s comprehension of the focus-to-prosody mapping supports their use of prosody to mark focus in production. The stability of individual differences across development is similar to findings in other domains of language acquisition.

Keywords:

prosodic focus marking; language acquisition; production–comprehension asymmetry; individual differences; Dutch

1. Introduction

A long-standing view rooted in research on lexical and morpho-syntactic development is that children learn to understand a certain form-meaning mapping (e.g., a word and its meaning) before they can use the form to convey the intended meaning in production (Clark, 1993). Comprehension is therefore a prerequisite for production and should precede production in language development. In line with the comprehension-precedes-production view, the ultimate goal for a language-learning child may be to achieve coordination between these two domains, such that their production abilities will gradually align with comprehension abilities (Clark & Hecht, 1983).

However, cases of a production-precedes-comprehension asymmetry have also been reported (see Hendriks, 2014; Hendriks & Koster, 2010 for an overview). For example, English-learning 4- to 6-year-old children can use pronouns such as ‘him’ or ‘her’ correctly and do not confuse them with reflexives such as ‘himself’ or ‘herself’ in production. But they interpret the pronouns (e.g., ‘her’ in ‘This is Mama Bear; this is Goldilocks. Is Mama Bear washing her?’) as their reflexive counterparts in comprehension in the presence of two referents of the same gender (e.g., interpreting ‘her’ as Mama Bear) even at the age of 6 years (Chien & Wexler, 1990). Turkish-learning children can use verb suffixes in sentences in the past tense to distinguish first-hand experience of an event from acquired knowledge of an event through either hearsay or inference by the age of 3 years but are not able to correctly interpret the verb suffixes in comprehension at the age of 5 years (Ünal & Papafragou, 2016). Four main explanations have been proposed for the observed production-precedes-comprehension asymmetry: experimental task effects, limitations in children’s pragmatic ability to map form to function during comprehension, general cognitive constraints, and the possibility that the grammar itself yields different form-meaning mappings in production versus comprehension (Hendriks & Koster, 2010).

The relationship between comprehension and production of prosody has not been widely studied in developmental research. It appears to vary in different developmental stages in the temporal dimension. Studies on prosodic abilities in the first months of life suggest a head start in perception and processing due to prenatal exposure to prosodic properties of maternal speech and foetuses’ ability to learn from this input (e.g., see Nallet & Gervain, 2021 for a review). For example, newborns show language-specific grouping preferences for non-linguistic sounds (e.g., pure tones) varied in pitch, duration or intensity (Abboub et al., 2016). They can use prosodic information to segment speech in their native language (Fló et al., 2019). They also exhibit heightened sensitivity to prosodic variations associated with certain emotions specific to their native language (Mastropieri & Turkewitz, 1999; Zhang et al., 2019). In contrast, studies on cry and non-cry vocalisations of infants aged 0 to 3 months do not yield consistent evidence for language-specific patterns in the use of pitch (van Niekerk et al. to appear). Infants use non-cry vocalisations differing in prosody to express negative, positive and neutral affect in the first three months of life (Jhang & Oller, 2017) but it is not clear whether they show evidence of influence from their maternal language. However, in the acquisition of sentence-level prosody (e.g., intonation, prosodic phrasing) among older children, production appears to precede comprehension (e.g., Hendriks, 2014; Müller et al., 2006). Children are capable of producing sentence-level prosodic patterns resembling those of adults and can employ certain prosodic features for communication by the age of 2 or 3 years. But they do not yet process prosodic cues during language comprehension as efficiently as adults do by the ages of 5 and 6 years and fail to interpret some functions of pitch contours by the ages of 9 or 10 years (e.g., Cruttenden, 1985; Cutler & Swinney, 1987).

One particular aspect of sentence-level intonation relatively well studied in research on the production–comprehension asymmetries concerns the mapping between prosody and focus in West Germanic languages. Focus refers to the predication on a topic in a sentence and typically contains new or contrastive information to the hearer (Lambrecht, 1994; Vallduví & Engdahl, 1996). In many languages, there is a strong association between the focal constituent and prosodic prominence and between the post-focal constituents and reduced or little prosodic prominence (e.g., Baumann & Kügler, 2015; Bolinger, 1978; Kügler & Calhoun, 2020; Xu, 1999; Xu & Xu, 2005). In West Germanic languages, prosodic prominence can be varied through both discrete means, such as placement of pitch accent (i.e., emphasising a word or not with distinct pitch movement associated with the word), and choice of accent type (e.g., using a falling pitch accent vs. a rising pitch accent to emphasise a word), and gradient means, i.e., phonetic realisation or implementation of a pitch accent (hereafter phonetic realisation) (e.g., realising a falling pitch accent with a larger or smaller pitch range) (Chen, 2018a; Gussenhoven, 2004). Listeners take the focus-to-prosody mapping into account in online language comprehension such that an appropriate focus-to-prosody mapping speeds up comprehension, compared to an inappropriate focus-to-prosody mapping (Birch & Clifton, 1995; Cutler et al., 1997). Relatedly, listeners use the broad mapping between information value (new vs. given) and prosody to predict whether the upcoming referent is a new referent or a previously mentioned referent in online reference resolution before the segmental information completely unfolds (e.g., Chen et al., 2007; Dahan et al., 2002; Ito & Speer, 2008; Watson et al., 2008; Weber et al., 2006). Adult-like competence in prosodic focus marking thus entails that children can both vary prosodic prominence to encode focus in production and exploit the focus-to-prosody mapping in comprehension.

Although the relevance of comprehending the focus-to-prosody mapping for learning to produce this mapping has not yet received much attention in theories of prosodic development, it is widely believed that children acquiring a West Germanic language can use prosodic prominence to mark focus in production but fail to interpret or efficiently use the focus-to-prosody mapping in comprehension at the age of 4 or 5 years (Hendriks, 2014; Müller et al., 2006; Szendröi, 2004). Two accounts have been proposed to explain this production-precedes-comprehension asymmetry. Specifically, Cutler and Swinney (1987) argued that 4 to 5-year-olds’ success in production does not stem from their linguistic knowledge of prosodic focus marking but from a universal ‘physiological reflex’ (Bolinger, 1983). According to the reflex, speakers increase pitch in response to excitement when expressing something new or important (Bolinger, 1983). Thus, children have not acquired the focus-to-prosody mapping in either production nor comprehension at the age of 4 or 5 years and there is no production–comprehension asymmetry. However, recent cross-linguistic studies of children’s prosodic focus-marking in production cast considerable doubt on the validity of the ‘physiological reflex’ account. By age 4 or 5, children from different languages already use prosody to mark focus in language-specific ways, albeit not fully adult-like (see Chen, 2018a for a review). Their use of prosody is influenced by factors such as the relative importance of the prosodic cues, the acoustic salience of the prosodic cues, and the transparency of the form-meaning relationship between a prosodic cue and focus in the target language (Chen, 2018a). This finding indicates that 4- to 5-year-olds’ use of prosody reflects language-specific knowledge of prosodic focus marking, contradicting the prediction of cross-linguistic similarity associated with the ‘physiological reflex’ account. Unlike Cutler and Swinney (1987), Chen (2010) and Szendrői et al. (2017) attributed the production-precedes-comprehension asymmetry to methodological limitations in past comprehension studies. First, these studies did not directly examine children’s comprehension of the focus-to-prosody mapping but were concerned with children’s ability to use prosodic prominence for purposes other than marking the focal word. Furthermore, the test materials used in these studies were usually syntactically and semantically more complex (e.g., SV + direct Object + indirect Object sentences with the particle ‘only’ or compound sentences containing ambiguous pronouns) than the materials used in the production studies (e.g., SVO or SV sentences), making the comprehension tasks cognitively more demanding. The apparent differences in syntactic complexity between the materials used in comprehension studies and those used in production studies have also been suggested as an explanation for the production-precedes-comprehension asymmetry for other areas of language development (Hendriks & Koster, 2010). In addition, some judgement tasks used in the comprehension studies (e.g., explicit truth-value judgement tasks) were cognitively demanding, such that children might not be able to fully apply their knowledge of prosodic focus marking (Szendrői et al., 2017).

Circumventing the methodological limitations in past work, Chen (2010) examined 4- to 5-year-olds’ and adults’ production and comprehension of the focus-to-prosody mapping in Dutch SVO sentences with focus on either the subject noun or the object noun. Production was elicited via a controlled but interactive picture-based game in which children responded to the experimenter’s wh-questions with SVO sentences. Comprehension of SVO sentences produced as responses to wh-questions was tested via an adapted version of Birch and Clifton’s (1995) reaction time paradigm. Children showed similar patterns to adults in two ways: they used accentuation to mark focus in production and they responded faster in comprehension when the focus-to-prosody mapping was appropriate than when it was not. The differences between children and adults were gradual. That is, children used deaccentuation in post-focus constituents slightly less frequently and had generally slower reaction times than adults. These findings indicate that Dutch-learning children exhibit linguistic knowledge of the focus-to-prosody mapping in both comprehension and production by the age of 4 or 5, showing no production–comprehension asymmetry in the acquisition of this mapping. Similarly, Szendrői et al. (2017) argued that production did not precede comprehension in the case of English-, German- and French-speaking 3- to 6-year-olds, based on evidence for adult-like comprehension of contrastive focus in an offline picture-based comprehension task. In this task, children gave a corrective response to the experimenter’s description of a picture, with prosodic prominence on either the subject noun or the object noun (e.g., The BIRDY has the bottle, right? vs. The birdy has the BOTTLE, right). Correcting the prosodically prominent noun in the response was interpreted to mean adult-like comprehension.

However, Chen’s (2010) analysis of the production data focused solely on the placement of accentuation. Yet, prosodic focus marking involves more than just where accentuation occurs; it also requires selecting the appropriate accent type. In addition, when the choice of accent type alone is insufficient to mark focus, variation in the phonetic realisation of the pitch accent can also be used (Chen, 2018a; Gussenhoven, 2004). In Dutch, the exact use of different prosodic means depends on the position of focus in a sentence. For example, in sentence-final position, focus and non-focus are typically distinguished via presence vs. absence of accentuation and the most commonly used pitch accent type is a falling pitch accent (H*L) (Chen, 2011a; Hanssen et al., 2008; Romøren, 2016). In sentence-initial position, speakers usually accent the word with the H*L accent independent of focus conditions. But they vary the phonetic realisation of the pitch accent in peak alignment, pitch scaling and duration to mark focus: aligning the maximal pitch earlier in the word, lowering the minimal pitch of the pitch accent, and lengthening the word in the focus condition (Chen, 2009).1 More detailed analysis on Dutch-speaking children’s use of prosody has shown that 4- to 5-year-olds are adult-like in choice of accent type in sentence-initial position but not in sentence-medial and final positions in SVO sentences (Chen, 2011a; Romøren, 2016), and they cannot use phonetic realisation for focus-marking purposes (Chen, 2009). In the stimuli used in Chen’s (2010) and Szendrői et al.’s (2017) comprehension experiments, focus and non-focus were distinguished by means of accent placement, choice of accent type and phonetic realisation (in the case of Dutch, English and German), as found in adults’ natural speech. This suggests that the children’s response might reflect their comprehension of the mapping between focus and the entire set of prosodic cues. Given that they showed adult-like differences in the reaction times between the appropriate focus-to-prosody mapping and the inappropriate focus-to-prosody mapping and adult-like offline comprehension of contrastive focus, the more in-depth prosodic analysis on children’s use of prosody in focus marking in production suggests that a typical comprehension-precedes-production asymmetry may in fact be present in 4- or 5-year-olds’ prosodic focus marking abilities.

Furthermore, past work on the relationship between production and comprehension in prosodic focus marking has primarily examined children aged 4 to 5 years. More recent studies of children’s prosodic focus-marking in production reveal continued development beyond the age of 5 across languages (e.g., Chen, 2018a; Destruel et al., 2024; Yang et al., 2024). For example, at the age of 7 or 8 Dutch-speaking children reach adult-like performance in their choice of pitch accent type in both sentence initial and final positions but only in the pitch-scaling dimension of phonetic realisation in sentence-initial position (Chen, 2009, 2011a), consistent with the proposal that discrete prosodic means such as choice of accent type are easier to learn than gradient means due to the acoustically more salient nature of the former and more demand on precise articulatory control in producing the latter (Chen, 2018b). By contrast, few studies have investigated children’s comprehension of the focus-to-prosody mapping beyond the age of 5, apart from work on contrastive focus (e.g., Ito et al., 2014; Szendrői et al. 2017). This gap in the literature raises the question of whether the relationship between production and comprehension undergoes age-related changes across middle childhood.

Moreover, existing literature on the production–comprehension relationship in prosodic development is based on studies examining production and comprehension at the group level. Little is known on whether the relationship between production and comprehension is the same for different children. Research on individual differences in first language acquisition is mostly concerned with early speech perception, vocabulary, and grammar development (see Kidd et al., 2018; Kidd & Donnelly, 2020 for a review). It has been argued that individual differences are ‘large and notably stable across development … are also observed early and across all domains’ in first language acquisition (Kidd et al., 2018, p. 158). Wells et al. (2004) found that individual differences could occur at both younger and older ages in different aspects of prosodic development in their study with 120 English-speaking 7- to 13-year-olds performing multiple production and comprehension tasks, consistent with Kidd et al.’s (2018) claim. In contrast, production data from a sample of 22 4- to 5-year-old Dutch-speaking children and 18 7- to 8-year-olds showed that there were noticeable individual differences in prosodic focus marking in the younger children but not in the older children (Chen, 2011a). It remains to be investigated how individual differences manifest in the relationship between production and comprehension in prosodic focus marking across different ages in childhood.

In the present study, we aimed to gain a clearer understanding of the relationship between production and comprehension in the acquisition of prosodic focus marking. To this end, we examined Dutch-speaking children’s prosodic focus marking in SVO sentences across a broad age range (i.e., 4 to 8 years) in a production experiment and a comprehension experiment. Specifically, we addressed two questions: (1) How does the relationship between production and comprehension develop across different ages? (2) To what extent do individual differences shape the relationship between production and comprehension?

Building on prior findings that Dutch-speaking children’s prosodic focus marking in production continues to develop beyond age 5, with adult-like performance emerging earlier in sentence-final than in sentence-initial position, and their comprehension appears adult-like by age four or five across sentence positions, we hypothesised that the relationship between production and comprehension will vary as a function of age (Hypothesis 1). Consistent with the view that comprehension precedes and supports production in language development, we predicted that children’s comprehension will be predictive of their production across all ages for sentence-initial focus, where production is not adultlike by the age of 8, but only at younger ages for sentence-final focus, where adultlike production competence appears at 7 or 8. A ‘predictive’ relationship is present if variation in children’s comprehension ability can explain variation in their production ability in a statistically meaningful way. The presence of a predictive relationship does not directly establish whether comprehension temporally precedes production, nor whether comprehension is a necessary enabling condition for development in production. But it can serve as first evidence that the relationship between comprehension and production is in line with the comprehension-precedes-production asymmetry.

Drawing on research into vocabulary and grammatical development as well as findings reported by Wells et al. (2004), we hypothesised that individual differences in the production–comprehension relationship will be present to a similar degree at all ages (Hypothesis 2). Accordingly, we predicted no effect of age on how well variation in comprehension can explain variation in production for individual children.

If our hypotheses are supported, the findings will provide the first indirect evidence for a role of comprehension in learning to use prosody for focus-marking. They will also suggest a similarity in the nature of individual differences underlying the development of prosodic and non-prosodic skills.

2. Materials and Methods

Production data was obtained from children and an adult control group in a natural and interactive setting using the same picture-matching game as in recent studies on children acquiring Mandarin Chinese (e.g., Yang & Chen, 2018), Korean (e.g., Yang et al., 2024) and Swedish (e.g., Romøren & Chen, 2022). The sentences produced by children and adults were subsequently evaluated via perceptual rating, in which trained adult listeners rated the appropriateness of the prosody of each sentence in the corresponding context on a five-point scale. The scores for prosodic appropriateness can both reflect the integrated use of prosody, instead of the use of a specific prosodic parameter (e.g., pitch) or a specific prosodic strategy (e.g., accent placement), and capture the perceptual relevance of production details. This makes perceptual rating an effective alternative for assessing children’s prosodic focus marking in production via manual annotation and acoustic analysis. Comprehension of the focus-to-prosody mapping was examined using Chen’s (2010) picture-based reaction-time method, which was designed to implicitly measure children’s comprehension of the focus-to-prosody mapping in real-time sentence processing. The reaction time technique has been increasingly used in studies of sentence processing in children aged 4 and above and has proven suitable for assessing children’s speech and language comprehension, as it minimises cognitive load and avoids reliance on explicit metalinguistic knowledge (Clahsen, 2008).

Children from different age groups were tested for both production and comprehension at two test moments, spaced about eight months apart. Our design was thus both cross-sectional and longitudinal. Given the comparability of the speech materials in the production and comprehension experiments, we conducted the production experiment first and then the comprehension experiment at each testing moment to avoid potential influence of how prosody was used in the comprehension experiment on children’s use of prosody in the production experiment. Dedicated Windows-based laptops were used to conduct the comprehension experiment and the perceptual rating on the sentences elicited in the production experiment. The experiment laptops were not connected to the internet during testing or rating sessions to avoid external interference.

Three female student assistants, who were native speakers of Dutch and students of linguistics or literature, conducted the experiments with the children individually at their schools during school time and with the adults in a sound-attenuated booth in the Institute for Language Sciences Labs at Utrecht University. To ensure consistency of the highest degree between the experimenters, we trained the experimenters on both the procedure and the use of prosody using an experiment protocol before they started testing participants. The core of the training involved a series of dry-run role-play sessions over three to four weeks, during which the experimenters simulated interactions with participants of different ages and the experimental protocol was finetuned for various scenarios.

To help the children to feel at ease with the experimenters, every experimenter picked up the child that she was to test from the classroom and had a chat with the child prior to the experiment. The testing schedule was made in such a way that every child did both the production and comprehension experiments with the same experimenter at one or both testing moments.

2.1. Participants

Seventy-one Dutch-speaking children from four age groups participated in this study. The children were on average 4;8, 5;6, 6;6 and 7;5 in their respective age group at the start of the study. Detailed information on age and gender can be found in Table 1, in which the four groups of children are labelled as a4, a5, a6, and a7 according to their age group at the start of the study. The children were from monolingual Dutch-speaking families and were recruited from four primary schools in the Netherlands. They had no hearing loss and speech or language disabilities according to parents’ reports. Written informed consent was received from the parents for the children to participate in this study and for their experimental sessions to be recorded and filmed.

An additional fifty-six children participated in the study but their data were not included in the current analysis. The data of eight of these children were lost due to loss of equipment. The other children either did not participate in both experiments at both testing moments, or did not have production ratings because their production in the focus conditions of interest was not evaluated for reasons like poor articulation, stuttering, self-repair, and being accompanied by unexpected noise in their surroundings.

Twenty-three adult native speakers of Dutch (mean age 21;5, range: 18;8–28;8, 10 men, 13 women) participated in the production experiment as the control group to provide a measure of adult-like production as established via perceptual rating. They were university students at the time of testing and were tested on the same tasks following the same procedure as the children. Prior to the experiment, they were informed that the tasks were of a simple nature as they were also used on child participants. They did not take part in the comprehension experiment because comprehension of Dutch-speaking adults with a similar background was assessed and reported in Chen (2010).

2.2. The Production Experiment

Following the question–answer paradigm (Roberts, 1996), fifteen question–answer dialogues were embedded in the picture-matching game to elicit fifteen SVO sentences in five focus conditions (three sentences per condition): narrow focus in sentence-initial position (initial focus), responding to who-questions, and narrow focus in sentence-final position (final focus), responding to what-questions, narrow focus in sentence-medial position, responding to what-does-X-do-with-Y questions, contrastive focus in sentence-medial position, correcting the experimenter’s statement about the action, and broad focus over the whole sentence, responding to what-happens questions. Only the first two focus conditions were relevant to the current study; the other three were included for a separate study on individual differences in focus marking in production (Chen & van den Bergh, in progress). The target SVO sentences were unique combinations of five subject-nouns (baker, ‘baker’, hond, ‘dog’, leeuw, ‘lion’, meisje, ‘girl’, poes, ‘cat’), three verbs (tekenen, ‘draw’, koken ‘cook’, toveren ‘conjure’), and three object-nouns (lepel, ‘spoon’, laars ‘boot’, wortel ‘carrot’) such that each subject noun, verb and object noun occurred once in a certain focus condition. All words were highly familiar to Dutch-speaking 4-year-olds.

To make sure that the participants would use the intended words in the picture-matching game, we asked each participant to complete a picture-naming task prior to the game during the same session. In the picture-naming task, the participants were familiarised with the characters and actions that appeared in the picture-matching game.

2.2.1. The Picture-Naming Task

In the picture-naming task, the participants first named the nouns illustrated in individual pictures. The experimenter showed one picture a time to the participant, and invited the participant to name it by saying Dit is een … ‘This is a …’. In the case of incorrect naming (e.g., calling a baker a cook), the experimenter first acknowledged that the entity might look like what the participant had in mind, and then drew the participant’s attention to distinctive features in the picture (e.g., the dough on the table) and suggested to the participant the intended label for the entity. Second, the experimenter showed the participant pictures of the actions occurring in the game, and explained to him which action each picture depicted. Finally, to check whether the participant has got all the labels right, the experimenter went through all the pictures and asked the participant to name them once more. If the participant did not have difficulty with naming the nouns the first time, he only named the actions for the second time. The picture-naming task lasted about 5–10 min.

2.2.2. The Picture Matching Game

In the picture-matching game, the child was supposed to help the experimenter to put pictures in matched pairs. The experimenter first explained to the participant how the game worked, and introduced two rules of the game: Never reveal his pictures to the experimenter; always say everything he sees happening on the picture in his response.2 The game consisted of fifteen trials, corresponding to the fifteen question–answer dialogues. Each trial was carried out in a fixed number of steps (Figure 1). First, the experimenter took a picture from her set of pictures (e.g., a picture of a girl drawing something on a piece of paper), drew the participant’s attention to the picture, and briefly described it by saying Kijk! Het meisje. Het lijkt alsof het meisje iets tekent ‘Look! The girl. It seems that the girl is drawing something.’ She then asked the participant a question about the picture (e.g., Wat tekent het meisje? ‘What is the girl drawing?’) or made a guess about the missing information in the case of contrastive focus (e.g., Ik denkt dat het meisje de wortel tekent. ‘I think the girl is drawing the carrot.’). The participant then took a picture from his own set of pictures, and visually identified the information requested by the experimenter. The experimenter repeated her question before the participant started to speak again. The participant then answered the question in an SVO sentence (e.g., Het meisje tekent de lepel. ‘The girl is drawing the spoon.’). The experimenter thanked the participant for his information, looked for the picture of the spoon in the box, and handed over both pictures to the participant for his approval.

The game proper was preceded by five practice trials. If the participant gave elided answers or full-sentence answers containing non-target words during the practice trials, the experimenter reminded the participant of the rules of the game or the intended words. This intervention turned out to be necessary only in the case of a small number of the 4- to 5-year-olds. Most children provided full-sentence answers using the intended words right from the start.

Each session lasted about 15–25 minutes, and was recorded using a Zoom H1 digital recorder (with a built-in microphone) at a sampling rate of 44.l kHz with a 16-bit resolution and filmed. The film recordings were used to check procedural consistency in how the game was carried out.

2.2.3. Data Annotation

The audio recording of each participant was first orthographically annotated using Praat (Boersma, 2001). Second, full-sentence responses were selected as usable responses if they were not plagued by any of the following factors: self-correction, use of pronouns, use of non-target words, detectable hesitation-induced silences, responding to a non-target question, elided responses, overlap with the experimenter’s speech, or poor recording quality. Third, the usable full-sentence responses and the corresponding questions or statements were selected and extracted as individual .wav files. These steps of data annotation were performed by a team of student assistants.

2.2.4. Perceptual Rating

The usable full-sentence responses and corresponding questions or statements were combined into context-response dialogues with a 300 ms interval between the question or statement and the response in each dialogue and a 1000 ms interval between dialogues. The three native speakers of Dutch who administered the production experiment served as the raters and rated each response in each dialogue on how well its prosody fitted in the context on a five-point Equal Appearing Interval scale, with 1 standing for ‘does not fit’ for and 5 standing for ‘fits perfectly’. Prior to the rating, the raters were given sound examples illustrating what the prosody typically sounded like in each focus condition and written instructions on how to do the rating. They did the rating at least ten months after the production experiment at the second testing moment was completed. These raters were familiar with child speech and the speech material to be evaluated, and were thus expected to be able to focus on the prosody in their rating, instead of being distracted by other features such as the speaking rate, voice quality and articulation.

To minimise variation in the scores due to comparisons between children or between children and adults, the dialogues were presented to the raters per speaker and per age group. The rating was conducted in seven 30–40 minute-sessions. The raters could listen to each dialogue maximally three times using a headphone set before finalising the score. As only initial focus and final focus were relevant to us here, the scores for these two focus conditions were included for further analysis.

2.3. The Comprehension Experiment

2.3.1. The Correct–Incorrect Answer Game

The comprehension experiment was presented to the participants as a ‘correct–incorrect answer’ game. In the game a boy looked through some pictures with his three pets, a parrot, a chicken, and a duck. The boy wanted to know whether his pets knew the pictures well and which of the pets knew the pictures best. To find this out, the boy showed one picture a time to one of his pets and asked the pet a question about the picture. The participant could follow the question–answer dialogues between the boy and the pets via a headphone set and see the boy, his pets and the pictures via the screen of the experiment laptop. The participant’s task was to judge whether the pets’ answers were correct or incorrect (‘goed’ or ‘fout’ in Dutch) by pressing the green response key for ‘correct’ and the red response key for ‘incorrect’ on a responding device.

Two focus conditions were embedded via wh-questions in the game, i.e., sentence-initial focus and sentence-final focus. The two focus conditions were combined with two prosody conditions (appropriate focus-to-prosody mapping vs. inappropriate focus-to-prosody mapping), forming four experimental conditions: initial focus-appropriate prosody, initial focus-inappropriate prosody, final focus-appropriate prosody, and final focus-inappropriate prosody, as illustrated in Figure 2.

Twenty-four question–answer dialogues were composed; they were syntactically comparable to the dialogues occurring in the picture-matching game. The answers in the experimental dialogues were all correct answers in terms of the segmental and lexical content. In addition, twenty question–answer dialogues were included as fillers. The answers in the fillers were all wrong answers, twelve of which contained a lexical error, e.g., eend ‘duck’ instead of kip ‘chicken’, and eight of which contained a pronunciation error, e.g., jaangen instead of jongen ‘boy’. The experimental dialogues and fillers were derived from question sentences recorded by a male native speaker of Dutch in child-directed speech (Burnham et al., 2002), and answer sentences by a female native speaker of Dutch in her usual manner of speaking.

A Latin square was used to distribute the 24 experimental dialogues, 12 fillers with lexically wrong answers and 8 fillers with wrongly pronounced answers over the four experimental conditions, resulting in four lists. Although each dialogue only occurred once in each list, each condition was realised in six experimental dialogues, three fillers with lexically wrong answers, and two fillers with wrongly pronounced answers. In total, every participant was presented with 44 dialogues. Two pseudo-randomised orders were created for each list, resulting in eight stimulus orders.

2.3.2. Procedure

The children did the comprehension experiment individually in a quiet room at their schools during school time. The experimenters who administered the production experiment also conducted this experiment by means of the Zep Experimental Control Application (hereafter Zep) (Veenker, 2013) on the experiment laptops. The exact list and stimulus order that a child got were randomly chosen by Zep. An approximately equal number of children were assigned to each stimulus order of each list. Each session lasted about 20–25 minutes starting with a practice session. In the practice session, the children were familiarised with the task and trained to properly use the response keys, either on a pushbutton box or the keyboard of the experiment laptops.3 Each session was filmed from behind the children and the experimenter to minimise camera interference. The recordings were used to verify consistency in procedural details.

The timeline of a trial was as follows: A target picture appeared on the screen, accompanied by an image of a boy and one of his pets on the screen. Simultaneously, the boy said Kijk ‘look’ as an attention getter. Eight-hundred milliseconds later, he named an entity in the picture (e.g., Een varken ‘A pig’). The 800 ms delay allowed the participants to take a proper look at the picture. Twelve-hundred ms after the naming, the boy asked a question about the picture (e.g., Wat wast de varken? ‘What is the pig washing?’). Two thousand two hundred ms after the end of the question, the pet provided an answer (Het varken wast een bloes. ‘The pig is washing a blouse.’). At the end of the answer, a high-precision timer (1 ms accuracy) was automatically activated. Simultaneously, a picture showing the response keys for correct and incorrect answers appeared on the screen to remind the children that they should respond by pressing a key.

Reaction times were automatically measured from the end of each answer sentence until a response key was pressed. The children’s correct–incorrect judgments were also automatically recorded. They were instructed to press the response key as quickly as possible, but not before the end of the answer sentence. A timeout was set at four seconds after the end of the sentence, after which responses would be registered as late responses and discarded for statistical analysis.

3. Results

Before presenting the analysis and results on the relationship between production and comprehension and individual differences in this relationship (Section 3.2), we first report the analysis and results on production and comprehension separately (Section 3.1). This serves to validate the findings from past studies examining the production and comprehension in different groups of Dutch-speaking children, and to provide an overview of the children’s production and comprehension abilities in the current study.

3.1. Production and Comprehension as Two Separate Skills

3.1.1. Production

We computed the production score for each test item of each child in each focus condition by averaging the scores from the three raters. The production scores were nested both within participants (i.e., measurements from the same child are more alike than those from different children) and within items (i.e., measurements from the same stimulus are more alike than those from different stimuli). We thus adopted a cross-classified multilevel modelling approach to analyse the production scores (Quené & van den Bergh, 2008; Goldstein, 2011; Snijders & Bosker, 1999). The multilevel modelling was conducted using the software MLwiN (version 2.32) (Rasbash et al., 2015).

Separate modelling was performed for the production scores in the initial focus condition and the production scores in the final focus condition. In both cases, the outcome variable was the production score each child received on individual test items at each test moment. The model consisted of two parts: a fixed part and a random part. The fixed part estimated the overall average production score (i.e., the intercept of the model) and the effect of child age (age_child) on the production scores (i.e., the slope of the model). To make age effects easier to interpret, we centred age around the average age of all participants at both test moments (i.e., 78 months, or 6 years and 6 months, abbreviated as 6;6), referred to as ‘mean age’, and scaled it in units of 10 months. Thus, the model’s intercept represented the average production score of a child at the mean age, and the slope showed the change in the production scores per 10 months of age.

The random part of the model captured differences in performance that could not be explained by the fixed effect age_child alone. It included four variance components: variance between children in scores at the mean age (S²_child), variance between children in change in scores with age (S²_{age_child}), variance between items in average scores assigned to them, (S²_item), and residual variance for the interaction between child and item (S²_error). The first two variance components quantified individual differences in production. The last variance component is usually interpreted as measurement error and will not be further discussed. The equation used for multilevel modelling can be found in Appendix A (Equation (A1)).

Note that the effect of the variable age_child was modelled both as a fixed effect (average trend) and as a random effect (individual variation in age-related change). This was necessary because the children belonged to different age groups and the children in the same age group were tested at different ages at different testing moments.

The results of the modelling are summarised in Appendix B and illustrated in Appendix C. In the initial focus condition (left panel of Appendix B), the estimated average production score was 3.018 at the mean age. It changed significantly with age (p = 0.003): an increase or decrease of 0.206 at every increase or decrease of 10 months in age relative to the score at the mean age. For example, the average production score was 2.6 at the age of 4;10 but 3.43 at the age of 8;2. Further, the production scores differed significantly between children at the mean age (S²_child = 0.258, p < 0.001). The change in the production scores with age also differed significantly between children (S²_{age_child} = 0.046, p < 0.005). Finally, the variance in the scores on different items was significant but was relatively small (S²_items = 0.025, p < 0.005).

In the final focus condition (right panel of Appendix B), the estimated average production score was 3.559 at the mean age. It did not change significantly with age (p = 0.247). Further, the production scores differed significantly between children at the mean age (S²_child = 0.200, p < 0.001). But the change in the production scores with age did not differ significantly between children (p = 0.399). Finally, the variance in the scores on different items was not significant (p = 0.399).

Compared to the production of the adult controls, the children’s average production score was significantly lower than that of the adults even at the oldest age (8;8) in the initial focus condition (3.636 in children vs. 3.957 in adults) (χ² = 14.24; df = 1; p < 0.001). The children’s average score would match that of the adults only at the age of 10;3, based on the mean score at the mean age and an increase of 0.206 at every increase of 10 months in age. In the final focus condition, the children’ average production score (3.557) was similar to that of the adults (3.586) throughout the age range under investigation (χ² = 2.13; df = 1; p = 0.144).

To sum up, the children’s production was perceived to be target-like notably later and exhibited more individual differences in sentence-initial focus than in sentence-final focus. The prognosis on when their production can be perceived to be target-like in sentence-initial focus echoes the findings stemming from prosodic analysis on their peers’ production in Dutch in previous studies (Chen, 2009, 2011a). Additionally, the earlier attainment of perceived target-like production in sentence-final focus suggests that the use of the accent types commonly employed by adults, even without making finer distinctions between them, is sufficient to be perceived as target-like.

3.1.2. Comprehension

As mentioned in Section 2.3, the answers (given by the pets) were segmentally and lexically correct on the experimental trials. Chen (2010) found that children judged the answers to be correct on most of the experimental trials and rejected the answers on some experimental trials for reasons unrelated to prosody, i.e., different interpretations of the depicted objects in the pictures or false perception of pronunciation. Since we were interested in the effect of prosody on comprehension, not other factors, we decided to only include reaction times on the experimental trials on which the answers were judged to be correct for further analysis. Furthermore, we excluded extremely short reaction times (<200 ms) on the selected experimental trial, as they fell well below the minimum typically required for sentence comprehension (e.g., Sereno & Rayner, 2003) and likely reflected accidental responses, anticipatory key presses, or lapses in attention. The final data set consisted of reaction times from 2262 trials (83% of all trials): 568 trials in the initial focus-appropriate prosody condition (85%), 556 trials in the initial focus-inappropriate prosody condition (85%), 560 trials in final focus-appropriate prosody condition (82%), and 578 trials in the final focus-inappropriate prosody condition (80%).

As the reaction times were nested both within participants and items, similar to the production scores, we used cross-classified multilevel modelling in MLwiN (version 2.32) (Rasbash et al., 2015) to analyse the comprehension data (Quené & van den Bergh, 2008; Goldstein, 2011; Snijders & Bosker, 1999). In the modelling, we applied a log-transformation to the reaction times using the natural logarithm. Generally, reaction times often show greater variability for longer responses (i.e., positively skewed). This means that the variance is not constant across the range of values, a violation of a key assumption in multilevel modelling. A log transformation compresses the scale of longer reaction times more than shorter ones, resulting in a reduced skew and more uniform variance (Baayen & Milin, 2010).

Separate modelling was performed for the trials in the initial focus-appropriate prosody condition, the initial focus-inappropriate prosody condition, the final focus-appropriate prosody condition, and the final focus-appropriate prosody condition. In all these cases, the outcome variable was the log-transformed reaction time (hereafter (ln) reaction time) each child had on individual test items at each test moment. Each model consisted of a fixed part and a random part, similar to the model for the production scores. The fixed part estimated the average (ln) reaction time at the mean age (i.e., the intercept of the model) and the effect of the factor age_child on the (ln) reaction times, i.e., the change in the (ln) reaction times per 10 months of age (i.e., the slope of the model). The random part of the model included four variance components: variance between children in the (ln) reaction times at the mean age (S²_child), variance between children in change in the (ln) reaction times with age (S²_{age_child}), variance between items in average (ln) reaction times assigned to them, (S²_item), and residual variance for the interaction between child and item (S²_error). The first two variance components quantified individual differences in comprehension. The last variance component is usually interpreted as measurement error and will not be further discussed. The equation used for the modelling can be found in Appendix A (Equation (A1)).

To assess the difference in reaction times between the appropriate prosody condition and the inappropriate prosody condition, we compared the estimates of the intercept, the fixed factor age_child and the first three variance components in these conditions for the initial focus and final focus separately. The results of the modelling and the subsequent Chi-squared tests are summarised in Appendix D, in which the estimates differing significantly between the two prosody conditions are underscored.

In the initial focus condition, the average (ln) reaction time was significantly longer in the appropriate prosody condition (6.92) than in the inappropriate prosody condition at the mean age (χ² = 6.243; df = 1; p = 0.012). The average (ln) reaction time changed with age to similar degrees in the two prosody conditions (χ² = 1.611; df = 1; p = 0.204): it decreased/increased by 0.119 at every increase/decrease of 10 month in age relative to the mean age in the appropriate prosody condition and by 0.129 in the inappropriate prosody condition. The average (ln) reaction time thus remained longer in the appropriate prosody condition than in the inappropriate condition over the whole age range tested in our study (i.e., 4;2–8;8). Furthermore, the reaction times differed between children to a similar degree at the mean age in the two prosody conditions (χ² = 0.874; df = 1; p = 0.35). The change in the reaction times with age differed between children to a significantly larger degree in the appropriate prosody condition than in the inappropriate prosody (χ² = 64.489; df = 1; p < 0.001). Finally, the change in the reaction times on different items was not significant between the two prosody conditions.

In the final focus condition, the average (ln) reaction time was significantly shorter in the appropriate prosody condition (6.699) than in the inappropriate prosody-to-focus mapping condition (6.715) at the mean age (χ² = 4.308; df = 1; p = 0.038). The average (ln) reaction time changed with age to a significantly larger degree in the appropriate prosody condition than in the inappropriate prosody condition (χ² = 4.628; df = 1; p = 0.031): It decreased/increased by 0.152 at every increase/decrease of 10 month in age relative to the mean age in the appropriate prosody condition but by 0.109 in the inappropriate prosody condition. The average (ln) reaction time thus remained shorter in the appropriate prosody condition than in the inappropriate prosody condition in the whole age range but the difference in the average (ln) reaction times between the two prosody conditions became larger at older ages due to a faster decrease in the appropriate prosody condition than in the inappropriate prosody condition. Furthermore, the reaction times differed between children at the mean age to a larger degree in the appropriate prosody condition (0.071) than in the inappropriate prosody condition (0.065) (χ² = 23.02; df = 1; p < 0.001). The change in the reaction times with age differed between children to a significantly larger degree in the inappropriate prosody condition (0.019) than in the appropriate prosody (0.026) (χ² = 106.6; df = 1; p < 0.001). Finally, the change in the reaction times on different items was not significant between the two prosody conditions.

To sum up, these results show that at the group level, the children responded significantly faster in the appropriate prosody condition than in the inappropriate prosody condition in sentence-final focus in the whole age range tested in this study. The opposite pattern was found in sentence-initial focus. In other words, the children were adult-like in comprehension in sentence-final focus from 4;8 onwards but were not adult-like in sentence-initial focus even at the age of 8;8. Our study thus only replicates Chen’s (2010) finding in sentence-final focus based on raw reaction times and a smaller sample (N = 20, age range: 4;3–5;7, mean age: 5;1). Furthermore, the children differed significantly from each other in the (ln) reaction times at the mean age in the final focus condition and in the change in the (ln) reaction times with age in both focus conditions. They also exhibited larger differences in the (ln) reaction times between the two prosody conditions at an older age in the final focus condition.

3.1.3. Interim Discussion

Taken together, the statistical analyses on production and comprehension as two different skills show that the children were adult-like in both production and comprehension from 4;8 in sentence-final position but in neither production and comprehension in sentence-initial position even at the age of 8;8. The earlier acquisition of the use of prosody to mark focus in sentence-final position than in sentence-initial position corresponds to findings by Chen (2009, 2011a), conforming to Chen’s (2018b) proposal that discrete prosodic means are easier to learn than gradient means.4 If children have not acquired the gradient use of prosody to distinguish focus from non-focus in sentence-initial position, they should exhibit no difference in comprehension of the appropriate and inappropriate focus-to-prosody mappings. It is then very surprising that they comprehended the inappropriate mapping faster than the appropriate mapping in sentence-initial position. Half of the children in our study appeared to show this pattern, though the difference in (ln) reaction times might vary (more on this in Section 3.2). This pattern may not suggest a mere absence of knowledge of the focus-to-prosody mapping, but potentially a processing strategy in these children. Specifically, in sentence-initial focus, the stimuli were 218 s longer in the appropriate mapping condition than in the inappropriate mapping condition. Since the children did not necessarily need to hear the whole sentence to decide whether an answer was correct or incorrect in this focus condition, they might prepare to respond before the expected completion. A longer sentence duration (by about 200 ms) can make it harder for the listener to anticipate the end point and maintain peak readiness for responding, leading to delayed motor initiation for the response (Bögels et al., 2015; Obleser & Kotz, 2010). This might have led to a longer (ln) reaction time in the appropriate focus-to-prosody mapping condition in sentence-initial focus at least in some children, independent of their processing of the focus-to-prosody mapping. In sentence-final focus, the stimuli were 186 s shorter in the appropriate mapping condition than in the inappropriate mapping condition. Is it possible that the children responded more slowly in the inappropriate mapping condition for the same reason? We think it is highly unlikely because in sentence-final focus, the children only knew the answer to the question at the end of sentence and did not need to anticipate the end point and main peak readiness for responding. Future research using stimuli of the same duration is needed to separate the speculated effect of sentence duration on reaction time from the inability to process the focus-to-prosody mapping in sentence-initial focus, at least, for some children.

3.2. The Relationship Between Production and Comprehension

The analysis reported in Section 3.1 shows that at the group level the children were adult-like in neither production nor comprehension in the initial focus condition throughout the age range under investigation but were adult-like in both production and comprehension in the final focus condition from the youngest age onwards (i.e., 4;8). To examine the relationship between comprehension and production across age and to what extent individual differences shape this relationship, we used multilevel modelling rather than correlation analysis. While correlation statistics can reveal whether two variables are associated, they do not account for potential confounding factors or allow for hypothesis-driven modelling of fixed effects. In contrast, multilevel modelling allows us to test predictive relationships between comprehension and production, accounting for age as a continuous developmental factor, and incorporate random effects to handle the nested nature of the data.

To find out whether variation in children’s comprehension could explain variation in their production scores in each focus condition, we expanded the model for the production data by including a comprehension score as a second fixed factor in addition to the factor age_child (see Appendix A, Equation (A2)). We operationalised ‘comprehension score’ in each focus condition as the ratio between the average (ln) reaction time in the appropriate prosody condition (APP) and that in the inappropriate prosody condition (INAPP). Each child thus had two comprehension scores, one for each focus condition. If a child responded faster in the appropriate-prosody condition than in the inappropriate-prosody condition (as adults did in previous studies), he should have a comprehension score lower than 1; if a child responded slower in the appropriate-prosody condition than in the inappropriate-prosody condition, he should have a comprehension score higher than 1. Comprehension scores lower than 1 were thus supposed to indicate adult-like comprehension; comprehension scores higher than 1 non-adult-like comprehension, following findings on adult native speakers of English and Dutch in Birch and Clifton (1995) and Chen (2010), respectively. Relatedly, a lower comprehension score would indicate a closer-to-adult comprehension than a higher comprehension score.

We performed the modelling for children with categorically different comprehension abilities separately in order to examine individual differences in the relationship between production and comprehension. We recognised two sub-groups among the children on the basis of their comprehension scores, the ‘good’ comprehenders and the ‘poor’ comprehenders. The ‘good’ comprehenders were the children with a comprehension score smaller than 1 (N = 33 in initial focus, N = 36 in final focus); the ‘poor’ comprehenders were the children with a comprehension score equal to or larger than 1 (N = 32 in initial focus, N = 32 in final focus).

The results of our modelling are summarised in Appendix E. As can be seen, variation in the comprehension scores could explain substantial variation in the production scores only in the initial-focus condition. Notably, this worked in different ways for the two types of comprehenders (p < 0.001 in both cases). For the ‘good’ comprehenders, the lower the comprehension score was, the higher the production score was (p < 0.001). For the ‘poor’ comprehenders, the higher the comprehension score was, the higher the production score was (p < 0.05). Further, the factor age_child had a significant effect on the production score such that the production score became higher at an older age (p < 0.001 in the case of the ‘good’ comprehenders; p < 0.05 in the case of the ‘poor’ comprehenders). There was no difference in the effect of age_child in the two types of comprehenders (χ² = 1.44; df = 1; p = 0.230).

To sum up, in the initial-focus condition, the children’s comprehension scores were predictive of their production scores in different ways for different types of comprehenders. The patterns were consistent across age. However, in the final-focus condition, the comprehension scores were not predictive of the production scores across the age range tested.

4. Discussion

Based on the production and comprehension data obtained from the same group of children of a wide age range at two testing moments, we have addressed two questions on the production–comprehension relationship in the acquisition of prosodic focus marking in Dutch: (1) How does the relationship between production and comprehension develop across different ages? (2) To what extent do individual differences shape the relationship between production and comprehension? We hypothesised that the relationship between production and comprehension will vary as a function of age (Hypothesis 1) and that individual differences in the predicted production–comprehension relationship will exist to a similar degree at all ages (Hypothesis 2).

We have found that the children’s comprehension was predictive of their production in sentence-initial focus across ages, as predicted, but not in sentence-final focus across ages, contra our prediction. However, the predictive relationship between comprehension and production in sentence-initial focus differed for different children depending on whether they could process the focus-to-prosody mapping in online language comprehension like adults but independent of their age. If they could, more adult-like comprehension was related to better production. If they could not, less adult-like comprehension was related to better production. Taken together, the findings did not fully support Hypothesis 1 but confirmed Hypothesis 2 on the stability of individual differences in the relationship between production and comprehension.

It is unexpected that comprehension was not predictive of production in sentence-final focus at any age under investigation. Considering that the children were adult-like in both comprehension and production in sentence-final focus from the age of 4;8 onwards in our data, we cannot rule out potential influence of comprehension on production in younger children. It is equally premature to dismiss the possibility that the comprehension ability as tested using the reaction time paradigm may not be crucial for developing adult-like use of prosody in production. That is, children may develop adult-like use of prosody in focus marking in production without processing the focus-to-prosody mapping in online language comprehension in the same way as found for adults (Birch & Clifton, 1995; Cutler et al., 1997; Chen, 2010). For example, children may figure out how to use prosody to mark focus through perception of the form-function mappings between prosodic variations and focus conditions (in different contexts) in others’ production. Specifically, children may initially operate on physiologically motivated mechanism in their use of prosody to highlight new information, e.g., Bolinger’s physiological reflex, and gradually grammaticalise the form-function mapping between prosodic prominence and focus by implementing prosodic prominence in a language-specific manner (Chen, 2020). The process of grammaticalisation may be driven by perception of the association between focus and various forms of prosodic prominence in the speech that children are exposed to. It follows that more salient form-function mappings are likely to be easier to perceive and are therefore acquired earlier than less salient form-function mappings. This has been shown to be indeed the case. For example, Dutch-speaking children’s use of accent placement and accent type to express narrow focus in final focus became adult-like earlier than their use of relatively subtle phonetic cues in initial focus (Chen, 2009, 2011b). Salience can also be in the nature of the focus condition. For example, English- and German-speaking 3- to 5-year-olds can use prosodic prominence in an adult-like way in contrastive focus before their Dutch-speaking peers can do so in (non-contrastive) narrow focus (see Chen, 2018a for a review).

The ability to associate prosodic prominence with focus at the perceptual level may not be the same as capitalising on the focus-to-prosody mapping to process sentence meaning in online language comprehension, as the latter entails that the listener has a representation of how the prosody of a sentence typically unfolds in each information structure context and can be impeded in the comprehension process if the prosody does not go as expected. It is possible that the ability to process the focus-to-prosody mapping in online language comprehension is the next developmental milestone for relatively less experienced language learners after having established the association between prosodic prominence and focus. This speculation ties in with early work showing that children aged between 4 and 10 seem to take little notice of prosody when following verbal instructions and performing various comprehensions tasks. For example, Lahey (1974) found that 4- and 5-year-old English-speaking children were not significantly worse at acting out coordinate sentences and sentences with relative clauses when the sentences were spoken with monotonous prosody than when they were spoken with proper prosody. Bates (1976) reported that English-speaking children’s imitation of sentences was disrupted by marked word order but not by pragmatically inappropriate accent placement. Morton and Trehub (2001) found that English-speaking children responded primarily to the lexical content of a sentence in perception of emotions such as happy and sad when prosodic cues conflicted with lexical cues at the age of 4, their reliance on the lexical content declined between the age of 5 and 10, and they became reliant on prosody only at the age of 10. We thus propose that children in the multiword-utterances stage initially rely on lexical and syntactical information in language comprehension and gradually establish a network of representations of contextually appropriate and statistically more probable form-function mappings between prosodic cues and meanings. In this process, they begin with responding to prosodic information when it appears in the associated context. Only at a more advanced stage, they take the contextually probable form-function mappings as the default and respond in a negative way to contextually improbable form-function mappings.

Also, in everyday communication, listeners can use different strategies in language comprehension and need not all heavily rely on prosodic cues. It is thus possible that not all children will develop the ability to respond to the focus-to-prosody mapping in online comprehension whereas they still use prosody in focus marking in production. It follows that not all adults with normal speaking and hearing may comprehend sentences faster when the prosody is contextually appropriate than when the prosody is contextually inappropriate. In a preliminary study on individual differences in online comprehension of prosodic focus marking, Lentz and Chen (2016) indeed found that 17 of the 32 adult participants in their study did not show the expected reaction time differences between appropriate prosody and inappropriate prosody.

Furthermore, the relationship between comprehension and production in sentence-initial focus is intriguing regarding the ‘poor’ comprehenders. The result showed that the less adult-like the comprehension was, the more adult-like the production was. This may suggest that the children’s production outperformed their comprehension, with the former being rated somewhat lower than adults’ production and the latter being completely unlike the typical adults’ response (Birch & Clifton, 1995; Cutler et al., 1997). It may lend further support to the possibility that the comprehension ability as tested in the reaction time paradigm may not be crucial to the development of adult-like use of prosody in production. Future research on children’s use of prosody to identify focus on perception tasks (e.g., choosing the appropriate response to a wh-question from renditions of the same sentence with narrow focus in different sentence positions) and how this is related to their production is needed to obtain a clearer understanding of which perception or comprehension ability can support the acquisition of prosodic focus marking in production.

Finally, this study has a number of limitations. First, individual differences in the production–comprehension relationship were only found in sentence-initial focus and remained the same across development, in line with the findings in other domains of language acquisition research (Kidd et al., 2018). However, our analysis on individual differences is limited to dividing children into two types of comprehenders based on the comprehension score. It remains a puzzle as to how the observed individual differences can be explained. Recent research on individual differences in adults’ perception of prosodic prominence has found that listeners with lower pragmatic skill (or more autistic traits) exhibit a weaker top-down effect of focus interpretation on their subjective prominence ratings in SVO sentences with either narrow focus on the object or broad focus on the verb phrase (Bishop et al., 2020). Lentz and Chen (2016) examined whether variation in perspective taking could explain why some adults were not good at both production and comprehension in focus marking in a small sample of adult native speakers of Dutch (N = 32). They found that the adult native speakers of Dutch who scored high on perspective taking tended to have a lower production score and a higher comprehension score, thus weak in both production and comprehension. It would thus seem that they had less desire to mark focus properly in their own production and less need for others to mark it properly in comprehension. In future research, it can be useful to study the effect of signal-extrinsic factors such as acoustic traits and perspective taking skill on the relationship between production and comprehension of the focus-to-prosody mapping in children.

The second limitation concerns the interpretation of the predictive relationship observed between comprehension and production. Although our multilevel modelling revealed that comprehension ability statistically predicts production ability across age in sentence-initial focus, this should not be taken as evidence of temporal or causal precedence. While the study incorporated both cross-sectional and longitudinal elements, the analyses do not allow us to determine whether comprehension developmentally enables or causes improvements in production. Rather, the findings are consistent with the theoretical view that comprehension supports production but do not confirm a necessary or unidirectional relationship. To more rigorously investigate developmental directionality and potential enabling effects, future research should include longer-term longitudinal designs with multiple follow-up points (e.g., over a span of two or more years), and consider using lagged modelling to assess how changes in comprehension may predict subsequent changes in production over time. Experimental approaches such as comprehension training studies could also help clarify whether gains in comprehension facilitate development in production, providing stronger evidence for a causal or conditional relationship.

In addition, while our findings in sentence-initial focus are consistent with the view that comprehension supports production developmentally, we note that this claim does not imply that comprehension is fully in place before production begins. Both skills may still be undergoing development, particularly at younger ages, and developmental “precedence” in this context should be interpreted as relative timing or ordering of acquisition of certain aspects of certain skills, rather than as an all-or-nothing staging.

5. Conclusions

To conclude, we have found limited evidence that children’s comprehension of the focus-to-prosody mapping may support their use of prosody to mark focus in production. This result raises the possibility that the ability to process the focus-to-prosody mapping in online language comprehension may not be essential for developing adult-like use of prosody in focus marking in production. Furthermore, we have observed stable individual differences in the relationship between production and comprehension in sentence-initial focus across age, similar to findings in other domains of language acquisition research. Future research on children’s perception of the focus-to-prosody mapping using tasks explicitly tapping into their use of this mapping in sentence interpretation and longitudinal studies with multiple follow-up points are needed for a more thorough understanding of the production–comprehension relationship in the acquisition of prosodic focus marking.

Author Contributions

Conceptualization, A.C.; methodology, A.C.; formal analysis, A.C., H.v.d.B.; investigation, A.C.; resources, H.v.d.B.; data curation, A.C.; writing—original draft preparation, A.C., H.v.d.B.; writing—review and editing, A.C.; visualisation, A.C., H.v.d.B.; supervision, A.C.; project administration, A.C.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a VIDI grant from the Dutch Research Council (grant number 276-89-001), awarded to A.C. The preparation of this manuscript was supported by a VICI grant from the same council (grant number VI.C.201.109), also awarded to A.C.

Institutional Review Board Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements at the time of testing.

Informed Consent Statement

Informed consent was obtained from all participants or their guardians involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Acknowledgments

We would like to express our special gratitude to the children and their parents for their full cooperation. We thank Sjef Pieters and Alex Manus for technical support and Joe Rodd for assistance with data preprocessing.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Equations used in the multilevel modelling on production, comprehension (Equation (A1)), and the relationship between production and comprehension (Equation (A2)):

\begin{matrix} Y_{(i j) t} = b_{0} + b_{1} * {M o n t h}_{(i j) t} + [u_{0 j} + u_{1 j} * {M o n t h}_{(i j) t} + v_{i 0} + e_{(i j) t}] \\ (i = 1, 2, \dots, I_{j}; j = 1, 2, \dots, J) . \end{matrix}

(A1)

\begin{matrix} Y_{(i j) t} = b_{0} + b_{1} * {M o n t h}_{(i j) t} + b_{2} * \frac{{A P P}_{i t}}{{I N A P P}_{i t}} + [u_{1 j} + u_{2 j} * {M o n t h}_{(i j) t} + v_{1 i} + e_{1 (i j) t}] \\ (i = 1, 2, \dots I_{j}; j = 1, 2, \dots, J) . \end{matrix}

(A2)

In both equations, the outcome variable Y_(ij)t stands for the observed ‘score’ on the ith item of the jth child. The observed ‘score’ referred to the production score in the model for the production data and the model for the relationship between production and comprehension, and the (ln) reaction time in the model for the comprehension data. The model had two parts: the fixed effects part and the random effects part (between square brackets). In the fixed effects part, the intercept (b₀), i.e., the overall average score, and the slope (b₁), i.e., the average change in scores with the age of the child (or age_child), were estimated. In Equation (A2), a second fixed effect was added, i.e., the comprehension score. In the random effects part, four residual terms were distinguished for variance related to four random factors (child, age, item, and the interaction between child and item: u_0j, u_1j, v_i₀ and e_(ij)t. The residual scores were assumed to be normally distributed with an expected score of 0.0 (E(u_0j) = E(u_1j) = E(v_i) = E(e_(ij)) = 0.0) and a variance of S²_u_0j, S²_u_1j, S²_vi, and S²_e(ij), respectively.

Appendix B

Parameter estimates for the production scores in the initial and final focus conditions.

	Initial focus			Final focus
	Estimate	(se)	p	Estimate	(se)	p
	Fixed Parameters
Intercept (b₀)	3.016	−0.108	<0.001	3.559	−0.067	<0.001
Age_child (b₁)	0.206	−0.065	0.003	0.045	−0.046	0.247
	Random Parameters
S²_child	0.258	−0.031	<0.001	0.200	−0.017	<0.001
S²_{age_child}	0.046	−0.015	0.004	0.000	0.000	0.399
S²_items	0.025	−0.008	0.004	0.000	0.000	0.399
Note: The factor age_child was centred around the mean age of all children (i.e., 6;6 or 78 months). A unit of change in age was defined as an increase or decrease of 10 months relative to the mean age.

Appendix C

Change in estimated production scores (y-axis) with age (x-axis) on average (M: dark line) and for individual children in the initial-focus (left) and final-focus (right) conditions. Each grey line represents a child at two testing moments, with the left end of the line indicating the production score at the first testing moment and the right end of the line indicating the production score at the second testing moment. The letter ‘A’ indicates the average production score of the adults, aligned with the point of 108 months for comparison reasons.

Appendix D

Parameter estimates for reaction times in the initial-focus condition and the final-focus condition.

Initial focus

Final focus

Approropriate prosody

Inappropriate prosody

Appropriate prosody

Inappropriate prosody

Est.

(se)

p

Est.

(se)

p

Est.

(se)

p

Est.

(se)

p

Fixed parameters

Intercept (b₀)

6.920

−0.041

<0.001

6.887

−0.039

<0.001

6.699

−0.045

<0.001

6.715

−0.044

<0.001

Age_child (b₁)

−0.119

−0.031

<0.001

−0.129

−0.028

<0.001

−0.152

−0.034

<0.001

−0.109

−0.034

0.030

Random parameters

S²_child

0.054

−0.002

<0.001

0.046

−0.002

<0.001

0.071

−0.002

<0.001

0.065

−0.022

<0.001

S²_{age_child}

0.018

−0.001

<0.001

0.009

−0.001

<0.001

0.019

−0.001

<0.001

0.026

−0.013

0.054

S²_items

0.002

0.000

0.053

0.036

−0.004

<0.001

0.001

0.000

0.021

0.000

0.261

Note 1: The factor age_child was centred around the mean age of all children (i.e., 6;6 or 78 months). A unit of change in age was defined as an increase or decrease of 10 months relative to the mean age. Note 2: The estimates differing significantly between the two prosody conditions were underscored.

Appendix E

Parameter estimates for the relationship between production and comprehension in the initial and final focus conditions

	Initial focus			Final focus
	Estimate	(se)	p	Estimate	(se)	p
	‘good’ comprehenders			‘good’ comprehenders
Interceptie	3.288	−0.210	<0.001	3.400	−0.140	<0.001
Age_child	0.370	−0.080	<0.001	−0.810	−0.690	0.200
Comprehension	−12.960	−2.510	<0.001	0.120	−0.080	0.130
	‘poor’ comprehenders			‘poor’ comprehenders
Intercept	2.970	−0.200	<0.001	3.690	−0.150	<0.001
Age_child	0.230	−0.110	0.044	−0.710	−0.580	0.189
Comprehension	6.350	−2.470	0.015	0.020	−0.090	0.389

Notes

1	These patterns mean that within a sentence, sentence-initial focus is typically followed by post-focus deaccentuation; sentence-final focus is preceded by accentuation in sentence initial position if the subject is realised by a noun.
2	In the Dutch instructions, the second rule was presented to the children as the following: De tweede belangrijke regel van het spel is dat je altijd álles moet zeggen wat je op het plaatje ziet gebeuren, dus een hele zin. Anders weet ik nog niet goed welke plaatjes bij elkaar horen ‘The second important rule of the game is that you should always say everything you see happening in the picture, so a whole sentence. Otherwise, I’m not sure which pictures belong together.’ The notion of a complete sentence or a full sentence was thus introduced in combination with a jargon-free explanation. While this was not strictly necessary for the older children, it proved effective for the younger ones, as demonstrated by previous studies using this approach (Yang & Chen, 2018; Romøren & Chen, 2022; Yang et al., 2024).
3	The pushbutton box was used during the first round of testing. The keyboard of the experiment laptops was used during the second round of testing as the push-button box was no longer supported on these laptops by our labs. No differences in reaction time were observed between the children who did the comprehension experiment using the push-button box and the children using the keyboard.
4	An anonymous reviewer suggested that the earlier acquisition of sentence-final focus marking may reflect its status as a default accentuation pattern (consisting of prenuclear accents and a nuclear accent), one that does not involve deaccentuation, which is known to pose challenges for young children (e.g., Grünloh et al., 2015). This would imply that the children in our study were not marking focus per se, but merely producing a default pattern. However, there is limited empirical evidence that such a default is clearly dominant or unambiguous in the input received by 4- to 8-year-olds. In natural discourse, sentence-final accents may be realised with reduced prominence through pitch compression or downstep, making them less salient than subject accents earlier in the sentence (Baumann et al., 2007; Sityaev & House, 2003; Xu & Xu, 2005). Furthermore, Grünloh et al. (2015), cited by the reviewer, found that adult speakers of German deaccented given information less frequently in child-directed speech (CDS) (addressed to 2-year-olds) than in adult-directed speech and observed similarity in the use of deaccentuation between 2 to 3-year-olds’ production and CDS. But they also found a significant increase in the use of deaccentuation from age 2 to age 3, pointing to a gradual developmental shift, as also noted in children aged 4 to 8 by Chen (2011a). Moreover, children do not rely solely on default-like prosody in early production. Studies have shown that toddlers already vary accent type in accordance with information structure (Wieman, 1976; Chen, 2009), and that children from age 2 onward across languages vary their prosody to reflect changes in information struture (Chen, 2018a, 2018b; Grünloh et al., 2015). In sum, while the default pattern hypothesis is an interesting proposal, we believe it does not fully explain the nuanced developmental patterns observed in children’s focus marking. Nonetheless, further investigation of the frequency and distribution of accentuation patterns in the input, across ages and discourse contexts, could help clarify how such patterns may shape children’s development in prosodic focus marking.

References

Abboub, N., Nazzi, T., & Gervain, J. (2016). Prosodic grouping at birth. Brain and Language, 162, 46–59. [Google Scholar] [CrossRef]
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28. [Google Scholar] [CrossRef]
Bates, E. (1976). Language in context. Academic Press. [Google Scholar]
Baumann, S., Becker, J., Grice, M., & Mücke, D. (2007). Tonal and Articulatory Marking of Focus in German. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (pp. 1029–1032). University of Saarland. [Google Scholar]
Baumann, S., & Kügler, F. (2015). Prosody and information status in typological perspective—Introduction to the special issue. Lingua, 165, 179–182. [Google Scholar] [CrossRef]
Birch, S., & Clifton, C. J. (1995). Focus, accent, and argument structure: Effects on language comprehension. Language and Speech, 38(4), 365–391. [Google Scholar] [CrossRef] [PubMed]
Bishop, J., Kuo, G., & Kim, B. (2020). Phonology, phonetics, and signal-extrinsic factors in the perception of prosodic prominence: Evidence from rapid prosody transcription. Journal of Phonetics, 82, 100977. [Google Scholar] [CrossRef]
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345. [Google Scholar]
Bolinger, D. (1978). Intonation across languages. In J. H. Greenberg (Ed.), Universals of human language: Phonology (Vol. 2, pp. 471–524). Stanford University Press. [Google Scholar]
Bolinger, D. (1983). Intonation and gesture. American Speech, 58, 156–174. [Google Scholar] [CrossRef]
Bögels, S., Kendrick, K. H., & Levinson, S. C. (2015). Never say no … How the brain interprets the timing of speech acts. PLoS ONE, 10(6), e0131609. [Google Scholar] [CrossRef]
Burnham, D., Kitamura, C., & Vollmer-Conna, U. (2002). What’s new, Pussycat? On talking to babies and animals. Science, 296(5572), 1435. [Google Scholar] [CrossRef]
Chen, A. (2009). The phonetics of sentence-Initial topic and focus in adult and child Dutch. In M. Vigário, S. Frota, & M. J. Freitas (Eds.), Phonetics and phonology: Interactions and interrelations (pp. 91–106). Benjamins. [Google Scholar]
Chen, A. (2010). Is there really an asymmetry in the acquisition of the focus-to-accentuation mapping? Lingua, 120(8), 1926–1939. [Google Scholar] [CrossRef]
Chen, A. (2011a). Tuning information packaging: Intonational realization of topic and focus in child Dutch. Journal of Child Language, 38(5), 1055–1083. [Google Scholar] [CrossRef]
Chen, A. (2011b). The Developmental path to phonological focus-Marking in Dutch. In S. Frota, P. Prieto, & G. Elordieta (Eds.), Prosodic production, perception and comprehension (pp. 93–109). Springer. [Google Scholar] [CrossRef]
Chen, A. (2018a). Get the focus right across languages: Acquisition of prosodic focus-Marking in production. In P. Prieto, & N. Esteve-Gibert (Eds.), Prosodic development in first language acquisition (pp. 295–314). John Benjamins. [Google Scholar]
Chen, A. (2018b). Production and comprehension of prosodic granularity: A developmental perspective. In T. Cho, S. Kim, J. Choi, J. J. Kim, S. Y. Kim, & K.-J. Lee (Eds.), Proceedings of the first Hanyang international symposium on phonetics and cognitive sciences of language (pp. 11–12). Available online: https://site.hanyang.ac.kr/documents/10980062/13052079/HISPhonCog_HB_2018.pdf (accessed on 6 July 2025).
Chen, A. (2020). A sound start: Prosodic development before birth and in the first three years of life. Unpublished grant proposal. Available online: https://soundstart.sites.uu.nl/ (accessed on 6 July 2025).
Chen, A., den Os, E., & de Ruiter, J. P. (2007). Pitch accent type matters for online processing of information status: Evidence from natural and synthetic speech. The Linguistic Review, 24(2–3), 317–344. [Google Scholar] [CrossRef]
Chien, Y. C., & Wexler, K. (1990). Children’s knowledge of locality conditions on binding as evidence for the modularity of syntax and pragmatics. Language Acquisition, 1, 225–295. [Google Scholar] [CrossRef]
Clahsen, H. (2008). Behavioral methods for investigating morphological and syntactic processing in children. In I. A. Sekerina, E. M. Ferna’ ndez, & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in children’s language processing (pp. 1–28). John Benjamins. [Google Scholar]
Clark, E. V. (1993). The Lexicon in acquisition. CUP. [Google Scholar]
Clark, E. V., & Hecht, B. F. (1983). Comprehension, production, language acquisition. Annual Review of Psychology, 34(1), 325–349. [Google Scholar] [CrossRef]
Cruttenden, A. (1985). Intonation comprehension in ten-year-olds. Journal of Child Language, 12, 643–661. [Google Scholar] [CrossRef]
Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40(2), 141–201. [Google Scholar] [CrossRef]
Cutler, A., & Swinney, D. A. (1987). Prosody and the development of comprehension. Journal of Child Language, 14, 145–167. [Google Scholar] [CrossRef]
Dahan, D., Tanenhaus, M. K., & Chambers, C. G. (2002). Accent and reference resolution in spoken-Language comprehension. Journal of Memory and Language, 47, 292–314. [Google Scholar] [CrossRef]
Destruel, E., Lalande, L., & Chen, A. (2024). The development of prosodic focus marking in French. Frontiers in Psychology, 15, 1360308. [Google Scholar] [CrossRef]
Fló, A., Brusini, P., Macagno, F., Nespor, M., Mehler, J., & Ferry, A. L. (2019). Newborns are sensitive to multiple cues for word segmentation in continuous speech. Developmental Science, 22(4), e12802. [Google Scholar] [CrossRef]
Goldstein, H. (2011). Multilevel statistical models (Vol. 922). John Wiley & Sons. [Google Scholar]
Grünloh, T., Lieven, E., & Tomasello, M. (2015). Language Learning and Development Young Children’s Intonational Marking of New, Given and Contrastive Referents. Language Learning and Development, 11(2), 95–127. [Google Scholar] [CrossRef]
Gussenhoven, C. (2004). The phonology of tone and intonation. CUP. [Google Scholar]
Gussenhoven, C. (2005). Transcription of Dutch intonation. In S. Jun (Ed.), Prosodic typology and transcription: A unified approach (pp. 118–145). OUP. [Google Scholar]
Hanssen, J., Peters, J., & Gussenhoven, C. (2008). Prosodic effects of focus in Dutch declaratives. In P. A. Barbosa, S. Madureira, & C. Reis (Eds.), Proceedings of the 4th international conferences on speech prosody (pp. 609–612). Editora RG/CNPq. Campinas. [Google Scholar]
Hendriks, P. (2014). Asymmetries between language production and comprehension. Springer. [Google Scholar] [CrossRef]
Hendriks, P., & Koster, C. (2010). Production/comprehension asymmetries in language acquisition. Lingua, 120, 1887–1897. [Google Scholar] [CrossRef]
Ito, K., Bibyk, S. A., Wagner, L., & Speer, S. R. (2014). Interpretation of contrastive pitch accent in six- to eleven-year-old English-speaking children (and adults). Journal of Child Language, 41(1), 84–110. [Google Scholar] [CrossRef]
Ito, K., & Speer, S. R. (2008). Anticipatory effect of intonation: Eye movements during instructed visual search. Journal of Memory and Language, 58, 541–573. [Google Scholar] [CrossRef]
Jhang, Y., & Oller, D. K. (2017). Emergence of functional flexibility in infant vocalizations of the first 3 months. Frontiers in Psychology, 8, 300. [Google Scholar] [CrossRef] [PubMed]
Kidd, E., & Donnelly, S. (2020). Individual differences in first language acquisition. Annual Review of Linguistics, 6, 319–340. [Google Scholar] [CrossRef]
Kidd, E., Donnelly, S., & Christiansen, M. H. (2018). Individual differences in language acquisition and processing. Trends in Cognitive Sciences, 22, 154–169. [Google Scholar] [CrossRef] [PubMed]
Kügler, F., & Calhoun, S. (2020). Prosodic encoding of information structure. In C. Gussenhoven, & A. Chen (Eds.), The Oxford handbook of language prosody (pp. 453–467). Oxford University Press. [Google Scholar]
Lahey, M. (1974). Use of prosody and syntactic markers in children’s comprehension of spoken sentences. Journal of Speech and Hearing Research, 17, 656–668. [Google Scholar] [CrossRef]
Lambrecht, K. (1994). Information structure and sentence form: Topics, focus, and the representations of discourse referents. CUP. [Google Scholar]
Lentz, T., & Chen, A. (2016, July 17). Individual imbalances in prosody production and comprehension. Satellite workshop ‘Personality in speech perception & production’ at the 15th Conference on Laboratory Phonology, Ithaca, NY, USA. [Google Scholar]
Mastropieri, D., & Turkewitz, G. (1999). Prenatal experience and neonatal responsiveness to vocal expressions of emotion. Developmental Psychobiology, 35, 204–214. [Google Scholar] [CrossRef]
Morton, J. B., & Trehub, S. E. (2001). Children’s understanding of emotion in speech. Child Development, 72(3), 834–843. [Google Scholar] [CrossRef]
Müller, A., Höhle, B., Schmitz, M., & Weissenborn, J. (2006). Focus-to-stress Alignment in 4- to 5-year-old German-learning Children. In A. Belletti, E. Bennati, C. Chesi, E. Di Domenico, & I. Ferrari (Eds.), Proceedings of GALA 2005 (pp. 393–407). Cambridge Scholars Publishing. [Google Scholar]
Nallet, C., & Gervain, J. (2021). Neurodevelopmental preparedness for language in the neonatal Brain. Annual Review of Developmental Psychology, 3, 41–58. [Google Scholar] [CrossRef]
Obleser, J., & Kotz, S. A. (2010). Expectancy constraints in degraded speech modulate the language comprehension network. Cerebral Cortex, 20(3), 633–640. [Google Scholar] [CrossRef]
Quené, H., & van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59, 413–425. [Google Scholar] [CrossRef]
Rasbash, J., Charlton, C., Browne, W. J., Healy, M., & Cameron, B. (2015). MLwiN version 2.32 [computer software]. Centre for multilevel modelling, University of Bristol. Available online: https://www.bristol.ac.uk/cmm/software/mlwin/ (accessed on 28 February 2015).
Roberts, C. (1996). Information structure: Towards an integrated theory of formal pragmatics. In J. H. Yoon, & A. Kathol (Eds.), OSU working papers in Linguistics (Vol. 49). The Ohio State University Department of Linguistics. [Google Scholar]
Romøren, A. S. H. (2016). Hunting Highs and Lows: The acquisition of prosodic focus marking in Swedish and Dutch [Ph.D. dissertation, Utrecht University]. LOT, Netherlands Graduate School of Linguistics. (LOT Dissertation Series No. 426). [Google Scholar]
Romøren, A. S. H., & Chen, A. (2022). The acquisition of prosodic marking of narrow focus in central Swedish. Journal of Child Language, 49, 213–238. [Google Scholar] [CrossRef]
Sereno, S. C., & Rayner, K. (2003). Measuring word recognition in reading: Eye movements and event-Related potentials. Trends in Cognitive Sciences, 7(11), 489–493. [Google Scholar] [CrossRef]
Sityaev, D., & House, J. (2003). Phonetic and Phonological Correlates of Broad, Narrow and Contrastive Focus in English. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th international congress of phonetic sciences (pp. 1819–1822). Organizing Committee of the 15th ICPhS. Available online: https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_1819.pdf (accessed on 26 August 2025).
Snijders, T., & Bosker, R. (1999). Multilevel analysis: An introduction to basic and applied multilevel analysis. Sage. [Google Scholar]
Szendröi, K. (2004). Acquisition evidence for an interface theory of focus. In J. Van Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 457–468). LOT, Netherlands Graduate School of Linguistics. [Google Scholar]
Szendrői, K., Bernard, C., Berger, F., Gervain, J., & Höhle, B. (2017). Acquisition of prosodic focus marking by English, French, and German three-, four-, five- and six-year-olds. Journal of Child Language, 45, 219–241. [Google Scholar] [CrossRef] [PubMed]
Ünal, E., & Papafragou, A. (2016). Production-Comprehension asymmetries and the acquisition of evidential morphology. Journal of Memory and Language, 89, 179–199. [Google Scholar] [CrossRef]
Vallduví, E., & Engdahl, E. (1996). The linguistic realization of information packaging. Linguistics, 34(3), 459–520. [Google Scholar] [CrossRef]
Veenker, T. J. G. (2013). The Zep experiment control application (version 1.6) [Computer software]. Available online: https://beexy.nl/zep1/wiki/doku.php?id=home (accessed on 1 February 2013).
Watson, D., Gunlogson, C., & Tanenhaus, M. (2008). Interpreting pitch accents in on-linecomprehension: H* vs. L+H*. Cognitive Science, 32, 1232–1244. [Google Scholar] [CrossRef]
Weber, A., Braun, B., & Crocker, M. (2006). Finding referents in time: Eye-tracking evidence for the role of contrastive accents. Language and Speech, 49(3), 367–392. [Google Scholar] [CrossRef]
Wells, B., Peppé, S., & Goulandris, N. (2004). Intonation development from five to thirteen. Journal of Child Language, 31, 749–778. [Google Scholar] [CrossRef]
Wieman, L. (1976). Stress pattern in early child language. Journal of Child Language, 3(2), 283–286. [Google Scholar] [CrossRef]
Xu, Y. (1999). Effect of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27, 55–107. [Google Scholar] [CrossRef]
Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English declarative intonation. Journal of Phonetics, 33, 159–197. [Google Scholar] [CrossRef]
Yang, A., & Chen, A. (2018). The developmental path to adult-Like prosodic focus-Marking in mandarin Chinese-speaking children. First Language, 38, 26–46. [Google Scholar] [CrossRef]
Yang, A., Cho, T., Kim, S., & Chen, A. (2024). Prosodic focus marking in Seoul Korean-Speaking children: The use of prosodic phrasing. Frontiers in Psychology, 15, 1352280. [Google Scholar] [CrossRef]
Zhang, D., Chen, Y., Hou, X., & Wu, Y. J. (2019). Near-Infrared spectroscopy reveals neural perception of vocal emotions in human neonates. Human Brain Mapping, 40, 2434–2448. [Google Scholar] [CrossRef]

Figure 1. An example trial of sentence-final focus in the picture-matching game.

Figure 2. The left panel consisting of three drawings is a screenshot of what participants saw on the laptop screen on the trials involving the sentences Het/een varken wast een/de bloes ‘The/a pig is washing a/the blouse’. The right panel consisting of four figures illustrates the prosodic patterns of the sentences uttered as answers to the question Wie wast de bloes? ‘Who is washing the blouse’ (initial focus) and Wat wast het varken? ‘what is the pig washing? (final focus) in both the appropriate prosody condition and the inappropriate prosody condition. Note: The examples were transcribed following Gussenhoven (2005). H*L (read as ‘high-star-low’) represents a falling pitch accent; !H*L (read as downstepped ‘high-star-low’) indicates a downstepped falling pitch accent (i.e., a falling pitch accent with a lower peak than the preceding pitch accent). H denotes high pitch level and L stands for low pitch level; the ‘*’ sign indicates which pitch level is realised in the stressed syllable. %L (pronounced as percent-L) represents an initial low boundary tone and L% (pronounced as L-percent) a final low boundary tone.

Table 1. Mean ages, age ranges and gender of the four groups of children at the two testing moments (time 1, time 2) in the study.

Group_time1	Total	Production		Comprehension
Group_time1	Total	age_time1	age_time2	age_time1	age_time2
a4	12 (5 m)	4;8 (4;2−4;12)	5;3 (4;10−5;7)	4;8 (4;2−5;1)	5;4 (4;10−5;8)
a5	26 (12 m)	5;6 (5;0−5;12)	6;2 (5;8−6;8)	5;7 (5;0−6;0)	6;3 (5;8−6;8)
a6	15 (7 m)	6;6 (6;1−6;12)	7;2 (6;9−7;7)	6;7 (6;1−6;12)	7;2 (6;9−7;8)
a7	18 (10 m)	7;5 (6;12−7;12)	8;1 (7;7−8;8)	7;6 (7;0−8;1)	8;1 (7;8−8;8)

Note: Some children happened to be tested shortly before their birthday at a testing moment. Their age was registered as X;12, instead of X + 1 years.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, A.; van den Bergh, H. The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences. Languages 2025, 10, 234. https://doi.org/10.3390/languages10090234

AMA Style

Chen A, van den Bergh H. The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences. Languages. 2025; 10(9):234. https://doi.org/10.3390/languages10090234

Chicago/Turabian Style

Chen, Aoju, and Huub van den Bergh. 2025. "The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences" Languages 10, no. 9: 234. https://doi.org/10.3390/languages10090234

APA Style

Chen, A., & van den Bergh, H. (2025). The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences. Languages, 10(9), 234. https://doi.org/10.3390/languages10090234

Article Menu

The Production-Comprehension Relationship in the Acquisition of Prosodic Focus Marking: The Role of Age and Individual Differences

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. The Production Experiment

2.2.1. The Picture-Naming Task

2.2.2. The Picture Matching Game

2.2.3. Data Annotation

2.2.4. Perceptual Rating

2.3. The Comprehension Experiment

2.3.1. The Correct–Incorrect Answer Game

2.3.2. Procedure

3. Results

3.1. Production and Comprehension as Two Separate Skills

3.1.1. Production

3.1.2. Comprehension

3.1.3. Interim Discussion

3.2. The Relationship Between Production and Comprehension

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI