1. Introduction
A long-standing view rooted in research on lexical and morpho-syntactic development is that children learn to understand a certain form-meaning mapping (e.g., a word and its meaning) before they can use the form to convey the intended meaning in production (
Clark, 1993). Comprehension is therefore a prerequisite for production and should precede production in language development. In line with the comprehension-precedes-production view, the ultimate goal for a language-learning child may be to achieve coordination between these two domains, such that their production abilities will gradually align with comprehension abilities (
Clark & Hecht, 1983).
However, cases of a production-precedes-comprehension asymmetry have also been reported (see
Hendriks, 2014;
Hendriks & Koster, 2010 for an overview). For example, English-learning 4- to 6-year-old children can use pronouns such as ‘him’ or ‘her’ correctly and do not confuse them with reflexives such as ‘himself’ or ‘herself’ in production. But they interpret the pronouns (e.g., ‘her’ in ‘This is Mama Bear; this is Goldilocks. Is Mama Bear washing her?’) as their reflexive counterparts in comprehension in the presence of two referents of the same gender (e.g., interpreting ‘her’ as Mama Bear) even at the age of 6 years (
Chien & Wexler, 1990). Turkish-learning children can use verb suffixes in sentences in the past tense to distinguish first-hand experience of an event from acquired knowledge of an event through either hearsay or inference by the age of 3 years but are not able to correctly interpret the verb suffixes in comprehension at the age of 5 years (
Ünal & Papafragou, 2016). Four main explanations have been proposed for the observed production-precedes-comprehension asymmetry: experimental task effects, limitations in children’s pragmatic ability to map form to function during comprehension, general cognitive constraints, and the possibility that the grammar itself yields different form-meaning mappings in production versus comprehension (
Hendriks & Koster, 2010).
The relationship between comprehension and production of prosody has not been widely studied in developmental research. It appears to vary in different developmental stages in the temporal dimension. Studies on prosodic abilities in the first months of life suggest a head start in perception and processing due to prenatal exposure to prosodic properties of maternal speech and foetuses’ ability to learn from this input (e.g., see
Nallet & Gervain, 2021 for a review). For example, newborns show language-specific grouping preferences for non-linguistic sounds (e.g., pure tones) varied in pitch, duration or intensity (
Abboub et al., 2016). They can use prosodic information to segment speech in their native language (
Fló et al., 2019). They also exhibit heightened sensitivity to prosodic variations associated with certain emotions specific to their native language (
Mastropieri & Turkewitz, 1999;
Zhang et al., 2019). In contrast, studies on cry and non-cry vocalisations of infants aged 0 to 3 months do not yield consistent evidence for language-specific patterns in the use of pitch (van Niekerk et al. to appear). Infants use non-cry vocalisations differing in prosody to express negative, positive and neutral affect in the first three months of life (
Jhang & Oller, 2017) but it is not clear whether they show evidence of influence from their maternal language. However, in the acquisition of sentence-level prosody (e.g., intonation, prosodic phrasing) among older children, production appears to precede comprehension (e.g.,
Hendriks, 2014;
Müller et al., 2006). Children are capable of producing sentence-level prosodic patterns resembling those of adults and can employ certain prosodic features for communication by the age of 2 or 3 years. But they do not yet process prosodic cues during language comprehension as efficiently as adults do by the ages of 5 and 6 years and fail to interpret some functions of pitch contours by the ages of 9 or 10 years (e.g.,
Cruttenden, 1985;
Cutler & Swinney, 1987).
One particular aspect of sentence-level intonation relatively well studied in research on the production–comprehension asymmetries concerns the mapping between prosody and focus in West Germanic languages. Focus refers to the predication on a topic in a sentence and typically contains new or contrastive information to the hearer (
Lambrecht, 1994;
Vallduví & Engdahl, 1996). In many languages, there is a strong association between the focal constituent and prosodic prominence and between the post-focal constituents and reduced or little prosodic prominence (e.g.,
Baumann & Kügler, 2015;
Bolinger, 1978;
Kügler & Calhoun, 2020;
Xu, 1999;
Xu & Xu, 2005). In West Germanic languages, prosodic prominence can be varied through both discrete means, such as placement of pitch accent (i.e., emphasising a word or not with distinct pitch movement associated with the word), and choice of accent type (e.g., using a falling pitch accent vs. a rising pitch accent to emphasise a word), and gradient means, i.e., phonetic realisation or implementation of a pitch accent (hereafter phonetic realisation) (e.g., realising a falling pitch accent with a larger or smaller pitch range) (
Chen, 2018a;
Gussenhoven, 2004). Listeners take the focus-to-prosody mapping into account in online language comprehension such that an appropriate focus-to-prosody mapping speeds up comprehension, compared to an inappropriate focus-to-prosody mapping (
Birch & Clifton, 1995;
Cutler et al., 1997). Relatedly, listeners use the broad mapping between information value (new vs. given) and prosody to predict whether the upcoming referent is a new referent or a previously mentioned referent in online reference resolution before the segmental information completely unfolds (e.g.,
Chen et al., 2007;
Dahan et al., 2002;
Ito & Speer, 2008;
Watson et al., 2008;
Weber et al., 2006). Adult-like competence in prosodic focus marking thus entails that children can both vary prosodic prominence to encode focus in production and exploit the focus-to-prosody mapping in comprehension.
Although the relevance of comprehending the focus-to-prosody mapping for learning to produce this mapping has not yet received much attention in theories of prosodic development, it is widely believed that children acquiring a West Germanic language can use prosodic prominence to mark focus in production but fail to interpret or efficiently use the focus-to-prosody mapping in comprehension at the age of 4 or 5 years (
Hendriks, 2014;
Müller et al., 2006;
Szendröi, 2004). Two accounts have been proposed to explain this production-precedes-comprehension asymmetry. Specifically,
Cutler and Swinney (
1987) argued that 4 to 5-year-olds’ success in production does not stem from their linguistic knowledge of prosodic focus marking but from a universal ‘physiological reflex’ (
Bolinger, 1983). According to the reflex, speakers increase pitch in response to excitement when expressing something new or important (
Bolinger, 1983). Thus, children have not acquired the focus-to-prosody mapping in either production nor comprehension at the age of 4 or 5 years and there is no production–comprehension asymmetry. However, recent cross-linguistic studies of children’s prosodic focus-marking in production cast considerable doubt on the validity of the ‘physiological reflex’ account. By age 4 or 5, children from different languages already use prosody to mark focus in language-specific ways, albeit not fully adult-like (see
Chen, 2018a for a review). Their use of prosody is influenced by factors such as the relative importance of the prosodic cues, the acoustic salience of the prosodic cues, and the transparency of the form-meaning relationship between a prosodic cue and focus in the target language (
Chen, 2018a). This finding indicates that 4- to 5-year-olds’ use of prosody reflects language-specific knowledge of prosodic focus marking, contradicting the prediction of cross-linguistic similarity associated with the ‘physiological reflex’ account. Unlike
Cutler and Swinney (
1987),
Chen (
2010) and
Szendrői et al. (
2017) attributed the production-precedes-comprehension asymmetry to methodological limitations in past comprehension studies. First, these studies did not directly examine children’s comprehension of the focus-to-prosody mapping but were concerned with children’s ability to use prosodic prominence for purposes other than marking the focal word. Furthermore, the test materials used in these studies were usually syntactically and semantically more complex (e.g., SV + direct Object + indirect Object sentences with the particle ‘only’ or compound sentences containing ambiguous pronouns) than the materials used in the production studies (e.g., SVO or SV sentences), making the comprehension tasks cognitively more demanding. The apparent differences in syntactic complexity between the materials used in comprehension studies and those used in production studies have also been suggested as an explanation for the production-precedes-comprehension asymmetry for other areas of language development (
Hendriks & Koster, 2010). In addition, some judgement tasks used in the comprehension studies (e.g., explicit truth-value judgement tasks) were cognitively demanding, such that children might not be able to fully apply their knowledge of prosodic focus marking (
Szendrői et al., 2017).
Circumventing the methodological limitations in past work,
Chen (
2010) examined 4- to 5-year-olds’ and adults’ production and comprehension of the focus-to-prosody mapping in Dutch SVO sentences with focus on either the subject noun or the object noun. Production was elicited via a controlled but interactive picture-based game in which children responded to the experimenter’s wh-questions with SVO sentences. Comprehension of SVO sentences produced as responses to wh-questions was tested via an adapted version of
Birch and Clifton’s (
1995) reaction time paradigm. Children showed similar patterns to adults in two ways: they used accentuation to mark focus in production and they responded faster in comprehension when the focus-to-prosody mapping was appropriate than when it was not. The differences between children and adults were gradual. That is, children used deaccentuation in post-focus constituents slightly less frequently and had generally slower reaction times than adults. These findings indicate that Dutch-learning children exhibit linguistic knowledge of the focus-to-prosody mapping in both comprehension and production by the age of 4 or 5, showing no production–comprehension asymmetry in the acquisition of this mapping. Similarly,
Szendrői et al. (
2017) argued that production did not precede comprehension in the case of English-, German- and French-speaking 3- to 6-year-olds, based on evidence for adult-like comprehension of contrastive focus in an offline picture-based comprehension task. In this task, children gave a corrective response to the experimenter’s description of a picture, with prosodic prominence on either the subject noun or the object noun (e.g., The BIRDY has the bottle, right? vs. The birdy has the BOTTLE, right). Correcting the prosodically prominent noun in the response was interpreted to mean adult-like comprehension.
However,
Chen’s (
2010) analysis of the production data focused solely on the placement of accentuation. Yet, prosodic focus marking involves more than just where accentuation occurs; it also requires selecting the appropriate accent type. In addition, when the choice of accent type alone is insufficient to mark focus, variation in the phonetic realisation of the pitch accent can also be used (
Chen, 2018a;
Gussenhoven, 2004). In Dutch, the exact use of different prosodic means depends on the position of focus in a sentence. For example, in sentence-final position, focus and non-focus are typically distinguished via presence vs. absence of accentuation and the most commonly used pitch accent type is a falling pitch accent (H*L) (
Chen, 2011a;
Hanssen et al., 2008;
Romøren, 2016). In sentence-initial position, speakers usually accent the word with the H*L accent independent of focus conditions. But they vary the phonetic realisation of the pitch accent in peak alignment, pitch scaling and duration to mark focus: aligning the maximal pitch earlier in the word, lowering the minimal pitch of the pitch accent, and lengthening the word in the focus condition (
Chen, 2009).
1 More detailed analysis on Dutch-speaking children’s use of prosody has shown that 4- to 5-year-olds are adult-like in choice of accent type in sentence-initial position but not in sentence-medial and final positions in SVO sentences (
Chen, 2011a;
Romøren, 2016), and they cannot use phonetic realisation for focus-marking purposes (
Chen, 2009). In the stimuli used in
Chen’s (
2010) and
Szendrői et al.’s (
2017) comprehension experiments, focus and non-focus were distinguished by means of accent placement, choice of accent type and phonetic realisation (in the case of Dutch, English and German), as found in adults’ natural speech. This suggests that the children’s response might reflect their comprehension of the mapping between focus and the entire set of prosodic cues. Given that they showed adult-like differences in the reaction times between the appropriate focus-to-prosody mapping and the inappropriate focus-to-prosody mapping and adult-like offline comprehension of contrastive focus, the more in-depth prosodic analysis on children’s use of prosody in focus marking in production suggests that a typical comprehension-precedes-production asymmetry may in fact be present in 4- or 5-year-olds’ prosodic focus marking abilities.
Furthermore, past work on the relationship between production and comprehension in prosodic focus marking has primarily examined children aged 4 to 5 years. More recent studies of children’s prosodic focus-marking in production reveal continued development beyond the age of 5 across languages (e.g.,
Chen, 2018a;
Destruel et al., 2024;
Yang et al., 2024). For example, at the age of 7 or 8 Dutch-speaking children reach adult-like performance in their choice of pitch accent type in both sentence initial and final positions but only in the pitch-scaling dimension of phonetic realisation in sentence-initial position (
Chen, 2009,
2011a), consistent with the proposal that discrete prosodic means such as choice of accent type are easier to learn than gradient means due to the acoustically more salient nature of the former and more demand on precise articulatory control in producing the latter (
Chen, 2018b). By contrast, few studies have investigated children’s comprehension of the focus-to-prosody mapping beyond the age of 5, apart from work on contrastive focus (e.g.,
Ito et al., 2014;
Szendrői et al. 2017). This gap in the literature raises the question of whether the relationship between production and comprehension undergoes age-related changes across middle childhood.
Moreover, existing literature on the production–comprehension relationship in prosodic development is based on studies examining production and comprehension at the group level. Little is known on whether the relationship between production and comprehension is the same for different children. Research on individual differences in first language acquisition is mostly concerned with early speech perception, vocabulary, and grammar development (see
Kidd et al., 2018;
Kidd & Donnelly, 2020 for a review). It has been argued that individual differences are ‘large and notably stable across development … are also observed early and across all domains’ in first language acquisition (
Kidd et al., 2018, p. 158).
Wells et al. (
2004) found that individual differences could occur at both younger and older ages in different aspects of prosodic development in their study with 120 English-speaking 7- to 13-year-olds performing multiple production and comprehension tasks, consistent with
Kidd et al.’s (
2018) claim. In contrast, production data from a sample of 22 4- to 5-year-old Dutch-speaking children and 18 7- to 8-year-olds showed that there were noticeable individual differences in prosodic focus marking in the younger children but not in the older children (
Chen, 2011a). It remains to be investigated how individual differences manifest in the relationship between production and comprehension in prosodic focus marking across different ages in childhood.
In the present study, we aimed to gain a clearer understanding of the relationship between production and comprehension in the acquisition of prosodic focus marking. To this end, we examined Dutch-speaking children’s prosodic focus marking in SVO sentences across a broad age range (i.e., 4 to 8 years) in a production experiment and a comprehension experiment. Specifically, we addressed two questions: (1) How does the relationship between production and comprehension develop across different ages? (2) To what extent do individual differences shape the relationship between production and comprehension?
Building on prior findings that Dutch-speaking children’s prosodic focus marking in production continues to develop beyond age 5, with adult-like performance emerging earlier in sentence-final than in sentence-initial position, and their comprehension appears adult-like by age four or five across sentence positions, we hypothesised that the relationship between production and comprehension will vary as a function of age (Hypothesis 1). Consistent with the view that comprehension precedes and supports production in language development, we predicted that children’s comprehension will be predictive of their production across all ages for sentence-initial focus, where production is not adultlike by the age of 8, but only at younger ages for sentence-final focus, where adultlike production competence appears at 7 or 8. A ‘predictive’ relationship is present if variation in children’s comprehension ability can explain variation in their production ability in a statistically meaningful way. The presence of a predictive relationship does not directly establish whether comprehension temporally precedes production, nor whether comprehension is a necessary enabling condition for development in production. But it can serve as first evidence that the relationship between comprehension and production is in line with the comprehension-precedes-production asymmetry.
Drawing on research into vocabulary and grammatical development as well as findings reported by
Wells et al. (
2004), we hypothesised that individual differences in the production–comprehension relationship will be present to a similar degree at all ages (Hypothesis 2). Accordingly, we predicted no effect of age on how well variation in comprehension can explain variation in production for individual children.
If our hypotheses are supported, the findings will provide the first indirect evidence for a role of comprehension in learning to use prosody for focus-marking. They will also suggest a similarity in the nature of individual differences underlying the development of prosodic and non-prosodic skills.
2. Materials and Methods
Production data was obtained from children and an adult control group in a natural and interactive setting using the same picture-matching game as in recent studies on children acquiring Mandarin Chinese (e.g.,
Yang & Chen, 2018), Korean (e.g.,
Yang et al., 2024) and Swedish (e.g.,
Romøren & Chen, 2022). The sentences produced by children and adults were subsequently evaluated via perceptual rating, in which trained adult listeners rated the appropriateness of the prosody of each sentence in the corresponding context on a five-point scale. The scores for prosodic appropriateness can both reflect the integrated use of prosody, instead of the use of a specific prosodic parameter (e.g., pitch) or a specific prosodic strategy (e.g., accent placement), and capture the perceptual relevance of production details. This makes perceptual rating an effective alternative for assessing children’s prosodic focus marking in production via manual annotation and acoustic analysis. Comprehension of the focus-to-prosody mapping was examined using
Chen’s (
2010) picture-based reaction-time method, which was designed to implicitly measure children’s comprehension of the focus-to-prosody mapping in real-time sentence processing. The reaction time technique has been increasingly used in studies of sentence processing in children aged 4 and above and has proven suitable for assessing children’s speech and language comprehension, as it minimises cognitive load and avoids reliance on explicit metalinguistic knowledge (
Clahsen, 2008).
Children from different age groups were tested for both production and comprehension at two test moments, spaced about eight months apart. Our design was thus both cross-sectional and longitudinal. Given the comparability of the speech materials in the production and comprehension experiments, we conducted the production experiment first and then the comprehension experiment at each testing moment to avoid potential influence of how prosody was used in the comprehension experiment on children’s use of prosody in the production experiment. Dedicated Windows-based laptops were used to conduct the comprehension experiment and the perceptual rating on the sentences elicited in the production experiment. The experiment laptops were not connected to the internet during testing or rating sessions to avoid external interference.
Three female student assistants, who were native speakers of Dutch and students of linguistics or literature, conducted the experiments with the children individually at their schools during school time and with the adults in a sound-attenuated booth in the Institute for Language Sciences Labs at Utrecht University. To ensure consistency of the highest degree between the experimenters, we trained the experimenters on both the procedure and the use of prosody using an experiment protocol before they started testing participants. The core of the training involved a series of dry-run role-play sessions over three to four weeks, during which the experimenters simulated interactions with participants of different ages and the experimental protocol was finetuned for various scenarios.
To help the children to feel at ease with the experimenters, every experimenter picked up the child that she was to test from the classroom and had a chat with the child prior to the experiment. The testing schedule was made in such a way that every child did both the production and comprehension experiments with the same experimenter at one or both testing moments.
2.1. Participants
Seventy-one Dutch-speaking children from four age groups participated in this study. The children were on average 4;8, 5;6, 6;6 and 7;5 in their respective age group at the start of the study. Detailed information on age and gender can be found in
Table 1, in which the four groups of children are labelled as a4, a5, a6, and a7 according to their age group at the start of the study. The children were from monolingual Dutch-speaking families and were recruited from four primary schools in the Netherlands. They had no hearing loss and speech or language disabilities according to parents’ reports. Written informed consent was received from the parents for the children to participate in this study and for their experimental sessions to be recorded and filmed.
An additional fifty-six children participated in the study but their data were not included in the current analysis. The data of eight of these children were lost due to loss of equipment. The other children either did not participate in both experiments at both testing moments, or did not have production ratings because their production in the focus conditions of interest was not evaluated for reasons like poor articulation, stuttering, self-repair, and being accompanied by unexpected noise in their surroundings.
Twenty-three adult native speakers of Dutch (mean age 21;5, range: 18;8–28;8, 10 men, 13 women) participated in the production experiment as the control group to provide a measure of adult-like production as established via perceptual rating. They were university students at the time of testing and were tested on the same tasks following the same procedure as the children. Prior to the experiment, they were informed that the tasks were of a simple nature as they were also used on child participants. They did not take part in the comprehension experiment because comprehension of Dutch-speaking adults with a similar background was assessed and reported in
Chen (
2010).
2.2. The Production Experiment
Following the question–answer paradigm (
Roberts, 1996), fifteen question–answer dialogues were embedded in the picture-matching game to elicit fifteen SVO sentences in five focus conditions (three sentences per condition): narrow focus in sentence-initial position (initial focus), responding to who-questions, and narrow focus in sentence-final position (final focus), responding to what-questions, narrow focus in sentence-medial position, responding to what-does-X-do-with-Y questions, contrastive focus in sentence-medial position, correcting the experimenter’s statement about the action, and broad focus over the whole sentence, responding to what-happens questions. Only the first two focus conditions were relevant to the current study; the other three were included for a separate study on individual differences in focus marking in production (Chen & van den Bergh, in progress). The target SVO sentences were unique combinations of five subject-nouns (baker, ‘baker’, hond, ‘dog’, leeuw, ‘lion’, meisje, ‘girl’, poes, ‘cat’), three verbs (tekenen, ‘draw’, koken ‘cook’, toveren ‘conjure’), and three object-nouns (lepel, ‘spoon’, laars ‘boot’, wortel ‘carrot’) such that each subject noun, verb and object noun occurred once in a certain focus condition. All words were highly familiar to Dutch-speaking 4-year-olds.
To make sure that the participants would use the intended words in the picture-matching game, we asked each participant to complete a picture-naming task prior to the game during the same session. In the picture-naming task, the participants were familiarised with the characters and actions that appeared in the picture-matching game.
2.2.1. The Picture-Naming Task
In the picture-naming task, the participants first named the nouns illustrated in individual pictures. The experimenter showed one picture a time to the participant, and invited the participant to name it by saying Dit is een … ‘This is a …’. In the case of incorrect naming (e.g., calling a baker a cook), the experimenter first acknowledged that the entity might look like what the participant had in mind, and then drew the participant’s attention to distinctive features in the picture (e.g., the dough on the table) and suggested to the participant the intended label for the entity. Second, the experimenter showed the participant pictures of the actions occurring in the game, and explained to him which action each picture depicted. Finally, to check whether the participant has got all the labels right, the experimenter went through all the pictures and asked the participant to name them once more. If the participant did not have difficulty with naming the nouns the first time, he only named the actions for the second time. The picture-naming task lasted about 5–10 min.
2.2.2. The Picture Matching Game
In the picture-matching game, the child was supposed to help the experimenter to put pictures in matched pairs. The experimenter first explained to the participant how the game worked, and introduced two rules of the game: Never reveal his pictures to the experimenter; always say everything he sees happening on the picture in his response.
2 The game consisted of fifteen trials, corresponding to the fifteen question–answer dialogues. Each trial was carried out in a fixed number of steps (
Figure 1). First, the experimenter took a picture from her set of pictures (e.g., a picture of a girl drawing something on a piece of paper), drew the participant’s attention to the picture, and briefly described it by saying Kijk! Het meisje. Het lijkt alsof het meisje iets tekent ‘Look! The girl. It seems that the girl is drawing something.’ She then asked the participant a question about the picture (e.g., Wat tekent het meisje? ‘What is the girl drawing?’) or made a guess about the missing information in the case of contrastive focus (e.g., Ik denkt dat het meisje de wortel tekent. ‘I think the girl is drawing the carrot.’). The participant then took a picture from his own set of pictures, and visually identified the information requested by the experimenter. The experimenter repeated her question before the participant started to speak again. The participant then answered the question in an SVO sentence (e.g., Het meisje tekent de lepel. ‘The girl is drawing the spoon.’). The experimenter thanked the participant for his information, looked for the picture of the spoon in the box, and handed over both pictures to the participant for his approval.
The game proper was preceded by five practice trials. If the participant gave elided answers or full-sentence answers containing non-target words during the practice trials, the experimenter reminded the participant of the rules of the game or the intended words. This intervention turned out to be necessary only in the case of a small number of the 4- to 5-year-olds. Most children provided full-sentence answers using the intended words right from the start.
Each session lasted about 15–25 minutes, and was recorded using a Zoom H1 digital recorder (with a built-in microphone) at a sampling rate of 44.l kHz with a 16-bit resolution and filmed. The film recordings were used to check procedural consistency in how the game was carried out.
2.2.3. Data Annotation
The audio recording of each participant was first orthographically annotated using Praat (
Boersma, 2001). Second, full-sentence responses were selected as usable responses if they were not plagued by any of the following factors: self-correction, use of pronouns, use of non-target words, detectable hesitation-induced silences, responding to a non-target question, elided responses, overlap with the experimenter’s speech, or poor recording quality. Third, the usable full-sentence responses and the corresponding questions or statements were selected and extracted as individual .wav files. These steps of data annotation were performed by a team of student assistants.
2.2.4. Perceptual Rating
The usable full-sentence responses and corresponding questions or statements were combined into context-response dialogues with a 300 ms interval between the question or statement and the response in each dialogue and a 1000 ms interval between dialogues. The three native speakers of Dutch who administered the production experiment served as the raters and rated each response in each dialogue on how well its prosody fitted in the context on a five-point Equal Appearing Interval scale, with 1 standing for ‘does not fit’ for and 5 standing for ‘fits perfectly’. Prior to the rating, the raters were given sound examples illustrating what the prosody typically sounded like in each focus condition and written instructions on how to do the rating. They did the rating at least ten months after the production experiment at the second testing moment was completed. These raters were familiar with child speech and the speech material to be evaluated, and were thus expected to be able to focus on the prosody in their rating, instead of being distracted by other features such as the speaking rate, voice quality and articulation.
To minimise variation in the scores due to comparisons between children or between children and adults, the dialogues were presented to the raters per speaker and per age group. The rating was conducted in seven 30–40 minute-sessions. The raters could listen to each dialogue maximally three times using a headphone set before finalising the score. As only initial focus and final focus were relevant to us here, the scores for these two focus conditions were included for further analysis.
2.3. The Comprehension Experiment
2.3.1. The Correct–Incorrect Answer Game
The comprehension experiment was presented to the participants as a ‘correct–incorrect answer’ game. In the game a boy looked through some pictures with his three pets, a parrot, a chicken, and a duck. The boy wanted to know whether his pets knew the pictures well and which of the pets knew the pictures best. To find this out, the boy showed one picture a time to one of his pets and asked the pet a question about the picture. The participant could follow the question–answer dialogues between the boy and the pets via a headphone set and see the boy, his pets and the pictures via the screen of the experiment laptop. The participant’s task was to judge whether the pets’ answers were correct or incorrect (‘goed’ or ‘fout’ in Dutch) by pressing the green response key for ‘correct’ and the red response key for ‘incorrect’ on a responding device.
Two focus conditions were embedded via wh-questions in the game, i.e., sentence-initial focus and sentence-final focus. The two focus conditions were combined with two prosody conditions (appropriate focus-to-prosody mapping vs. inappropriate focus-to-prosody mapping), forming four experimental conditions: initial focus-appropriate prosody, initial focus-inappropriate prosody, final focus-appropriate prosody, and final focus-inappropriate prosody, as illustrated in
Figure 2.
Twenty-four question–answer dialogues were composed; they were syntactically comparable to the dialogues occurring in the picture-matching game. The answers in the experimental dialogues were all correct answers in terms of the segmental and lexical content. In addition, twenty question–answer dialogues were included as fillers. The answers in the fillers were all wrong answers, twelve of which contained a lexical error, e.g., eend ‘duck’ instead of kip ‘chicken’, and eight of which contained a pronunciation error, e.g., jaangen instead of jongen ‘boy’. The experimental dialogues and fillers were derived from question sentences recorded by a male native speaker of Dutch in child-directed speech (
Burnham et al., 2002), and answer sentences by a female native speaker of Dutch in her usual manner of speaking.
A Latin square was used to distribute the 24 experimental dialogues, 12 fillers with lexically wrong answers and 8 fillers with wrongly pronounced answers over the four experimental conditions, resulting in four lists. Although each dialogue only occurred once in each list, each condition was realised in six experimental dialogues, three fillers with lexically wrong answers, and two fillers with wrongly pronounced answers. In total, every participant was presented with 44 dialogues. Two pseudo-randomised orders were created for each list, resulting in eight stimulus orders.
2.3.2. Procedure
The children did the comprehension experiment individually in a quiet room at their schools during school time. The experimenters who administered the production experiment also conducted this experiment by means of the Zep Experimental Control Application (hereafter Zep) (
Veenker, 2013) on the experiment laptops. The exact list and stimulus order that a child got were randomly chosen by Zep. An approximately equal number of children were assigned to each stimulus order of each list. Each session lasted about 20–25 minutes starting with a practice session. In the practice session, the children were familiarised with the task and trained to properly use the response keys, either on a pushbutton box or the keyboard of the experiment laptops.
3 Each session was filmed from behind the children and the experimenter to minimise camera interference. The recordings were used to verify consistency in procedural details.
The timeline of a trial was as follows: A target picture appeared on the screen, accompanied by an image of a boy and one of his pets on the screen. Simultaneously, the boy said Kijk ‘look’ as an attention getter. Eight-hundred milliseconds later, he named an entity in the picture (e.g., Een varken ‘A pig’). The 800 ms delay allowed the participants to take a proper look at the picture. Twelve-hundred ms after the naming, the boy asked a question about the picture (e.g., Wat wast de varken? ‘What is the pig washing?’). Two thousand two hundred ms after the end of the question, the pet provided an answer (Het varken wast een bloes. ‘The pig is washing a blouse.’). At the end of the answer, a high-precision timer (1 ms accuracy) was automatically activated. Simultaneously, a picture showing the response keys for correct and incorrect answers appeared on the screen to remind the children that they should respond by pressing a key.
Reaction times were automatically measured from the end of each answer sentence until a response key was pressed. The children’s correct–incorrect judgments were also automatically recorded. They were instructed to press the response key as quickly as possible, but not before the end of the answer sentence. A timeout was set at four seconds after the end of the sentence, after which responses would be registered as late responses and discarded for statistical analysis.
4. Discussion
Based on the production and comprehension data obtained from the same group of children of a wide age range at two testing moments, we have addressed two questions on the production–comprehension relationship in the acquisition of prosodic focus marking in Dutch: (1) How does the relationship between production and comprehension develop across different ages? (2) To what extent do individual differences shape the relationship between production and comprehension? We hypothesised that the relationship between production and comprehension will vary as a function of age (Hypothesis 1) and that individual differences in the predicted production–comprehension relationship will exist to a similar degree at all ages (Hypothesis 2).
We have found that the children’s comprehension was predictive of their production in sentence-initial focus across ages, as predicted, but not in sentence-final focus across ages, contra our prediction. However, the predictive relationship between comprehension and production in sentence-initial focus differed for different children depending on whether they could process the focus-to-prosody mapping in online language comprehension like adults but independent of their age. If they could, more adult-like comprehension was related to better production. If they could not, less adult-like comprehension was related to better production. Taken together, the findings did not fully support Hypothesis 1 but confirmed Hypothesis 2 on the stability of individual differences in the relationship between production and comprehension.
It is unexpected that comprehension was not predictive of production in sentence-final focus at any age under investigation. Considering that the children were adult-like in both comprehension and production in sentence-final focus from the age of 4;8 onwards in our data, we cannot rule out potential influence of comprehension on production in younger children. It is equally premature to dismiss the possibility that the comprehension ability as tested using the reaction time paradigm may not be crucial for developing adult-like use of prosody in production. That is, children may develop adult-like use of prosody in focus marking in production without processing the focus-to-prosody mapping in online language comprehension in the same way as found for adults (
Birch & Clifton, 1995;
Cutler et al., 1997;
Chen, 2010). For example, children may figure out how to use prosody to mark focus through perception of the form-function mappings between prosodic variations and focus conditions (in different contexts) in others’ production. Specifically, children may initially operate on physiologically motivated mechanism in their use of prosody to highlight new information, e.g., Bolinger’s physiological reflex, and gradually grammaticalise the form-function mapping between prosodic prominence and focus by implementing prosodic prominence in a language-specific manner (
Chen, 2020). The process of grammaticalisation may be driven by perception of the association between focus and various forms of prosodic prominence in the speech that children are exposed to. It follows that more salient form-function mappings are likely to be easier to perceive and are therefore acquired earlier than less salient form-function mappings. This has been shown to be indeed the case. For example, Dutch-speaking children’s use of accent placement and accent type to express narrow focus in final focus became adult-like earlier than their use of relatively subtle phonetic cues in initial focus (
Chen, 2009,
2011b). Salience can also be in the nature of the focus condition. For example, English- and German-speaking 3- to 5-year-olds can use prosodic prominence in an adult-like way in contrastive focus before their Dutch-speaking peers can do so in (non-contrastive) narrow focus (see
Chen, 2018a for a review).
The ability to associate prosodic prominence with focus at the perceptual level may not be the same as capitalising on the focus-to-prosody mapping to process sentence meaning in online language comprehension, as the latter entails that the listener has a representation of how the prosody of a sentence typically unfolds in each information structure context and can be impeded in the comprehension process if the prosody does not go as expected. It is possible that the ability to process the focus-to-prosody mapping in online language comprehension is the next developmental milestone for relatively less experienced language learners after having established the association between prosodic prominence and focus. This speculation ties in with early work showing that children aged between 4 and 10 seem to take little notice of prosody when following verbal instructions and performing various comprehensions tasks. For example,
Lahey (
1974) found that 4- and 5-year-old English-speaking children were not significantly worse at acting out coordinate sentences and sentences with relative clauses when the sentences were spoken with monotonous prosody than when they were spoken with proper prosody.
Bates (
1976) reported that English-speaking children’s imitation of sentences was disrupted by marked word order but not by pragmatically inappropriate accent placement.
Morton and Trehub (
2001) found that English-speaking children responded primarily to the lexical content of a sentence in perception of emotions such as happy and sad when prosodic cues conflicted with lexical cues at the age of 4, their reliance on the lexical content declined between the age of 5 and 10, and they became reliant on prosody only at the age of 10. We thus propose that children in the multiword-utterances stage initially rely on lexical and syntactical information in language comprehension and gradually establish a network of representations of contextually appropriate and statistically more probable form-function mappings between prosodic cues and meanings. In this process, they begin with responding to prosodic information when it appears in the associated context. Only at a more advanced stage, they take the contextually probable form-function mappings as the default and respond in a negative way to contextually improbable form-function mappings.
Also, in everyday communication, listeners can use different strategies in language comprehension and need not all heavily rely on prosodic cues. It is thus possible that not all children will develop the ability to respond to the focus-to-prosody mapping in online comprehension whereas they still use prosody in focus marking in production. It follows that not all adults with normal speaking and hearing may comprehend sentences faster when the prosody is contextually appropriate than when the prosody is contextually inappropriate. In a preliminary study on individual differences in online comprehension of prosodic focus marking,
Lentz and Chen (
2016) indeed found that 17 of the 32 adult participants in their study did not show the expected reaction time differences between appropriate prosody and inappropriate prosody.
Furthermore, the relationship between comprehension and production in sentence-initial focus is intriguing regarding the ‘poor’ comprehenders. The result showed that the less adult-like the comprehension was, the more adult-like the production was. This may suggest that the children’s production outperformed their comprehension, with the former being rated somewhat lower than adults’ production and the latter being completely unlike the typical adults’ response (
Birch & Clifton, 1995;
Cutler et al., 1997). It may lend further support to the possibility that the comprehension ability as tested in the reaction time paradigm may not be crucial to the development of adult-like use of prosody in production. Future research on children’s use of prosody to identify focus on perception tasks (e.g., choosing the appropriate response to a wh-question from renditions of the same sentence with narrow focus in different sentence positions) and how this is related to their production is needed to obtain a clearer understanding of which perception or comprehension ability can support the acquisition of prosodic focus marking in production.
Finally, this study has a number of limitations. First, individual differences in the production–comprehension relationship were only found in sentence-initial focus and remained the same across development, in line with the findings in other domains of language acquisition research (
Kidd et al., 2018). However, our analysis on individual differences is limited to dividing children into two types of comprehenders based on the comprehension score. It remains a puzzle as to how the observed individual differences can be explained. Recent research on individual differences in adults’ perception of prosodic prominence has found that listeners with lower pragmatic skill (or more autistic traits) exhibit a weaker top-down effect of focus interpretation on their subjective prominence ratings in SVO sentences with either narrow focus on the object or broad focus on the verb phrase (
Bishop et al., 2020).
Lentz and Chen (
2016) examined whether variation in perspective taking could explain why some adults were not good at both production and comprehension in focus marking in a small sample of adult native speakers of Dutch (N = 32). They found that the adult native speakers of Dutch who scored high on perspective taking tended to have a lower production score and a higher comprehension score, thus weak in both production and comprehension. It would thus seem that they had less desire to mark focus properly in their own production and less need for others to mark it properly in comprehension. In future research, it can be useful to study the effect of signal-extrinsic factors such as acoustic traits and perspective taking skill on the relationship between production and comprehension of the focus-to-prosody mapping in children.
The second limitation concerns the interpretation of the predictive relationship observed between comprehension and production. Although our multilevel modelling revealed that comprehension ability statistically predicts production ability across age in sentence-initial focus, this should not be taken as evidence of temporal or causal precedence. While the study incorporated both cross-sectional and longitudinal elements, the analyses do not allow us to determine whether comprehension developmentally enables or causes improvements in production. Rather, the findings are consistent with the theoretical view that comprehension supports production but do not confirm a necessary or unidirectional relationship. To more rigorously investigate developmental directionality and potential enabling effects, future research should include longer-term longitudinal designs with multiple follow-up points (e.g., over a span of two or more years), and consider using lagged modelling to assess how changes in comprehension may predict subsequent changes in production over time. Experimental approaches such as comprehension training studies could also help clarify whether gains in comprehension facilitate development in production, providing stronger evidence for a causal or conditional relationship.
In addition, while our findings in sentence-initial focus are consistent with the view that comprehension supports production developmentally, we note that this claim does not imply that comprehension is fully in place before production begins. Both skills may still be undergoing development, particularly at younger ages, and developmental “precedence” in this context should be interpreted as relative timing or ordering of acquisition of certain aspects of certain skills, rather than as an all-or-nothing staging.