Next Article in Journal
Cognitive and Linguistic Influences on EFL Real Word and Pseudoword Spelling: Predictors and Error Analysis
Previous Article in Journal
Aspect Architecture in Bulgarian: Morphology and Semantics
Previous Article in Special Issue
How Children With and Without Developmental Language Disorder Use Prosody and Gestures to Process Phrasal Ambiguities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Preschoolers Mark Focus Types Through Multimodal Prominence: Further Evidence for the Precursor Role of Gestures

1
Department of Translation and Language Sciences, Universitat Pompeu Fabra, 08002 Barcelona, Spain
2
Faculty of Psychology and Educational Sciences, Universitat Oberta de Catalunya, 08018 Barcelona, Spain
3
Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
*
Author to whom correspondence should be addressed.
Languages 2025, 10(5), 92; https://doi.org/10.3390/languages10050092 (registering DOI)
Submission received: 31 January 2025 / Revised: 25 March 2025 / Accepted: 23 April 2025 / Published: 26 April 2025
(This article belongs to the Special Issue Advances in the Acquisition of Prosody)

Abstract

:
The present cross-sectional study assessed the role of multimodal cues in marking focus types during early childhood, focusing on prosodic prominence, gesture presence, and gestural prominence. A total of 116 Catalan-speaking three-, four- and five-year-olds participated in a semi-controlled interactive task eliciting words in three focus conditions: information, contrastive, and corrective. The data were coded manually using holistic assessments for all three measures. The results indicated, first, that children’s prosodic and gestural behavior was key in marking corrective focus. A significant tendency to use more gestures and increase both prosodic and gestural prominence was found in the corrective focus condition across the three age groups. Second, a developmental difference emerged in the acquisition of contrastive focus. Three-year-olds relied solely on gesture presence to encode contrastive focus, being unable to differentiate it prosodically from information focus. In turn, four- and five-year-olds used both gestures and prosody, with contrastive focus not only receiving more gestures than information focus but also increased prosodic prominence. This finding shows that gesture presence is a precursor to prosodic prominence in marking contrastive focus in Catalan, thus supporting the idea that gesture production can bootstrap the expression of focus type distinctions.

1. Introduction

Human communication is a dynamic process in which speakers continuously adapt their utterances to establish, manage, and modify common ground, defined as the shared knowledge and assumptions that speakers have during conversation (Stalnaker, 2002). Focus marking is an essential part of this process because it governs how new information is introduced to the interlocutor, as in the case of information focus; how presupposed information is emphasized to express opposition, as in the case of contrastive focus; or how certain information is replaced during the course of conversation, as in the case of corrective focus. For this reason, the ability to mark different types of focus is frequently regarded as an indicator of children’s linguistic and pragmatic competence (e.g., Ito, 2014; Wells et al., 2004), having been considered in assessment tasks that evaluate children’s socio-pragmatic development (e.g., Peppé & McCann, 2003; Pronina et al., 2023).
Children’s strategies for marking focus types have received considerable attention in the developmental literature from the perspective of prosody (see, for example, Chen, 2015). Research has shown that children acquire prosodic abilities to distinguish between different focus types during the preschool years, although the particular timing of this acquisition varies across languages. For example, children speaking Germanic languages begin using prosodic cues, albeit not necessarily in adult-like ways, during the early preschool years (e.g., Grünloh et al., 2015), whereas French-speaking children fail to employ such cues even between the ages of four and eight (e.g., Destruel et al., 2024). In contrast, gestures have rarely been examined in this context, despite their well-established role in conveying pragmatic meaning since infancy (see Hübscher & Prieto, 2019, for a review). To our knowledge, only one study has analyzed the acquisition of gestural cues to mark focus types, revealing that French-speaking four-year-olds use head gestures to distinguish different focus types despite not using the expected prosodic cues described for adults (Esteve-Gibert et al., 2022). This study not only identified head gestures as an available strategy to mark focus types in development but also revealed, for the first time, that gestures have a precursor role relative to prosody in the marking of information structure.
Given that the role of gestures in the acquisition of focus types has been investigated in a single study, this area remains significantly underexplored compared to the extensive research conducted on prosody. As a result, there is no knowledge about the developmental trajectory of gesture in the context of marking focus types, nor sufficient evidence to determine whether the observed precursor role of gestures relative to prosody in French-speaking four-year-olds reflects a language-specific phenomenon or a more general developmental pattern that can be observed in other languages. Furthermore, the development of the prosodic marking of focus types in Catalan has not been studied, with existing research heavily focused on Germanic languages. As a result, it is unclear whether the acquisition of prosodic cues for focus marking in Catalan mirrors patterns observed in other Romance languages like French, where acquisition occurs relatively late and gestural strategies take on a compensatory role, or aligns more with Germanic languages, where such cues emerge during the early preschool years. The present study investigated from a cross-sectional perspective the ability of Catalan-speaking three- to five-year-old children to signal different focus types using prosody and gestures, by assessing when and how the two modalities are employed throughout this developmental period.
The remainder of the introduction is structured as follows: The first two subsections discuss the definition of focus types and review studies on their prosodic and gestural marking in adult speech. The next two subsections provide a review of the literature concerning children’s prosodic and gestural marking of focus types. Finally, the last subsection outlines the research questions and hypotheses of the study.

1.1. Focus Types: A Definition

Focus is an information structure concept typically examined within theories of communication. According to the predominant definition in the literature, an expression in focus identifies an element from a set of alternatives and regulates how utterances are integrated into the interlocutors’ common ground (Krifka, 2008). Researchers frequently distinguish three main types of focus: information, contrastive, and corrective (see, for example, Krifka, 2008; Repp, 2010, 2016). Expressions in information focus identify an element within an open set of alternatives (i.e., alternatives that are not already part of the common ground), introduce new information to the common ground, and are considered to lack contrast (Repp, 2016). For example, consider a scenario where a speaker says “Mary brought [cake”]Focus while discussing the items their friends brought to their previous gathering. The focus on the word “cake” implies two things: first, that cake represents the new information being introduced to the discourse, specifying what Mary brought; and second, that the alternatives (other possible items Mary could have brought) are not part of the common ground, and are not relevant to the current assertion.
Contrastive focus, on the other hand, identifies an element which is already part of the common ground with the goal of expressing a contrast between it and other known elements. Consider, for example, a scenario where the previous speaker intends to specify which of the two cakes—the chocolate or the strawberry cake—was brought by Mary, and states “Mary brought the [chocolate]Focus cake”. In this context, the focus on “chocolate” identifies an element from a known set of alternatives, while highlighting the opposition with the other item in the set (i.e., the strawberry cake). When contrastivity between two alternatives in the common ground involves a correction, it is classified as corrective focus. For example, if the speaker uttering the previous sentence were incorrect, one of her interlocutors would state “No, Mary brought the [strawberry]Focus cake”. Here, “strawberry” not only contrasts with the wrongly introduced alternative element, but it also replaces it in the common ground. Corrective focus is, therefore, a subtype of contrastive focus. However, it not only contrasts two known alternatives but also explicitly rejects one of them, replacing it in the common ground. This explicit rejection in corrective focus is not present in contrastive focus and it is interpreted by certain authors to imply a higher degree of contrastivity (e.g., Repp, 2016; Umbach, 2004).
Despite the important distinctions between these three types of focus, the previous literature has not consistently differentiated them, particularly with respect to contrastive and corrective focus. This lack of differentiation is reflected in the terminology used. In particular, corrective focus has often been classified under the broader category of contrastive focus (e.g., Chen & Höhle, 2018; Vanrell et al., 2013), which can obscure the differences between contrastive focus which implies correction and that which does not. This study reserves the term contrastive focus for non-corrective types of contrast. This terminology is consistently applied in reviewing previous literature, regardless of the terms originally employed by the authors.

1.2. The Prosodic and Gestural Marking of Focus Types in Adult Speech

Prosody and co-speech gestures are important communicative devices that display analogous functions and are used by speakers to distinguish focus types in different languages. Starting with prosody, previous research on adult speech has demonstrated that prosodic prominence is modulated differently depending on the focus type. Prosodic prominence is understood as the acoustic salience of a linguistic element and is usually correlated with a series of acoustic phenomena at the syllable or word level, such as higher F0 peaks and intensity, wider F0 range, more complex intonation contours, and longer duration (see Grice & Kügler, 2021). Crosslinguistic evidence suggests that information focus constituents are less prosodically prominent than both contrastive (e.g., for Portuguese Frota, 2002; for English Katz & Selkirk, 2011) and corrective focus constituents (e.g., for Spanish Chung, 2012; for German Kügler & Gollrad, 2015; for French and English Vander Klok et al., 2018; see Kügler & Calhoun, 2021 for a typological perspective).
In Catalan, the language under study, the prosodic marking of focus types has been frequently discussed alongside the use of syntactic means for the same purpose. In comparison to other intonational languages such as English, Catalan makes more extensive use of word order strategies such as left- and right-dislocation (see e.g., Dufter & Gabriel, 2016; Mayol, 2007; Vallduví, 1994). These strategies entail the dislocation of the focused constituent from the main sentence into a new sentence, which is placed either at the left (left dislocation; e.g., “[the strawberry cake]Focus she brought”) or the right periphery (right dislocation; e.g., “she brought it, [the strawberry cake”]Focus) of the main sentence. Importantly, the use of these syntactic strategies is prosodically motivated, because focused constituents are placed in phrasal edges, where prosodic prominence is located in Catalan (e.g., Cruschina & Mayol, 2022; Vanrell & Fernández Soriano, 2013). As a consequence, and given that prosodic-only strategies are also possible (e.g., Adli, 2011), Catalan has been classified as a language standing halfway in a continuum between word order-only and intonation-only strategies (Vanrell & Fernández Soriano, 2013). In truth, the use of prosody to mark focus types in Catalan seems to be robust and to go in line with results in other languages, where the different focus types differ in terms of prosodic prominence. For example, it has been shown that information and corrective focused words differ in duration, with stressed syllables in information focused words being shorter than those in corrective focused words (Vanrell et al., 2013). Moreover, corrective focus is typically marked by a complex nuclear configuration made up of a rising pitch accent (L+H*) and a low boundary tone (L%), which is prosodically more prominent than the low nuclear pitch configurations (i.e., L* L%) associated with information focus (Prieto, 2014).
Recent research by Sánchez-Ramón et al. (2024, in press) has expanded these findings on Catalan by investigating adults’ prosodic and gestural realization of all three focus types (see below for a review of their results on gesture). The authors measured the prosodic prominence of the focus productions from a perceptual standpoint on a 0 to 3 prominence scale, which allowed for a holistic assessment of the acoustic signal. Their results showed that both contrastive and corrective focus constituents were significantly higher in prominence than information focus constituents. Their findings, therefore, not only corroborate earlier results on corrective focus being associated with greater prominence than information focus in Catalan but also provide the first evidence of contrastive focus demonstrating similar characteristics.
Next to prosody, co-speech gestures also have a privileged role in the expression of pragmatic meaning. Adult speakers convey in this modality a broad spectrum of pragmatic functions which are also encoded in prosody (see for a review Brown & Prieto, 2021). With respect to information structure, research has shown that producing more gestures, more complex, and bigger in size together with referential expressions contributes to marking new information as opposed to given information (e.g., Debreslioska & Gullberg, 2019; Galati & Brennan, 2013; Holler & Stevens, 2007; Rohrer, 2022). Importantly, adult speakers have been shown to use co-speech gestures systematically to signal contrast and correction. Sánchez-Ramón et al. (2024, in press) found that, in Catalan, the use of manual and non-manual gestures increased progressively from information to contrastive focus and from contrastive to corrective focus, showing that gesture presence is a key marker of focus types in adult Catalan. Interestingly, speakers have also been shown to vary the kinematic expression of those gestures to signal especially corrective focus. In French, it has been shown that head nods associated with corrective focus are kinematically different from those associated with contrastive focus, displaying longer duration, a wider spatial range, and a smoother and less tense motion (Carignan et al., 2024). In Sánchez-Ramón et al. (2024, in press), the authors measured the visual prominence of manual and non-manual gestures using a 0 to 3 scale, evaluating holistically various aspects of the movement, such as velocity, amplitude or the number of articulators involved. Their results point to differences in terms of gestural prominence in Catalan between corrective focus on the one hand, and information and contrastive focus on the other. They found that gestures produced in the corrective focus tended to display significantly higher prominence than those in information and contrastive focus.
All in all, previous research on the multimodal marking of focus types in Catalan has shown that both prosodic prominence and gesture presence play an important role in distinguishing focus types. Corrective focus consistently receives the most marking in prosody and gestures whereas information focus receives the least. Contrastive focus occupies an intermediate position: it matches corrective focus in prosodic prominence but is accompanied by fewer gestures, although still more than information focus. Notably, corrective focus is not only marked more systematically through co-speech gestures than other focus types but is also associated with higher levels of gestural prominence. These findings highlight the critical role of both prosodic and gestural cues and their parallel contribution to focus marking.

1.3. The Prosodic Marking of Focus Types in Child Speech

Previous literature on children’s marking of focus types points to the existence of important crosslinguistic differences affecting the age of acquisition of prosodic strategies within intonational languages (see Chen, 2018). One important reason behind these differences seems to be the extent to which prosody is used to mark focus in children’s input language. In languages where non-prosodic strategies to signal focus like word order are prevalent over prosodic ones (such as Italian, Hungarian and French), children have been shown to display a lesser use or even a late acquisition of the prosodic means compared to English-speaking children. For instance, MacWhinney and Bates’ (1978) cross-linguistic study revealed that three- to five-year-old Italian-speaking children used prosodic prominence to mark contrastive focus less frequently than their English-speaking peers, with Hungarian-speaking children using it even more rarely. This trend was inversely correlated with children’s use of word order. Research on French-speaking children has shown that between the ages of four and eight they are not able to use the acoustic cues employed by adults (i.e., syllable duration, word-level intensity, word-level mean pitch and word-level pitch range) in their production of focus types (Destruel et al., 2024; Esteve-Gibert et al., 2022). This result is attributed to a preference in adult French for syntactic over prosodic strategies in focus marking (see Destruel et al., 2024).
In contrast, in those languages where the systematicity of prosodic means for focus marking is prevalent in the adult model, like English, German, or Dutch, children have been shown to use prosodic prominence quite early and systematically (although not always in an adult-like fashion) to distinguish contrastive focus from information focus, as well as corrective focus from information focus. Regarding the first distinction, English-speaking four-year-olds have been shown to use “emphatic stress” (i.e., perceived enhanced word prominence) more frequently for contrastive than for information focus (Hornby & Hass, 1970), a finding that was further substantiated with acoustic data in Wonnacott and Watson (2008). This use of prosodic prominence as a marker of contrastive focus in English seems to become progressively more frequent between ages three and five (MacWhinney & Bates, 1978). Regarding corrective focus, English-speaking children aged two and a half have been shown to begin producing the most common pitch accent type reported for adult-speakers to convey corrections (i.e., L+H*), already showing sensitivity to this type of meaning in production (Thorson & Morgan, 2021). German-speaking children as early as age three have been shown to also use acoustic cues such as wider pitch range in corrective contexts as opposed to purely informative ones, a finding, however, that does not apply to the two-year-olds tested (Grünloh et al., 2015). Wider pitch range in corrective focus as opposed to information focus has been attested also in five-year-old Dutch-speaking children (Chen & Höhle, 2018).
Thus, the literature seems to distinguish two distinct trajectories in the acquisition of the prosodic marking of focus types in intonational languages: delayed acquisition of prosodic strategies, as reported for French, and early acquisition, as seen in some Germanic languages. No previous investigation has examined children’s prosodic marking of focus types in Catalan. Therefore, it remains unclear which of these paths Catalan-speaking children align with more closely. In languages where children use prosody to mark focus types, research has shown that contrastive and corrective focus are both more prosodically prominent—whether assessed perceptually or acoustically—than information focus by age three. However, the distinction between contrastive and corrective focus is rarely examined within the same study. Despite evidence of significant prosodic differences between the three focus types in adult speech (e.g., Sánchez-Ramón et al., 2024, in press; Vander Klok et al., 2018), most previous developmental studies have investigated the prosodic realization of information focus in comparison to either contrastive or corrective focus, leaving the distinction between contrastive and corrective focus unexplored. In fact, only one study has considered all three focus types in child speech, reporting null findings on prosody (i.e., Esteve-Gibert et al., 2022). Consequently, further investigation is needed to understand how children develop their modulation of prosody to distinguish between these three focus types across other languages.

1.4. The Gestural Marking of Focus Types in Child Speech

Gestures have been shown to have important pragmatic functions in development, allowing children to convey a whole range of meanings, such as knowledge state (e.g., Hübscher et al., 2019a; Kim et al., 2016), politeness (e.g., Hübscher et al., 2019b), information status (e.g., Rohrer et al., 2022) or narrative cohesion (Colletta et al., 2014). Importantly, the pragmatic meanings that children express gesturally are frequently also expressed in prosody, a fact that highlights the parallel functions of these two modalities also in childhood (see Hübscher & Prieto, 2019 for a review). Regarding children’s gesture production in the context of focus types, to our knowledge, only Esteve-Gibert et al. (2022) have specifically examined it, undertaking a multimodal perspective in which the use of prosodic cues was considered as well. The authors assessed the multimodal productions of four-year-old French-speaking children in three focus conditions (namely information, contrastive and corrective) taking into account their uses of pitch and duration, together with the number of head gestures produced in each focus condition. Results showed that children did not distinguish focus types through the use of acoustic cues, but they used non-referential head gestures, as more head gestures were produced in the contrastive focus condition than in information focus condition and more in the corrective focus condition than in the other two. These findings show, first, that gestures are important pragmatic devices used by children already at age of four to five to distinguish between types of focus. Second, that the use of gestures has a similar effect to that of prosody in marking focus types, as reported for adult speech in Catalan, in the sense that information focus receives the least marking and corrective focus the most.
Therefore, gesture production in the context of marking focus types in development has only been considered in one language (French) and focusing on a specific developmental stage (four-year-olds). As a result, there is no evidence from other languages to support the findings reported for French-acquiring children. There is also a lack of research examining a broader developmental period that could shed light onto what is the developmental path of gestural strategies in this specific pragmatic context. Furthermore, gesture production in children’s marking of focus types has only been considered taking into account gestures produced with the head, disregarding the use of other gesture articulators or other aspects such as gestural prominence. Hand gestures, for instance, have been shown to contribute to marking focus types in adult speech (Sánchez-Ramón et al., 2024, in press) and to play a crucial role in signaling other information structure categories in child speech (Rohrer et al., 2022). Similarly, gestural prominence has been shown to play an important role in marking corrective focus in adult speech (e.g., Carignan et al., 2024; Sánchez-Ramón et al., 2024, in press). For this reason, this study adopts a more global perspective by considering different gesture articulators in the analysis. We follow Kendon’s (2004) definition of gesture as “a visible action of any body part, when it is used as an utterance or as part of an utterance” (p. 7). This broad definition includes meaningful, communicative movements produced not only by the hands but also by the head and other body parts. Such a comprehensive approach has been widely used in multimodal discourse studies (see, for example, Bavelas, 2022) and is particularly relevant for investigating children’s gestures, which, as initial observations of our data indicated, frequently involve the use of non-manual articulators. The study further incorporates another unexamined aspect in child speech: gestural prominence.
Importantly, the multimodal approach adopted in Esteve-Gibert et al. (2022) revealed that gestural strategies can be in place when prosodic ones are still to be developed, identifying a precursor role of gestures with respect to prosody in the context of marking focus types. The precursor role of gestures is well-documented in early infancy, particularly in relation to lexical and syntactic development. Infants begin using gestures in their referential communication between eight and twelve months of age, well before producing their first words, and they often combine gestures with words prior to forming two-word utterances (see Goldin-Meadow, 2015, for a review). Interestingly, recent studies have provided further evidence that gestures can precede words in the expression of other linguistic aspects later in development, especially in the domain of pragmatics. A range of pragmatic meanings, such as negation (e.g., Beaupoil-Hourdel et al., 2016; Beaupoil-Hourdel, 2022; Benazzo & Morgenstern, 2014) and politeness (Hübscher et al., 2019a), have been found to be initially expressed through gestures rather than words (see Hübscher & Prieto, 2019 for a review).
The question of whether gestural strategies precede prosodic strategies in pragmatic development has been explored in a limited number of studies. Most prior research has focused on early development, revealing that the two modalities are used in tandem from an early age. For example, early pragmatic meanings, such as informational or requestive acts, are already expressed by 14–15 months through the combination of prosodic modulations and gestures (e.g., Aureli et al., 2017; Murillo & Capilla, 2016). This joint use of gestures and prosody persists in later developmental stages. For instance, in the context of epistemic stance marking, Catalan-speaking children aged three to five years have been shown to integrate prosodic and gestural strategies to signal uncertainty. This ability precedes the full development of lexical skills and increases with age (Hübscher et al., 2019b). In contrast to these findings, Esteve-Gibert et al. (2022) presented the first evidence suggesting that children tend to rely on the use of head gestures before prosody, offering a potentially different developmental trajectory. These seemingly contrasting results highlight the need for more systematic investigations into how the creation of pragmatic meaning emerges across the two modalities. Specifically, in the marking focus types, it remains to be determined whether gestures consistently act as precursors to prosody or whether the findings by Esteve-Gibert and colleagues rather reflect a language-specific pattern. This issue is directly addressed in the present study.

1.5. The Present Study

The central aim of this study is to investigate the development of three- to five-year-old children’s prosodic and gestural strategies for marking focus types in Catalan from a cross-sectional perspective. This age range was chosen because previous research suggests that the acquisition of prosodic cues for marking focus types emerges between ages three and four in Germanic languages, whereas in Romance languages like French, this process occurs later, with gestures emerging as precursors. Thus, we believe that the selected developmental window is appropriate for addressing our aim, as it is broad enough to observe the emergence and development of both types of strategies in the context of marking focus types.
This study addresses four research questions:
  • Do three- to five-year-old children vary prosodic prominence to distinguish between information, contrastive, and corrective focus?
  • Do three- to five-year-old children employ manual and non-manual (head, eyebrow, torso, and legs) gestures in terms of presence to distinguish between information, contrastive, and corrective focus?
  • Do three- to five-year-old children vary gestural prominence to distinguish between information, contrastive, and corrective focus?
  • Do gestural abilities to distinguish information, contrastive, and corrective focus emerge prior to prosodic abilities during the developmental period from three to five?
Regarding the first research question, we hypothesize that only five-year-old children will use prosody to distinguish focus types. The five-year-olds in our study may exhibit a more systematic use of prosody given their potentially enhanced linguistic skills compared to younger children. We thus expect them to produce contrastive and corrective focus with the greatest prominence and information focus with the least, akin to adults in Catalan (Sánchez-Ramón et al., 2024, in press). This expectation aligns with previous findings on French-acquiring children (Esteve-Gibert et al., 2022), as Catalan more closely resembles French than prosody-only languages like English in its marking of focus types (Vanrell & Fernández Soriano, 2013). For the second research question, we hypothesize that children will use more gestures with corrective focus and the least with information focus, like Catalan-speaking adults in Sánchez-Ramón et al. (2024, in press) or French-acquiring children (Esteve-Gibert et al., 2022). This pattern will be observed from age three on, given that early uses of gesture for pragmatic purposes have been attested in several studies (e.g., Benazzo & Morgenstern, 2014; Hübscher et al., 2019b). Regarding the third research question, we hypothesize that children’s productions will show a similar behavior to adults’ in Catalan as shown in Sánchez-Ramón et al. (2024, in press) from the moment they start using gestures, with more prominent gestures accompanying only corrective focus. Given that in Catalan adult speech gestures produced together with information and contrastive focus do not differ significantly in terms of gestural prominence, these two conditions are not expected to differ in this respect in the child data. Finally, provided that a late acquisition of prosodic cues to mark focus types is expected, we believe that children will use the gesture modality before the prosodic modality, as in Esteve-Gibert et al. (2022), with the tendency being clearer in the youngest children (i.e., age three) and older children displaying a more systematic use of both modalities. Due to the lack of prior research, no a priori predictions were made regarding how the interaction of age and type of focus might affect the timing of acquisition of gestural strategies relative to prosodic ones.
To address these research questions, an experimental study with a within-subjects design was conducted, targeting three focus conditions: information, contrastive, and corrective. A cross-sectional design was implemented wherein participants were divided into three age groups (three-, four-, and five-year-olds). Productions were elicited through an interactive task adapted from Esteve-Gibert et al. (2022). This semi-controlled task, designed as a game, encouraged naturalistic responses from participants. The adaptations to the original task eliminated the computer-based interaction, creating a more natural environment for spontaneous interaction and enabling the analysis of gestures involving articulators beyond the head (see Section 2.2 below for details). Unlike many previous studies on the acquisition of prosodic focus marking, which predominantly used repetition tasks (e.g., Arnhold et al., 2016; Chen & Höhle, 2018), this approach prioritized spontaneous, context-driven interaction. Prosodic and gestural strategies were examined in terms of prosodic prominence, gesture presence, and gestural prominence. The measures of prosodic and gestural prominence entail holistic assessments of the prosodic and visual cues, rather than being an acoustic or kinematic analysis of specific components of the respective signals. This methodology allows for a more comprehensive and global analysis of the respective cues, as opposed to the more compositional analyses (e.g., independent analyses of different acoustic parameters) usually performed in previous studies. The measure of gesture presence refers to the presence or absence of gestures accompanying the focused word in the participants’ productions (see Section 2.4 for details on data coding). We believe our methodological approach to be adequate as a first step in evaluating whether children are able to meaningfully use the two modalities to distinguish between focus types.

2. Materials and Methods

2.1. Participants

A total of 125 children aged between three and six years were initially enrolled in this study. The participation criteria required children to be typically developing, which was confirmed through the Core Language Score (CLS) battery of the CELF Preschool 2 (Wiig et al., 2009) administered to all children recruited, and with no history of hearing or visual impairment according to parental report. The test results revealed that eight children had atypical language development. Data from these eight children were excluded from the study. One additional participant was excluded due to a recording error. Therefore, the final sample consisted of 116 typically developing children with no prior history of language, hearing, or visual difficulties. Due to the cross-sectional design of the study, participants were categorized into one of three groups depending on their age: three-, four-, and five-year-olds. The age-related information is presented in Table 1. We calculated a priori the number of participants using G*Power Version 3.1.9.6 (Faul et al., 2007) with inputs of 0.05 for alpha and three groups of participants. The result showed that to have an acceptable level of power (1-β error probability = 0.8) and to detect small effect sizes of 0.25 having 6 experimental trials (see Section 2.4 below), a total sample size of 30 participants was needed.
Recruitment was conducted in two public schools located in Poblenou, a central neighborhood in the city of Barcelona (Catalonia). Catalan is the medium of instruction in both institutions, which ensures that the children in the sample were exposed to this language regularly. However, as the degree of proficiency in Catalan varies among individuals, information about each participant’s proficiency level and use of Catalan was collected through the LEAP-Q questionnaire (adapted for preschool children from Marian et al., 2007) administered to parents. All children were reported to speak Catalan, with 81.3% of them speaking from adequately to excellently. Regarding language use, 65% of the children were reported to be exposed to Catalan 50% of the time or more on a daily basis. Parents also provided information regarding their occupational status, which was classified according to the International Socioeconomic Index (ISEI; Ganzeboom et al., 1992). An average ISEI score of 64.2 (SD = 12.54; range: 25–86) indicated that the children’s socioeconomic background was middle to upper-middle.
Prior to the experiment, parental written consent was obtained for the children’s participation. Furthermore, children were individually informed about the task and provided verbal consent. After their participation, they received a sticker as compensation.

2.2. The Train Task: Creation and Piloting Process

For this study, we adapted an elicitation task that was initially developed by Esteve-Gibert et al. (2022) to study children’s marking of focus types through prosody and head gestures. In the original version of the task, participants were seated in front of a computer screen which rested on a table, and were asked to interact with a virtual character helping her select certain objects from a bag. While effective for eliciting head gestures, the proximity to the screen and the seated position facing a table constrained the use of other gesture articulators like the hands or lower body. Given our interest in capturing a range of whole-body gestures, we modified the task to remove the computer-based interaction. Instead, we introduced puppets as interlocutors and replaced the virtual character’s interactive tool (i.e., the bag) with a real toy train. In the adapted version, children sat on the ground instead of at a table, which allowed them to gesture more freely, using both upper and lower body gestures. In this modified setup, participants collaborated with the puppets in placing objects into the toy train, giving the task its name: Train Task. This adaptation was partly inspired by Thorson and Morgan’s (2021) Spaceship Task, which was designed to study the marking of given, new and corrective information using a similar procedure (participants helped the experimenter sort objects in a spaceship) and encouraged a naturalistic interaction. The introduction of puppets also enhanced the task’s interactive quality by creating a more engaging and dynamic environment.
The modifications relative to the original version of the task by Esteve-Gibert et al. (2022) were made following two piloting phases. In the first phase, an initial version of the Train Task was tested, in which children interacted directly with the experimenter to place objects inside a train. This interaction, however, hindered the children’s ability to make corrections, as they were usually not comfortable with correcting the experimenter, who was an unfamiliar adult and likely perceived as an authority figure. To address this issue, the second piloting phase replaced the experimenter with puppets as interlocutors. This change allowed children to interact with characters they perceived as playful and on their level, rather than with an authority figure. As a result, the environment became more relaxed and engaging, where children felt more comfortable making corrections. This was reflected in an increase in children’s multimodal strategies, as shown by the results from the pilot. Compared to the initial version of the task without puppets, the version with puppets led to a higher proportion of gestures (from 33.3% to 80%) and prosodic prominent words (from 14% to 70%) in corrective focus. Additionally, the use of puppets improved the pragmatic context of the task. The puppets were incorporated as passengers on the train, each waiting at different stations and needing to collect their belongings before embarking on their trip.

2.3. Materials and Experimental Conditions

The experimental materials of the Train Task consisted of a toy train, three puppets manipulated by the experimenter, a series of small objects representing everyday items (i.e., the task objects), as well as a tablet identifying some of these objects (i.e., the target objects; see Figure 1). To video-record participants, a camera Panasonic AG-CX7 with a microphone PANASONIC AG-MC200GC was used.1
The context in which the task objects were introduced, as well as the number of objects displayed, varied across the three experimental conditions. In the information focus condition, one target object was presented to the participant, with no clearly defined set of alternatives in the immediate context. As a result, the target object was not directly contrasting with any other object, and the set of possible alternatives remained open. In the contrastive focus condition, two identical objects that differed only in color (e.g., a purple shoe and a blue shoe) were shown, forming a closed set: the target object and its competitor. Participants were informed that the puppets preferred the competitor object due to its color being their favorite. This was introduced only for the second and third puppets, as the first puppet was used exclusively for practice trials. The competitor object was always of the puppet’s favorite color (blue for the second puppet and yellow for the third, matching the color of their clothes), while the target object was of a different color (either purple or black). This information was provided prior to introducing each puppet. The setup emphasized the presupposition that the puppet might select the competitor object, enhancing the contrast between the two items and highlighting the need for the child to disambiguate them. Corrective focus trials always followed contrastive focus trials. After the contrastive trial, the puppet would make a mistake by selecting the wrong object, prompting the child to correct the puppet. Thus, in the corrective focus condition, the objects presented were the same as those in the contrastive focus condition.
Practice and filler trials were also incorporated into the task. The inclusion of filler items was motivated, first, by the need to mitigate potential carryover effects of correction (the type of production in which we expected more prosodic and gestural involvement) on subsequent trials. Second, we wanted to add filler items where the puppet did not make a mistake, despite having two choices of objects, and which included objects of different colors to prevent monotony in the task. While practice trials mirrored experimental trials exactly, filler items always displayed two completely different objects (e.g., a green comb and purple clock), and were placed after information or corrective focus trials. Objects presented in filler trials were never of the puppets’ favorite colors so that there was no interference with the contrastive focus condition.
Overall, the experimental design comprised six trials per condition in a within-subjects design, resulting in a total of 18 test trials per participant (6 trials × 3 experimental conditions). The task included nine practice items and ten filler items, bringing the total number of items in the task to 37.

2.4. Procedure

Children were tested individually in a quiet room located within their school. They were seated on the ground facing the experimenter (the first author of the study), with the task objects and the toy train placed in between (see Figure 2). A line was drawn on the floor to prevent participants from reaching for the objects. The camera with integrated microphone was placed behind the experimenter, filming the participant’s whole body. The target objects were depicted on a tablet manipulated remotely by the experimenter. Prior to the experimental task, participants underwent a vocabulary check in which they were asked to identify all objects and colors relevant to the task. This helped ensure that they knew the necessary vocabulary.
The Train Task was structured as a game in which participants were required to assist three puppets in collecting their luggage prior to going on a train journey by providing a verbal command. Each puppet was placed at a different stop, awaiting the train’s arrival, which made the task more dynamic as the transition from one station to the next introduced a short break from the experiment. Participants were told that the puppets were unaware of which luggage items they needed to take and required assistance in identifying them. To prompt their responses, the puppet would ask questions such as “Quin objecte agafo?” (Which object should I take?), fostering an interactive behavior throughout the task. Children were encouraged to provide verbal instructions to the puppet so that they could retrieve the correct object and place it inside one of the train’s wagons. In the corrective focus condition, the puppet would make a mistake and collect an incorrect object. In such cases, participants were encouraged to correct the puppet. The Train Task lasted 12.27 min on average.
Although no specific instructions regarding how participants’ verbal productions should be were provided, due to the type of prompting question used, productions were expected to have the following structure: a verb followed by a noun phrase consisting of a determiner, noun, and adjective (e.g., “agafa la sabata lila”, [pick up the purple shoe]; see Table 2). Since the action of ‘taking’ remains constant throughout the task and the verb “agafar” [to take] is explicitly mentioned in the puppet’s prompt, its meaning is contextually given, making it unnecessary to state the verb. As a result, children could produce a reduced structure without the verb (e.g., “la sabata lila”, [the purple shoe]) while still fulfilling the task requirements. Importantly, the task was designed to avoid triggering word order strategies such as left- and right-dislocation.2 In order to ensure comparability of productions across conditions, all target nouns and adjectives in the experimental trials were disyllabic words with stress on the first syllable. The choice of nouns and adjectives was also based on CDI vocabulary inventory for Catalan (Serrat et al., 2022) and Esteve-Gibert et al.’s (2022) task. In the information focus condition, both noun and adjective were in focus. In the contrastive and the corrective focus conditions, the only focused word was the adjective. Focused words were invariably sentence-final, which is the most common pattern for focused constituents in adult Catalan. This design was implemented to mitigate potential effects of sentence position on the realization of sentence-level prosody.

2.5. Data Coding

Children’s productions in each test trial were initially transcribed orthographically in Praat (Boersma & Weenink, 2022). Focused words were further annotated on a separate tier. These annotations formed the basis for subsequent prosodic and gesture annotations. The grammatical category of each word was identified on a third tier, and this information was used to determine the grammatical structure of the productions (see Section 2.6). Following this initial coding, prosodic prominence was annotated, also using Praat. To add prosodic prominence annotations, the position of the stressed syllable of focused words was identified and annotated. Finally, children’s productions were coded for gesture presence and gestural prominence in ELAN (The Language Archive, 2023).

2.5.1. Prosodic Prominence

The coding of prosodic prominence was conducted exclusively on the basis of the audio recordings by a trained research assistant who was a native speaker of Catalan. The assistant was unaware of the study’s research questions and hypotheses and performed the annotations blind to both experimental condition and the gesture annotations. Prior to coding, the assistant completed a four-session training program led by an expert coder (the first author of this study), with each session lasting 60 min on average. This program included an explanation of the annotation system, individual annotation practice, and joint evaluations of the outcomes.
Prosodic prominence was annotated following the DIMA system (Deutsche Intonation, Modellierung und Annotation; Kügler et al., 2015), which is characterized by three key features. First, it is a perceptual annotation system, as it relies on the perceptual judgments of trained raters. Second, it employs a holistic evaluation of prosodic prominence, which integrates multiple acoustic cues such as pitch movement, intensity, and syllable duration. Third, it proposes a granular approach in which a 4-point prominence scale ranging from 0 to 3 is used. The lowest extreme of that scale is represented by prominence level 0, which corresponds to no prosodic prominence, while the highest point in the scale corresponds to level 3, indicating extra-strong prominence. Level 1 reflects reduced prominence compared to level 2, which represents the typical strong prominence of a standard pitch-accented syllable. In DIMA, prominence is annotated at the syllable level. In this study, only the stressed syllables of focused words were annotated, as focused words constituted our unit of analysis (see Figure 3).
The procedure to annotate prosodic prominence was carried out in three sequential phases. In the first phase, the coder listened to all the participant’s productions in the test trials to become familiar with their prominence range and individual speaking style. In the second phase, the coder rated the degree of prominence of the stressed syllables in the focused words, beginning with those perceived as having the strongest prominence and those perceived as having the weakest prominence. In the third phase, the remaining focused words were annotated relative to the highest and lowest prominence levels identified in the second phase. Importantly, prominence ratings were assessed always by listening to the entire production of the participant in each trial, rather than isolated focused words or their stressed syllables.
As prominence was rated relative to each participant’s unique speaking style and prominence range, what constituted a weak prominence for one participant could correspond to stronger prominence for another, and vice versa. This speaker-specific approach inherently ruled out the use of rigid acoustic thresholds in the evaluation process. While acoustic cues such as intensity, duration, pitch level, and pitch range were considered perceptually, the acoustic distinctions between levels of prominence were determined individually for each participant. Nevertheless, common patterns generally emerged within each prominence level. A prominence value of 0 was typically assigned in cases where the focused word was deaccented or exhibited extreme pitch lowering, often resulting in whispered speech. Prominence ratings of 1 generally lacked significant increases in pitch, duration, or intensity and often coincided with low-pitched syllables. Prominence ratings of 2 typically corresponded to a standard pitch-accented syllable, frequently displaying features such as a clearly raised or lowered pitch and an expanded pitch range as compared to syllables in level 1. Finally, prominence level 3 was usually characterized by a large pitch excursion and markedly increased intensity or duration, making the focused words highly perceptually salient compared to those at level 2. These features manifested either in combination or individually across all prominence levels.

2.5.2. Gesture Presence

Gesture presence was coded by the first author, who is an expert coder, based on visual data only (i.e., without audio playback) and blind to the experimental condition and the prosodic prominence annotations. The coding was performed following the framework and guidelines established in the M3D annotation system (The MultiModal MultiDimensional labelling system; Rohrer et al., 2023), which follows Kendon’s definition of gesture. According to this definition, a gesture is any intentional and structured bodily movement with communicative intent (see Kendon, 2004). This definition includes gestures produced by various articulators, such as the head, eyebrows, hands, torso, and legs. Body movements involving any of these articulators that fit this definition were annotated. Each articulator (i.e., head, eyebrows, hands, torso, or legs) was annotated on a separate set of tiers, as gestures produced by different articulators may overlap with one another but do not necessarily begin or end at the same point in time (see Figure 4). Annotating them separately allows for a more precise tracking of the movement produced by each articulator. The M3D system (Rohrer et al., 2023) proposes a dimensionalized approach to gesture categorization, considering form, pragmatic meaning, and prosodic characteristics as interrelated yet distinct aspects of gesture. Therefore, according to this proposal, any gesture can serve pragmatic functions. For this reason, all gestures regardless of their form (e.g., pointing, open hand, nod, tilt) or semantic category (e.g., deictic, iconic, non-referential) were annotated.
The coding of gesture presence comprised two stages. In the first stage, an initial pass was performed to annotate gesture units, which are instances of gesturing that begin when the articulator starts moving and end when it comes to a complete rest (Rohrer et al., 2023). Gesture units included both manual gestures and non-manual gestures (e.g., head, eyebrow, torso, and legs). In the second stage, the coder performed a more fine-grained annotation, identifying the stroke of each previously annotated gesture unit. The stroke represents the most meaningful phase of a gesture, encompassing the principal action or movement, and typically coincides with its kinematic peak (i.e., the moment of maximum velocity). Provided that the stroke is widely regarded as the part that conveys the main semantic or pragmatic content of the gesture, we considered only the stroke annotations for the gesture analyses in this study, in line with previous research (e.g., Rohrer et al., 2022). The stroke phase has traditionally been studied in both manual and non-manual head gestures (see Wagner et al., 2014, for a review on head gestures). To ensure comparability with existing annotations and to exclude large movements unrelated to focus marking (e.g., large body movements that began while the verb was uttered but extended across multiple words, including the focused word), we applied this stroke-based annotation approach to all other non-manual articulators (i.e., eyebrows, torso, and legs). Following the definition of stroke for head gestures (see, for example, Wagner et al., 2014), we identified the stroke of other non-manual gestures as the phase leading to the articulator’s point of maximal extension, occurring just before a change in movement direction. For instance, in a forward torso movement, the stroke extended until its furthest point before the torso began moving backward. When a clearly defined stroke phase was not identifiable, the entire movement of the articulator was coded as the stroke.
The focused word served as our primary unit of gestural analysis. Therefore, for the gesture presence analysis, we considered only those gesture strokes that temporally overlapped with the focused word. If one or more gesture strokes overlapped with the focused word, gesture was considered “present”. Conversely, if no gesture stroke overlapped with the focused word, gesture was considered “absent”. The classification of gesture presence or absence was automated and performed after the annotations were completed, using R Statistical Software (R Core Team, 2024).

2.5.3. Gestural Prominence

Gestural prominence was annotated by the first author of this study based solely on visual data, blind to both the experimental condition and the prosodic prominence ratings. The annotation process followed the newly developed proposal by Rohrer et al. (2023), which defines gestural prominence as the degree of visual salience of a gesture relative to the preceding and subsequent gestures. The degree of visual prominence typically correlates with an enhancement of the kinematic properties (i.e., the amplitude, velocity, or spatial location) and with an increase in the number of articulators involved in the movement, with usually more articulators being involved in higher prominence gestures.
As proposed in M3D (Rohrer et al., 2023), gestural prominence was rated on a four-point prominence scale analogous to that used for prosodic prominence. The lowest point of the scale is represented by level 0, which represents minimal prominence. The highest point in the scale is represented by level 3, which represents extra-strong prominence. Level 1 represents weak prominence, and level 2 strong prominence. Gestural prominence was assigned to gestures produced while the focused word was being uttered. This process involved a holistic evaluation of either isolated gestures performed with a single articulator (e.g., a hand or head movement) or more complex movements comprising multiple overlapping gestures produced by different articulators (e.g., a hand gesture combined with a head nod or eyebrow raising). As a result, the entire gestural activity accompanying the production of the focused word was assigned a single prominence value. Importantly, only focused words that overlapped with one or more gesture strokes were considered in the annotation of gestural prominence.
The annotation process was conducted in three sequential phases. In the first phase, the coder familiarized themselves with each participant’s gesturing style and prominence range by observing all test trials. In the second phase, the coder identified and annotated gestures representing the lowest and highest levels of prominence for that participant. In the final phase, the remaining gestures were annotated relative to the established baseline. Notably, while gestural prominence was annotated at the level of the focused word, the assessment also considered the broader context, including other gestures produced in the entire utterance and gestures from preceding and subsequent utterances.
As with prosodic prominence, gestural prominence annotations were assigned depending on each participant’s gesturing style. As a consequence, gestures of low prominence for one participant may be of a higher prominence for another participant, and vice versa. This speaker-specific approach intrinsically rules out the use of rigid criteria in the evaluation process. While features like the number of articulators, the gesture space, and the velocity were considered perceptually in the rating process, the distinctions between levels of gestural prominence were determined individually for each participant. Nevertheless, common patterns generally emerged within each prominence level. Gestural prominence level 0 usually corresponded to gestures produced with one articulator whose movement occupied a limited gesture space, not demonstrating high velocity. Video S1 presents an example of a gesture displaying minimal prominence, where the participant produces a very subtle lateral movement of the head that occupies little gesture space. Prominence value of 1 typically included gestures produced with one articulator performing a moderate movement. In Video S2, for instance, the participant protrudes the head occupying a comparatively larger space and exhibiting a higher velocity as compared to the gesture shown in Video S1. A prominence value of 2 usually corresponded to gestures involving one or more articulators and characterized by either larger movements, extending away from the body in the case of manual gestures, or more restrained movements executed with relatively high velocity for the speaker’s baseline (see Video S3, where the participant executes a gesture now involving several articulators like the head, torso, and legs, occupying significantly more space than gestures in levels 0 and 1). Finally, prominence level 3 usually includes gestures that involve multiple articulators or consist of very large movements that can potentially reach extreme points in the spatial field. They can also be executed with relatively high velocity as compared to the speaker’s baseline. In Video S4, which contains an example of this prominence level, the participant executes a much more rapid gesture as compared to that in Video S3, occupying a substantial gesturing space and using several articulators as well such as the head, torso, legs, and now also the hands.

2.5.4. Reliability

Inter-rater reliability was calculated for prosodic prominence, gesture presence and gestural prominence with two additional coders (coder B and C). Prior to annotating the data, the additional coders participated in three training series, one for the prosody annotations, one for the gesture presence annotations and a final one for the gestural prominence annotations. Each training series had three sessions lasting between 90 and 120 min. In the initial session, the main coder introduced the annotation system and commented on examples from the data. Subsequently, coders B and C annotated three participants independently and met again with the main coder for the second session, in which they jointly assessed the performance of coders B and C, with particular emphasis on instances of disagreement. This process was repeated for the third session. The coding process for prosodic prominence was temporally separated by five months from the coding of both gesture presence and gestural prominence. The annotation of gestural prominence was performed right after the annotation of gesture presence. One of the additional coders participated in all annotation tasks, whereas the other differed between the prosodic and gestural annotations.3
Agreement between the three coders was calculated in R Statistical Software (R Core Team, 2024) with data from 30 participants, which represents approximately 26% of the entire database. For both prosodic and gestural prominence annotations, Gwet’s AC2 coefficient for ordinal data was calculated using the irrCAC package (Gwet, 2022). Agreement was very high for both prosodic prominence [AC2 = 0.89298, 95% CI: 0.881–0.904, p < 0.0001] and gestural prominence [AC2 = 0.87218, 95% CI: 0.853–0.892, p < 0.0001]. For gesture presence, Fleiss’ kappa was computed using the irr package (Gamer et al., 2019), yielding similarly high agreement results [κ = 0.838, p < 0.0001].

2.6. Data Preparation and Statistical Analysis

Out of the initial 2088 trials (18 trials per participant × 116 participants), 97 had to be excluded, resulting in a final dataset of 1991 successful trials (information focus: N = 660, contrastive focus: N = 673, and contrastive-corrective focus: N = 658). The exclusion of trials was based on two primary criteria. The first criterion addressed issues that compromised data quality or complicated the annotation process. These included trials where gestures were produced partially outside the camera frame (N = 30), where the child was unaware of the focused word or did not produce any (N = 16), and where speech disfluencies or background noise occurred (N = 14). The second criterion considered characteristics of speech productions that could potentially interfere with the coding of prosodic prominence. Specifically, trials were excluded if the focused word was not phrase-final (N = 27), including cases where children continued speaking after the focused word (e.g., “la sabata lila he dit” [the purple shoe I said]) or where adjectives appeared before nouns, which is ungrammatical in Catalan (e.g., “agafa la lila sabata” [pick up the shoe purple]). Additionally, trials where the production was uttered as a question, such as when the child was unsure of their response, were also excluded (N = 10).
The task productions were expected to feature complex noun phrases consisting of a determiner, noun, and modifying adjective in that order (e.g., “agafa la sabata lila” [pick up the purple shoe]), with the verb being optional due to contextual saliency. This expectation was met in 37.57% of the productions. Variability in grammatical structures was naturally obtained due to the experiment’s interactive and semi-controlled design (see Table A1 in Appendix A for a summary of the grammatical structures obtained in the task). We obtained productions that contained a simple noun phrase, which included either a determiner and noun (e.g., “la gorra”, [the cap]) or a pronoun alone (e.g., “aquella” [that one]). Adjective phrases were also produced, comprising a determiner and adjective only (e.g., “la lila” [the purple one]). Additionally, some responses featured two references to the target object, such as a determiner followed by a complex or simple noun phrase or two adjective phrases (e.g., “aquesta, la sabata lila” [this one, the purple one]; “la lila, la lila” [the purple one, the purple one]). In such cases, only the first reference to the object was retained for analysis. A small number of trials consisted of participants using only deictic gestures without speech. All these various grammatically correct structures were included in the final dataset and then accounted for in the statistical models as a random effect.
A total of three models were run in R Statistical Software (R Core Team, 2024). In all three of them, age was included as a categorical predictor to allow for the comparison of distinct developmental stages. This approach is particularly relevant to our research questions, which aim to identify developmental shifts rather than assume a continuous, linear effect of age. To address the first research question on the relationship between Prosodic prominence, Focus condition and Age group, a cumulative link mixed model (CLMM; Model 1) was fitted using the ordinal package (Christensen, 2023). To assess the second research question on the link between Gesture presence, Focus condition and Age group, a generalized linear mixed model (GLMM) with a binomial family was fitted using the lme4 package in R (Bates et al., 2015) (Model 2). Finally, a second CLMM (Model 3) was fitted to examine the third research question, on how Gestural prominence was predicted by Focus condition and Age group with the ordinal package. The three models allowed us to also answer the more general research question four, on the timing of development of gesture and prosodic abilities in marking focus types. Factor complexity was increased in a stepwise manner and R’s anova() function was used to compare pairs of models and select the models with better fit. A random effect of Grammatical structure was incorporated in the first and third models to control for potential variability induced by the type of production. Model 2 did not include this random effect due to a convergence issue.
For Model 1, the dependent variable was Prosodic prominence of the focused word, an ordinal variable with four ordered levels (0, 1, 2, and 3). The model included fixed effects for Focus condition (three levels: Information, Contrastive, and Corrective) and Age group (three levels: three-year-olds, four-year-olds, and five-year-olds), along with their interaction. A random intercept for Participant, and a by-Participant random slope for Grammatical structure (with five levels: Complex noun phrase, Simple noun phrase, Adjective phrase, Repeated target word, and No speech) were added. Model 2 included as a dependent variable the presence or absence of gestures during the production of the focused word with two levels (with two levels: 1 = presence, 0 = absence) and a fixed effect for Focus condition (three levels: Information, Contrastive, and Corrective), along with a random intercept for Participant. Finally, Model 3 analyzed the prominence of gestures accompanying the focused word. The model included Gestural prominence (with four ordered levels: 0, 1, 2, and 3) as a dependent variable, with Focus condition (three levels: Information, Contrastive, and Corrective) as fixed effect, a random intercept for Participant, and a by-Participant random slope for the Grammatical structure of the production (with five levels: Complex noun phrase, Simple noun phrase, Adjective phrase, Repeated target word, and No speech).

3. Results

3.1. The Use of Prosodic Prominence in Distinguishing Focus Types Across Age Groups

The analysis of prosodic prominence revealed a significant main effect of Focus condition (LR(2) = 601.66, p < 0.001). Focused words were significantly more likely to exhibit higher levels of prosodic prominence in the corrective focus condition compared to the contrastive (β = 2.34, SE = 0.13, z = 17.46, p < 0.001) and the information focus conditions (β = 3.27, SE = 0.15, z = 21.78, p < 0.001). Additionally, prosodic prominence was higher in the contrastive focus condition than in the information focus condition (β = 0.93, SE = 0.12, z = 7.61, p < 0.001). These findings indicate that prosodic prominence varies significantly across different focus types, with corrective focus showing the highest prominence, followed by contrastive, and then information focus. No main effect of Age group was observed on Prosodic prominence (LR(2) = 2.5199, p = 0.2837), suggesting no substantial differences in the production of prosodic prominence across age groups.
Crucially, a statistically significant two-way interaction between Focus condition and Age group was found (LR(4) = 48.846, p < 0.001). The results from the two-way interaction model revealed that in the group of three-year-olds, prosodic prominence was significantly higher in the corrective focus condition compared to both the contrastive focus condition (β = 2.28, SE = 0.23, z = 9.93, p < 0.001) and the information focus condition (β = 2.61, SE = 0.24, z = 10.79, p < 0.001). No significant differences were found between the contrastive and information focus conditions (β = 0.34, SE = 0.22, z = 1.52, p = 0.13). These results show that while three-year-olds mark corrective focus clearly with an increase in prosodic prominence, they do not distinguish information from contrastive focus using prosodic prominence. In the group of four-year-olds, significant differences in prominence were observed across all conditions. Prosodic prominence was higher in the corrective focus condition compared to the contrastive focus condition (β = 1.87, SE = 0.22, z = 8.43, p < 0.001) and the information focus condition (β = 2.65, SE = 0.23, z = 11.30, p < 0.001). Additionally, prominence was higher in the contrastive focus condition than in the information focus condition (β = 0.78, SE = 0.21, z = 3.71, p < 0.001). Regarding the five-year-olds, results revealed the existence of similar patterns to the four-year-olds. The corrective focus condition exhibited significantly higher prominence compared to the contrastive focus condition (β = 2.90, SE = 0.22, z = 13.06, p < 0.001) and the information focus condition (β = 4.48, SE = 0.24, z = 18.31, p < 0.001). Prominence was also higher in the contrastive focus condition compared to the information focus condition (β = 1.58, SE = 0.20, z = 8.00, p < 0.001). These results suggest that both four- and five-year-olds systematically distinguish between the three focus types using prosodic prominence.
The two-way interaction analysis also revealed several differences between age groups. Prosodic prominence in the corrective focus condition was higher for five-year-olds than for four-year-olds (β = 0.83, SE = 0.27, z = 3.13, p < 0.01) and three-year-olds (β = 0.72, SE = 0.27, z = 2.65, p < 0.01). In contrast, prominence in the information focus condition was significantly lower for five-year-olds than for both four-year-olds (β = −1.00, SE = 0.23, z = −4.32, p < 0.001) and three-year-olds (β = −1.16, SE = 0.24, z = −4.91, p < 0.001). No differences across groups were found in the contrastive focus condition. Figure 5 shows the predicted probabilities of prosodic prominence levels across focus conditions and age groups extracted from the two-way interaction model (i.e., Model 1 described above).

3.2. The Use of Gestures in Distinguishing Focus Types Across Age Groups

Regarding the analysis for gesture presence, the results indicated a significant main effect of Focus condition (χ2(2) = 252.63, p < 0.001). Focused words in the corrective focus condition were significantly more likely to be accompanied by gestures compared to those in the contrastive (β = 1.279, SE = 0.132, z = 9.676, p < 0.001) and the information focus conditions (β = 2.074, SE = 0.141, z = 14.667, p < 0.001). Additionally, focused words in the contrastive focus condition were more likely to co-occur with gestures than those in the information focus condition (β = 0.795, SE = 0.130, z = 6.097, p < 0.001). No main effect of Age group was found (χ2(2) = 2.098, p = 0.35), indicating no significant differences in the presence of gestures across age groups. Similarly, the interaction between Focus condition and Age group was not significant (χ2(4) = 2.6945, p = 0.6102), suggesting that the relationship between focus condition and gesture presence was consistent across age groups. Figure 6 shows the predicted probabilities of gesture presence across focus conditions extracted from Model 2 (see Section 2.6 above).
In order to provide a more comprehensive characterization of the data, we provide, in Table A2 in Appendix A, a detailed breakdown of the number of manual and non-manual gestures together with the information regarding the type of articulators involved in each gesture.

3.3. The Use of Gestural Prominence in Distinguishing Focus Types Across Age Groups

For gestural prominence, the analysis showed a significant main effect of Focus condition (LR(2) = 64.962, p < 0.001). Gestures accompanying words in corrective focus were significantly more prominent than those accompanying contrastive (β = 1.043, SE = 0.152, z = 6.861, p < 0.001) and information focus (β = 1.0682, SE = 0.177, z = 6.052, p < 0.001). No statistically significant differences in gestural prominence were found between gestures in the contrastive and information focus conditions (β = 0.025, SE = 0.181, z = 0.139, p = 0.9894). Neither a main effect of Age group (LR(2) = 0.6614, p = 0.7184) nor an interaction between Focus condition and Age group (LR(4) = 2.7336, p = 0.6033) were observed. These results suggest that children, regardless of their age, produce highly prominent gestures in the corrective focus condition, while gestural prominence levels in contrastive and information focus are comparable. Figure 7 shows the predicted probabilities of gestural prominence levels across focus conditions extracted from Model 3 (see Section 2.6 above).

4. Discussion

This study explored children’s ability to multimodally (i.e., in both prosody and gesture) convey pragmatic meaning related to information structure in the context of marking focus types in Catalan. Using a cross-sectional design, the study compared outcomes across three age groups: three-, four-, and five-year-olds. Several novelties were incorporated relative to previous studies. First, a comprehensive approach was adopted in which three types of focus were examined and compared: information, contrastive, and corrective. Second, a semi-controlled and interactive experimental task was used: the Train Task. This task, adapted from Esteve-Gibert et al.’s (2022) focus task, proved to be successful in eliciting highly natural and spontaneous productions. It effectively captured children’s use of prosodic and gestural prominence and, importantly, allowed for the inclusion of gestures produced with multiple articulators. Finally, drawing on the DIMA (Kügler et al., 2015) and the M3D systems (Rohrer et al., 2023), holistic perceptual measures were implemented to assess children’s prosodic and gestural behavior in terms of prominence. These measures proved effective in capturing children’s multimodal behavior and yielded high reliability results, as reflected in strong inter-rater agreement. By employing and systematically testing them, our study not only provided empirical insights into children’s multimodal marking of focus types but also contributed to the methodological advancement of prominence annotation on the acoustic and visual domains, further validating the measures for future research.
The main aims of the study were the following: (1) to analyze whether and how prosodic prominence was used to distinguish focus types across the three age groups; (2) to examine whether and how the presence of gestures varied across focus types and age groups; (3) to explore differences in gestural prominence in relation to focus types across age groups; and finally (4) to assess the relationship of prosodic prominence and gesture production from the point of view of their timing of acquisition in the context of marking focus types. The results of the study are discussed in the subsections below in relation to these aims.

4.1. Prosodic Prominence Across Focus Types and Age Groups

The results of prosodic prominence revealed several findings. First, regarding corrective focus, the results indicated that prosodic prominence was significantly higher in this condition than in the information focus condition across all age groups, in tune with results on adult speech in several languages including Catalan (e.g., Sánchez-Ramón et al., 2024, in press; Vander Klok et al., 2018; Vanrell et al., 2013). Interestingly, our findings also revealed a significant increase in prosodic prominence in the corrective focus condition as compared to the contrastive focus condition in all age groups. This pattern differs from previous findings on adult speech in Catalan, where the increase in prosodic prominence between contrastive focus and corrective focus was not statistically significant and as pronounced as in our results (see Sánchez-Ramón et al., 2024, in press).
There are two potential reasons for this difference between the children’s and adults’ data in Catalan. First, while the experimental tasks used in both studies are similar in terms of procedure and design, their pragmatic context differs. In our study, children played a game and interacted with puppets, whereas adults communicated with a language learner in a more polite context. According to the authors, it is possible that the task used for adult speakers triggered prosodic mitigation in the corrective focus condition, as participants tried to be polite with their interlocutor. In contrast, our task did not require children to be polite, and this might have triggered more uninhibited productions on the part of the children. Second, it is possible that children regulated their excitement in the corrective focus condition differently from adults. In this experimental condition, participants had to override their interlocutor’s contribution, which involves disagreement or reassertion. This could elicit stronger emotions like excitement, ultimately causing children’s productions to be substantially more prominent in this condition and unfit for the contrastive focus condition. The link between emotion and focus marking had already been highlighted in Ito (2018), where it is maintained that the prosody of focus naturally carries affective connotations.
Our second main finding, regarding now contrastive focus, showed that the productions of three-year-old children were virtually identical in terms of prosodic prominence in the contrastive and information focus conditions. Therefore, three-year-olds do not seem to be yet capable of using prosodic prominence in a systematic way to distinguish this type of focus from information focus. This contrasts with the results of four- and five-year-olds in this study or adults in Catalan, as shown by Sánchez-Ramón et al. (2024, in press). Similarly to previous results on the prosodic marking of contrastive focus in English (Hornby & Hass, 1970; Wonnacott & Watson, 2008), four-year-olds in our study start to use higher levels of prominence in the contrastive focus condition as opposed to the information focus condition, demonstrating an emerging sensitivity to mark this pragmatic distinction in prosody by age four.
Lastly, two main cross-group differences were found. Differences between information and contrastive focus became even more pronounced in the five-year-old group. In this group, children produced lower levels of prosodic prominence for information focus more systematically, making this condition less prominent than in the three- and four-year-olds. Interestingly, the prominence levels in the corrective focus condition in the five-year-olds were significantly higher in prominence as compared to the other two age groups. As a result, the differences between the three focus conditions are magnified in the group of five-year-olds. Although four-year-olds already demonstrate an ability to distinguish between information and contrastive focus, unlike three-year-olds, the observed group differences in the use of prosodic prominence, particularly the distinction seen in five-year-olds, highlight that the development of prosodic prominence to mark focus types continues beyond age four. All in all, our results highlight a gradual development of the prosodic marking of focus types across age groups, with a more systematic use of prosodic prominence by age five.
Overall, our findings contrast with previous studies on French-speaking children, which have shown that even by the ages of four in spontaneous speech or seven–eight in repetition tasks, children do not vary acoustic cues to distinguish any focus type (Destruel et al., 2024; Esteve-Gibert et al., 2022). Consequently, the hypothesis proposed for our first research question, which posited that Catalan children would exhibit a similar developmental trajectory to French-speaking children in acquiring prosodic strategies for focus marking, is not supported. Instead, our results align with findings from German-speaking children where prosodic marking of contrastive and corrective focus has been separately reported for ages three and four (Grünloh et al., 2015; Wonnacott & Watson, 2008). If, as suggested in previous works, the timing of acquisition of prosodic cues for focus is influenced by the frequency of prosodic strategies as opposed to syntactic ones in the adult language model, the observed differences between Catalan and French could stem from French having lesser reliance on prosodic prominence in marking focus in the adult model. This aligns with the proposal by Face and D’Imperio (2005) that languages lie along a continuum from prosody-dominant to syntax-dominant focus-marking strategies, reflecting a gradual variation in the interplay of these two linguistic modalities. However, it is important to note that our study and those conducted in French by Destruel et al. (2024) and Esteve-Gibert et al. (2022) are not entirely comparable due to significant methodological differences in the analysis of prosody. Our study employed holistic perceptual measures, represented by the concept of prominence, which evaluate multiple aspects of the signal simultaneously, such as F0 modulations, intensity, or lengthening. In contrast, the two available studies on French performed compositional analyses, evaluating separately several aspects of the acoustic signal (e.g., syllable duration, intensity, or mean pitch and pitch range). It is possible that holistic measures allow us to better capture the use of prosodic cues when there is variation from one cue to another or when children use different cues from those described for adults in the same language. The fact that these measures evaluate the use of prosody in context could also make them more sensitive to nuances that do not correspond to clear acoustic differences. Either way, future research would need both to compare differences in the use of syntactic and prosodic strategies between Catalan and French, and to explore differences between acoustic and perceptual measures in the assessment of prosody.

4.2. Gesture Presence Across Focus Types and Age Groups

Regarding the second aim of the study, and consistent with the second hypothesis, our results revealed that children across all age groups produced more gestures associated with corrective focus than in the other two conditions, and more gestures together with contrastive focused words than with information focused words. The absence of age group differences suggests that this gestural behavior is stable across the developmental period studied. These findings perfectly align with the patterns observed in Catalan adult speech where the number of gestures produced increased from information to corrective focus, with contrastive focus being in between (Sánchez-Ramón et al., 2024, in press).
From a developmental perspective, Esteve-Gibert et al.’s (2022) study on French-speaking four-year-olds had already reported this use of gestures to mark the three focus types with respect to the use of head gestures. Expanding on these findings, our study demonstrates that children as young as age three already use gestures to mark focus types, indicating that this strategy emerges at an earlier developmental stage than previously documented. Our findings contribute in a more general way to previous evidence on children’s use of gestures as early pragmatic and conversational devices (e.g., Kim et al., 2016; Hübscher et al., 2019b) and underscore the significance of gestures as a valuable measure of children’s pragmatic development. The stability of gesture presence across age groups, as revealed in our results, confirms that gestures remain a consistent tool for marking focus types even in later developmental stages. These findings align with previous research demonstrating that gestures continue to support closely related pragmatic functions, such as signaling information status, at more advanced stages of development (e.g., Rohrer et al., 2022).

4.3. Gestural Prominence in Relation to Focus Types and Age Groups

Consistent with the third hypothesis, our results demonstrated that children produced significantly more highly prominent gestures in the corrective focus condition than in the contrastive or information focus conditions, regardless of age group. This pattern mirrors findings in adult Catalan speech, where gestures in corrective focus were significantly more prominent than those accompanying the other two focus types (Sánchez-Ramón et al., 2024, in press).
We believe the use of higher levels of gestural prominence to mark corrective focus can be explained by three main factors. First, children may use more prominent gestures in this condition because they need their interlocutors, the puppets, to cooperate with them, amending their behavior. It has been shown that adult speakers tend to produce larger gestures when aiming to encourage cooperation from their interlocutor (Hostetter et al., 2011). Second, similarly to the effects on prosodic prominence discussed above, the unique characteristics of corrective focus as being a natural locus for affect expression could also play a role in enhancing the prominence of the gestures. Third, gestures in adult speech are known to vary in size depending on their role in managing common ground. For instance, previous research has shown that adult speakers tend to use larger gestures when conveying highly relevant information to the listener (e.g., Campisi & Özyürek, 2013; Galati & Brennan, 2013). Taken together, these findings provide the first evidence that children’s gesture production systematically varies in prominence as a function of pragmatic context. This suggests that gesture use is a dynamic and pragmatic-sensitive strategy for communication from an early age.

4.4. Timing of Acquisition of Prosodic and Gesture Cues in Marking Focus Types

In relation to the fourth research question concerning the emergence of gesture abilities relative to prosodic prominence, our study confirmed the hypothesis partially. We found that gestures emerge earlier than prosodic cues in marking specifically one focus type, i.e., contrastive focus. While three-year-olds were able to use gestures (in terms of presence) to distinguish all three focus types—producing significantly more gestures in the contrastive focus condition compared to the information focus condition—they were still not capable of using prosodic prominence for the same purpose. Additionally, we found that the children’s use of prosodic prominence differed across developmental stages while their gestural strategies were consistent across age groups, paralleling results on adult data in Catalan. These findings suggest that gestures stabilize earlier, at least within the measures examined here, while the use of prosodic prominence continues to develop with age. This supports the idea that gestural strategies are acquired before prosodic ones, providing further evidence for the precursor role of gestures with respect to prosody found in Esteve-Gibert et al. (2022). Our study shows, therefore, that gestures continue to serve as a primary communicative strategy during the preschool years, helping children communicate pragmatically more complex meanings and entraining their prosodic abilities.
Gestures were found to be used together with prosody already from age three for corrective focus, showing a joint use of the two modalities already. These findings point to an earlier acquisition of this focus type, evidencing that the acquisition of the ability to mark focus types is progressive in Catalan. Three main factors could explain why multimodal cues for corrective focus are acquired or expressed earlier than for contrastive focus. First, it is possible that this earlier mastery is linked to the particularities of corrective focus. This type of focus inherently conveys rejection of prior information, an aspect that is not present in the other two focus types. Research indicates that rejection emerges very early in development. For instance, children as young as one year old express rejection through vocalizations and gestural cues, with gestures clearly preceding lexical strategies (e.g., Beaupoil-Hourdel, 2022; Morgenstern et al., 2018). Given this early sensitivity to the expression of rejection, it is plausible that children generalize multimodal patterns associated with expressing rejection to mark corrective focus, facilitating its earlier acquisition in both the prosodic and gestural modalities. The close link between corrective focus and affect expression could work similarly in this respect. As argued above, the interaction between correction and emotion has likely contributed to the enhancement of prosodic and gestural prominence in this condition. In comparison, contrastive focus relies on distinguishing between alternatives in less emotionally charged contexts. The affect layer present in corrective focus may contribute, therefore, to children expressing this meaning more easily through multimodal cues.
Second, it is possible that the acquisition of highly marked uses of prosody is easier for children than more nuanced distinctions, while gesture cues are readily accessible from the outset. The difference in prosodic prominence between information focus and contrastive focus in older children as revealed in our results is relatively subtle compared to the more marked prosodic contrast observed for corrective focus. Producing these subtle modulations requires more advanced motor skills, including precise control over pitch and other prosodic parameters, which younger children may not yet have fully developed. In this sense, research has shown that children’s ability to control prosodic correlates of prominence such as pitch modulations or duration of syllables develops gradually, and younger children often struggle to produce fine-grained prosodic distinctions (Astruc et al., 2013; Chen, 2011; Grünloh et al., 2015; Snow, 1998). In contrast, more marked or exaggerated uses of prosodic prominence, such as those required for corrective focus, may be easier for younger children to produce. For example, case studies have shown that children initially use exaggerated prosodic parameters to express negation, which only become more refined with age (e.g., Dodane & Massini-Cagliari, 2010). Additionally, it has been shown that pitch range, which is a key acoustic parameter in the expression of prominence, undergoes significant developmental adjustments from ages two to six in languages such as Catalan, Spanish and English, with younger children generally exhibiting broader and more exaggerated uses of pitch range than older children (Astruc et al., 2013).
Lastly, the earlier acquisition of corrective focus may be influenced by its frequency in children’s input. Corrective contexts, where caregivers correct children’s errors, are frequent in child-directed speech (e.g., Chouinard & Clark, 2003). Such interactional contexts may provide children with repeated exposure to prosodic and gestural patterns that highlight corrections. In fact, input frequency has been shown to play a key role in language acquisition, with linguistic strategies often emerging earlier for functions that occur more frequently in the input (Anderson et al., 2021).
Overall, our findings reveal that while children have not yet developed prosodic prominence strategies to distinguish between information and contrastive focus by age three, they systematically rely on the presence of gestures to do so. Multimodal strategies to signal corrective focus, in contrast, appear to be acquired before the age of three, demonstrating an earlier mastery for this type of meaning.

4.5. Limitations and Further Research

The study has two main limitations related to the use of the measures of perceived prosodic and gestural prominence. First, while these measures provide valuable insights into children’s ability to use prosody and gesture meaningfully, they may obscure interesting acoustic and kinematic differences across participants and age groups because they are inherently normalized to each participant’s range. This means that what is perceived as prominent within one participant may not align with what is prominent in another. For example, a three-year-old child might use more exaggerated prosodic and gestural cues to mark focus types compared to a five-year old child, who may employ subtler strategies. However, because prominence is assessed relative to each participant’s own range rather than on an absolute scale, these differences are not captured. Second, while our results for corrective focus show a clear increase in prosodic prominence that aligns with the tendency described for adults in Catalan already at age three, they may not yet reflect a fully adult-like mastery of the specific prosodic correlates of focus (e.g., pitch height or contour type). Unlike many previous studies on focus marking that have concentrated on children’s acquisition of adult-like acoustic patterns in prosody, this study has investigated children’s ability to use prosody, whether or not it conforms to the specific prosodic strategies adults use to implement prominence. The same can be said relative to the use of gestural prominence in corrective focus. Although we have identified an increase in gestural prominence in the corrective focus condition just as reflected in adult data in Catalan, children may not necessarily use the same kinematic cues in doing so as adults. Future studies should address these two issues by exploring alternative measures, such as acoustic or kinematic measures, which are less affected by participant-dependent normalization and offer a direct way to evaluate the mapping of prosodic and gestural cues used by children relative to adult-like patterns. Additionally, apart from age, other factors influencing individual variability, such as other linguistic or broader developmental abilities, should be examined to better understand the sources of variation in prosodic and gestural behavior in the context of focus types.
While we examined the use of prosody and gestures in marking focus types, we did not explore how these two strategies interact with each other. Further research is needed to investigate whether prosodic and gestural cues operate additively in this context or whether a trade-off exists between them. Moreover, it was outside the scope of this study to examine the use of other strategies used in marking focus types, like the syntactic strategies of left and right dislocation typically described for adult Catalan. Examining how prosodic and gestural modalities interact with word-order strategies in future studies could further refine our understanding of how children integrate diverse meaning-making strategies during language acquisition. Our study identified an early multimodal marking of corrective focus, where three-year-olds already use prosodic and gestural strategies to mark it. Future research should also explore the multimodal expression of corrective focus in children younger than three, so as to establish a clearer onset of the use of both strategies.

5. Conclusions

Our findings demonstrate that, by age three, children can use higher levels of prosodic and gestural prominence to mark corrective focus in Catalan. In contrast, although three-year-olds use the presence of gestures to distinguish between contrastive and information focus, the systematic use of prosodic prominence to mark contrastive focus was observed only in children aged four and five years. These results highlight, first, gesture’s precursor role in prosody in the development of multimodal focus-marking strategies, specifically in the marking of contrastive focus, and second, the developmental asynchrony in the acquisition of prosodic strategies for different focus types, with corrective focus being acquired earlier than contrastive focus. These findings show the importance of prosodic and especially gestural cues in children’s marking of focus types, which highlights the necessity to adopt a multimodal approach to the study of language acquisition and development.

Supplementary Materials

The following supporting information can be accessed at https://osf.io/k384c/ (accessed on 22 April 2025).

Author Contributions

Conceptualization, S.C., N.E-G. and P.P.; methodology, S.C., N.E-G. and P.P.; formal analysis, S.C.; investigation, S.C.; resources, S.C.; data curation, S.C.; writing—original draft preparation, S.C.; writing—review and editing, S.C., N.E-G. and P.P.; validation, S.C., N.E.-G., P.P.; visualization, S.C.; supervision P.P.; project administration, P.P.; funding acquisition, N.E.-G. and P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by grant PID2020-115385GA-I00, financed by MCIN/AEI/10.13039/501100011033 and by the European Regional Development Fund (ERDF), by the European Union. It was also supported by the grant AGAUR-FI Joan Oró 2025FI_B 00342, funded by the Agency for Management of University and Research Grants (Generalitat de Catalunya) together with the European Social Fund (ESF), by the European Union. The APC was funded by grants PID2021-123823NB-I00 and PID2020-115385GA-I00, financed by MCIN/AEI/10.13039/501100011033 and by the European Regional Development Fund (ERDF), by the European Union.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Committee for Ethical Review of Projects of the Universitat Pompeu Fabra (Approval number: 284) and by the Regional Ministry of Education from the Catalan Government.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are openly available in OSF at https://osf.io/k384c/.

Acknowledgments

Preliminary versions of this work were presented at the 45th International Child Phonology Conference, The International Max Planck Research School conference 2004, the 16th congress of the International Association for the Study of Child Language, the second International Multimodal Communication Symposium and the UR-Ling Workshop 2024. We thank all the attendees at those conferences that provided valuable feedback to us. We are deeply grateful to Carla Rufi for having annotated the prosodic prominence data and to Paula Sánchez-Ramón and Nàdia Carbó, for their assistance in data annotation for reliability. We would also like to thank Patrick L. Rohrer for his help in data collection. We also thank the staff and teachers at Escola Les Acàcies and Escola La Mar Bella for their support during data collection. Special thanks go to Sergio Prieto, principal of Escola Les Acàcies, and Maria Campi, head of studies at Escola La Mar Bella, for their collaboration and support in making this study possible. Finally, we extend our heartfelt gratitude to all the participants and their families for their involvement in this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Summary of types of grammatical structures of the productions obtained in the Train Task.
Table A1. Summary of types of grammatical structures of the productions obtained in the Train Task.
Type of StructureExampleFocus ConditionNPercent
Complex NP(Agafa) [la]Determiner [sabata]Noun [lila]AdjectiveTotal74837.57%
Information247
Contrastive347
Corrective154
Simple NP(Agafa) [la]Determiner [gorra]Noun
or
(Agafa) [aquella]Pronoun
Total42121.15%
Information309
Contrastive101
Corrective11
AP(Agafa) [la]Determiner [lila]NounTotal75437.87%
Information98
Contrastive296
Corrective360
Repeated focused words(Agafa) [aquesta,]Pronoun [la]Determiner [sabata]Noun [lila]Adjective
or
(Agafa) [la]Determiner [lila,]Adjective, [la]Determiner [lila]Adjective
Total582.91%
Information5
Contrastive16
Corrective37
No speechdeictic gestureTotal100.5%
Information1
Contrastive3
Corrective6
Table A2. Number and percentage of gestures by articulators involved in the movement.
Table A2. Number and percentage of gestures by articulators involved in the movement.
ArticulatorsNPercent
Hand28729.99
Hand and head535.96
Hand and eyebrow171.78
Hand and torso242.51
Hand, head, and eyebrow181.88
Hand, head, and torso80.84
Hand, eyebrow, and torso10.10
Hand, eyebrow, and legs20.21
Hand, eyebrow, torso, legs10.10
Hand, head, eyebrow, and torso40.42
Hand, head, eyebrow, torso, and legs10.10
Head22423.41
Head and eyebrow10.10
Head and torso868.99
Head and legs40.42
Head, eyebrow, and torso50.52
Head, eyebrow, and legs10.10
Head, torso and legs30.31
Eyebrow232.40
Torso14314.94
Torso and eyebrow101.04
Torso and legs60.63
Legs222.30

Notes

1
The materials used in the Train Task are available at https://osf.io/7nqke/ (accessed on 22 April 2025).
2
Only one instance in the final analyzed database involved left dislocation of the focused word.
3
The materials used to train the raters for the annotation of prosodic prominence, gesture presence, and gestural prominence are available at https://osf.io/k384c/ (accessed on 22 April 2025).

References

  1. Adli, A. (2011). A heuristic mathematical approach for modeling constraint cumulativity: Contrastive focus in Spanish and Catalan. The Linguistic Review, 28(2), 111–173. [Google Scholar] [CrossRef]
  2. Anderson, N. J., Graham, S. A., Prime, H., Jenkins, J. M., & Madigan, S. (2021). Linking quality and quantity of parental linguistic input to child language skills: A meta-analysis. Child Development, 92, 484–501. [Google Scholar] [CrossRef]
  3. Arnhold, A., Chen, A., & Järvikivi, J. (2016). Acquiring complex focus-marking: Finnish 4- to 5-year-olds use prosody and word order in interaction. Frontiers in Psychology, 7, 1886. [Google Scholar] [CrossRef] [PubMed]
  4. Astruc, L., Payne, E., Post, B., Vanrell, M. D. M., & Prieto, P. (2013). Tonal targets in early child English, Spanish, and Catalan. Language and Speech, 56(2), 229–253. [Google Scholar] [CrossRef]
  5. Aureli, T., Spinelli, M., Fasolo, M., Garito, M. C., Perucchini, P., & D’Odorico, L. (2017). The pointing–vocal coupling progression in the first half of the second year of life. Infancy, 22(6), 801–818. [Google Scholar] [CrossRef]
  6. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. [Google Scholar] [CrossRef]
  7. Bavelas, J. B. (2022). Face-to-face dialogue: Theory, research, and applications. Oxford University Press. [Google Scholar]
  8. Beaupoil-Hourdel, P. (2022). Embodying language complexity: Co-speech gestures between age 3 and 4. In A. Morgenstern, & S. Goldin-Meadow (Eds.), Gesture in language: Development across the lifespan (pp. 380–442). De Gruyter Mouton. [Google Scholar] [CrossRef]
  9. Beaupoil-Hourdel, P., Morgenstern, A., & Boutet, D. (2016). A child’s multimodal negations from 1 to 4: The interplay between modalities. In P. Larrivée, & C. Lee (Eds.), Negation and polarity: Experimental perspectives (Vol. 1, pp. 95–123). Springer International Publishing. [Google Scholar] [CrossRef]
  10. Benazzo, S., & Morgenstern, A. (2014). A bilingual child’s multimodal path into negation. Gesture, 14(2), 171–202. [Google Scholar] [CrossRef]
  11. Boersma, P., & Weenink, D. (2022). Praat: Doing phonetics by computer (Version 6.2.06) [Computer Software]. Available online: https://www.praat.org (accessed on 22 April 2025).
  12. Brown, L., & Prieto, P. (2021). Gesture and prosody in multimodal communication. In M. Haugh, D. Z. Kádár, & M. Terkourafi (Eds.), The cambridge handbook of sociopragmatics (pp. 430–453). Cambridge University Press. [Google Scholar] [CrossRef]
  13. Campisi, E., & Özyürek, A. (2013). Iconicity as a communicative strategy: Recipient design in multimodal demonstrations for adults and children. Journal of Pragmatics, 47(1), 14–27. [Google Scholar] [CrossRef]
  14. Carignan, C., Esteve-Gibert, N., Lœvenbruck, H., Dohen, M., & D’Imperio, M. (2024). Co-speech head nods are used to enhance prosodic prominence at different levels of narrow focus in French. The Journal of the Acoustical Society of America, 156(3), 1720–1733. [Google Scholar] [CrossRef]
  15. Chen, A. (2011). The developmental path to phonological focus-marking in Dutch. In S. Frota, G. Elordieta, & P. Prieto (Eds.), Prosodic categories: Production, perception and comprehension (pp. 93–109). Springer. [Google Scholar] [CrossRef]
  16. Chen, A. (2015). Children’ use of intonation in reference and the role of input. In L. Serratrice, & S. E. M. Allen (Eds.), The acquisition of reference (pp. 83–104). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
  17. Chen, A. (2018). Get the focus right across languages: Acquisition of prosodic focus-marking in production. In P. Prieto, & N. Esteve-Gibert (Eds.), The development of prosody in first language acquisition (pp. 295–314). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
  18. Chen, A., & Höhle, B. (2018). Four- to five-year-olds’ use of word order and prosody in focus marking in Dutch. Linguistics Vanguard, 4(1), 20160101. [Google Scholar] [CrossRef]
  19. Chouinard, M. M., & Clark, E. V. (2003). Adult reformulations of child errors as negative evidence. Journal of Child Language, 30(3), 637–669. [Google Scholar] [CrossRef]
  20. Christensen, R. H. B. (2023). Ordinal-regression models for ordinal data (R Package Version 2023.12.4). Available online: https://CRAN.R-project.org/package=ordinal (accessed on 22 April 2025).
  21. Chung, H.-Y. (2012). Two types of focus in Castilian Spanish [Doctoral dissertation, The University of Texas at Austin]. Repository University of Texas Libraries. Available online: http://hdl.handle.net/2152/19471 (accessed on 22 April 2025).
  22. Colletta, J.-M., Guidetti, M., Capirci, O., Cristilli, C., Demir, O. E., KuneneNicolas, R. N., & Levine, S. (2014). Effects of age and language on cospeech gesture production: An investigation of French, American, and Italian children’s narratives. Journal of Child Language, 42(1), 122–145. [Google Scholar] [CrossRef] [PubMed]
  23. Cruschina, S., & Mayol, L. (2022). The realization of information focus in Catalan. Languages, 7(4), 310. [Google Scholar] [CrossRef]
  24. Debreslioska, S., & Gullberg, M. (2019). Discourse reference is bimodal: How information status in speech interacts with presence and viewpoint of gestures. Discourse Process, 56, 41–60. [Google Scholar] [CrossRef]
  25. Destruel, E., Lalande, L., & Chen, A. (2024). The development of prosodic focus marking in French. Frontiers in Psychology, 15, 1360308. [Google Scholar] [CrossRef]
  26. Dodane, C., & Massini-Cagliari, G. (2010). La prosodie dans l’acquisition de la négation: Étude de cas d’une enfant monolingue francaise. ALFA Revista de Lingüística, 54(2), 335–360. [Google Scholar]
  27. Dufter, A., & Gabriel, C. (2016). 14. Information structure, prosody, and word order. In S. Fischer, & C. Gabriel (Eds.), Manual of grammatical interfaces in Romance (pp. 419–456). De Gruyter. [Google Scholar] [CrossRef]
  28. Esteve-Gibert, N., Lœvenbruck, H., Dohen, M., & D’Imperio, M. (2022). Pre-schoolers use head gestures rather than prosodic cues to highlight important information in speech. Developmental Science, 25(1), e13154. [Google Scholar] [CrossRef]
  29. Face, T. L., & D’Imperio, M. (2005). Reconsidering a focal typology: Evidence from Spanish and Italian. Rivista di Linguistica, 17(2), 271–289. [Google Scholar]
  30. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. [Google Scholar] [CrossRef]
  31. Frota, S. (2002). The prosody of focus: A case-study with cross-linguistic implications. Proceedings of Speech Prosody, 2002, 319–322. [Google Scholar] [CrossRef]
  32. Galati, A., & Brennan, S. E. (2013). Speakers adapt gestures to addressees’ knowledge: Implications for models of co-speech gesture. Language, Cognition and Neuroscience, 29(4), 435–451. [Google Scholar] [CrossRef]
  33. Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2019). irr: Various coefficients of interrater reliability and agreement (R Package Version 0.84.1). Available online: https://CRAN.R-project.org/package=irr (accessed on 22 April 2025).
  34. Ganzeboom, H. B. G., De Graaf, P. M., & Treiman, D. J. (1992). A standard international socio-economic index of occupational status. Social Science Research, 21(1), 1–56. [Google Scholar] [CrossRef]
  35. Goldin-Meadow, S. (2015). Gesture as a window onto communicative abilities: Implications for diagnosis and intervention. Perspectives on Language Learning and Education, 22(2), 50–60. [Google Scholar] [CrossRef] [PubMed]
  36. Grice, M., & Kügler, F. (2021). Prosodic prominence—A cross-linguistic perspective. Language and Speech, 64(2), 253–260. [Google Scholar] [CrossRef]
  37. Grünloh, T., Lieven, E., & Tomasello, M. (2015). Young children’s intonational marking of new, given and contrastive referents. Language Learning and Development, 11(2), 95–127. [Google Scholar] [CrossRef]
  38. Gwet, K. L. (2022). irrCAC: Computing chance-corrected agreement coefficients (CAC) (R Package Version 1.0). Available online: https://cran.r-project.org/web/packages/irrCAC/index.html (accessed on 22 April 2025).
  39. Holler, J., & Stevens, R. (2007). The effect of common ground on how speakers use gesture and speech to represent size information. Journal of Language and Social Psychology, 26(1), 4–27. [Google Scholar] [CrossRef]
  40. Hornby, P. A., & Hass, W. A. (1970). Use of contrastive stress by preschool children. Journal of Speech and Hearing Research, 13, 359–99. [Google Scholar] [CrossRef]
  41. Hostetter, A., Alibali, M., & Schrager, S. (2011). Chapter 5. If you don’t already know, I’m certainly not going to show you! Motivation to communicate affects gesture production. In G. Stam, & M. Ishino (Eds.), Integrating gestures: The interdisciplinary nature of gesture (pp. 61–74). John Benjamins. [Google Scholar] [CrossRef]
  42. Hübscher, I., Garufi, M., & Prieto, P. (2019a). The development of polite stance in preschoolers: How prosody, gesture, and body cues pave the way. Journal of Child Language, 46(5), 825–862. [Google Scholar] [CrossRef]
  43. Hübscher, I., & Prieto, P. (2019). Gestural and prosodic development act as sister systems and jointly pave the way for children’s sociopragmatic development. Frontiers in Psychology, 10, 1259. [Google Scholar] [CrossRef]
  44. Hübscher, I., Vincze, L., & Prieto, P. (2019b). Children’s signaling of their uncertain knowledge state: Prosody, face, and body cues come first. Language Learning and Development, 15(4), 366–389. [Google Scholar] [CrossRef]
  45. Ito, K. (2014). Children’s pragmatic use of prosodic prominence. In D. Mathews (Ed.), Pragmatic development in first language acquisition (pp. 199–218). John Benjamins. [Google Scholar] [CrossRef]
  46. Ito, K. (2018). Gradual development of focus prosody and affect prosody comprehension. In P. Prieto, & N. Esteve-Gibert (Eds.), The development of prosody in first language acquisition (pp. 247–270). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
  47. Katz, J., & Selkirk, E. (2011). Contrastive focus vs. discourse-new: Evidence from phonetic prominence in English. Language, 87(4), 771–816. [Google Scholar] [CrossRef]
  48. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press. [Google Scholar]
  49. Kim, S., Paulus, M., Sodian, B., & Proust, J. (2016). Young children’s sensitivity to their own ignorance in informing others. PLoS ONE, 11(3), e0152595. [Google Scholar] [CrossRef] [PubMed]
  50. Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica, 55(3–4), 243–276. [Google Scholar] [CrossRef]
  51. Kügler, F., & Calhoun, S. (2021). Prosodic encoding of information structure. In C. Gussenhoven, & A. Chen (Eds.), The oxford handbook of language prosody (pp. 453–467). Oxford University Press. [Google Scholar]
  52. Kügler, F., & Gollrad, A. (2015). Production and perception of contrast: The case of the rise-fall contour in German. Frontiers in Psychology, 6, 1254. [Google Scholar] [CrossRef]
  53. Kügler, F., Smolibocki, B., Arnold, D., Baumann, S., Braun, B., Grice, M., Jannedy, S., Michalsky, J., Niebuhr, O., & Peters, J. (2015). DIMA: Annotation guidelines for German intonation. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences. IPA Public Archive. [Google Scholar]
  54. MacWhinney, B., & Bates, E. (1978). Sentential devices for conveying givenness and newness: A cross-cultural developmental study. Journal of Verbal Learning and Verbal Behavior, 17(5), 539–558. [Google Scholar] [CrossRef]
  55. Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The language experience and proficiency questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals. Journal of Speech Language and Hearing Research, 50(4), 940–967, Translation by Puig-Mayenco & Tubau (2016), Universitat Autònoma de Barcelona. [Google Scholar] [CrossRef] [PubMed]
  56. Mayol, L. (2007). Right-dislocation in Catalan: Its discourse function and counterparts in English. Languages in Contrast, 7(2), 203–20. [Google Scholar] [CrossRef]
  57. Morgenstern, A., Blondel, M., Beaupoil-Hourdel, P., Benazzo, S., Boutet, D., Kochan, A., & Limousin, F. (2018). The blossoming of negation in gesture, sign and oral productions. In M. Hickman, E. Veneziano, & H. Jisa (Eds.), Sources of variation in first language acquisition: Languages, contexts, and learners (pp. 339–364). John Benjamins. [Google Scholar] [CrossRef]
  58. Murillo, E., & Capilla, A. (2016). Properties of vocalization- and gesture-combinations in the transition to first words. Journal of Child Language, 43(4), 890–913. [Google Scholar] [CrossRef]
  59. Peppé, S., & McCann, J. (2003). Assessing intonation and prosody in children with atypical language development: The PEPS–C test and the revised version. Clinical Linguistics & Phonetics, 17, 345–354. [Google Scholar]
  60. Prieto, P. (2014). The intonational phonology of Catalan. In S.-A. Jun (Ed.), Prosodic typology II (1st ed., pp. 43–80). Oxford University Press. [Google Scholar] [CrossRef]
  61. Pronina, M., Prieto, P., Bischetti, L., & Bambini, V. (2023). Expressive pragmatics and prosody in young preschoolers are more closely related to structural language than to mentalizing. Language Learning and Development, 19(3), 323–344. [Google Scholar] [CrossRef]
  62. R Core Team. (2024). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 22 April 2025).
  63. Repp, S. (2010). Defining ‘contrast’ as an information-structural notion in grammar. Lingua, 120(6), 1333–1345. [Google Scholar] [CrossRef]
  64. Repp, S. (2016). Contrast: Dissecting an elusive information-structural notion and its role in Grammar. In C. Féry, & S. Ishihara (Eds.), The Oxford handbook of information structure (1st ed., pp. 270–289). Oxford University Press. [Google Scholar] [CrossRef]
  65. Rohrer, P. L. (2022). A temporal and pragmatic analysis of gesture-speech association: A corpus-based approach using the novel MultiModal MultiDimensional (M3D) labeling system [Doctoral dissertation, Nantes Université-Universitat Pompeu Fabra]. E-Reopositori UPF. Available online: http://hdl.handle.net/10803/687534 (accessed on 22 April 2025).
  66. Rohrer, P. L., Florit-Pons, J., Vilà-Giménez, I., & Prieto, P. (2022). Children use non-referential gestures in narrative speech to mark discourse elements which update common ground. Frontiers in Psychology, 12, 661339. [Google Scholar] [CrossRef] [PubMed]
  67. Rohrer, P. L., Tütüncübasi, U., Vilà-Giménez, I., Florit-Pons, J., Esteve-Gibert, N., Ren-Mitchell, A., Shattuck-Hufnagel, S., & Prieto, P. (2023). The MultiModal MultiDimensional (M3D) labeling system. Available online: https://doi.org/10.17605/osf.io/ankdx (accessed on 22 April 2025).
  68. Sánchez-Ramón, P., Gregori, A., Kugler, F., & Prieto, P. (2024, September 6). La prominència prosòdica va lligada a la prominència gestual? El marcatge multimodal del focus en català [Conference presentation]. Workshop de la prosòdia del català, Palma de Mallorca, Spain. [Google Scholar]
  69. Sánchez-Ramón, P., Gregori, A., Kugler, F., & Prieto, P. (in press). The multimodal marking of focus types in Catalan and German. [Google Scholar]
  70. Serrat, E., Aguilar-Mediavilla, E., Sanz-Torrent, M., Andreu, L., Amadó, A., Badia, I., & Serra, M. (2022). Inventaris del desenvolupament d’habilitats comunicatives MacArthur-Bates en català. Universitat Oberta de Catalunya. [Google Scholar]
  71. Snow, D. (1998). Children’s imitations of intonation contours: Are rising tones more difficult than falling tone? Journal of Speech, Language, and Hearing Research, 41, 576–587. [Google Scholar] [CrossRef]
  72. Stalnaker, R. (2002). Common ground. Linguistics and Philosophy, 25(5/6), 701–721. Available online: http://www.jstor.org/stable/25001871 (accessed on 22 April 2025). [CrossRef]
  73. The Language Archive. (2023). ELAN (Version 6.7) [Computer software]. Max Plank Institute for Psycholinguistics. Available online: https://archive.mpi.nl/tla/elan (accessed on 22 April 2025).
  74. Thorson, J. C., & Morgan, J. L. (2021). Prosodic realizations of new, given, and corrective referents in the spontaneous speech of toddlers. Journal of Child Language, 48(3), 541–568. [Google Scholar] [CrossRef] [PubMed]
  75. Umbach, C. (2004). On the notion of contrast in information structure and discourse structure. Journal of Semantics, 21(2), 155–175. [Google Scholar] [CrossRef]
  76. Vallduví, E. (1994). Detachment in Catalan and information packaging. Journal of Pragmatics, 22, 573–601. [Google Scholar] [CrossRef]
  77. Vander Klok, J., Goad, H., & Wagner, M. (2018). Prosodic focus in English vs. French: A scope account. Glossa: A Journal of General Linguistics, 3(1), 71. [Google Scholar] [CrossRef]
  78. Vanrell, M. D. M., & Fernández Soriano, O. (2013). Variation at the interfaces in ibero-romance. Catalan and spanish prosody and word order. Catalan Journal of Linguistics, 12, 253. [Google Scholar] [CrossRef]
  79. Vanrell, M. D. M., Stella, A., Fivela, B. G., & Prieto, P. (2013). Prosodic manifestations of the effort code in Catalan, Italian and Spanish contrastive focus. Journal of the International Phonetic Association, 43(2), 195–220. [Google Scholar] [CrossRef]
  80. Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232. [Google Scholar]
  81. Wells, B., Peppé, S., & Goulandris, N. (2004). Intonation development from five to thirteen. Journal of Child Language, 31(4), 749–778. [Google Scholar] [CrossRef] [PubMed]
  82. Wiig, E. H., Secord, W. A., & Semel, E. (2009). Clinical evaluation of language fundamentals—Preschool Spanish (CELF 2). Pearson. [Google Scholar]
  83. Wonnacott, E., & Watson, D. G. (2008). Acoustic emphasis in four year olds. Cognition, 107(3), 1093–1101. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Materials used in the Train Task (1: toy train; 2: puppets in order of use; 3: example task objects; 4: tablet with example of target object depiction).
Figure 1. Materials used in the Train Task (1: toy train; 2: puppets in order of use; 3: example task objects; 4: tablet with example of target object depiction).
Languages 10 00092 g001
Figure 2. Illustration of the experimental setup in the Train Task in a contrastive focus condition (1: target object depicted; 2: trial task objects; 3: unsorted task objects).
Figure 2. Illustration of the experimental setup in the Train Task in a contrastive focus condition (1: target object depicted; 2: trial task objects; 3: unsorted task objects).
Languages 10 00092 g002
Figure 3. Example of prosodic prominence annotation in Praat for the production agafa la sabata [lila]Focus “pick the purple shoe”, produced by a four-year-old participant.
Figure 3. Example of prosodic prominence annotation in Praat for the production agafa la sabata [lila]Focus “pick the purple shoe”, produced by a four-year-old participant.
Languages 10 00092 g003
Figure 4. Example of gesture presence annotations in ELAN for the production agafa la sabata [lila]Focus “pick the purple shoe”, produced by a three-year-old participant.
Figure 4. Example of gesture presence annotations in ELAN for the production agafa la sabata [lila]Focus “pick the purple shoe”, produced by a three-year-old participant.
Languages 10 00092 g004
Figure 5. Predicted probabilities of prosodic prominence levels across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus) and age groups. Error bars represent 95% confidence intervals.
Figure 5. Predicted probabilities of prosodic prominence levels across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus) and age groups. Error bars represent 95% confidence intervals.
Languages 10 00092 g005
Figure 6. Predicted probabilities of gesture presence across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus). Error bars represent 95% confidence intervals.
Figure 6. Predicted probabilities of gesture presence across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus). Error bars represent 95% confidence intervals.
Languages 10 00092 g006
Figure 7. Predicted probabilities of gestural prominence levels across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus). Error bars represent 95% confidence intervals.
Figure 7. Predicted probabilities of gestural prominence levels across focus conditions (Info: information focus, Cont: contrastive focus, Corr: corrective focus). Error bars represent 95% confidence intervals.
Languages 10 00092 g007
Table 1. Age-related information of the participants and total sample size per age groups.
Table 1. Age-related information of the participants and total sample size per age groups.
NSexAge Range (Years; Months)Average Age (Years; Months)
Total11654 girls3; 3–6; 34; 10
Three-year-olds3414 girls3; 3–4; 03; 8
Four-year-olds3619 girls4; 1–5; 04; 8
Five-year-olds4621 girls5; 1–6; 35; 10
Table 2. Expected productions in each condition with example stimuli.
Table 2. Expected productions in each condition with example stimuli.
Focus ConditionExpected ProductionExample Stimuli
Information(Agafa) el [llibre negre]Focus
Lit. trans. 1: (Pick up) the [book black]Focus
Trans. 2: (Pick up) the black book
Languages 10 00092 i001
Contrastive(Agafa) la sabata [lila]Focus
Lit. trans.: (Pick up) the shoe [purple]Focus
Trans.: (Pick up) the purple shoe
Languages 10 00092 i002
CorrectiveNo, (agafa) la sabata [lila]Focus
Lit. trans.: No, (pick up) the shoe [purple]Focus
Trans.: No, (pick up) the purple shoe
Languages 10 00092 i003
1 Lit. trans. refers to a literal translation. 2 Trans. refers to translation.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Coego, S.; Esteve-Gibert, N.; Prieto, P. Preschoolers Mark Focus Types Through Multimodal Prominence: Further Evidence for the Precursor Role of Gestures. Languages 2025, 10, 92. https://doi.org/10.3390/languages10050092

AMA Style

Coego S, Esteve-Gibert N, Prieto P. Preschoolers Mark Focus Types Through Multimodal Prominence: Further Evidence for the Precursor Role of Gestures. Languages. 2025; 10(5):92. https://doi.org/10.3390/languages10050092

Chicago/Turabian Style

Coego, Sara, Núria Esteve-Gibert, and Pilar Prieto. 2025. "Preschoolers Mark Focus Types Through Multimodal Prominence: Further Evidence for the Precursor Role of Gestures" Languages 10, no. 5: 92. https://doi.org/10.3390/languages10050092

APA Style

Coego, S., Esteve-Gibert, N., & Prieto, P. (2025). Preschoolers Mark Focus Types Through Multimodal Prominence: Further Evidence for the Precursor Role of Gestures. Languages, 10(5), 92. https://doi.org/10.3390/languages10050092

Article Metrics

Back to TopTop