Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers

Galatà, Vincenzo; Lucarini, Gaia; Palmieri, Maria; Zmarich, Claudio

doi:10.3390/languages10100249

Open AccessArticle

Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers

¹

Institute of Cognitive Sciences and Technologies (ISTC), Italian National Research Council (CNR), 35137 Padua, Italy

²

Department of Developmental and Social Psychology, University of Padua, 35122 Padua, Italy

³

Padova Neuroscience Center, University of Padua, 35128 Padua, Italy

⁴

Dipartimento di Neuroscienze e Riabilitazione, Università degli Studi di Ferrara, 44100 Ferrara, Italy

⁵

Center for Translational Neurophysiology of Speech and Communication, Italian Institute of Technology, 44121 Ferrara, Italy

^*

Author to whom correspondence should be addressed.

Languages 2025, 10(10), 249; https://doi.org/10.3390/languages10100249

Submission received: 23 September 2024 / Revised: 24 July 2025 / Accepted: 2 September 2025 / Published: 26 September 2025

(This article belongs to the Special Issue Speech Variation in Contemporary Italian)

Download

Browse Figures

Versions Notes

Abstract

This contribution provides new data on Italian first language acquisition and phonological development in preschool children. In total, 104 3- to 6;4-year-old typically developing Italian children were tested with two novel nonword tasks tackling the Italian consonantal system: one for repetition (NWR) and one for discrimination (NWD). NWR data were analyzed in terms of repetition accuracy, featural characteristics, and phonological processes, while NWD was analyzed according to signal detection theory (i.e., A-prime and d-prime) and in terms of discrimination accuracy. The results show the significant role of age on children’s repetition and discrimination abilities: as the children grow older, all the scores improve and the number of errors declines. No complete overlap is found between what children can produce and what they can discriminate, which is in line with what has already been documented in other languages. The findings contribute to the state of the art on the Italian language and provide new perspectives on some methodological issues specific to this language.

Keywords:

nonword discrimination; nonword repetition; Italian; preschool children; production–perception link; phonological processes; featural analysis; phonetic transcription; PHON software

1. Introduction

Newborns are born as universal listeners and can discriminate among most of the sounds present in the world’s languages even if they have never heard them before (Singh et al., 2022; Werker, 2024), although some exposition is necessary for the less acoustically salient ones (Narayan et al., 2010). During the second half of the first year, infants attune to the sounds of their native language, and from 8 to 10 months of age, they start losing the ability to discriminate almost all non-native contrasts in favor of those belonging to their native language (Werker & Tees, 1984; Kuhl et al., 2006; Werker, 2024), with some notable exceptions (Best et al., 1988; Werker et al., 1998 for 14-month-old infants; McMurray et al., 2002 for adults).

On the other side of the coin—speech production—we see a similar trend from universal to tuned-to-language abilities. If, in their first months of life, babies produce vegetative and universally similar sounds all around the world, afterwards, they specialize in those phonological features typical of their native language (Stark, 1986; Brown, 1958), although this journey is heavily constrained by the continually changing anatomy and neurophysiology of their phono-articulatory system (Callan et al., 2000; Kent, 2024).

The different levels of linguistic competence influence each other from the earliest stages of life, suggesting a developmental cascade in language acquisition (Werker, 2018). Several works have provided evidence in support of this thesis. If we only focus on the relationship between speech perception and speech production, we see how the two abilities influence each other in different ways across early development. For instance, sensorimotor speech information already influences auditory speech processing in the first half of the first year of life (Bruderer et al., 2015; Choi et al., 2019, 2021, 2023). Later, between 4 and 6 years of age, if asked to learn new words (Seidl et al., 2018) or lipread words (Turner et al., 2015) while their articulators are blocked, typically developing children are still highly sensitive to cross-modal inputs. Generally speaking, a high sensitivity to cross-modal input seems to be present from infancy to adulthood.

The developmental path of these two skills, while interconnected and cross-modal, appears to follow developmental trajectories that are not completely overlapping. First, the mastery of the two systems is not simultaneous. On one side, from a segmental point of view, 4- to 5-year-old children have usually acquired most of the speech sounds of their language and present intelligible speech (McLeod & Crowe, 2018). Speech production abilities still develop during preschool years (Viterbori et al., 2018; Zmarich, 2008; McLeod & Crowe, 2018; Zanobini et al., 2012; Zmarich et al., 2014) and even later if we consider that some speech sounds are not yet fully acquired and mastered by the age of 7 (Tresoldi et al., 2018; Lucarini et al., 2022). However, one has to consider that segmental analysis offers a developmental picture that could hide the neuromotor attempts of children: as stated by Goffman (2015), it appears plausible that the fundamental representations supporting production are established at an early stage of development, but the child lacks the capability to execute linguistic distinctions with enough accuracy to surpass a crucial perceptual barrier. On the other side, speech perception abilities are not “adultlike” until after 12 years of age (Rvachew & Brosseau-Lapré, 2018). Second, several possible trajectories emerge. Examples such as the “fis” phenomenon (Brown & Berko, 1960) suggest a correct mental representation of sounds, but a still immature production; in fact, children are able to discriminate contrasts they are not able to produce yet. At the same time, the exact opposite pattern has also been described in typically developing children, who can produce contrasts they cannot discriminate (Nakeva von Mentzer, 2020; Pascoe et al., 2016), even if these results could be affected by methodological choices. Moreover, a third pattern emerged from other studies, according to which there is a correlation between discriminated and produced contrasts, both in typical and atypical populations (Edwards et al., 2002; Rvachew et al., 2004; McAllister Byun, 2012; Altvater-Mackensen et al., 2014; Hearnshaw et al., 2018). In line with this pattern, a more recent work (Hearnshaw et al., 2023) highlighted that both speech production and vocabulary significantly predict differences in speech perception ability, with receptive and expressive vocabulary accounting for more of the variance than speech production. In sum, the speech perception–production relationship appears to be complex, reflecting the developmental cascade (Werker, 2018; Swingley, 2021).

Several theories try to give an explanation for the interplay of these abilities (for a complete review, see Arjmandi & Behroozmand, 2024). For instance, traditionally, the Motor Theory of speech perception (Liberman & Mattingly, 1985) posits that listeners perceive speech not as an acoustic object, but as a motor object—specifically, as a set of phonetic articulatory gestures. More recently, Articulatory Phonology (Goldstein & Fowler, 2003) has assumed that gestures are patterns of location and constriction created by articulatory movements in the vocal tract, and not segmented sequences in a continuum. Since gestures are the units for mapping articulation to the perception of lexical items, perception involves the reconstruction of these articulatory patterns. Additionally, both the Directions Into Velocities of Articulators (DIVA) model (Guenther, 2016; Meier & Guenther, 2023) and the integrative sensorimotor model of speech (Behroozmand et al., 2018) consider the fundamental role of forward predictions and auditory feedback to integrate the auditory input and articulatory control in speech. In general, these and other theories and models agree that there is a relationship between speech perception and speech production, although its exact nature is still not fully understood, especially in its developmental path.

Achieving a full understanding of this topic is crucial for informing clinical practice. In 2019, Hearnshaw, Baker, and Munro systematically reviewed 73 articles examining the speech perception skills of children with Speech Sound Disorders (SSDs). The majority of the reviewed studies reported that preschool- and early school-age children with SSDs presented difficulty with speech perception tasks. Across the included studies, there was considerable methodological variability in terms of assessment tasks used and eligibility criteria for the SSD definition. Nonetheless, the pattern emerged, suggesting the need to consider speech perception as part of routine assessment for children with SSDs. As reported by a previous study by the same authors (Hearnshaw et al., 2018), perception abilities in preschoolers are less investigated than production abilities, and this is even more true for Italian. Hence, further studies are needed, with potentially relevant implications both for improving the research in this field (i.e., for delving into specific methodological issues) and clinical practice.

In the present study, we investigated first language acquisition and phonological development in a group of 3- to 6-year-old children who speak Italian, which is still an understudied language. Specifically, we explored the relationship between speech production and discrimination abilities by administering two novel tasks using nonwords. The first is a nonword repetition (NWR) task, which assessed children’s decoding and encoding abilities independently of lexical knowledge. The second is a nonword discrimination (NWD) task, which investigated children’s ability to perceive differences among Italian consonantal sounds. Nonwords were used because of their “potential usefulness […] in providing culturally nonbiased assessments of linguistic abilities” (Weismer et al., 2000, p. 874). Additionally, their use prevents the dependency on previous lexical knowledge, thus engaging only the phonological memory and the articulatory system, but not the lexical/semantic system (see Baddeley et al., 1998; Stokes & Klee, 2009). Concerning the Italian language, different studies have used nonwords to investigate children’s production or discrimination abilities. Nonword repetition is investigated in 3;11- to 5;8-year-old children by Dispaldro et al. (2013); in 4- to 12-year-olds by Fabbro (1999) and Marini et al. (2015); and in 5- to 12-year-olds by Bertelli and Bilancia (2006) and Vicari (2007). Moreover, Cornoldi et al. (2009) created a task that was validated on preschool and elementary school children, while Piazzalunga et al. (2018) proposed a nonword repetition test, which was normed on typically developing children aged 3;0 to 6;11 years.

Only two normative studies using nonwords in an AX, or “same-different”, paradigm discrimination task are available. The CMF by Marotta et al. (2008) normed on children between 5 and 11 years of age, which consists of 15 disyllabic word pairs and 15 disyllabic nonwords pairs. The BVN 5–11 by Bisiacchi et al. (2005; also in Pinton & Zanettin, 1998), which contains 37 nonword pairs and is normed on the same age range. The BVN 5–11 also includes a nonword repetition task with 15 stimuli from two to three syllables. Both discrimination tests, however, have several limitations, which are discussed in more detail in Zmarich et al. (2019). Moreover, all the above-mentioned tasks/tests present a limited set of phonemes (with the exception of Piazzalunga et al., 2018), thus differing in terms of articulatory complexity. With our two tasks, we investigated the entire set of Italian consonantal phonemes and used the same stimuli both in perception and in production. To our knowledge, this is the first attempt in the Italian literature.

Therefore, our aim was twofold:

Studying the phonological development of Italian preschoolers in the production and discrimination of Italian consonants embedded in disyllabic nonwords by using two complementary tasks (i.e., an NWR and an NWD task).
Investigating the relationship between production and discrimination abilities in the function of children’s chronological age.

We expect that children’s accuracy in both tasks will improve with age, although we foresee this improvement to be highly variable among children.

2. Materials and Methods

All the materials used in the current study originate from previous research funded by the Italian National Research Council through the project “Migrazioni” (2009–2012). The project aimed to study the acquisition of Italian in preschoolers with a different mother tongue (i.e., Romanian, Albanian, Arabic, and Igbo and Edo from Nigeria; for further details about the inclusion of these languages in the former project and for a description of their phonological inventories, see Galatà & Zmarich, 2011a, 2011b). Italian-speaking children born to Italian parents had been recruited as control subjects, but apart from small case studies (Galatà et al., 2012, 2017), their data have not been analyzed so far; thus, they are under the spotlight of the present study.

2.1. Participants

Children were all Italian speakers and were recruited from two different kindergartens in the town of Padova (Veneto region, in north-eastern Italy). As much as possible, during the recruitment, attention was paid to balancing the sample according to age and gender. Children with diagnosed or suspicious hearing, speech, or language-related problems, as reported by the head of the school, the children’s school caregivers, or a parental questionnaire, were explicitly excluded.1 Prior to their inclusion in the study, informed and written consent was obtained from all the parents.

Overall, 122 children aged 3 to 6;4 years old were recruited and divided into 6-month age groups. Each child was individually assessed at school in a separate and quiet room. One child was excluded because they had been involved in the pilot-testing; 16 children (1 in the 3;0–3;5 group, 11 in the 3;6–3;11 group, 2 in the 4;0–4;5 group, 1 in the 4;6–4;11 group, and 2 in the 5;0–5;5 group) were excluded after testing since they showed a clear lack of attention and scarce collaboration during the experiment and/or because they did not complete one of the three blocks of the NWD task; one child was excluded because they were congested due to a cold. Hence, the final sample included 104 children, a summary of which is provided in Table 1.

2.2. Stimuli

Nonwords were used as stimuli for both the NWD and NWR.

With the aim of testing the Italian phonological consonant inventory (23 consonants, including the glides /w, j/; Bertinetto & Loporcaro, 2005), the target consonants in the nonwords were those absent in the above-mentioned languages, such as /ɲ, ʎ, ʣ/, and consonant length in Romanian; /w, ʎ/, and consonant length in Albanian; /p, g, v, ʦ, ʣ, ʧ, ʎ, ɲ/ in standard Arabic; /ʦ, ʣ, ʎ, v, r/ in Igbo; /ʦ, ʣ, ʧ, ʤ, ʎ, ɲ, ʃ/, and consonant length in Edo. In the end, the 12 targeted consonants were /p, g, v, r, ɲ, ʃ, ʎ, ʦ, ʣ, ʧ, ʤ, w/ together with the length contrast in word-medial position as indexed by the prosodic feature [+long], which holds for all Italian consonants with the exception of /ʦ, ʣ, ʃ, ʎ, ɲ/, which are inherently long in intervocalic position, while /j, w, z/ are inherently short.

A list of nonword stimuli was created following Pinton and Zanettin (1998) and addressing the following conditions:

Choice of disyllabic nonwords of the type ˈCV.CV: the structure CV is the simplest and widespread among the world’s languages (Maddieson, 2008); children acquiring Italian as L2 are able to produce this type of syllable independently from the linguistic transfer (Carlisle, 2001). Item length is an important clinical parameter (see Dispaldro et al., 2013, among others) and is known to influence children’s performance in tasks of nonword discrimination and repetition (see Dollaghan & Campbell, 1998). In order to control for the test items’ length, we decided to include only disyllabic nonwords with a ˈCV.CV structure in our tasks. Additionally, this choice aligns with the Italian version of the MacArthur-Bates CDI (Caselli et al., 2015), where most syllables are CV (63%), and disyllables represent 39.4% of the total number of words (Zmarich et al., 2011).
Combination of each of the 23 Italian consonants (C) with cardinal vowels (V) /a/ or /i/ (for a few items, the back vowel /u/ was used): the statistics available in the UPSID (UCLA Phonological Segment Inventory Database; Maddieson & Precoda, 1990) show that the vowel /a/ is present in 86.9% of world’s languages, /i/ is in 87.1%, and /u/ is in 81.8%. No consonant clusters are included.
The presence of only 1 of the 12 target consonants in each nonword, either in word-initial or word-medial position.
Every nonword is a nonword in Italian, as well as in the other target languages: native speakers of each target language have checked all the nonwords in order to exclude the presence of real lexical items and violations of the language’s phonotactic rules.2

The described procedure yielded a list of 104 nonwords, which were recorded by a professional female speech therapist in a soundproof room at 44.1 kHz 16-bit mono. The nonwords were then processed in PRAAT (Boersma & Weenink, 2001) in order to be segmented, labeled, and normalized at 65 dB with PRAAT’s scale intensity function and finally included in the nonword discrimination (NWD) and the nonword repetition (NWR) tasks.

2.3. The Nonword Repetition (NWR) Task

The aim of the NWR task is to evaluate children’s accuracy in reproducing the Italian consonantal phoneme inventory in ˈCV.CV nonwords.

The task items were selected from the list of 104 nonwords described above. The following selection criteria were adopted:

Target stop consonants were preferred in word-initial position, followed either by /a/ or /i/ to also allow future computations of Voice Onset Time (VOT). Italian voiced consonants have a negative VOT; therefore, they are more difficult to utter, especially in word-initial position (cf. Ohala, 2011; for Italian, see also Zmarich et al., 2021). Our choice of placing target stops mostly in word-initial position made the items more challenging, avoiding the facilitation of [+voiced] stop consonants in word-medial and intervocalic positions.
All other target consonants were preferred, as far as possible, in word-medial position with at least one target consonant followed by /a/ and one by /i/.

The NWR task was organized into three separate blocks to allow the experimenter to conclude the testing session in case of a lack of collaboration from the child (see Table A1 in Appendix B). Specifically:

Block 1 contained 24 nonwords with the following stop Cs in word-initial position: /p, b, t, d, k, g/. Each of these Cs appeared twice, followed by /a/ (e.g., /ˈbaʤa/, /ˈbaza/) and twice followed by /i/ (e.g., /ˈbizi/, /ˈbina/). Within this block, the C in word-medial position was one of the other non-stop Cs, with only one of the two consonants in the nonword belonging to those identified as targets.
Block 2 contained 19 nonwords covering all the other non-stop Cs in word-initial and word-medial positions that were not included in block 1 (e.g., /ˈsiʧa/, /ˈlari/); this block also included 3 items for consonantal length (e.g., /ˈdaffi/, /ˈjalli/, /ˈbitti/), while /z/ is missing in word-initial position because of the phonotactic rules of Italian. Each target C has a minimum of two repetitions (one with /a/ and one with /i/).
Block 3 contained 12 items and was intended as an optional block to address VOT production (not discussed in the present work). It reflects block 1 and adds additional items with word-initial stop Cs (i.e., 6 stop Cs followed by /a/ and 6 stop Cs followed by /i/).

Table A2 in Appendix B provides an overview of the number of consonants by position in the word as tested in the NWR task.

2.4. The Nonword Discrimination (NWD) Task

The NWD task tests children’s ability to discriminate ˈCV.CV nonword pairs, which differ for one Italian consonantal phoneme drawn from the list of targets in Section 2.2. Moreover, in each trial, the contrasting nonwords differ for only one target phoneme, which presents a maximum of 4 distinctive acoustic features (Jakobson et al., 1952; Muljačić, 1972) according to the matrix elaborated by Mioni (1983, p. 64) for the Italian phonological system. Examples of nonword pairs are:

/ˈnala/ vs /ˈnaʎa/ for the target phoneme /ʎ/ contrasted with /l/, which differs for two features ([-compact], [+diffuse]).
/ˈsita/ vs /ˈsiʧa/ for the target phoneme /ʧ/ contrasted with /t/, which differs for three features ([-compact], [+diffuse], [-strident]).
The NWD task also includes nonword pairs contrasting for voicing (i.e., [±voice] as in /ˈsita/ vs /ˈsida/) and length (i.e., [±long] as in /ˈdaffi/ vs /ˈdafi/).3

The purpose of the NWD task is to determine whether children perceive any difference in the two nonwords by asking them to indicate for each pair whether the two items are the same or different, thus following an AX, or “same-different”, paradigm. The AX paradigm requires a reduced load on auditory memory; it is easier to explain to young children, and it can be kept temporally short enough to comply with the children’s limited ability to sustain attention (Beving & Eblen, 1973; Polka et al., 1995).

The NWD task encompasses 3 independent blocks with 17 trials each: 14 test pairs (12 for target consonants, 1 for voicing, and 1 for length) for which a “different” response is expected, and 3 control pairs (or distractors) for which the “same” response is expected. The control pairs assess the participants’ responding reliability.4 The organization of the task allowed breaks after completing each block if the child showed fatigue or lack of concentration or collaboration, and eventually to stop even without completing all the blocks (see Table A3 in Appendix B for a detailed overview of the NWD task).

2.5. Procedure

Both tasks were implemented and presented to children via PRAAT’s ExperimentMFC procedure: they were presented in random order, while the order of the three blocks in each task was kept fixed in order to guarantee data integrity during data collection in case one child stopped being collaborative after one or two blocks; the trials within each block were randomized. Stimuli were presented at a comfortable level through a pair of Logitech X-140 loudspeakers placed at a distance of about 50 cm from the participant. The loudspeakers were masked by two puppets and connected to a notebook.

The two tasks were presented as games, in which the participants had to use “magic” words to teach younger children to talk (adapted from Roy & Chiat, 2004). Precisely, the NWR task was presented as a sort of “parrot game” in which they had to listen and repeat single nonwords. The NWD task was explained as a “same–different”, “right–wrong”, or “good–bad” task, depending on the children’s attitude and age. The nonword stimuli of each pair were presented with an inter-stimulus interval of 500 ms and played in stereophonic mode: the first stimulus of the pair was reproduced on the right speaker (disguised as the Talking Cricket) and the second one on the left speaker (disguised as Pinocchio). Every sound emission came accompanied by a hand gesture from the experimenter in order to guide the participant towards the origin of the sound source. After each pair of stimuli, the children were asked if the left puppet correctly repeated the nonword uttered by the right one. Children responded verbally using the same labels we used to explain the task.

For each child, a pre-test session was conducted to make sure the assignments were clear. The administration of the tasks proceeded at a normal pace, also in accordance with the children’s collaboration and participation, and the presentation of the next trial in both tasks took place only after the children’s response. For each trial in each task, stimuli were played back to the children no more than twice: in the absence of a response in the NWR task, no repetition was recorded, while in the NWD task, the trial was skipped and marked with a wrong answer.5 Each testing session was audio recorded at 44.1 kHz, wav-16-bit mono by means of an AKG Perception 120 microphone placed at a distance of about 40 cm from the participant and connected to an Edirol UA-101 soundcard. Verbal feedback and reinforcement were used while administering both tasks to stimulate the children’s cooperation. When a child showed decreased concentration or lack of collaboration, we attempted to complete the block. A total of 22 children were provisionally retained in the sample described in Section 2.1 if they completed the NWR task, and later removed from the NWD dataset only, following the procedure described in Section 2.8 and better detailed in Section 3.2.

2.6. Transcription Criteria for the Productions from the NWR Task

The productions from the NWR task were first segmented and orthographically transcribed by the first author. The resulting .wav files, together with the orthographic lists of nonwords as produced by each child, were force-aligned using the tools available via The BAS WebServices (Kisler et al., 2017) to produce PRAAT’s TextGrid files which were further processed by means of custom-made PRAAT scripts in order to retrieve information on the target nonwords. At this stage, transcription guidelines were prepared after a training phase consisting of collective transcription sessions among the four authors (see transcription guidelines provided in Appendix A for more details).

Subsequently, three authors (V.G., M.P., C.Z.) further revised the TextGrids by: (a) manually correcting phone and word boundaries across the tiers, where needed, and (b) correcting the phonetic transcriptions by means of auditory feedback via headphones and visual inspection of waveform and sonogram. All productions were phonetically annotated using the International Phonetic Alphabet (IPA), and we agreed to adopt a perceptual criterion, meaning that only perceptually salient phones were to be segmented and transcribed; in contrast, perceptual alterations that could be explained by contextual coarticulatory effects were not transcribed.

Once all the transcriptions were completed and all cases of ambiguity or doubt solved, the TextGrids were (a) checked via scripting for potential typing issues (e.g., inconsistent labels, blank spaces, etc.) and (b) converted to .csv files using a custom-made PRAAT script. The transcriptions were imported via .csv files into PHON (Hedlund & Rose, 2023), where multiple repetitions, previously tagged as to be excluded, received the flag ‘exclude’ for the specific record. Similarly, after identifying all the allophones of /r/ through specific PHON queries, these were arbitrarily marked in PHON as ‘distorted’ by means of the corresponding diacritic. In some of PHON’s analyses (e.g., PCC and variants), it is possible to consider the ‘distorted’ phones as correct when analyzing children’s productions if a correspondence between the IPA Target (the expected production) and the IPA Actual (the child’s production) is established. Additional data integrity checks were carried out and, finally, potential syllabification issues were manually reviewed and fixed.

In order to account for potential transcribers’ bias in the transcriptions (Shriberg & Lof, 1991), 20% of the children (i.e., 21 out of 104) were randomly selected to possibly include at least 3 or 4 children by age group according to a 6-month age grouping criterion (i.e., 3;0–3;5, 3;6–3;11, 4;0–4;5, 4;6–4;11, 5;0–5;5, and >5;6). Their linguistic productions were transcribed by the second author (G.L.) using PHON’s ‘Blind transcription mode’ and following the same transcription guidelines. Inter-transcriber agreement for each child’s production was obtained using PHON’s dedicated ‘Inter-transcriber reliability’ function. For this computation, only consonants were considered, while diacritics were discarded; potential allophones, such as different realizations of /r/, were considered at this stage as alternative instances of /r/. The agreement between transcribers reached an average of 84.0% for the checked transcriptions (n = 21; min = 69.8%; max = 92.2%; sd = 6.81; see Table A4 in Appendix B for more detailed information about age group-related inter-transcriber reliability).

2.7. Analysis of the Productions from the NWR Task

After removing the pre-test items, the analysis of the children’s NWRs was carried out in PHON thanks to its powerful query and analysis composer system.

Each child’s production was analyzed with specific search queries in terms of:

Percent Consonants Correct (PCC), Percent Consonants Correct without epenthesis (PCC_NoEpen), and number of deleted, substituted, and epenthesized phones; allophones, which had been marked as ‘distorted’ using PHON’s dedicated diacritic, were considered correct in the query setup window, while all other diacritics were ignored.
Percent word match, i.e., the amount of nonwords produced that match the adult target; vowel quality, as well as all the diacritics, was not considered.
Phone similarity, which “measures how similar two phones or strings of phones are within a target-actual aligned pair based on the number of descriptive phonological matchings divided by the maximal number of potential matches” (i.e., phone similarity (%) = nr. of matched features ÷ max[nr. of target features, nr. of actual features] × 100).6 This analysis allowed us to derive a featural categorization of all the consonants’ substitutions in terms of changes in manner, place, and voicing (along with possible combinations of these three terms), as well as correct productions without epentheses (i.e., PCC_NoEpen)7 and deletions. For this analysis, substitutions are computed for each child with reference to the total amount of IPA targets, which corresponds to 113 for a complete NWR test session. Henceforth, we will use the term ‘featural analysis’ to account for this type of substitution categorization.
Repetition accuracy, in order to verify the children’s ability to correctly repeat the target consonants proposed with the nonwords. The repetition accuracy for each IPA target (consonant) was computed separately for each participant, consonant, and word position (word-initial or word-medial). Specifically, for each participant and position, the repetition accuracy for each consonant was computed as the number of times that the consonant was correctly produced divided by the number of times the consonant was proposed in the NWR task, multiplied by 100, in order to account for unequal numbers of consonants in each position, thus mitigating potential biases due to unbalanced distributions. Finally, group-level means and standard errors were derived from these participant-level percentages to reflect individual variability and differences.
Phonological processes, in order to account for the children’s behavior in modifying either (1) the phonological system or (2) the phonotactic structure (Ingram, 1976; Grunwell, 1987; Zanobini et al., 2012; Bernthal et al., 2017). Phonological processes affecting the phonological system include devoicing, voicing, fricative stopping, fricativization, lateralization, delateralization, liquid gliding, affrication, deaffrication, backing, and fronting. Phonological processes affecting the phonotactic structure include vowel epenthesis, vowel deletion, consonant epenthesis (including consonant lengthening, which was also searched separately), and consonant deletion (including degemination, which was also searched separately). These phonological processes were retrieved for each child through PHON’s ‘analysis composer’ function by means of specific queries and with reference to (a) the consonant’s position (word-initial, word-medial, and word-final) and (b) the number of targets that could potentially be affected by a certain process in the given position (a condition that makes the denominator change for each type of process; see Table A5 in Appendix B for further details).

2.8. Analysis of the Same–Different Responses from the NWD Task

After removing the pre-test trials, the children’s responses to the NWD task were analyzed in terms of discrimination accuracy, i.e., by computing a proportion of correct discrimination score for each testing block of the task (including both target and control/distractor test pairs).

Furthermore, the responses were analyzed according to signal detection theory in order to assess the children’s perceptual sensitivity by taking into account the perceivers’ response bias (Macmillan & Creelman, 2005). Additional scores were computed for each child’s testing block, including the following sensitivity measures:

A-prime (or A′)—a non-parametric index of discrimination performance (similar to d-prime), which allows a correction for response bias. It measures the sensitivity of a participant to correctly discriminate the test items in a task.
d-prime (or d′)—a measure that provides an assessment of a participant’s performance while accounting for the listener’s response bias (i.e., how much the subject is responding based on the differences in the test items versus using a response strategy that does not mirror the test items’ differences, e.g., a participant who may prefer ‘same’ responses to ‘different’ responses, or vice versa).

Before being able to compute d-prime and A-prime scores, each child’s response was first labeled and coded in terms of hit, miss, false alarm, and correct rejection, and then summarized within each participant’s testing block in terms of total number of hits, misses, false alarms, and correct rejections by using two distinct R functions (Tichko, 2021):

Hit, if a different target test pair was correctly identified by the child as being different.
Miss, if a different target test pair was identified by the child as being same.
False alarm, if a same distractor test pair was identified by the child as being different.
Correct rejection, if a same distractor test pair was identified by the child as being same.

Based on the above procedure, the hit rate (i.e., the number of hits divided by the number of target test pairs) and the false alarm rate (i.e., the number of false alarms divided by the number of distractor test pairs) were computed. Then, sensitivity scores, i.e., d-prime and A-prime, were computed using the dprime() function from the R psycho package (v. 0.6.1; Makowski, 2018). The highest d-prime value for each testing block is 2.98 and corresponds to the ceiling performance (i.e., the child responded correctly to all of the items in the testing block).8 Similarly, A-prime ranges from 0 to 1, where 1 represents perfect discrimination (i.e., the participant perfectly discriminates the test items) and 0.5 or lower represents chance or a lack of sensitivity to a contrast (Grier, 1971).

After accounting for potentially unreliable children as indexed by the A-prime score for each child’s testing block, the remaining scores of each child were averaged across blocks. Furthermore, discrimination accuracy was investigated in terms of phonological opposition with reference to the contrasting consonants in each nonword pair (i.e., manner of articulation, place of articulation, voicing, length, manner, and place of articulation) and in terms of number of distinctive acoustic features that differentiate contrasting consonants in each nonword pair (from 1 to a maximum of 4).

2.9. Statistical Analyses

Data exploration, additional computations, and all the statistical analyses were carried out in RStudio (v. 2024.04; Posit Team, 2024) with R (v. R 4.4.0; R Core Team, 2024). All plots were produced with the ggplot2 package (Wickham, 2016).

Correlations were computed using the function tab_corr() from the package sjPlot (Lüdecke, 2024) and visually inspected by means of either the packages corrplot (Wei & Simko, 2021) or PerformanceAnalytics (Peterson & Carl, 2020). Assumptions for correlations were preliminarily verified as appropriate by means of Shapiro–Wilk tests, which, in all cases, suggested the use of Spearman’s method. Effect sizes for correlations were labeled according to Funder and Ozer (2019) as implemented in the report package (Makowski et al., 2023) used to extract statistical analyses. Significance level was set at α = 0.05, and multiple correlations were controlled for using Benjamini and Hochberg (1995) False Discovery Rate correction.

Simple regression analyses were computed using the lm() function from the R stats package, and model assumptions were visually checked using the check_model() function from the package performance (Lüdecke et al., 2021).

Generalized linear mixed-effects models (i.e., mixed-effects logistic regression) were fitted using the glmer() function available in the package lme4 (Bates et al., 2015). Comparison of models against the ‘null’ model was achieved with the anova() function from the package stats available in R, while the goodness of fit of each model was assessed by means of Akaike’s Information Criterion (AIC) in order to find the best-performing model with the lowest prediction error (i.e., the lowest AIC index). p-values for fixed effects were obtained through the Anova() function from the package car (Fox & Weisberg, 2019), and model diagnostics were achieved by visual inspection using the functions available in the package DHARMa (Hartig, 2022).

The results from the fitted models and all the statistical analyses were extracted and reported in this paper with the support of the package report (Makowski et al., 2023) and the function tab_model() of the package sjPlot (Lüdecke, 2024), while graphical model predictions were plotted with the package ggeffects (Lüdecke, 2018).

3. Results

We first present the results for the NWR and the NWD tasks separately, and then we compare the performances of the children on the two tasks.

3.1. NWR Results

In this section, results are provided for 104 children. One child completed only the first two blocks, producing 43 nonwords (i.e., 89 IPA target consonants), while all other children completed all three blocks, producing all of the 55 nonwords for a total of 113 IPA target consonants per child (with the exception of 1 child producing 53 nonwords and 6 children producing 54 nonwords; that, is 109 and 111 IPA target consonants, respectively).

Overall, 5700 nonwords and a total of 11,712 IPA target consonants were transcribed.

3.1.1. Repetition Accuracy

Figure 1 and Figure 2 provide the percentage of correctly produced IPA targets within each IPA target and position combination for each child, pooled by age group and the position in which these consonants were proposed (in word-initial and word-medial position, respectively). It is worth noting here that for the targets tested in word-initial position (see Figure 1), independently of age, those with the lowest overall mean accuracy in production are all voiced stop consonants (i.e., /g/ 52.98%, /d/ 63.31%, /b/ 77.96%).

For word-medial position (see Figure 2), we can see that, overall, independently of age, children are less accurate in the production of the following IPA targets: /ʣ/ with 26.44% accuracy; /ʎ/ with 27.88%; /ʦ/ with 54.33%; /ʤ/ with 59.62%; and /z/ with 69.40%.

The number of matching nonwords (i.e., children’s productions matching the target items), Percent Consonants Correct (PCC), and Percent Consonants Correct without epenthesis (PCC_NoEpen) increases with children’s age (Figure 3). Conversely, the number of substitutions decreases. Additionally, consonant deletions (found in about 41% of the children) and consonant epentheses (found in about 93% of the children) are present in all age groups.

The number of matching nonwords spans from 29.09% in the 3;0–3;5 and 3;6–3;11 age groups to 96.36% in the >5;6 age group (overall mean = 61.83%, sd = 15.62%, Q1 = 50.91, Q3 = 74.55%). Similarly, the percent consonant correct (PCC) score spans from 53.45% and 52.07% in the 3;0–3;5 and 3;6–3;11 age group, respectively, to 98.23% in the >5;6 age group (overall mean = 78.33%, sd = 10.06, Q1 = 72.33%, Q3 = 86.13%; for more details see Table A6 in Appendix B). As expected, percent matching nonwords and PCC are highly correlated (rho = 0.98, S = 2952.96, p < 0.001).

The overall PCC improvement is also confirmed by a simple regression analysis, by which a linear model (estimated using OLS) was fitted to predict PCC with age (formula: pcc ~ age). As expected, the model explains a statistically significant and substantial proportion of variance (R² = 0.34, F(1, 102) = 52.34, p < 0.001, adj. R² = 0.33). The model’s intercept, corresponding to age in months = 0, is 46.76 (95% CI [37.95, 55.56], t(102) = 10.54, p < 0.001). Within the model, the effect of age is statistically significant and positive (β = 0.60, 95% CI [0.43, 0.76], t(102) = 7.23, p < 0.001; Std. β = 0.58, 95% CI [0.42, 0.74]).

3.1.2. Featural Analysis

Figure 4 reports the results of the featural analysis of the children’s substitutions (recall that in this analysis, the count of the substitutions is computed for each child with reference to the total amount of IPA targets). In general, a high variability characterizes the sample, particularly in voicing substitutions in word-initial position. Other substitutions are especially found in the youngest age group (3;0–3;5) for place and place-manner in word-initial position, while place, place-manner, and manner substitutions are present with decreasing magnitudes in all age groups for word-medial position. Irrespective of position, the substitutions analyzed in terms of features showed that all but voicing negatively and significantly correlate with age (i.e., the number of substitutions decreases as age increases) as indexed by a series of Spearman’s correlations after Benjamini–Hochberg’s correction (see Table A7 in Appendix B).

3.1.3. Phonological Processes Affecting the Phonological System

Figure 5 and Figure 6 present data for phonological processes. It can be noted that some processes are present in all the children (with a varying degree of incidence as a function of age), while others are less frequent or quasi-absent in many children. However, the reader has to bear in mind that all the processes shown here are biased by the criteria that led to the construction of the nonwords: for example, the absence of deaffrication processes in word-initial position is due to the absence of IPA targets that can be potentially affected in the nonwords (see Table A5 in Appendix B for further details).

As expected, the children in the youngest age group (3;0–3;5) are those showing more phonological processes both in word-initial and word-medial position. Going into more detail, Figure 5 shows that voicing substitutions are mainly due to devoicing phenomena (as also shown in Figure 4), especially in word-initial position (word-initial: mean = 32.27%, min = 0%, max = 100%, sd = 25.30%; word-medial: mean = 19.56%, min = 0%, max = 87.50%, sd = 18.97%), with a very limited number of cases attributed to voicing phenomena (word-initial: mean = 1.54%, min = 0%, max = 20%, sd = 3.17%; word-medial: mean = 1.75%, min = 0%, max = 17.65%, sd = 3.92%). This difficulty persists even in older children in both word-initial position, where a negative and significant correlation with age is found (rho = −0.29, p = 0.029; see Table A8 in Appendix B for multiple corrected correlations of word-initial processes with age), and to a lesser extent in word-medial position, where the correlation is not significant (rho = −0.14, p = n.s.; see Table A9 in Appendix B for multiple corrected correlations of word-medial processes with age). Likewise, other processes affecting the word-initial position were found to have a negative and significant correlation with age, e.g., backing (rho = −0.40, p = 0.001) and initial vowel epenthesis (rho = −0.30, p = 0.027; see Table A8 in Appendix B for further information).

Deaffrication processes, which are only in word-medial position in our dataset (see Figure 6), have the highest incidence in the 3;0–3;5 and 3;6–3;11 age groups, with an overall mean of 30.68% (min = 0%, max = 100%, sd = 28.15%) and 20.37% (min = 0%, max = 75%, sd = 20.55%), respectively. Overall, they were found to negatively and significantly correlate with age (rho = −0.42, p < 0.001).

For other processes in word-medial position, where their presence is predominant with respect to word-initial position, in Figure 6, we highlight:

Backing processes, which show great variability across age groups (overall mean = 7.55%, min = 0%, max = 42.11%, sd = 8.95%) and are negatively but not significantly correlated with age (rho = −0.27, p = n.s.).
Fronting processes, which show a clear decrease as age increases: from a mean of 18.18% in the 3;0–3;5 age group (min = 0%, max = 50.00%, sd = 19.66%) to a mean of 1.39% in the >5;6 age group (min = 0%, max = 16.67%, sd = 4.81%). This trend is confirmed by a negative and significant association with age (rho = −0.37, p < 0.003).

3.1.4. Phonological Processes Affecting the Phonotactic Structure

Processes affecting the phonotactic structure are presented in Figure 7.

As for degemination processes, few instances were found: only 12 children were not able to correctly repeat one of the three nonwords containing a geminate consonant (i.e., 1 child in the 3;0–3;5 age group, 1 in 3;6–3;11, 5 in 4;0–4;5, and 2 in 5;0–5;5). Notably, however, a fair number of children (n = 81) showed a tendency to geminate word-medial singleton consonants: specifically, analyzing only those consonants in word-medial position that can be geminated, we find an overall percentage of substitutions of 4.44% (min = 0%, max = 17.31%, sd = 4.28%). Consonant epenthesis is attested in all positions (for 94 children in word-medial position: mean 4.65%, min = 0.88%, max = 11.72%, sd = 2.48%; for 38 children in word-initial position: mean = 5.93%, min = 1.74%, max = 11.72%, sd = 2.64%) and for a few children (n = 19) also in word-final position (mean = 4.95%, min = 0.88%, max = 8.33%, sd = 2.04%). Lastly, vowel epenthesis is even more marginal, with only 35 children for word-medial position (mean = 1.48%, min = 0.9%, max = 3.51%, sd = 0.73%) and 11 children for word-initial position (mean = 1.54%, min = 0.89%, max = 3.51%, sd = 0.79%). For correlations of these processes with age, see Table A8 and Table A9 in Appendix B for word-initial and word-medial position, respectively.

3.2. NWD Results

The analysis according to signal detection theory confirmed that some children were unreliable in the NWD task (as also observed behaviorally during testing, as mentioned in Section 2.5). Figure 8 reports A-prime scores for each child, pooled for each testing block, and shows that a fair number of children scored below chance level (A-prime = 0.50). A few extreme examples are highlighted in Figure 8 for testing block 1 and testing block 2 (e.g., children identified by it010m_54, it016m_62 and it049m_42); despite the high discrimination accuracy achieved by these children, the very low A-prime score signals that their responses are in some way biased by the way they responded to each trial (considering both test pairs and control pairs).

All those children with A-prime below chance level in block 1 were removed from the NWD dataset since we assumed they had not understood the task. Similarly, for those children with an A-prime below chance level in block 2, both the problematic block and the subsequent one (i.e., block 3) were discarded from the dataset since we assumed a decrease in attention or collaboration; in case of an A-prime below chance level in block 3, only block 3 was discarded from the dataset.

Table A10 in Appendix B provides a list of all the children with testing blocks with an A-prime score ≤ 0.49 that were removed from the rest of the analyses. Thirteen children were removed from the entire dataset; for seven children, only block 2 was removed; and for two children, only block 3 was removed. This operation left a total of 91 children for block 1, 67 for block 2, and 47 for block 3.

The analysis was then re-run, and this time, d-primes and discrimination accuracy proportions were calculated for each child’s NWD block and then averaged, in order to obtain one d-prime score and one accuracy score for each child. Lower discrimination accuracy and lower d-prime scores were found for younger children in the first three age groups (3;0–3;5, 3;6–3;11, and 4;0–4;5), suggesting they be less sensitive to the contrasting pairs presented in the NWD task: d-prime scores ranged across all age groups from 0.15 to 2.70 (mean = 1.56; sd = 0.66), while accuracy scores ranged from 0.29 to 0.96 (mean = 0.74; sd = 0.18).

Given the correlation between d-prime scores and accuracy scores (rho = 0.94, p < 0.001) and since data exploration showed a positive linear relation between d-prime scores and age, we ran a multiple linear regression analysis by fitting a linear model to predict d-prime from age and school (formula: d-prime ~ age + school). The variable school was added as a factor in order to check if background noise or the slight reverberation of the environments where the testing sessions were conducted also played a role. The overall model was found to explain a statistically significant and weak proportion of variance (R² = 0.12, F(2, 88) = 5.75, p = 0.004, adj. R² = 0.10). The effect of age was found to be statistically significant and positive (β = 0.02, t(88) = 3.37, p = 0.001), while school did not reach statistical significance (β = 0.02, t(88) = 0.13, p = 0.894).

We then analyzed the children’s responses in terms of feature change in the contrasting consonants in each pair of nonwords (i.e., length, manner, place, place and manner, and voicing) and according to the contrasting consonant’s position (i.e., word-initial and word-medial). As can be seen in Figure 9, the most challenging contrasts are those involving changes in manner and voicing in word-initial position and, to a lesser extent, in word-medial position, as well as those involving a length contrast in word-medial position. Note, however, that voicing and manner were tested in word-initial position with only one and two stimulus pairs, respectively; length with three pairs; and voicing in word-medial position with two pairs. This should also clarify why the responses are less distributed when compared to the other features on the same graph in Figure 9 (see Table A3 in Appendix B for additional details).

To further test for the effect of age (expressed in months), contrast type in terms of features (four levels: voicing, manner, place, and place–manner), and position of the contrasting consonant in the nonword (two levels: initial and medial), we analyzed the children’s raw responses to each pair in the NWD task by means of a mixed-effect binomial logistic regression using the glmer function of the lme4 package (Bates et al., 2015).9 Model fitting (estimated using ML and the Nelder–Mead optimizer) followed a manual stepwise procedure by first including the random effects and then adding the predictors (i.e., fixed effects) one by one. Each resulting model was compared against the baseline model, including only the intercept as a predictor (null model formula = correct ~ 1). Interactions did not improve model fitting and were therefore discarded. The final model (i.e., Model 1; formula: correct ~ age + features + position), with subject id and stimulus pair entered as random effects, reached the lowest AIC index, yielding significant effects for all the model terms included (cf. the Anova() output in Table 2).

The statistical output, provided in Appendix B in Table A11, shows that Model 1’s total explanatory power is substantial (conditional R² = 0.58), and the part related to the fixed effects alone (marginal R²) is 0.14. Model 1’s intercept, corresponding to age in months = 0, features = manner, and position = initial, is −3.68 log-odds (95% CI [−5.84, −1.51], p < 0.001). The model shows that the effect of age is statistically significant and positive (β = 0.06, p < 0.001). The effect of features [place] is statistically significant and positive (β = 1.78, p < 0.001). The effect of features [place_manner] is statistically significant and positive (β = 1.98, p < 0.001). The effect of features [voicing] is statistically non-significant and positive (β = 0.77, p = 0.267). The effect of position [medial] is statistically significant and positive (β = 1.05, p = 0.002). Figure 10 offers a visual representation of Model 1, where the effect of age is clearly visible: the model predicts that as the age of the children increases, the probability of correct discrimination for all types of contrasts significantly increases, with the exception of voicing, which, despite its positive effect, is not statistically significant. Moreover, lower probabilities of correct discrimination are predicted for contrasts differing in manner of articulation, particularly in word-initial position (as seen in Figure 9).

Similarly, an additional model was run (formula: correct ~ age) with subject id and stimulus pair entered as random effects, in order to test for the effect of age specifically on the discrimination of length contrasts. As impressionistically determined by inspecting the results for the length contrasts in Figure 9 above, this model revealed that the effect of age is statistically non-significant and positive (β = 0.03, 95% CI [−0.03, 0.08], p = 0.328; Std. β = 0.25, 95% CI [−0.25, 0.74]). Moreover, the model has a substantial total explanatory power (conditional R² = 0.51) and a marginal R² of 0.009 related to the fixed effects alone.

Figure 11 shows a similar analysis to the one proposed in Figure 9 by taking into account the number of differences in terms of distinctive acoustic features (e.g., 1, 2, 3, or 4) that characterize the contrasting consonants in the nonword pairs. Note that voicing, manner, and length contrasts, i.e., contrasts showing the lowest discrimination accuracies in Figure 9, are grouped here under the 1 feature difference (see Table A3 in Appendix B for more details). Looking at the whole picture, we can state that the higher the number of differences, as indexed by the number of distinctive acoustic features, the higher the children’s discrimination accuracy.

To better account for our impressionistic analysis of Figure 11, another model, i.e., Model 2, was then fitted following the same procedure described for Model 1 but replacing the predictor features with the predictor number of features (four levels: 1, 2, 3, and 4). Model 2 (formula: correct ~ age + num_features + position), with subject id and stimulus pair entered as random effects, reached the lowest AIC index, yielding significant effects for all model terms included (cf. the Anova() output in Table 3).

The statistical output, provided in Appendix B in Table A12, shows that the total explanatory power of Model 2 is substantial (conditional R² = 0.58), and the part related to the fixed effects alone (marginal R²) is 0.11. The model’s intercept, corresponding to age in months = 0, num_features = 1, and position = initial, is −3.12 (95% CI [−5.28, −0.95], p = 0.005). Within this model, the effect of age is statistically significant and positive (β = 0.06, 95% CI [0.03, 0.10], p < 0.001); the effect of num_features [2] is statistically significant and positive (β = 1.09, 95% CI [0.14, 2.04], p = 0.024); the effect of num_features [3] is statistically significant and positive (β = 1.67, 95% CI [0.66, 2.68], p = 0.001); the effect of num_features [4] is statistically non-significant and positive (β = 1.12, 95% CI [−0.24, 2.48], p = 0.106); and the effect of position [medial] is statistically significant and positive (β = 0.81, 95% CI [0.07, 1.56], p = 0.033).

The predictions of Model 2 are presented in Figure 12, where the effect of age is visible and shows an improvement in the probability of correct discrimination for all numbers of features involved, with the exception of num_features [4], which is statistically non-significant. The figure also highlights that the contrasts differing for only one feature, which are the most difficult for children to correctly discriminate, especially in word-initial position, have the lowest probability of correct discrimination.

3.3. NWD and NWR Comparison

In this section, we compare the children’s NWD and NWR overall performances by means of correlation analyses. Only those children who completed both tasks were considered (i.e., 91 children).

Since the scores for NWD accuracy and d-prime are highly and significantly correlated (rho = 0.94, p < 0.001), as are the accuracy scores for NWR PCC, PCC_NoEpen, and matching nonwords (rho between 0.96 and 0.99; all three with p < 0.001), we only report the results of a correlation analysis with respect to NWR’s PCC score and NWD’s discrimination accuracy, which showed a positive, statistically significant, large correlation (rho = 0.38, S = 77,374.26, p < 0.001).

The second multiple correlation analysis focused on the children’s ‘errors’ according to the features involved in the NWR substitutions in production, on one side, and the features involved in the contrasting pairs of the NWD leading to discrimination failure in perception, on the other side. To carry out this analysis, we filtered out all the features’ combinations in the NWR task that did not match those in the NWD task (i.e., place–voicing, manner–voicing, and place–manner–voicing).10 While all the substitutions in NWR and the failures in NWD for the considered features generally decrease with age, when correlating perception and production, interesting results are found. For instance, no significant correlation for the length feature is found (rho = −0.06, p = n.s.; see Table 4). Children are better at producing geminate consonants (resulting in very few degemination processes) than discriminating length contrast as implemented in a singleton-geminate opposition in a pair of nonwords (resulting in a relatively high number of discrimination errors). Moreover, a significant positive correlation was found in the two tasks for the features manner (rho = 0.264, p = 0.039) and voicing (rho = 0.256, p = 0.046) only.

4. Discussion

The present work aimed to investigate speech production and perception abilities in Italian preschool children with a nonword repetition (NWR) task and a nonword discrimination (NWD) task.

From the administration of the NWR task, at the segmental level, it emerges that anterior plosives and nasals (such as /t, d, n/) are typically acquired earlier than anterior fricatives (such as /s, z, ʃ, ʒ/), liquids, trills, and affricates (such as /ʧ, ʤ/). This result is supported by previous findings in Italian (Zanobini et al., 2012; Tresoldi et al., 2018) and English (Crowe & McLeod, 2020). Moreover, in the word-initial position, less accurate target consonants are all voiced stop consonants, i.e., /g, d, b/, which mainly undergo devoicing processes. In the word-medial position, less accurate target consonants are those mastered late in the development. This aligns with Tresoldi et al. (2018), who found that /ʣ/ and /ʎ/ are mastered after 7 years, /ʦ/ after 6;6 years, /ʤ/ after 5;6 years, and /z/ after 6;6 years. Children’s speech development improves as a function of age in terms of overall accuracy (cf. Figure 2).

Concerning PCC scores, we also find analogies with previous works on Italian (see Figure 5 in Zmarich et al., 2025) and other languages (refer to the review by McLeod & Crowe, 2018). In both cases, the general trend shows that children’s PCC steadily increases as age increases.

The presence of similar results between Italian and other languages seems to support the hypothesis of a universal trend in speech and sound development (Tresoldi et al., 2015), which does, however, coexist with a considerable individual variation in the acquisition of sounds, with phenomena of reversion and revision (see also Fox-Boyer et al., 2021). This hypothesis has its roots in the continuity hypothesis between babbling and first words (Oller et al., 1976; Locke, 1983; Vihman et al., 1985; Kent, 1992, among others), which was put forward as a reaction to the prevalence of the discontinuity hypothesis supported by generative phonologists based on Jakobson’s claims (Jakobson, 1968). Importantly, it must also be remembered that the development of consonant production ability is constantly shaped by changes in the anatomy and neurophysiology of the phono-articulatory system, which is the reason cross-linguistic similarities can be observed. A universally valid order of consonant acquisition was described through the identification of four hierarchical and implicational sets of consonants. Indeed, each set includes the precise physiological characteristics that are progressively needed as the ability of motor control is refined, thus allowing us to produce increasingly more complex sounds (Kent, 1992; see also Kent, 2024). The influence of the neuromotor control of physiological structures on speech development can be beautifully exemplified by studies on the acquisition of VOT and anticipatory coarticulation. A case study on voicing contrast acquisition in Italian (Zmarich et al., 2021) found that while, at 18 months of age, the child produced only voiceless plosives, at 30 months of age, she started to produced initial voiced plosives with negative VOT values appropriate for the voicing lead VOT category of Italian. The child was facing the “Aerodynamic Voicing Constraint” (Ohala, 2011, p. 64), which does not allow vocal folds to vibrate for a sufficient time if the mouth is closed. Moreover, the child also showed intrasyllabic anticipatory CV coarticulation for consonant place. This phenomenon explains differences among different articulatory places based on the strength of anatomo-physiological constraints, as predicted by Sussman et al. (1992): the bilabial occlusion, not being anatomically binding for the tongue, would allow the largest temporal overlap of the two gestures; as for dentals, the child must learn to differentiate almost independently and coordinate the tip and the back of the tongue; in the case of velars, the biomechanical constraints are the largest (the consonant and the vowel use the same articulator, the tongue dorsum). Anatomical binding of a similar sort was also invoked by Davis and MacNeilage (1995) to explain the C-V co-occurrences in speech development, and a richer developmental framework has been recently advanced by Namasivayam et al. (2020).

Our findings on PCC are even more important if coupled with featural analysis (see Section 3.1.2). In agreement with what McLeod and Crowe (2018) suggested, both place and manner of articulation play a role in the acquisition pattern of consonant phonemes, both individually and in interaction. Contrasts involving manner are mastered relatively late; therefore, it is more likely that manner has a stronger effect on children’s NWR performance with respect to place and voicing (Howell et al., 2024). Similarly, Orso et al. (2010) found that 18- to 27-month-old Italian children were more prone to make manner substitutions than place ones. This pattern reflects the predictions by Goldstein (2003) within the framework of the Articulatory Phonology (see also Best et al., 2016): the experimental hypothesis of the theory states that, in children’s substitution errors, the phonetic categories related to the place of articulation are more stable than those related to the manner of articulation. This happens because children acquire the ability to identify the articulatory organs underlying the consonant sounds they hear and then produce early; namely, they are able to analyze the acoustic pattern in its articulatory components and reassemble those components in the correct articulatory sequences necessary for the production of the sounds (Studdert-Kennedy, 2000). In contrast, the acquisition of the manner of articulation takes more time, in that children have to learn through exposure to the linguistic input how to use the articulators in different ways for the production of different sounds.

Regarding our results on phonological processes, we analyzed those affecting both the phonological system and the phonological structure. Processes affecting the phonological system are more frequent in the productions of the youngest children, especially in the first age group (i.e., 3;0–3;5). This pattern, in line with the observations by Fox-Boyer et al. (2021), shows once again that, as age increases, the phonological system stabilizes. In contrast, processes affecting the phonotactic structure seem to have a more discontinuous developmental trajectory. The interpretation of findings concerning the phonological processes and their comparisons with different studies demands a certain amount of caution because the literature lacks common guidelines. For the present project, phonological processes were investigated using custom processes available in PHON (with specific modifications to the pre-defined parameters according to our needs), which compare the IPA Target and the IPA Actual tiers (see Section 2.7). The most salient result concerns those substitutions involving the [±voice] feature as documented by the high amount of devoicing phenomena, particularly in word-initial position (cf. Figure 5). This confirms and extends previous results by Zmarich et al. (2013), who attested this difficulty in a small group of 10 Italian children from 42 to 47 months of age, and by Zmarich et al. (2021), who found that voiced stops were more difficult to produce than voiceless stops in one Italian child studied longitudinally from 1;6 to 4;0 years of age (with voicing contrast finally achieved at age 2;6). Our results also confirm and extend what was found by Lucarini et al. (2022) in a recent study involving 13 72- to 78-month-old Italian children who were tested in production with a new Italian phonetic test (TFPI, Test Fonetico per la Prima Infanzia; Zmarich et al., 2012; see also Zmarich et al., 2025), finding reduced accuracy of voiced segments, while voicing phenomena were almost absent.

Fox-Boyer et al. (2021) deeply investigated phonological processes in Italian-speaking children aged 3;0 to 4;11 years with a naming and articulation task (BVL_4–12, Marini et al., 2015). They adopted two different threshold criteria based on the frequency of occurrence of each phonological variation in order to define them as proper phonological processes. In contrast to our results, they attested a scarce frequency of occurrences of deaffrication, fricative stopping, and devoicing. The difference in the attested patterns indicates that our current knowledge about phonological processes of Italian-speaking children is still too fragmentary, both in methodological terms and developmental trajectories. Specifically, the age threshold at which to consider the presence of a certain process as atypical is not clear, with major implications for clinical practice. Considering also the great individual variability of children’s production, even among those within the same age range, the literature urges more studies that involve children up to the age of 7 in order to clarify at what age the acquisition of the adult phonological system can be defined as complete and stable. The present study constitutes a step forward in this direction.

Moving to phonological processes affecting the phonotactic structure, quite surprisingly, our data, even if collected in Veneto where the regional variety of Italian is more prone to degemination than elsewhere (see Telmon, 1993; Zamboni, 1988; Loporcaro, 2009), show a low percentage of degemination processes in almost all age groups, with the exception of the two oldest (i.e., 4;6–4;11 and >5;6). This only partially agrees with the results by Fox-Boyer et al. (2021), where the process was recorded exclusively in the second age range (i.e., 3;6–3;11). This pattern could originate from a general articulatory tendency toward strengthened consonants, like the geminates, in the acquisition of the first vocabulary (see Kunnari et al., 2001; Vihman & Majorano, 2017). If so, the rising percentages for degemination in the last two age groups of our sample could be interpreted as the effect of both the weakening of the general articulatory tendency and the fine-tuning of the children towards the Veneto variety of Italian. Conversely, as we will see in more detail below, children’s discrimination of the length contrast is often incorrect, demonstrating a phonological categorization toward degemination (see Gósy & Horváth, 2015 for a similar result on Hungarian children). Moreover, we found that consonant deletion in word-initial position occurred with a scarce frequency at all ages, whereas Fox-Boyer et al. (2021) found a low frequency of occurrence only in the second age range (i.e., 3;6–3;11). Conversely, our findings align with Tresoldi et al. (2015), where phonemes were rarely omitted and instead produced as distorted.

Finally, our results also highlight the presence of the vowel epenthesis process, both in word-initial and word-medial positions. As we saw in Figure 7, its frequency of occurrence in both word positions does not decrease with age. This process is mostly found in children’s productions as a strategy to simplify consonant clusters (for example, the word drago for ‘dragon’ may be pronounced as [dǝˈrago]), which are absent in our test items. Normally, epenthesis can be explained as cluster simplification, as a consequence of the ongoing process of motor control development toward more controllable CV syllables, or, as in the case of initial epenthesis, as a consequence of the oral airflow leakage, which is used to facilitate the start of voicing before a voiced consonant (Ohala, 2011).11 In our linguistic data, there are several cases of airflow leakage, with those involving the epenthesis of a nasal consonant outnumbering those involving a vowel segment.

Concerning the NWD task, we found lower discrimination accuracy and lower d-prime scores in younger children falling in the first three age groups (i.e., 3;0–3;5, 3;6–3;11, and 4;0–4;5). We argue here that they might be less sensitive to the contrasting pairs presented in the task, but it might also be the case that they found the task more difficult and challenging with respect to older children. This last consideration is in line with Creel (2022), who suggested that children under 4 years of age might have more difficulty in understanding how to perform auditory “same/different” tasks when it comes to nonspeech stimuli, even if they succeed in the pre-test trials. Indeed, in our study, the results of a mixed-effect binomial logistic regression showed a significant effect of age when considering the features involved in the contrasting pairs, with an exception made for the [±voice] feature. The great variability shown by the model’s predictions (Figure 10) highlights how challenging the discrimination of voicing contrasts is in both word-initial and word-medial positions. This result is in agreement with the findings of Zmarich et al. (2019), where the [±voice] feature in ˈVCV nonwords showed significantly lower discrimination accuracy scores, and is also in line with some reviews on speech discrimination in children, where voicing distinction emerges particularly late in development (Flege & Eefting, 1986; Graham & House, 1971). However, an intrinsic limitation of the study might have played a crucial role here, at least for the contrasts in the word-initial position: in the NWD task, voicing in the word-initial position was tested only once with the pair /ˈtaka/ vs /ˈdaka/, yielding discrimination accuracy scores around or below chance level for younger children. Similarly, the discrimination of word-initial target consonants involving a change in manner of articulation (tested with the nonword pairs /ˈʣaki/ vs /ˈdaki/ and /ˈsiba/ vs /ˈʦiba/) yielded contradicting results. In the first case, the children’s discrimination accuracy is above chance in all age groups, with an overall accuracy of 77.6% across age groups, while in the second case, the discrimination accuracy is rather low in all age groups, with an overall accuracy of 7.5% across age groups. The very low discrimination accuracy recorded for the pair /ˈsiba/ vs /ˈʦiba/ is likely attributed to the choice of contrasting consonants. In the Italian variety spoken in Veneto, and more specifically in Padova, /ʦ/ in word-initial position is always uttered as a voiced affricate (Mioni, 1990, p. 198; Telmon, 1993, p. 110), while here, we presented it as voiceless: it might well be the case that the children—not used to /ʦ/ in the word-initial position—did not hear the difference in the proposed pair at all, thus responding predominantly “same”. In the NWD task, many children also showed low discrimination accuracy scores for length contrasts (i.e., singleton vs geminate consonants in word-medial position) with an overall accuracy of 30.7% (see also Figure 9), with no effect of age (i.e., the accuracy did not improve with the children’s age). This result confirms the finding by Zmarich et al. (2019), where the degemination feature in ˈVCV nonwords showed the lowest discrimination accuracy compared to the other processes affecting the word structure. This can, again, be explained by the origin of the tested children: given that the northern varieties of Italian are known to degeminate geminate consonants in production (Telmon, 1993, p. 107), it cannot be excluded that our children show less sensitivity to such contrasts as an effect of what they hear in the surrounding environment.

We also found a statistically significant effect of age on the children’s accuracy in the NWD task when considering the amount of distinctive acoustic features involved in each pair, as well as the position of the contrast. The contrasts differing for only one acoustic feature were the most difficult to discriminate, especially in word-initial position, thus yielding the lowest probability of correct discrimination, with a significant improvement with age. The probability of correct discrimination further increased by age as the number of features increased, with the only exception of those pairs differing for four features, where the improvement was shown to be statistically non-significant. This last result might have, as in other cases, an intrinsic methodological origin that we acknowledge among the potential limitations of the current study. In the entire NWD task, we had a total of four pairs differing in four acoustic features, which, associated with the number of blocks completed by each child, might have caused a numerosity issue with the data used for statistical modeling. Taken as a whole, however, the outcomes of our NWD analysis are comparable to those from the study by Zmarich et al. (2019) on a group of Italian children of 4;0 to 6;0 years of age, who found the highest error rates in discrimination for sound pairs differing for only one feature, and with those from the study by Graham and House (1971, p. 563, Figure 1), who tested girls of 3;0 to 4;6 years of age on a set of English consonants using a “same/different” task. They found the highest error rates in discrimination for sound pairs differing by only one feature. The discrimination improved with pairs differing for two features, while no significant improvement was found for pairs differing for more than two features.

Lastly, we compared the results of the NWR and NWD tasks. A statistically significant positive correlation was found between the PCC score in the NWR task and discrimination accuracy in the NWD task. This suggests that children with stronger overall discrimination skills also performed better in nonword repetition. When examining specific features, voicing and manner errors in discrimination and repetition were significantly correlated. However, no such correlation was observed for place or length errors.

This relationship between speech perception and production abilities seems to align with findings in both typical and atypical populations (Edwards et al., 2002; Rvachew et al., 2004; Hearnshaw et al., 2018). For instance, Edwards et al. (2002) reported that children who correctly produced more sounds across more word positions also demonstrated better word discrimination scores. Hearnshaw et al. (2018) found a significant positive correlation between speech production and speech perception scores in terms of accuracy, although they did not find a significant link between the ability to accurately perceive and produce four specific phonemes in specific words. Similarly, Rvachew et al. (2004) found that children with the best phonemic perception skills also showed the strongest articulation performance. It is important to note, however, that we have not included a vocabulary evaluation, which, according to Hearnshaw et al. (2023), might be a better predictor of speech perception than speech production. Moreover, due to the nature of our tasks, we cannot determine whether one ability facilitates the development of the other. Nevertheless, these findings underscore the importance of comprehensive clinical assessments that address both speech perception and production, as recommended by Hearnshaw et al. (2019).

The current project presents some limitations. Even if in line with previous speech acquisition studies in Italian and other languages, our data are partial and should be interpreted with caution. This is due to the fact that the stimuli were created to study the Italian consonants’ production and discrimination abilities in children with a first language other than Italian, resulting in some consonants occurring in a certain position, but not in the other. Moreover, only disyllables were tested, and no consonant clusters were included.

Ongoing and future investigations and improvements will need to take into account the highlighted issues by adopting a more systematic approach to balancing and redefining the comparisons and contrasts among consonants in order to rule out what is going on in production and perception up to the age of 7. Particular attention should also be paid to the variation that the children’s origin induces in both their speech productions and speech discrimination abilities: this would avoid penalizing the children for particular phenomena that can be explained by the features of the regional variety of Italian spoken in the rearing environment.

In ongoing work, we are addressing other major limitations that, unfortunately, characterized the two tasks we presented here. Among these were the inclusion and control for the phonotactic probability of phones and syllables making up the nonwords; the need to have the children go through an auditory screening test before running any production or discrimination task; the need to explicitly assess the children for vocabulary to confirm that language development falls within normal limits (one possibility would be to test lexical comprehension and production with the TFL by Vicari et al., 2007); and other non-linguistic tasks to confirm the integrity of attentional mechanisms and working-memory related issues by introducing, for example, a digit-span task.

5. Conclusions

The current study provides new data on the phonological development of Italian preschool children by means of two novel tasks that rely on the use of nonwords, one for production (NWR) and one for discrimination (NWD). Furthermore, we documented how preschool children vary in their production and discrimination abilities as a function of age, showing that, despite some notable exceptions, as the children grow older, all the scores improve and the number of errors declines.

Finally, throughout this paper, we have highlighted the importance of precise transcription guidelines. We hope our methodological and procedural solutions can be of help to other researchers, especially those working with the Italian language. From a phonetic and phonological point of view, the acquisition of Italian requires more investigation and documentation, possibly along a shared transcription and annotation framework such as the one provided by PHON.

Author Contributions

Conceptualization, V.G., G.L., M.P. and C.Z.; methodology, V.G., G.L., M.P. and C.Z.; formal analysis, V.G.; investigation, V.G.; data curation, V.G. and M.P.; writing—original draft preparation, V.G., G.L., M.P. and C.Z.; writing—review and editing, V.G., G.L., M.P. and C.Z.; visualization, V.G.; supervision, V.G. and C.Z.; project administration, V.G.; funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Original data collection took place between 2010 and 2014 and was funded by the Italian National Research Council through the project “Migrazioni” (2009–2012).

Institutional Review Board Statement

The original study and the data collection were conducted between 2010 and 2014 by the authors V.G. and C.Z. and did not require, at that time, to be officially evaluated and approved by an ethics committee.

Informed Consent Statement

Written informed consent was obtained from the children’s parents, who agreed for their children to take part in the study by being tested and audio recorded.

Data Availability Statement

The data that support the findings of this study are not readily available because the data are part of an ongoing study.

Acknowledgments

We are very grateful to Yvan Rose and Greg Hedlund for their invaluable support with tailored solutions and fruitful suggestions to specific issues we encountered while preparing our data in PHON. Finally, we would like to thank the three anonymous reviewers for their careful reading and insightful comments, which helped us to improve our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Complying with Howard and Heselwood (2002, p. 372) who claimed: «An enormously important first step […] is to observe (with eyes as well as ears) the speech output and then to be able to make a valid and reliable record of it which will be transparent at a later date to the investigator and to other phonetically-trained professionals», we provide the guidelines adopted for the transcription of children’s productions in the NWR task with reference to the actual phones and actual IPA tiers. The following guidelines derive from the authors’ previous and similar transcription experiences and from collective transcription sessions:

Geminate (i.e., [+long] consonants are transcribed by reduplicating the phone symbol (e.g., [ˈbitti]) instead of using the chrono symbol (e.g., [ˈbitːi]). This was needed in order to be able to retrieve the degemination processes when importing the transcribed data into PHON. Additionally, geminate consonants were kept in the same TextGrid’s interval without splitting the phones apart: thus, in the case of perceived degemination, only one symbol was transcribed on the actual phones (and IPA) tier. Conversely, since intrinsic geminates do not give rise to phonological contrasts between singleton and geminated consonants in word-medial position, they were transcribed as singletons followed by the chrono symbol. Ambiguous segments not perceived as geminated in word-medial position were transcribed as singleton consonants. During the analysis of the transcriptions in PHON, this strategy allowed us to keep degemination processes separate from ambiguous intrinsic geminates: as a consequence, the first ones were considered errors (thus, analyzed broadly as consonant deletion processes or, more specifically, degemination processes resulting from geminate to singleton substitutions) and the second ones as allophonic and accepted realizations of intrinsic geminates (the diacritic was discarded in the analyses);
Devoiced consonants and devoicing errors are marked using the homorganic voiceless symbol. In case devoicing was present due to the absence of vocal fold activity (e.g., whispered production), the corresponding diacritic (e.g., [ d̥ ]) was used instead; the same rule was established for whispered vowels (e.g., [ ḁ ]).
Slightly interdental consonants are transcribed with a diacritic for advanced position, like in [ s̟ ].
In general, diacritics for breathiness and creakiness are used when the phenomenon affects at least 50% of the segment’s duration. For word endings, if a phenomenon is not present in at least 50% of the last vowel, it is not even annotated in the voice quality tier (e.g., glottalized vowel at the very end, creakiness, etc.).
If the child produced a sound between two sounds (e.g., [ v ] vs [ β ]) and no agreement could be found among the transcribers, the sound belonging to the Italian phonological system was preferred (e.g., [ v ]). It was also agreed that the chosen symbol could be further marked and modified with the use of specific diacritics in order not to penalize the child for slightly deviant productions with reference to the target nonword; examples include the symbol for weak articulation (e.g., [ m͉ ]), lowered (e.g., [ d̞ ]) or retracted (e.g., [ d̠ ]) phones.
In case of multiple repetitions of the same item, only one was retained for the analyses, while the others were marked on the notes tier with the tag ‘exclude’.
All cases of ambiguity or doubt for a given segment were marked on the notes tier with the label ‘check’ and were later recovered and discussed by two of the authors (V.G. and M.P.) until agreement was reached.

Additional guidelines were further devised on how to code particular pieces of information on the additional tiers, notes, and voice quality, for which the data are out of the scope of the current paper and are not reported here due to space restrictions. We briefly mention that the notes tier was used to track generic issues related to noise overlapping the children’s productions, words to be excluded, words with stress change, etc., while the voice quality tier was used to obtain a redundant annotation and categorization of particular voicing issues already marked with the use of pertaining diacritics on the actual IPA tier (such as creakiness, breathiness, etc.), phenomena that were very frequent in the population addressed here.

Appendix B

Table A1. Stimuli used in the NWR test. For methodological reasons, intrinsic geminates have been transcribed as singletons followed by the IPA symbol for length (e.g., /ˈkaʦːa/), while real geminates have been transcribed by reduplication of the consonant (/ˈdaffi/). Bold phonemes in the IPA stimulus column represent the target consonants that are not present in the other target languages originally tested with the NWD and NWD tasks.

Testing Block ID	Trial	Stimulus (Orthography)	Stimulus (IPA)	Stimulus (SAMPA)
1	1	tazi	/ˈtaʣːi/	090_tadzi
	2	basa	/ˈbaza/	004_baza
	3	paiu	/ˈpaju/	069_paju
	4	tala	/ˈtala/	092_tala
	5	chila	/ˈkila/	046_kila
	6	pima	/ˈpima/	071_pima
	7	dafi	/ˈdafi/	012_dafi
	8	gaia	/ˈgaja/	030_gaja
	9	gana	/ˈgana/	031_gana
	10	ghisi	/ˈgizi/	034_gizi
	11	divi	/ˈdivi/	019_divi
	12	daza	/ˈdaʣːa/	010_dadza
	13	caza	/ˈkaʦːa/	044_katsa
	14	digi	/ˈdiʤi/	018_didZi
	15	bisi	/ˈbizi/	009_bizi
	16	caia	/ˈkaja/	041_kaja
	17	bagia	/ˈbaʤa/	002_badZa
	18	bina	/ˈbina/	006_bina
	19	chira	/ˈkira/	048_kira
	20	pafa	/ˈpafa/	068_pafa
	21	piba	/ˈpiba/	070_piba
	22	ghina	/ˈgina/	033_gina
	23	tisa	/ˈtisa/	097_tisa
	24	tigna	/ˈtiɲːa/	095_tiJa
2	1	fisci	/ˈfiʃːi/	028_fiSi
	2	sagliu	/ˈsaʎːu/	077_saLu
	3	lali	/ˈlali/	050_lali
	4	naua	/ˈnawa/	061_nawa
	5	liva	/ˈliva/	053_liva
	6	signi	/ˈsiɲːi/	086_siJi
	7	miscia	/ˈmiʃːa/	057_miSa
	8	lari	/ˈlari/	051_lari
	9	naglia	/ˈnaʎːa/	062_naLa
	10	iuzi	/ˈjuʦːi/	040_jutsi
	11	fisi	/ˈfisi/	027_fisi
	12	sicia	/ˈsiʧa/	088_sitSa
	13	iaui	/ˈjawi/	038_jawi
	14	sini	/ˈsini/	085_sini
	15	daffi	/ˈdaffi/	011_daffi
	16	ialli	/ˈjalli/	037_jalli
	17	bitti	/ˈbitti/	008_bitti
	18	simi	/ˈsimi/	083_simi
	19	nici	/ˈniʧi/	066_nitSi
3	1	biti	/ˈbiti/	007_biti
	2	disi	/ˈdizi/	020_dizi
	3	padi	/ˈpadi/	067_padi
	4	piti	/ˈpiti/	072_piti
	5	cata	/ˈkata/	043_kata
	6	baiu	/ˈbaju/	003_baju
	7	tasi	/ˈtazi/	093_tazi
	8	dasa	/ˈdaza/	016_daza
	9	gada	/ˈgada/	029_gada
	10	ghiba	/ˈgiba/	032_giba
	11	tina	/ˈtina/	094_tina
	12	chima	/ˈkima/	047_kima

Table A2. Number of consonants (in IPA) tested in the NWR task by position in the word. Empty cells indicate that the consonant was not tested in that position.

Phoneme	Word-Initial	Word-Medial
p ¹	6	-
b	7	2
t	6	3
d	7	2
k	6	-
g ¹	6	-
f	2	2
v ¹	-	2
m	1	3
n	3	5
ɲ ¹	-	2
r ¹	-	2
l	3	3
ʎ ¹	-	2
j	3	4
w ¹	-	2
s	5	2
z	-	6
ʃ ¹	-	2
ʧ ¹	-	2
ʤ ¹	-	2
ʦ ¹	-	2
ʣ ¹	-	2
ff ²	-	1
ll ²	-	1
tt ²	-	1

¹ Target consonants that are not present in the languages other than Italian and originally targeted with the current NWR task. ² Consonants used with the prosodic feature [+long] to test gemination contrast in word-medial position and are not present in the languages other than Italian and originally targeted with the current NWR task.

Table A3. Stimuli pairs used in the NWD test and associated details for each pair (pre-test items not included). Contrast difference in terms of number of distinctive acoustic features (# acoustic features) follows the matrix elaborated by Mioni (1983, p. 64).

Testing Block ID	Trial	Stimuli Pairs (SAMPA)	Contrast (IPA)	Contrast Position	Contrast Diff. (# Acoustic Features)	Contrast Diff. (Features)
1	1	011_daffi–012_dafi	ff–f	m	1	length
	2	017_didi–017_didi	control	-	-	control
	3	017_didi–018_didZi	d–ʤ	m	3	place_manner
	4	024_fafa–068_pafa	f–p	i	2	place_manner
	5	029_gada–001_bada	g–b	i	2	place
	6	036_jali–038_jawi	l–w	m	4	place_manner
	7	044_katsa–043_kata	ʦ–t	m	1	manner
	8	046_kila–048_kira	l–r	m	1	manner
	9	053_liva–104_liza	v–z	m	2	place
	10	056_misa–056_misa	control	-	-	control
	11	057_miSa–056_misa	ʃ–s	m	2	place
	12	058_nabi–058_nabi	control	-	-	control
	13	060_nala–062_naLa	l–ʎ	m	2	place
	14	074_sadza–078_saza	ʣ–z	m	1	manner
	15	087_sita–088_sitSa	t–ʧ	m	3	place_manner
	16	091_taka–014_daka	t–d	i	1	voicing
	17	094_tina–095_tiJa	n–ɲ	m	3	place
2	1	005_biba–005_biba	control	-	-	control
	2	012_dafi–013_dZafi	d–ʤ	i	3	place_manner
	3	021_dzaki–015_daki	ʣ–d	i	1	manner
	4	026_fapa–024_fafa	p–f	m	2	place_manner
	5	034_gizi–009_bizi	g–b	i	2	place
	6	035_jabi–073_rabi	j–r	i	3	place_manner
	7	039_jusi–039_jusi	control	-	-	control
	8	047_kima–047_kima	control	-	-	control
	9	050_lali–101_wali	l–w	i	4	place_manner
	10	058_nabi–059_Jabi	n–ɲ	i	3	place
	11	079_siba–099_tsiba	s–ʦ	i	1	manner
	12	081_sika–082_Sika	s–ʃ	i	2	place
	13	087_sita–080_sida	t–d	m	1	voicing
	14	096_tSini–085_sini	ʧ–s	i	3	place_manner
	15	097_tisa–098_tissa	s–ss	m	1	length
	16	102_Laki–049_laki	ʎ–l	i	2	place
	17	103_bama–100_vama	b–v	i	2	place_manner
3	1	004_baza–002_badZa	z–ʤ	m	3	place_manner
	2	007_biti–008_bitti	t–tt	m	1	length
	3	020_dizi–019_divi	z–v	m	2	place
	4	024_fafa–024_fafa	control	-	-	control
	5	028_fiSi–027_fisi	ʃ–s	m	2	place
	6	033_gina–006_bina	g–b	i	2	place
	7	054_maja–055_mara	j–r	m	3	place_manner
	8	060_nala–061_nawa	l–w	m	4	place_manner
	9	065_nisi–066_nitSi	s–ʧ	m	3	place_manner
	10	071_pima–047_kima	p–k	i	2	place
	11	075_saju–077_saLu	j–ʎ	m	4	place
	12	078_saza–076_sasa	z–s	m	1	voicing
	13	079_siba–079_siba	control	-	-	control
	14	085_sini–086_siJi	n–ɲ	m	3	place
	15	089_sitsa–087_sita	ʦ–t	m	1	manner
	16	090_tadzi–093_tazi	ʣ–z	m	2	manner
	17	092_tala–092_tala	control	-	-	control

Table A4. Inter-transcriber reliability summary statistics by age group.

Age Group	n	Mean	Min	Max	sd
3;0–3;5	3	77.8	72.6	82.8	5.1
3;6–3;11	5	82.3	69.8	91.1	10.1
4;0–4;5	4	81.3	73.8	87.1	5.9
4;6–4;11	4	89	87.1	92.2	2.3
5;0–5;5	3	88.8	88.2	89.1	0.5
>5;6	2	85.4	82.9	87.9	3.6

Table A5. Phonological processes analyzed via PHON’s ‘analysis composer’ function with user-defined queries for the NWR data. We provide, for each process, the amount of IPA Targets (i.e., consonants) by position in the nonword (word-initial and word-medial) that can potentially be affected by each phonological process in the case of a complete NWR test session, which consists of 55 nonwords and 113 IPA targets.

Type of Process	Phonological Process	Word-Initial Position	Word-Medial Position
Sound substitution	affrication	7	16
	backing	18	19
	deaffrication	- ¹	8
	delateralization	3	7
	devoicing	20	16
	fricative stopping	17	16
	fricativization	38	9
	fronting	12	12
	lateralization	7	18
	liquid gliding	3	9
	voicing	25	17
Modification of phonotactic structure	degemination	-	3
	gemination	-	52
	consonant deletion	55	58
	consonant epenthesis	113 ²
	vowel deletion	110 ³
	vowel epenthesis	110 ²

¹ Not present since no affricate consonant was used in word-initial position in the list of nonwords used here. ² Includes all positions, i.e., word-initial, word-medial, and word-final positions. For the epenthesis process, this number is provided as a baseline with reference to the IPA Targets since “unlike the other processes IPA Actual is queried before IPA Target during the analysis. Likewise, the ‘count’ shown in tables of this analysis is the number of elements found in IPA Actual” (cf. PHON’s online manual available at https://www.phon.ca/phon-manual/analysis/phonological_processes.html?hl=epenthesis (accessed on 17 February 2024)). ³ Includes only word-medial and word-final positions.

Table A6. Descriptive statistics for consonants’ repetition accuracy by age group for the NWR task. Minimum (min), maximum (max), mean, median, standard deviation (sd), first quartile (Q1), third quartile (Q3), and inter-quartile range (IQR) are all expressed in percentages.

Age Group	Variable	Min	Max	Mean	Median	sd	Q1	Q3	IQR
3;0–3;5	matching nonwords	29.09	58.18	46.45	45.45	10.07	39.09	55.45	16.36
	PCC	53.45	76.03	66.85	66.09	7.66	62.52	75.10	12.59
	PCC_NoEpen	54.87	81.42	70.24	69.03	7.68	66.38	77.44	11.06
	substituted	17.36	39.66	27.39	29.06	6.50	21.46	30.77	9.31
	deleted	0.00	4.31	0.93	0.84	1.19	0.40	0.85	0.45
	epenthesized	1.74	10.32	4.83	3.42	2.62	3.00	6.61	3.61
3;6–3;11	matching nonwords	29.09	87.27	55.31	56.36	15.63	42.20	66.36	24.16
	PCC	52.07	92.24	74.66	75.61	10.05	68.56	81.27	12.71
	PCC_NoEpen	55.75	94.69	78.33	78.76	9.90	71.68	85.84	14.16
	substituted	4.31	39.67	19.74	19.69	8.76	13.39	23.69	10.30
	deleted	0.00	6.96	0.86	0.00	1.46	0.00	0.98	0.98
	epenthesized	0.00	11.02	4.75	5.04	2.69	2.59	6.61	4.02
4;0–4;5	matching nonwords	33.33	80.00	59.01	60.00	11.61	50.91	65.45	14.54
	PCC	55.93	89.66	77.09	78.07	8.10	72.50	81.36	8.86
	PCC_NoEpen	59.46	92.04	80.75	80.53	8.07	78.76	85.84	7.08
	substituted	7.76	38.14	17.93	17.21	7.65	13.04	20.18	7.14
	deleted	0.00	2.40	0.44	0.00	0.64	0.00	0.86	0.86
	epenthesized	0.88	11.72	4.55	4.24	2.84	2.59	5.83	3.24
4;6–4;11	matching nonwords	47.27	90.91	67.21	70.37	15.39	52.73	78.18	25.45
	PCC	68.85	95.65	81.99	82.91	8.63	75.22	86.32	11.10
	PCC_NoEpen	74.34	97.35	84.74	85.84	7.54	78.76	89.38	10.62
	substituted	2.61	23.77	14.47	13.68	6.82	10.26	19.51	9.25
	deleted	0.00	1.77	0.20	0.00	0.59	0.00	0.00	0.00
	epenthesized	0.00	8.13	3.35	3.42	2.84	1.74	3.42	1.68
5;0–5;5	matching nonwords	49.09	89.09	71.69	73.63	13.19	60.00	82.27	22.27
	PCC	70.59	94.59	83.94	85.35	7.97	78.05	91.34	13.29
	PCC_NoEpen	72.57	94.69	87.33	89.38	7.17	82.08	94.00	11.92
	substituted	4.50	25.00	11.64	9.88	6.51	5.69	15.90	10.21
	deleted	0.00	1.72	0.47	0.00	0.58	0.00	0.85	0.85
	epenthesized	0.00	8.26	3.95	3.83	2.24	2.38	5.83	3.45
>5;6	matching nonwords	67.27	96.36	76.01	74.55	8.24	70.31	77.72	7.41
	PCC	81.74	98.23	87.58	87.93	4.18	84.84	88.47	3.63
	PCC_NoEpen	83.19	98.23	90.48	90.27	4.55	87.39	94.61	7.22
	substituted	1.77	15.65	9.05	9.48	4.44	4.95	12.13	7.18
	deleted	0.00	1.72	0.22	0.00	0.54	0.00	0.00	0.00
	epenthesized	0.00	9.02	3.16	2.59	2.69	1.52	4.44	2.92

Table A7. Spearman’s correlation matrix of the NWR’s substitutions analyzed in terms of features, irrespective of position in the word.

	1	2	3	4	5	6	7	8
age
voicing	−0.203
place	−0.534 ***	0.043
place-manner	−0.473 ***	0.125	0.499 ***
manner	−0.353 ***	0.002	0.156	0.232 *
place-manner-voicing	−0.343 ***	0.238 *	0.256 *	0.391 ***	0.264 *
deletion	−0.259 *	0.157	0.166	0.385 ***	0.309 **	0.397 ***
manner-voicing	−0.251 *	0.508 ***	0.087	0.178	0.148	0.363 ***	0.298 **
place-voicing	−0.349 ***	0.272 **	0.347 ***	0.383 ***	0.085	0.503 ***	0.274 **	0.378 ***

Note: Computed correlation used the Spearman method with listwise deletion. Significance level adjusted for multiple correlations after Benjamini–Hochberg’s approach. Significance levels: p < 0.05 = *, p < 0.01 = **, p < 0.001 = ***.

Table A8. Spearman’s correlation matrix of the NWR’s substitutions in word-initial position analyzed in terms of phonological processes.

	1	2	3	4	5	6	7	8	9	10	11	12	13
age
affrication	−0.225
backing	−0.400 **	0.369 ***
consonant deletion	−0.178	0.225	0.287 *
consonant epenthesis	−0.222	−0.142	0.079	−0.120
delateralization	−0.183	0.061	0.045	0.152	0.141
devoicing	−0.290 *	0.071	−0.051	0.355 **	−0.136	0.002
fricative stopping	−0.101	−0.06	0.134	−0.031	−0.012	0.113	0.076
fricativization	−0.042	0.021	0.084	0.339 **	−0.063	0.183	0.069	−0.112
fronting	−0.097	0.045	0.143	0.226	0.015	−0.081	0.100	−0.180	−0.035
lateralization	−0.072	−0.037	0.113	−0.058	0.177	−0.022	−0.160	−0.049	−0.039	−0.036
liquidgliding	−0.153	−0.037	0.188	−0.058	0.188	0.438 ***	−0.127	0.220	0.280 *	−0.036	−0.010
voicing	−0.075	0.027	0.196	0.303 *	0.039	0.054	−0.081	0.049	0.211	0.136	−0.058	0.145
vowel epenthesis	−0.295 *	−0.13	0.193	0.129	0.175	0.063	−0.019	0.093	0.159	0.189	0.298 *	0.274 *	0.135

Note: Computed correlation used the Spearman method with listwise deletion. Significance level adjusted for multiple correlations after Benjamini–Hochberg’s approach. Significance levels: p < 0.05 = *, p < 0.01 = **, p < 0.001 = ***.

Table A9. Spearman’s correlation matrix of the NWR’s substitutions in word-medial position analyzed in terms of phonological processes.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
age
affrication	−0.189
backing	−0.267	0.200
consonant deletion	−0.200	0.042	0.009
consonant epenthesis	−0.257	0.027	0.232	−0.056
deaffrication	−0.416 ***	0.052	0.009	0.149	0.132
degemination	−0.094	−0.180	−0.122	0.650 ***	−0.056	0.056
delateralization	−0.097	0.036	0.192	−0.117	−0.011	0.036	−0.124
devoicing	−0.143	0.219	−0.033	0.079	0.152	0.087	0.094
fricative stopping	−0.222	0.080	0.272	0.236	0.059	0.092	0.102	−0.126	0.062
fricativization	−0.058	0.059	−0.075	−0.030	−0.157	0.071	0.051	0.130	0.025	0.005
fronting	−0.373 **	0.075	0.145	0.049	0.163	0.207	0.099	0.063	0.011	0.261	−0.104
gemination	−0.008	−0.083	0.148	−0.165	0.661 ***	0.012	−0.100	0.139	−0.049	−0.120	−0.224	0.001
lateralization	−0.200	−0.020	0.308 *	0.047	0.062	−0.085	−0.009	−0.058	−0.068	0.262	−0.079	0.181	0.47
liquid gliding	−0.001	0.090	0.065	0.180	−0.100	−0.073	0.250	0.302 *	0.047	0.057	−0.059	0.006	0.031	−0.003
voicing	−0.101	0.037	0.075	0.066	0.062	0.030	−0.024	−0.129	0.069	−0.041	−0.122	0.109	0.026	0.205	−0.015
vowel epenthesis	−0.191	0.005	0.118	0.178	0.034	0.212	0.092	−0.062	0.065	0.084	0.168	0.042	−0.027	−0.091	0.044	0.036

Note: Computed correlation used the Spearman method with listwise deletion. Significance level adjusted for multiple correlations after Benjamini–Hochberg’s approach. Significance levels: p < 0.05 = *, p < 0.01 = **, p < 0.001 = ***.

Table A10. Children with an A-prime score ≤ 0.49 whose blocks have been considered unreliable and have been removed from the NWD dataset for the remainder of the analyses.

Subject	Amount of Testing Blocks Completed	Unreliable Testing Block ID	A-Prime Score
it008f_46	1	1	0.00
it010m_54	2	1	0.00
it010m_54	2	2	0.23
it016m_62	1	1	0.00
it017f_46	3	3	0.22
it023f_53	3	1	0.14
it031m_49	2	2	0.47
it032f_47	2	2	0.00
it034f_42	1	1	0.23
it035f_52	2	1	0.12
it035f_52	2	2	0.30
it039f_46	2	1	0.22
it042m_48	1	1	0.00
it047m_46	2	1	0.35
it048m_49	2	2	0.12
it049f_42	1	1	0.00
it054m_56	3	3	0.19
it062f_44	2	2	0.00
it070f_49	2	2	0.47
it073m_70	2	2	0.21
it097m_40	1	1	0.00
it107f_38	1	1	0.15
it113m_38	1	1	0.21
it122m_37	2	2	0.41

Table A11. Mixed-effects logistic regression’s output for predicted correct discrimination of nonword pairs in Model 1.

	Correct
Predictors	Log-Odds	CI	p
(Intercept)	−3.68	−5.84–−1.51	0.001
age	0.06	0.03–0.10	0.001
features [place]	1.78	0.86–2.69	<0.001
features [place_manner]	1.98	1.07–2.90	<0.001
features [voicing]	0.77	−0.59–2.13	0.267
position [medial]	1.05	0.37–1.73	0.002
Random Effects
σ²	3.29
τ_{00 id}	2.49
τ_{00 stims}	0.93
ICC	0.51
N _id	91
N _stims	39
Observations	2665
Marginal R²/Conditional R²	0.140/0.578

Note: 95% Confidence Intervals (CIs) and p-values were computed using a Wald z-distribution approximation. Significant p-values are highlighted in bold.

Table A12. Mixed-effects logistic regression’s output for predicted correct discrimination of nonword pairs in Model 2.

	Correct
Predictors	Log-Odds	CI	p
(Intercept)	−3.12	−5.28–−0.95	0.005
age	0.06	0.03–0.10	0.001
num features [2]	1.09	0.14–2.04	0.024
num features [3]	1.67	0.66–2.68	0.001
num features [4]	1.12	−0.24–2.48	0.106
position [medial]	0.81	0.07–1.56	0.033
Random Effects
σ²	3.29
τ_{00 id}	2.48
τ_{00 stims}	1.15
ICC	0.52
N _id	91
N _stims	39
Observations	2665
Marginal R²/Conditional R²	0.113/0.579

Note: 95% Confidence Intervals (CIs) and p-values were computed using a Wald z-distribution approximation. Significant p-values are highlighted in bold.

Notes

1	The parental questionnaire investigated (a) the psycho-physical development and health status (e.g., birth and post-partum pathologies, health-related issues including otitis or ear infections, dentition problems, pacifier usage, etc.); (b) the linguistic development (e.g., pointing age, walking, first words age, etc.); (c) the household composition; and (d) parental personal and socio-economic-demographic information.
2	Items and syllables in the ˈCV.CV nonwords were not controlled for phonotactic probability, as at the time of study design (2010), no databases for Italian were available. Nevertheless, we acknowledge the subsequent work of Goslin et al. (2014), which provides detailed indices for single phones and single syllables (but not for their co-occurrences or position in the words). This limitation should be taken into account when drawing conclusions from the data presented here.
3	In many cases, it was not possible to test some target phonemes with others that differed for a smaller number of features because the contrasting phoneme was not present in one of the targeted mother tongue languages (for example, it was not possible to achieve the contrast in the stimuli pair between /ʧ/ and /ʃ/, between /ʧ/ and /ʦ/, or between /ʤ/ with /ʧ/ and /ʣ/). In these few cases, the closest phoneme with a higher number of features was chosen as contrast.
4	In AX testing designs, the number of “same” trials generally equals the number of “different” trials. However, after some preliminary piloting with younger children, and following McGuire’s suggestion (McGuire, 2010), we decided to reduce the number of “same” trials in each block in order to reduce both the task’s length and difficulty.
5	Similar cases where however very limited for both tests.
6	For additional information on how the phone similarity measure is computed, please refer to the online software manual: https://www.phon.ca/phon-manual/report/phones/phone_similarity_report.html?hl=similarity (accessed on 11 February 2024). This analysis was carried out on a derived copy of the corpus in which all diacritics have been removed (PHON’s analysis for phone similarity did not allow to exclude diacritics in the setup window); for example, IPA diacritics for “aspirated” and “voiceless”, in our case due to whispering, posed voicing issues that we did not want to consider as error; similarly, “lateral release” caused place errors, which we did not want to be classified as such and so on. Additionally, all rhotic sounds previously marked with the diacritic ’distorted’ to be accounted as allophonic realizations, due to the above limitation in PHON’s setup window, were replaced with a plain /r/ to avoid unwanted place or manner errors.
7	Consonant epenthesis is not captured in this analysis because the algorithm looks for all the IPA Target consonants to find a match in the IPA Actual; in the case of epenthesis, which is eventually annotated and present in the IPA Actual, no referent in the IPA Target is available.
8	“A value of d′ = 3 is close to perfect performance; a value of d′ = 0 is chance (“guessing”) performance.” (https://dictionary.apa.org/d-prime (accessed on 11 February 2024)).
9	A binomial family was chosen to fit the model because of the binary outcome of the responses in the NWD task (i.e., 0 and 1, where 1 represents the reference value for the dependent variable expressing success and indicating that the child correctly discriminated the stimuli in the proposed pair as being different, and 0 expresses failure to discriminate the pair). Control pairs were left out for this type of analysis, while length contrasts appearing only in word-medial position were analyzed with a separate model.
10	Even though this operation implied the loss of some information, it was necessary because of the characteristics of the features of the two tasks. In fact, for the NWR, the possible combinations between two or more features are derived from the substitutions produced by the children, whereas for the NWD, the set of features, as well as their combination, was determined by default by the contrasts included in the task.
11	Oral airflow leakage is an articulatory mechanism that developing or pathological subjects adopt in order to utter voiced plosive consonants in an isolated word-initial (or sentence-initial) position. Since voiced plosives require a sustained oscillation of the vocal folds starting before the articulation of the consonant itself, when the mouth is still closed (that is, they require a negative VOT), some subjects may need to slightly open the oral cavity in order to make the airflow pass from the trachea to the outside and make the vocal folds oscillate. They can do this either by lowering the velum, with the result of a nasal consonant epenthesis in word-initial position, or by opening their lips, with the result of a vowel epenthesis (generally the neutral sound [ ə ]) in word-initial position. In both cases, the listener’s perception of the insertions depends on their intensity and duration (see also Zmarich et al. (2021) for a quantitative analysis of atypical strategies to promote voicing in word-initial position used by a child recorded from 1;6 to 4;0 years of age).

References

Altvater-Mackensen, N., van der Feest, S. V., & Fikkert, P. (2014). Asymmetries in early word recognition: The case of stops and fricatives. Language Learning and Development, 10(2), 149–178. [Google Scholar] [CrossRef]
Arjmandi, M. K., & Behroozmand, R. (2024). On the interplay between speech perception and production: Insights from research and theories. Frontiers in Neuroscience, 18, 1347614. [Google Scholar] [CrossRef] [PubMed]
Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105(1), 173–158. [Google Scholar] [CrossRef]
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models usinglme4. Journal of Statistical Software, 67(1), 1–48. [Google Scholar] [CrossRef]
Behroozmand, R., Phillip, L., Johari, K., Bonilha, L., Rorden, C., Hickok, G., & Fridriksson, J. (2018). Sensorimotor impairment of speech auditory feedback processing in aphasia. NeuroImage, 165, 102–111. [Google Scholar] [CrossRef]
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(1), 289–300. [Google Scholar] [CrossRef]
Bernthal, J. E., Bankson, N. W., & Flipsen, P., Jr. (2017). Articulation and phonological disorders: Speech sound disorders in children (8th ed.). Pearson. [Google Scholar]
Bertelli, B., & Bilancia, G. (2006). VAUMeLF. Batterie per la valutazione dell’attenzione uditiva e della memoria di lavoro fonologica nell’età evolutiva. Giunti O.S. [Google Scholar]
Bertinetto, P. M., & Loporcaro, M. (2005). The sound pattern of standard Italian, as compared with the varieties spoken in Florence, Milan and Rome. Journal of the International Phonetic Association, 35(2), 131–151. [Google Scholar] [CrossRef]
Best, C. T., Goldstein, L. M., Nam, H., & Tyler, M. D. (2016). Articulating what infants attune to in native speech. Ecological Psychology, 28(4), 216–261. [Google Scholar] [CrossRef] [PubMed]
Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance, 14(3), 345–360. [Google Scholar] [CrossRef]
Beving, B., & Eblen, R. E. (1973). “Same” and “different” concepts and children’s performance on speech sound discrimination. Journal of Speech and Hearing Research, 16(3), 513–517. [Google Scholar] [CrossRef]
Bisiacchi, P. S., Cendron, M., Gugliotta, M., Tressoldi, P. E., & Vio, C. (2005). BVN 5–11—Batteria di valutazione neuropsicologica per l’età evolutiva. Edizioni Erickson. [Google Scholar]
Boersma, P., & Weenink, D. (2001). PRAAT, a system for doing phonetics by computer. Glot International, 5, 341–345. [Google Scholar]
Brown, R. (1958). How shall a thing be called? Psychological Review, 65(1), 14–21. [Google Scholar] [CrossRef]
Brown, R., & Berko, J. (1960). Word association and the acquisition of grammar. Child Development, 31(1). [Google Scholar] [CrossRef]
Bruderer, A. G., Danielson, D. K., Kandhadai, P., & Werker, J. F. (2015). Sensorimotor influences on speech perception in infancy. Proceedings of the National Academy of Sciences, 112(44), 13531–13536. [Google Scholar] [CrossRef]
Callan, D. E., Kent, R. D., Guenther, F. H., & Vorperian, H. K. (2000). An auditory-feedback-based neural network model of speech production that is robust to developmental changes in the size and shape of the articulatory system. Journal of Speech, Language, and Hearing Research, 43(3), 721–736. [Google Scholar] [CrossRef] [PubMed]
Carlisle, R. S. (2001). Syllable structure universals and second language acquisition. International Journal of English Studies, 1(1), 1–19. [Google Scholar]
Caselli, M. C., Bello, A., Rinaldi, P., Stefanini, S., & Pasqualetti, P. (2015). Il primo vocabolario del bambino: Gesti, parole e frasi. valori di riferimento fra 8 e 36 mesi delle forme complete e delle forme brevi del questionario MacArthur-Bates CDI. Strumenti per Il Lavoro Psico-Sociale Ed Educativo. Franco Angeli Edizioni. [Google Scholar]
Choi, D., Bruderer, A. G., & Werker, J. F. (2019). Sensorimotor influences on speech perception in pre-babbling infants: Replication and extension of Bruderer et al. (2015). Psychonomic Bulletin & Review, 26(4), 1388–1399. [Google Scholar] [CrossRef] [PubMed]
Choi, D., Dehaene-Lambertz, G., Peña, M., & Werker, J. F. (2021). Neural indicators of articulator-specific sensorimotor influences on infant speech perception. Proceedings of the National Academy of Sciences, 118(20), e2025043118. [Google Scholar] [CrossRef]
Choi, D., Yeung, H. H., & Werker, J. F. (2023). Sensorimotor foundations of speech perception in infancy. Trends in Cognitive Sciences, 27(8), 773–784. [Google Scholar] [CrossRef]
Cornoldi, C., Miato, L., Molin, A., & Poli, S. (2009). PRCR-2/2009: Prove di prerequisito per la diagnosi delle difficoltà di lettura e scrittura. Giunti O.S. [Google Scholar]
Creel, S. C. (2022). Preschoolers have difficulty discriminating novel minimal-pair words. Journal of Speech, Language, and Hearing Research, 65(7), 2540–2553. [Google Scholar] [CrossRef]
Crowe, K., & McLeod, S. (2020). Children’s English consonant acquisition in the United States: A review. American Journal of Speech-Language Pathology, 29(4), 2155–2169. [Google Scholar] [CrossRef] [PubMed]
Davis, B. L., & MacNeilage, P. F. (1995). The articulatory basis of babbling. Journal of Speech, Language, and Hearing Research, 38(6), 1199–1211. [Google Scholar] [CrossRef]
Dispaldro, M., Leonard, L. B., & Deevy, P. (2013). Real-word and nonword repetition in Italian-speaking children with specific language impairment: A study of diagnostic accuracy. Journal of Speech, Language, and Hearing Research, 56(1), 323–336. [Google Scholar] [CrossRef]
Dollaghan, C., & Campbell, T. F. (1998). Nonword repetition and child language impairment. Journal of Speech, Language, and Hearing Research, 41(5), 1136–1146. [Google Scholar] [CrossRef]
Edwards, J., Fox, R. A., & Rogers, C. L. (2002). Final consonant discrimination in children. Journal of Speech, Language, and Hearing Research, 45(2), 231–242. [Google Scholar] [CrossRef]
Fabbro, F. (1999). Neurolinguistica e neuropsicologia dei disturbi specifici del linguaggio nel bambino: Proposta di un esame del linguaggio. Saggi. Neuropsicologia Infantile, Psicopedagogia, Riabilitazione, 25(1), 11–23. [Google Scholar]
Flege, J. E., & Eefting, W. (1986). Linguistic and developmental effects on the production and perception of stop consonants. Phonetica, 43(4), 155–171. [Google Scholar] [CrossRef]
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed). Sage. [Google Scholar]
Fox-Boyer, A., Lavaggi, S., & Fricke, S. (2021). Phonological variations in typically-developing Italian-speaking children aged 3;0–4;11. Clinical Linguistics & Phonetics, 36(2–3), 241–259. [Google Scholar] [CrossRef]
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. [Google Scholar] [CrossRef]
Galatà, V., Angonese, G., & Zmarich, C. (2017). Italian as L2 in Romanian pre-schoolers: Evidence from a perception and production task. In Fattori sociali e biologici nella variazione fonetica (Vol. 3, pp. 257–280). Officinaventuno. Studi AISV. [Google Scholar] [CrossRef]
Galatà, V., Meneguzzi, G., Conter, L., & Zmarich, C. (2012). Primi dati sull’acquisizione fonetico-fonologica dell’italiano L2 in prescolari rumeni. In M. Falcone, & A. Paoloni (Eds.), La voce nelle applicazioni (Vol. 8, pp. 35–50). Bulzoni Editore. [Google Scholar]
Galatà, V., & Zmarich, C. (2011a). Le non-parole in uno studio sulla discriminazione e sulla produzione dei suoni consonantici dell’italiano da parte di bambini pre-scolari. In B. Gili Fivela, A. Stella, L. Garrapa, & M. Grimaldi (Eds.), Contesto comunicativo e variabilità nella produzione e percezione della lingua (Vol. 7, pp. 118–129). Bulzoni Editore. [Google Scholar]
Galatà, V., & Zmarich, C. (2011b). Una proposta per valutare l’influenza fonetico-fonologica della lingua di origine dei bambini figli di immigrati sull’acquisizione dell’italiano. In G. C. Bruno, I. Caruso, M. Sanna, & I. Vellecco (Eds.), Percorsi migranti (pp. 301–317). McGraw-Hill Companies, Publishing Group Italia. [Google Scholar]
Goffman, L. (2015). Effects of language on motor processes in development. In M. A. Redford (Ed.), The handbook of speech production (1st ed., pp. 555–577). Wiley. [Google Scholar] [CrossRef]
Goldstein, L. (2003, August 3–9). Emergence of discrete gestures. 15th International Congress of Phonetic Sciences (pp. 85–88), Barcelona, Spain. [Google Scholar]
Goldstein, L., & Fowler, C. A. (2003). Articulatory phonology: A phonology for public language use. In A. S. Meyer, & N. O. Schiller (Eds.), Phonetics and phonology in language comprehension and production: Differences and similarities (pp. 159–207). Mouton de Gruyter. [Google Scholar]
Goslin, J., Galluzzi, C., & Romani, C. (2014). PhonItalia: A phonological lexicon for Italian. Behavior Research Methods, 46(3), 872–886. [Google Scholar] [CrossRef]
Gósy, M., & Horváth, V. (2015). Speech processing in children with functional articulation disorders. Clinical Linguistics & Phonetics, 29(3), 185–200. [Google Scholar] [CrossRef]
Graham, L. W., & House, A. S. (1971). Phonological oppositions in children: A perceptual study. The Journal of the Acoustical Society of America, 49(2B), 559–566. [Google Scholar] [CrossRef]
Grier, J. B. (1971). Nonparametric indexes for sensitivity and bias: Computing formulas. Psychological Bulletin, 75(6), 424–429. [Google Scholar] [CrossRef]
Grunwell, P. (1987). Clinical phonology (2nd ed.). Williams & Wilkins. [Google Scholar]
Guenther, F. H. (2016). Neural control of speech. MIT Press. [Google Scholar] [CrossRef]
Hartig, F. (2022). DHARMa: Residual diagnostics for hierarchical (multi-level/mixed) regression models. Available online: https://CRAN.R-project.org/package=DHARMa (accessed on 24 February 2024).
Hearnshaw, S., Baker, E., & Munro, N. (2018). The speech perception skills of children with and without speech sound disorder. Journal of Communication Disorders, 71, 61–71. [Google Scholar] [CrossRef]
Hearnshaw, S., Baker, E., & Munro, N. (2019). Speech perception skills of children with speech sound disorders: A systematic review and meta-analysis. Journal of Speech, Language, and Hearing Research, 62(10), 3771–3789. [Google Scholar] [CrossRef]
Hearnshaw, S., Baker, E., Pomper, R., McGregor, K. K., Edwards, J., & Munro, N. (2023). The relationship between speech perception, speech production, and vocabulary abilities in children: Insights from by-group and continuous analyses. Journal of Speech, Language, and Hearing Research, 66(4), 1173–1191. [Google Scholar] [CrossRef]
Hedlund, G., & Rose, Y. (2023). Phon 3.5.2. Available online: https://phon.ca (accessed on 4 October 2023).
Howard, S. J., & Heselwood, B. C. (2002). Learning and teaching phonetic transcription for clinical purposes. Clinical Linguistics & Phonetics, 16(5), 371–401. [Google Scholar] [CrossRef]
Howell, P., Sorger, C., Alsulaiman, R., Yoshikawa, K., Harris, J., & Tang, K. (2024). Factors affecting judgment accuracy when scoring children’s responses to non-word repetition stimuli in real time. International Journal of Language & Communication Disorders, 59(2), 678–697. [Google Scholar] [CrossRef]
Ingram, D. (1976). Phonological disability in children. Edward Arnold. [Google Scholar]
Jakobson, R. (1968). Child language, aphasia and phonological universals. De Gruyter Mouton. [Google Scholar] [CrossRef]
Jakobson, R., Fant, G., & Halle, M. (1952). Preliminaries to speech analysis: The distinctive features and their correlates. The MIT Press. [Google Scholar]
Kent, R. D. (1992). The biology of phonological development. In A. F. Charles, M. Lise, & S.-G. Carol (Eds.), Phonological development: Models, research, implications (pp. 65–90). York Press. [Google Scholar]
Kent, R. D. (2024). The feel of speech: Multisystem and polymodal somatosensation in speech production. Journal of Speech, Language, and Hearing Research: JSLHR, 67(5), 1424–1460. [Google Scholar] [CrossRef]
Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347. [Google Scholar] [CrossRef]
Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science, 9(2), F13–F21. [Google Scholar] [CrossRef]
Kunnari, S., Nakai, S., & Vihman, M. (2001). Cross-linguistic evidence for acquisition of geminates. Psychology of Language and Communication, 5, 13–24. [Google Scholar]
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21(1), 1–36. [Google Scholar] [CrossRef]
Locke, J. L. (1983). Phonological acquisition and change. Academic Press. [Google Scholar]
Loporcaro, M. (2009). Profilo linguistico dei dialetti italiani (1st ed.). Manuali Laterza 275. GLF Editori Laterza. [Google Scholar]
Lucarini, G., Bovo, R., Galatà, V., Pinton, A., & Zmarich, C. (2022). Speech perception and production abilities in a group of Italian preschoolers aged 72–78 months. Hearing, Balance and Communication, 20(3), 186–195. [Google Scholar] [CrossRef]
Lüdecke, D. (2018). Ggeffects: Tidy data frames of marginal effects from regression models. Journal of Open Source Software, 3(26), 772. [Google Scholar] [CrossRef]
Lüdecke, D. (2024). sjPlot: Data visualization for statistics in social science. Available online: https://CRAN.R-project.org/package=sjPlot (accessed on 28 February 2024).
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). Performance: An R package for assessment, comparison and testing of statistical Models. The Journal of Open Source Software, 6(60), 3139. [Google Scholar] [CrossRef]
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.). Lawrence Erlbaum Associates. [Google Scholar]
Maddieson, I. (2008). Syllable structure. In M. Haspelmath, M. S. Dryer, D. Gil, & B. Comrie (Eds.), The world atlas of language structures online. Max Planck Digital Library. [Google Scholar]
Maddieson, I., & Precoda, K. (1990). Updating UPSID. The Journal of the Acoustical Society of America, 86(S1), S19. [Google Scholar] [CrossRef]
Makowski, D. (2018). The psycho package: An efficient and publishing-oriented workflow for psychological science. The Journal of Open Source Software, 3(22), 470. [Google Scholar] [CrossRef]
Makowski, D., Lüdecke, D., Patil, I., Thériault, R., Ben-Shachar, M. S., & Wiernik, B. M. (2023). Automated results reporting as a practical tool to improve reproducibility and methodological best practices adoption. CRAN. Available online: https://easystats.github.io/report/ (accessed on 20 February 2024).
Marini, A., Marotta, L., Bulgheroni, S., & Fabbro, F. (2015). BVL_4–12. Batteria per la valutazione del linguaggio in bambini dai 4 ai 12 anni. Giunti O.S. [Google Scholar]
Marotta, L., Ronchetti, C., Trasciani, M., & Vicari, S. (2008). Test CMF valutazione delle competenze metafonologiche. Edizioni Erickson. [Google Scholar]
McAllister Byun, T. (2012). Bidirectional perception–production relations in phonological development: Evidence from positional neutralization. Clinical Linguistics & Phonetics, 26(5), 397–413. [Google Scholar] [CrossRef]
McGuire, G. (2010). A brief primer on experimental designs for speech perception. Laboratory Report, 77(1), 2–19. [Google Scholar]
McLeod, S., & Crowe, K. (2018). Children’s consonant acquisition in 27 languages: A cross-linguistic review. American Journal of Speech-Language Pathology, 27(4), 1546–1571. [Google Scholar] [CrossRef]
McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2002). Gradient effects of within-category phonetic variation on lexical access. Cognition, 86(2), B33–B42. [Google Scholar] [CrossRef]
Meier, A. M., & Guenther, F. H. (2023). Neurocomputational modeling of speech motor development. Journal of Child Language, 50(6), 1318–1335. [Google Scholar] [CrossRef]
Mioni, A. (1983). Fonologia. In L. Croatto (Ed.), Trattato di foniatria e logopedia, aspetti linguistici della comunicazione (Vol. 2, pp. 51–87). La Garangola. [Google Scholar]
Mioni, A. (1990). La standardizzazione fonetico-fonologica a Padova e a Bolzano (stile di lettura). In M. Cortelazzo, & A. Mioni (Eds.), L’italiano regionale (pp. 193–208). Bulzoni. [Google Scholar]
Muljačić, Ž. (1972). Fonologia della lingua italiana. Il Mulino. [Google Scholar]
Nakeva von Mentzer, C. (2020). Phonemic discrimination and reproduction in 4–5-year-old children: Relations to hearing. International Journal of Pediatric Otorhinolaryngology, 133, 109981. [Google Scholar] [CrossRef]
Namasivayam, A. K., Coleman, D., O’dWyer, A., & van Lieshout, P. (2020). Speech sound disorders in children: An articulatory phonology perspective. Frontiers in Psychology, 10, 2998. [Google Scholar] [CrossRef]
Narayan, C. R., Werker, J. F., & Beddor, P. S. (2010). The interaction between acoustic salience and language experience in developmental speech perception: Evidence from nasal place discrimination. Developmental Science, 13(3), 407–420. [Google Scholar] [CrossRef]
Ohala, J. J. (2011). Accommodation to the aerodynamic voicing constraint and its phonological relevance. In W. S. Lee, & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences (pp. 64–67). City University of Hong Kong. [Google Scholar]
Oller, D. K., Wieman, L. A., Doyle, W. J., & Ross, C. (1976). Infant babbling and speech. Journal of Child Language, 3(1), 1–11. [Google Scholar] [CrossRef]
Orso, E., Calegaro, M., Rapa, F., Bonifacio, S., & Zmarich, C. (2010). L’emergere della fonologia nei bambini dai 18 ai 27 mesi: Analisi statistica degli errori di sostituzione. In F. Cutugno, P. Maturi, R. Savy, G. Abete, & I. Alfano (Eds.), Parlare con le persone, parlare alle macchine: La dimensione interazionale della comunicazione verbale (Vol. 6, pp. 249–278). EDK. [Google Scholar]
Pascoe, M., Rossouw, K., Fish, L., Jansen, C., Manley, N., Powell, M., & Rosen, L. (2016). Speech processing and production in two-year-old children acquiring isiXhosa: A tale of two children. South African Journal of Communication Disorders, 63(2), 15 pages. [Google Scholar] [CrossRef][Green Version]
Peterson, B. G., & Carl, P. (2020). PerformanceAnalytics: Econometric tools for performance and risk analysis. Available online: https://CRAN.R-project.org/package=PerformanceAnalytics (accessed on 21 March 2024).
Piazzalunga, S., Previtali, L., Pozzoli, R., Scarponi, L., & Schindler, A. (2018). An articulatory-based disyllabic and trisyllabic Non-Word Repetition test: Reliability and validity in Italian 3- to 7-year-old children. Clinical Linguistics & Phonetics, 33(5), 437–456. [Google Scholar] [CrossRef]
Pinton, A., & Zanettin, F. (1998). Le abilità fonetiche e fonologiche in età prescolare: Un compito di discriminazione uditiva. In S. Frasson, L. Lena, & S. Menin (Eds.), Procedure e metodi di trattamento nei disordini della comunicazione (pp. 25–42). Edizioni del Cerro. [Google Scholar]
Polka, L., Jusczyk, P. W., & Rvachew, S. (1995). Methods for studying speech perception in infants and children. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 49–89). York Press. [Google Scholar]
Posit Team. (2024). RStudio: Integrated development environment for R. Posit Software, PBC. Available online: http://www.posit.co/ (accessed on 11 February 2024).
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 11 February 2024).
Roy, P., & Chiat, S. (2004). A prosodically controlled word and nonword repetition task for 2- to 4-year-olds. Journal of Speech, Language, and Hearing Research, 47(1), 223–234. [Google Scholar] [CrossRef] [PubMed]
Rvachew, S., & Brosseau-Lapré, F. (2018). Developmental phonological disorders: Foundations of clinical practice (2nd ed.). Plural Publishing, Inc. [Google Scholar]
Rvachew, S., Nowak, M., & Cloutier, G. (2004). Effect of phonemic perception training on the speech production and phonological awareness skills of children with expressive phonological delay. American Journal of Speech-Language Pathology, 13(3), 250–263. [Google Scholar] [CrossRef] [PubMed]
Seidl, A., Brosseau-Lapré, F., & Goffman, L. (2018). The impact of brief restriction to articulation on children’s subsequent speech production. The Journal of the Acoustical Society of America, 143(2), 858–863. [Google Scholar] [CrossRef] [PubMed]
Shriberg, L. D., & Lof, G. L. (1991). Reliability studies in broad and narrow phonetic transcription. Clinical Linguistics & Phonetics, 5(3), 225–279. [Google Scholar] [CrossRef]
Singh, L., Rajendra, S. J., & Mazuka, R. (2022). Diversity and representation in studies of infant perceptual narrowing. Child Development Perspectives, 16(4), 191–199. [Google Scholar] [CrossRef]
Stark, R. E. (1986). Prespeech segmental feature development. In P. Fletcher, & M. Garman (Eds.), Language acquisition (1st ed., pp. 149–173). Cambridge University Press. [Google Scholar] [CrossRef]
Stokes, S. F., & Klee, T. (2009). The diagnostic accuracy of a new test of early nonword repetition for differentiating late talking and typically developing children. Journal of Speech, Language, and Hearing Research, 52(4), 872–882. [Google Scholar] [CrossRef] [PubMed]
Studdert-Kennedy, M. (2000). Imitation and the emergence of segments. Phonetica, 57(2–4), 275–283. [Google Scholar] [CrossRef]
Sussman, H. M., Hoemeke, K. A., & McCaffrey, H. A. (1992). Locus equations as an index of coarticulation for place of articulation distinctions in children. Journal of Speech, Language, and Hearing Research, 35(4), 769–781. [Google Scholar] [CrossRef]
Swingley, D. (2021). Infants’ learning of speech sounds and word forms. In A. Papafragou, J. C. Trueswell, & L. R. Gleitman (Eds.), Oxford handbook of the mental lexicon. Oxford University Press. [Google Scholar]
Telmon, T. (1993). Varietà regionali. In A. A. Sobrero (Ed.), Introduzione all’italiano contemporaneo. La variazione e gli usi (pp. 93–149). Editori Laterza. [Google Scholar]
Tichko, P. (2021). Pipeline to calculate D-prime in R. Available online: https://ptichko.github.io/2021/02/27/Pipeline-To-Calculate-D-Prime-in-R.html (accessed on 28 March 2024).
Tresoldi, M., Ambrogi, F., Favero, E., Colombo, A., Barillari, M. R., Velardi, P., & Schindler, A. (2015). Reliability, validity and normative data of a quick repetition test for Italian children. International Journal of Pediatric Otorhinolaryngology, 79(6), 888–894. [Google Scholar] [CrossRef]
Tresoldi, M., Barillari, M. R., Ambrogi, F., Sai, E., Barillari, U., Tozzi, E., Scarponi, L., & Schindler, A. (2018). Normative and validation data of an articulation test for Italian-speaking children. International Journal of Pediatric Otorhinolaryngology, 110, 81–86. [Google Scholar] [CrossRef] [PubMed]
Turner, A. C., McIntosh, D. N., & Moody, E. J. (2015). Don’t listen with your mouth full: The role of facial motor action in visual speech perception. Language and Speech, 58(2), 267–278. [Google Scholar] [CrossRef]
Vicari, S. (2007). PROMEA, prove di memoria e apprendimento per l’età evolutiva. Giunti O.S. [Google Scholar]
Vicari, S., Luigi, M., & Alessandra, L. (2007). TFL test fono-lessicale. Valutazione delle abilità lessicali in età prescolare. Edizioni Erickson. [Google Scholar]
Vihman, M., Macken, M. M. A., Miller, R., Simmons, H., & Miller, J. (1985). From babbling to speech: A re-assessment of the continuity issue. Language, 61(2), 397–445. [Google Scholar] [CrossRef]
Vihman, M., & Majorano, M. (2017). ‘The Role of geminates in infants’ early word production and word-form recognition’. Journal of Child Language, 44(1), 158–184. [Google Scholar] [CrossRef]
Viterbori, P., Zanobini, M., & Cozzani, F. (2018). Phonological development in children with different lexical skills. First Language, 38(5), 538–559. [Google Scholar] [CrossRef]
Wei, T., & Simko, V. (2021). R package “corrplot”: Visualization of a correlation matrix. Available online: https://github.com/taiyun/corrplot (accessed on 28 March 2024).
Weismer, S. E., Tomblin, J. B., Zhang, X., Buckwalter, P., Chynoweth, J. G., & Jones, M. (2000). Nonword repetition performance in school-age children with and without language impairment. Journal of Speech, Language, and Hearing Research, 43(4), 865–878. Available online: http://www.ncbi.nlm.nih.gov/pubmed/11386474 (accessed on 28 March 2024). [CrossRef] [PubMed]
Werker, J. F. (2018). Perceptual beginnings to language acquisition. Applied Psycholinguistics, 39(4), 703–728. [Google Scholar] [CrossRef]
Werker, J. F. (2024). Phonetic perceptual reorganization across the first year of life: Looking back. Infant Behavior & Development, 75, 101935. [Google Scholar] [CrossRef]
Werker, J. F., Shi, R., Desjardins, R. N., Pegg, J. E., Polka, L., & Patterson, M. L. (1998). Three methods for testing infant speech perception. In A. M. Slater (Ed.), Perceptual development: Visual, auditory, and speech perception in infancy (pp. 389–420). UCL Press. [Google Scholar]
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7(1), 49–63. [Google Scholar] [CrossRef]
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Available online: https://ggplot2.tidyverse.org (accessed on 11 February 2024).
Zamboni, A. (1988). 270. ltalienisch: Areallinguistik IV. veneto. In G. Holtus, M. Metzeltin, & C. Schmitt (Eds.), Lexicon der romanistischen linguistik (pp. 517–538). Niemeyer. [Google Scholar]
Zanobini, M., Viterbori, P., & Saraceno, F. (2012). Phonology and language development in Italian children: An analysis of production and accuracy. Journal of Speech, Language, and Hearing Research, 55(1), 16–31. [Google Scholar] [CrossRef] [PubMed]
Zmarich, C. (2008). L’emergere dei suoni dell’italiano in una prospettiva interlinguistica. In G. Marotta, & L. Costamagna (Eds.), Acquisizione linguistica e teorie fonologiche (pp. 43–65). Pacini. [Google Scholar]
Zmarich, C., Bonato, E., Bovo, R., Galatà, V., & Pinton, A. (2019). A new phonological discrimination test for children aged 48–72 months. In D. Piccardi, F. Ardolino, & S. Calamai (Eds.), Audio archives at the crossroads of speech sciences, digital humanities and digital heritage (Vol. 6, pp. 385–406). Officinaventuno. Studi AISV. [Google Scholar] [CrossRef]
Zmarich, C., Bonichini, S., Motterle, M., Palmieri, M., Sanfelici, E., & Bonifacio, S. (2025). Test fonetico per la prima infanzia (TFPI): A new instrument to assess Italian toddlers’ phonetic development. Languages, 10(1), 15. [Google Scholar] [CrossRef]
Zmarich, C., Bortone, E., Vayra, M., & Galatà, V. (2013). La coarticolazione e il VOT nello sviluppo fonetico: Studio sperimentale su bambini dai 42 ai 47 mesi d’età. In V. Galatà (Ed.), Multimodalità e multilingualità: La sfida più avanzata della comunicazione orale (Vol. 9, pp. 475–493). Bulzoni Editore. [Google Scholar]
Zmarich, C., Condi, A., Bonifacio, S., Busà, M. G., Colavolpe, B., Gaiotto, M., & Olivucci, F. (2021). Coarticulation and VOT in an Italian child from 18 to 48 months of age. In C. Bernardasci, D. Dipino, D. Garassino, S. Negrinelli, E. Pellegrino, & S. Schmid (Eds.), L’individualità del parlante nelle scienze fonetiche: Applicazioni tecnologiche e forensi (Vol. 8, pp. 413–435). Officinaventuno. Studi AISV. [Google Scholar] [CrossRef]
Zmarich, C., Dispaldro, M., Rinaldi, P., & Caselli, M. C. (2011). Caratteristiche fonetiche del “Primo Vocabolario del Bambino”. Psicologia Clinica dello Sviluppo, 15(1), 235–256. [Google Scholar]
Zmarich, C., Fava, I., Monego, G., & Bonifacio, S. (2012). Verso un “Test fonetico per la prima infanzia”. In M. Falcone, & A. Paoloni (Eds.), La voce nelle applicazioni (Vol. 8, pp. 51–66). Bulzoni Editore. [Google Scholar]
Zmarich, C., Pinton, A., & Lena, L. (2014). Lo sviluppo fonetico-fonologico nell’acquisizione di L1 e L2. In M. C. Caselli, & L. Marotta (Eds.), I disturbi di linguaggio: Caratteristiche, valutazione, trattamento (pp. 87–124). Edizioni Erickson. [Google Scholar]

Figure 1. Repetition accuracy of IPA targets in word-initial position by age group in the NWR task. Dots represent the children’s group-level mean accuracy within each IPA target; error bars indicate the standard error of the mean.

Figure 2. Repetition accuracy of IPA targets in word-medial position by age group in the NWR task. Dots represent the children’s group-level mean accuracy within each IPA target; error bars indicate the standard error of the mean.

Figure 3. Children’s NWR performances according to matching nonwords (i.e., productions matching the target items), PCC (i.e., Percent Consonants Correct), and PCC_NoEpen (i.e., a PCC score calculated without epenthesized consonants). All scores are provided as numbers on a percentage scale and grouped according to age group (in months). For each score, the boxplots in the figure provide the minimum, maximum, sample median, first quartile (Q1), and third quartile (Q3).

Figure 4. Featural analysis of group-level mean substitutions in the NWR task by position and age group. Error bars indicate standard error of the mean. Note: PCC_NoEpen (i.e., consonants produced correctly) and deletions are not included in the plot; pla-man-voi = place-manner-voicing.

Figure 5. Group-level average percent substitutions for devoicing and voicing processes affecting the phonological system, organized by position of the affected consonants and age group. Error bars indicate the standard error of the mean.

Figure 6. Group-level average percent substitutions for other processes affecting the phonological system, organized by position of the affected consonants and age group. Error bars indicate the standard error of the mean. Note: fricstop = fricative stopping; delat = delateralization; fricat = fricativization.

Figure 7. Group-level average percent substitutions for each phonological process affecting the phonotactic structure, organized by position of the affected consonants and age group. Error bars indicate the standard error of the mean.

Figure 8. A-prime scores computed for each child in block 1 (n = 104 children), block 2 (n = 79 children), and block 3 (n = 50 children), respectively, of the NWD task against correct discrimination accuracy (expressed as proportion); the black dashed line represents the chance level cut-off below which a child was considered unreliable and thus excluded from the NWD dataset. Note: datapoints on the plot are jittered for aesthetic reasons.

Figure 9. Group-level discrimination mean accuracy scores according to feature change (i.e., length, manner, place, place and manner, and voicing) in the contrasting consonants according to their consonant’s position in the nonword (i.e., word-initial and word-medial). Data are pooled by age group. Error bars represent the standard error of the mean.

Figure 10. Model 1’s predicted probabilities of correct discrimination for nonword pairs containing contrasts differing by a given feature (manner, place, place_manner, and voicing) and contrast position in the nonword (initial and medial) in function of age (in months).

Figure 11. Group-level mean discrimination accuracy scores according to the amount of distinctive acoustic features (i.e., 1, 2, 3, or 4) that differentiate the contrasting consonants in each pair of nonwords and according to the contrasting consonant’s position in the nonword (i.e., word-initial and word-medial). Data are pooled by age group in months. Error bars represent the standard error of the mean.

Figure 12. Model 2’s predicted probabilities of correct discrimination for nonword pairs containing contrasts differing by a given number of acoustic features (1, 2, 3, and 4) and contrast position in the nonword (initial and medial) as a function of age (in months).

Table 1. Number of children retained for the current study according to age group and gender; mean age is given in months.

Age Group	Age Group (Months)	Gender	Subjects	Mean Age (Months)	Standard Deviation
3;0–3;5	36–41	f	5	37.60	1.14
3;0–3;5		m	6	38.67	1.63
3;6–3;11	42–47	f	14	44.93	1.69
3;6–3;11		m	13	44.77	1.74
4;0–4;5	48–53	f	13	50.77	1.69
4;0–4;5		m	12	50.58	1.83
4;6–4;11	54–59	f	4	55.50	2.38
4;6–4;11		m	5	56.80	2.17
5;0–5;5	60–65	f	9	62.89	1.76
5;0–5;5		m	11	62.27	1.42
>5;6 *	>66	f	5	68.40	3.36
>5;6 *		m	7	70.14	3.44
		All	104	52.77	9.79

* Includes 3 older children originally in the 6;0–6;5 age group.

Table 2. Anova() output showing the significant effect of the predictors included in the final model (Model 1) fitted to predict correct discrimination in the NWD task.

	Chisq	Df	Pr (>Chisq)
(Intercept)	11.06	1	0.00088 ***
age	11.41	1	0.00073 ***
features	21.13	3	0.00010 ***
position	9.15	1	0.00249 **

Note: Significance levels: p < 0.01 = **, p < 0.001 = ***.

Table 3. Anova() output showing the significant effect of the predictors included in the final model (Model 2) fitted to predict correct discrimination in the NWD task.

	Chisq	Df	Pr (>Chisq)
(Intercept)	7.97	1	0.00477 **
age	11.35	1	0.00076 ***
num_features	10.73	3	0.01327 *
position	4.54	1	0.03313 *

Note: Significance levels: p < 0.05 = *, p < 0.01 = **, p < 0.001 = ***.

Table 4. Spearman’s correlation matrix of substitutions in the NWR task and failed discrimination in the NWD task, analyzed in terms of features.

	Age	Length NWD (%)	Manner NWD (%)	Voicing NWD (%)	Place NWD (%)	Place-Manner NWD (%)	Length NWR (%)	Manner NWR (%)	Place NWR (%)	Place-Manner NWR (%)
age
length NWD (%)	−0.177
manner NWD (%)	−0.144	0.235
voicing NWD (%)	−0.315 *	0.240	0.538 ***
place NWD (%)	−0.192	0.331 **	0.564 ***	0.460 ***
place-manner NWD (%)	−0.286 *	0.184	0.613 ***	0.532 ***	0.748 ***
length NWR (%)	−0.068	−0.060	−0.010	0.117	0.021	0.022
manner NWR (%)	−0.387 **	0.173	0.264 *	0.318 *	0.213	0.279 *	0.120
place NWR (%)	−0.536 ***	−0.021	0.026	0.122	0.105	0.134	0.005	0.173
place-manner NWR (%)	−0.441 ***	−0.153	0.126	0.225	0.170	0.202	0.026	0.243	0.483 ***
voicing NWR (%)	−0.179	0.171	0.077	0.256 *	0.140	0.151	−0.026	0.062	0.081	0.096

Note: Computed correlation used Spearman’s method with listwise deletion. Significance level adjusted for multiple correlations after Benjamini–Hochberg’s approach. Significance levels: p < 0.05 = *, p < 0.01 = **, p < 0.001 = ***.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Galatà, V.; Lucarini, G.; Palmieri, M.; Zmarich, C. Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers. Languages 2025, 10, 249. https://doi.org/10.3390/languages10100249

AMA Style

Galatà V, Lucarini G, Palmieri M, Zmarich C. Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers. Languages. 2025; 10(10):249. https://doi.org/10.3390/languages10100249

Chicago/Turabian Style

Galatà, Vincenzo, Gaia Lucarini, Maria Palmieri, and Claudio Zmarich. 2025. "Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers" Languages 10, no. 10: 249. https://doi.org/10.3390/languages10100249

APA Style

Galatà, V., Lucarini, G., Palmieri, M., & Zmarich, C. (2025). Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers. Languages, 10(10), 249. https://doi.org/10.3390/languages10100249

Article Menu

Age Variation in First-Language Acquisition and Phonological Development: Discrimination and Repetition of Nonwords in a Group of Italian Preschoolers

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants

2.2. Stimuli

2.3. The Nonword Repetition (NWR) Task

2.4. The Nonword Discrimination (NWD) Task

2.5. Procedure

2.6. Transcription Criteria for the Productions from the NWR Task

2.7. Analysis of the Productions from the NWR Task

2.8. Analysis of the Same–Different Responses from the NWD Task

2.9. Statistical Analyses

3. Results

3.1. NWR Results

3.1.1. Repetition Accuracy

3.1.2. Featural Analysis

3.1.3. Phonological Processes Affecting the Phonological System

3.1.4. Phonological Processes Affecting the Phonotactic Structure

3.2. NWD Results

3.3. NWD and NWR Comparison

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI