Perception and Interpretation of Contrastive Pitch Accent During Spoken Language Processing in Autistic Children

Pumpki Lei Su; Duane G. Watson; Stephen Camarata; James Bodfish

doi:10.3390/languages10070161

,

and

¹

Department of Speech, Language, and Hearing, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA

²

Callier Center for Communication Disorders, The University of Texas at Dallas, Dallas, TX 75235, USA

³

Department of Psychology and Human Development, Peabody College, Vanderbilt University, Nashville, TN 37203, USA

⁴

Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN 37232, USA

Languages2025, 10(7), 161;https://doi.org/10.3390/languages10070161

This article belongs to the Special Issue Advances in the Acquisition of Prosody

Version Notes

Order Reprints

Abstract

Although prosodic differences in autistic individuals have been widely documented, little is known about their ability to perceive and interpret specific prosodic features, such as contrastive pitch accent—a prosodic signal that places emphasis and helps listeners distinguish between competing referents in discourse. This study addresses that gap by investigating the extent to which autistic children can (1) perceive contrastive pitch accent (i.e., discriminate contrastive pitch accent differences in speech); (2) interpret contrastive pitch accent (i.e., use prosodic cues to guide real-time language comprehension); and (3) the extent to which their ability to interpret contrastive pitch accent is associated with broader language and social communication skills, including receptive prosody, pragmatic language, social communication, and autism severity. Twenty-four autistic children and 24 neurotypical children aged 8 to 14 completed an AX same–different task and a visual-world paradigm task to assess their ability to perceive and interpret contrastive pitch accent. Autistic children demonstrated the ability to perceive and interpret contrastive pitch accent, as evidenced by comparable discrimination ability to neurotypical peers on the AX task and real-time revision of visual attention based on prosodic cues in the visual-world paradigm. However, autistic children showed significantly slower reaction time during the AX task, and a subgroup of autistic children with language impairment showed significantly slower processing of contrastive pitch accent during the visual-world paradigm task. Additionally, speed of contrastive pitch accent processing was significantly associated with pragmatic language skills and autism symptom severity in autistic children. Overall, these findings suggest that while autistic children as a group are able to discriminate prosodic forms and interpret the pragmatic function of contrastive pitch accent during spoken language comprehension, differences in prosody processing in autistic children may be reflected not in accuracy, but in speed of processing measures and in specific subgroups defined by language ability.

Keywords:

autism spectrum disorder; prosody; language impairment; pragmatic language; eye-tracking

1. Introduction

Prosody, the rhythm and melodic contour of speech, plays a crucial role in spoken language by signaling sentence structure, emphasizing important information, and expressing a speaker’s intent and emotion (Cole, 2015). As a key component of pragmatic language (i.e., social use of language), prosody supports social communication in several ways. It helps speakers organize information and highlight key elements in discourse and helps listeners interpret the spoken message more accurately and efficiently (Couper-Kuhlen, 2009). Because of prosody’s role in language and social communication, prosody has emerged as a particularly relevant area of study in autism. Autistic children often show expressive and receptive prosodic differences that influence their everyday social interactions (Grice et al., 2023; McCann & Peppé, 2003). While previous studies have documented how autistic children produce prosody (see Asghari et al., 2021, for a recent systematic review), fewer have examined how they perceive and interpret prosodic cues during communication (Grice et al., 2023). This study addresses that gap by focusing on autistic children’s perception and interpretation of contrastive pitch accent—a prosodic cue that signals emphasis and helps clarify a speaker’s intended meaning.

1.1. Prosody Plays an Important Role in Spoken Language Communication

Spoken language conveys not only words and sentences but also a wide range of other information such as intonation, rhythm, stress, timing, tone, etc. These features are collectively referred to as prosody and are defined as suprasegmental features of language (Lehiste, 1970; Wagner & Watson, 2010). Prior research on prosody frequently distinguishes between prosodic form and prosodic function (Hirst, 2005; Peppé et al., 2007). The form aspect of prosody pertains to the perception (e.g., auditory discrimination) and production (e.g., vocal imitation) of prosodic elements. In contrast, the function aspect involves the cognitive and linguistic processing of the communicative roles that prosody fulfills in spoken language, including grammatical, pragmatic, and affective/emotional functions. Both form and function aspects of prosody contribute to spoken language comprehension and successful communication by providing cues about speaker’s structural organization of information, intent, and affect and shaping the listener’s interpretation of the speaker’s message.

One important pragmatic function of prosody is to enhance or modulate information beyond the literal meaning of an utterance. For instance, contrastive pitch accent (also referred to as focus or prominence) is acoustically associated with longer duration, greater amplitude, and pitch movement on the stressed syllable and is often used to direct listener’s attention to a piece of salient information in a discourse context or to contrast a piece of information with possible alternatives (Bolinger, 1961; Pierrehumbert & Hirschberg, 1990; Watson et al., 2006). Given that social communication involves both exchanging the content of messages but also inferring the intention of others, the ability to perceive contrastive pitch accent at the acoustic-form level and understand the pragmatic function of contrastive pitch accent is essential for effective communication. Listeners rely on pitch accents to recognize shifts in conversational focus, discern which elements of a message are most significant, develop a more nuanced understanding of the speakers’ message, and generate appropriate responses during a social interaction.

1.2. Prosodic Differences in Autism

Prosodic differences have been frequently identified in autistic children and adults (Kanner, 1943; McCann & Peppé, 2003). In the earliest description of autism by Kanner (1943), unusual speech patterns, marked by odd intonation, were highlighted in several of the eleven cases reported. Recent systematic reviews on prosody in autism painted a complex picture of prosodic characteristics in autistic individuals (Asghari et al., 2021; Grice et al., 2023; M. Zhang et al., 2022). Asghari et al. (2021) conducted a meta-analysis that includes 39 articles on expressive prosodic forms in autistic individuals’ vocal production. Findings indicate that autistic individuals exhibited a significantly higher mean pitch, larger pitch range, greater pitch variability, and longer voice duration compared to neurotypical individuals. While these findings suggest that autistic individuals may exhibit distinct prosodic forms in their speech patterns, this study did not review whether these form-level differences have any functional impact on autistic individuals’ overall social communication.

Beyond production of prosody, studies on the perception of prosody in autism offer additional insights, particularly when separating low-level acoustic-perceptual processing from higher-level linguistic comprehension. A review by O’Connor (2012) indicates that autistic individuals often perform comparably to, or even outperform, neurotypical peers on low-level pitch perception tasks such as pitch discrimination and categorization, but differences may arise when prosodic information must be integrated with higher-level linguistic comprehension or communicative interpretation. For example, autistic children were shown to be able to accurately match pitch contours to graphic representations for both speech and music stimuli but struggled when asked to answer questions about the meaning of the sentences (Järvinen-Pasley et al., 2008b). These findings suggest that observed differences in prosody perception among autistic individuals may stem not from difficulties in perceiving acoustic-perceptual features, but from challenges in linking those features to their communicative or linguistic functions.

A recent and comprehensive review by Grice et al. (2023) focused specifically on the linguistic prosody and synthesized findings from 51 articles that examined autistic individuals’ ability to produce and interpret prosodic cues that serve linguistic functions. While differences in both modalities were reported across studies, the authors noted stronger evidence for difference in the interpretation of context-dependent aspects of prosody than rule-governed aspects of prosody. Specifically, autistic individuals show less consistent differences in grammatical prosody (e.g., using pitch accent to indicate lexical stress, using intonation to indicate question vs. statement, or using pause to indicate syntactic boundary) and more pronounced differences in pragmatic prosody (e.g., using contrastive pitch accent to highlight a word) and affective prosody (e.g., using intonation to indicate intention or emotional state). Such a pattern echoes broader patterns observed in the language profile of autistic individuals that structural language abilities, such as vocabulary and grammar, are often relatively preserved, while pragmatic language skills are more vulnerable (Boucher, 2012). It is not surprising that autistic individuals have less difficulty with grammatical prosody, which is more rule-based, than pragmatic or affective prosody, which is highly dependent on discourse and social communication contexts. As such, prosodic differences in autism are best understood not as a uniform deficit but vary based on the specific function that prosody serves.

1.3. Contrastive Pitch Accent in Autistic Children and Adults

This current study focuses on autistic children’s ability to perceive (at the acoustic-form level) and interpret (at the cognitive-linguistic function level) contrastive pitch accent. Previous studies have suggested that contrastive pitch accents may be processed or used differently by autistic individuals, as it serves pragmatic functions by signaling the listener that the accented item is intended over a possible alternative (Pierrehumbert & Hirschberg, 1990; Watson et al., 2008). The majority of studies on the production of contrastive pitch accents found significant differences in children’s ability to use contrastive pitch accent across autistic and neurotypical groups (Baltaxe, 1984; DePape et al., 2012; Paul et al., 2005a; Peppé et al., 2007). Autistic children performed significantly worse than neurotypical children in tasks that elicit the production of contrastive pitch accents (Baltaxe, 1984; Paul et al., 2005a; Peppé et al., 2007). In one study, autistic children showed the ability to produce contrastive pitch accents, but the placement of the stress was incorrect given the discourse context and the stress produced was more ambiguous than the stress produced by neurotypical children (Peppé et al., 2007). Two other studies reported no group differences in the production of contrastive pitch accents but found that autistic children differed from neurotypical children in the specific acoustic manifestation of the stress (Diehl & Paul, 2013; Nadig & Shaw, 2015).

Fewer studies have examined the perception or interpretation of contrastive pitch accent in autism, with existing studies reporting conflicting findings. Several studies reported significantly worse performance in autistic children in receptive contrastive pitch accent tasks (Diehl & Paul, 2013; Paul et al., 2005a; Peppé et al., 2007). In contrast, Globerson et al. (2015) found no significant group differences in pitch discrimination or contrastive pitch accent interpretation tasks when autistic and neurotypical adults were matched on verbal ability. These mixed results have led some researchers to suggest that difficulties with receptive prosody may be more closely related to individual cognitive and linguistic profiles than to autism itself (Ong et al., 2024). For example, Ong et al. (2024) demonstrated that verbal and nonverbal intelligence were significantly associated with pitch perception performance and moderated group differences between autistic and neurotypical individuals. Similarly, Lyons et al. (2014) found that language ability was the strongest predictor of prosodic perception in autistic adolescents.

These findings raise important methodological considerations. Many prior studies on receptive prosody in autism have used behavioral tasks with high cognitive and linguistic demands. Participants were often required to follow complex, multi-step verbal instructions and follow specific prompts to demonstrate their ability to comprehend prosodic cues. Given that intellectual disability and language impairment are common in autistic children (Kjelgaard & Tager-Flusberg, 2001; Shaw et al., 2025; Tager-Flusberg, 2006), it is important to consider the cognitive and linguistic demands of tasks used to measure receptive contrastive pitch accent in autistic children. A growing body of literature on the neural processing of prosody using passive listening tasks provide converging evidence that differences in prosodic processing may emerge even in the absence of such demands. Autistic children and adolescents showed differences in neural pitch tracking as measured by frequency-following responses (FFRs), including reduced pitch strength, reduced phase locking, and increased frequency and slope errors (Patel et al., 2023b; Russo et al., 2008). Other studies have reported attenuated neural responses to prosodic cues in the context of spoken language in autistic children. Autistic children showed reduced mismatch negativity amplitudes in response to lexical stress contrasts (J. Zhang et al., 2018). Similarly, Lindström et al. (2018) reported that autistic children were slower in discriminating emotional prosodic cues and exhibited diminished mismatch negativity and late discriminative negativity amplitude. Together, these findings suggest that prosodic differences in autism may stem from underlying processing differences that exist independently of cognitive or language abilities and underscore the importance of using experimental tasks that minimize cognitive and linguistic demands to more precisely isolate prosodic processing differences in autism—an approach adopted in the present study.

1.4. This Current Study

This study focuses on autistic children’s ability to perceive and interpret the pragmatic function that contrastive pitch accent serves during spoken language processing. Our first research question asks the extent to which autistic children can discriminate contrastive pitch accent at the acoustic-form level. Discrimination of acoustic forms of contrastive pitch accent is a prerequisite for interpreting the pragmatic function it serves in communication. To assess this, we used an AX same–different discrimination task (Gerrits & Schouten, 2004), in which participants indicated whether pairs of stimuli with various contrastive pitch accent contours sounded the same or different. This task is similar to the Discrimination subtest included in the Profiling Elements of Prosodic Systems—Children (PEPS-C; Peppé & McCann, 2003), a widely used assessment of prosody in previous studies on prosody in autistic children. Including this task allows us to replicate and extend prior findings from the PEPS-C literature, which have shown that autistic children often perform similarly to neurotypical peers on basic prosodic discrimination tasks (Järvinen-Pasley et al., 2008a; Patel et al., 2023a). The AX same—different task enables us to isolate sensitivity to form-level acoustic contrasts without imposing linguistic processing demands. In addition to discrimination ability, we examined participants’ reaction time as a measure of the speed of prosodic perception. Based on empirical findings showing that autistic individuals often perform comparably or better on low-level acoustic tasks (e.g., O’Connor, 2012) compared to neurotypical peers and theoretical accounts suggesting a bias toward perceptual-level processing in autism (Järvinen-Pasley et al., 2008b; Mottron et al., 2006), we hypothesize that autistic children would not differ significantly from neurotypical peers in their discrimination ability or reaction time.

Our second research question tests the extent to which autistic children can interpret the pragmatic function of contrastive pitch accent during spoken language processing. To test this research question, we used an eye-tracking paradigm, the visual-world paradigm (Tanenhaus et al., 1995). In a typical visual-world paradigm task, participants look at an experimental display with various numbers of objects while listening to sentences. Participants’ eye movements are analyzed to understand the effect of the experimental manipulation on participants’ real-time language processing. Using the visual-world paradigm to measure prosody processing in autistic children has several advantages over other behavioral paradigms as it only requires that participants look at a target item and thus presents relatively low task demands.

Previous work with neurotypical children has shown that appropriate contextual use of contrastive pitch accent can facilitate listener’s spoken language comprehension and accelerate their visual search for a target referent, an effect termed as the anticipatory effect (Ito & Speer, 2008). In contrast, contextually inappropriate use of contrastive pitch accent has been found to mislead listeners’ processing and delay their visual search, which we call a garden-path effect (Arnold, 2008; Dahan et al., 2002; Ito & Speer, 2008). Ito et al. (2014) used a visual-world paradigm task to test neurotypical children’s ability to interpret contrastive pitch accents. Neurotypical children demonstrated both an anticipatory effect (i.e., correctly anticipated the referent, green cat, upon hearing sentences such as “Look at the pink cat. Now look at the GREEN cat.”) and a garden-path effect (i.e., looking at the wrong referent, green rabbit, upon hearing sentences such as “Looking at the pink rabbit. Now look at the GREEN monkey.”). Our second research question builds on findings from Ito et al. (2014) and extends them to autistic children. In light of two related bodies of literature, we predict that autistic children may interpret contrastive pitch accent cues differently during real-time spoken language comprehension. First, prior research has shown that autistic children perform differently on receptive contrastive pitch accent tasks (Diehl & Paul, 2013; Paul et al., 2005a; Peppé et al., 2007; Segal et al., 2017). Second, other studies have reported that autistic children are less likely to use prosodic cues to guide referential interpretation (Diehl et al., 2015; Zhou et al., 2019). Based on these findings, we hypothesize that autistic children will show attenuated or lack of anticipatory and garden-path effects relative to neurotypical children, operationalized as slower or absent shifts in their visual attention in response to contrastive pitch accent cues.

Our third research question examines the extent to which individual differences in autistic children’s ability to interpret contrastive pitch accent are associated with broader language and social communication skills, including receptive prosody, pragmatic language, social communication, and autism severity. Two previous studies reported significant associations among autistic children’s prosodic ability and broader skills (Patel et al., 2023a; Paul et al., 2005b). Specifically, Patel et al. (2023a) found that poorer contrastive pitch accent understanding, as measured by the PEPS-C, was linked to greater autism symptom severity and more pronounced pragmatic language difficulties. Our study builds on prior work that predominantly used behavioral tasks by asking whether similar relationships emerge using eye-tracking data, which provide a more fine-grained, real-time measure of prosody processing that may capture subtle individual differences in prosodic interpretation as they unfold. Interpreting contrastive pitch accent cues effectively supports real-time coordination between speaker intent and listener inference. Differences in using prosody during spoken language comprehension may impact not only moment-to-moment processing but also broader communicative outcomes in dynamic social contexts. As such, we hypothesize that autistic children who are more efficient at using prosodic cues to guide their visual search during spoken language comprehension will demonstrate stronger receptive prosody, pragmatic language, social communication skills, as well as milder autism symptoms.

2. Materials and Methods

2.1. Overview of Study Design

Autistic children and neurotypical children between 8 and 14 years participated in two experimental tasks. The study protocol was approved by the Vanderbilt University Institutional Review Board (IRB#180227). Parental consent and child assent were obtained prior to any study-related activities. For the first research question, participants first completed an AX same–different discrimination task to test their ability to perceive and discriminate contrastive pitch accent at the acoustic-form level. For the second research question, participants completed the visual-world paradigm task that examines their ability to interpret contrastive pitch accent at the pragmatic function level. Lastly, for the third research question, participants were tested on a battery of clinical assessment on receptive prosody, pragmatic language, social communication, and autism severity to examine the relations between participants’ ability to interpret contrastive pitch accent during spoken language comprehension and broader skills.

2.2. Participants

Forty-eight children between 8 and 14 were recruited (n = 24 in each group). The inclusion criteria for neurotypical children are (a) native English speaker based on parent report, (b) no existing diagnosis of visual, hearing, neurological, or cognitive impairment per parent report, and (c) a score of under 15 (cutoff for autism) on the Social Communication Questionnaire (Rutter et al., 2003). We directly assessed neurotypical participants’ cognitive functioning using the Stanford–Binet Intelligence Scales, Fifth Edition (SB-5; Roid, 2003) and confirmed that all children in the neurotypical group demonstrated cognitive performance within the typical range. Autistic children were eligible for this study if they are native English speaker based on parent report, have a confirmed diagnosis of autism based on the Autism Diagnostic Observational Schedule—Second Edition (ADOS-2; Lord et al., 2012), an Intelligent Quotient (IQ) of above 70 based on the SB-5, and have no visual or hearing impairment. Six out of the twenty-four autistic participants were excluded due to not meeting the cognitive criterion. The final sample included 24 neurotypical children (14 males, 10 females; M_age = 11.61 years; 18 White, 1 Black, 2 Asian, 3 Biracial) and 18 autistic children (16 males, 2 females; M_age = 11.04 years; 13 White, 2 Black, 1 Asian, 1 Biracial, and 1 Not Reported).

Participants were matched on age, overall IQ, nonverbal IQ, and verbal IQ (Table 1). We also measured participants’ language ability using the Clinical Evaluations of Language Fundamentals—Fifth Edition (CELF-5; Wiig & Secord, 2013) to characterize our sample rather than as an exclusion criterion. A significant difference was detected across neurotypical and autistic groups in language ability. Additional post hoc analyses were conducted to test the effect of language on the interpretation of contrastive pitch accent and were reported in the Results.

Table 1. Sample Characteristics.

Additionally, the sex distribution in our sample was not balanced across groups. This pattern is consistent with the broader autism diagnostic landscape, where boys are more frequently identified than girls, with an estimated ratio of approximately 3.4:1 (Shaw et al., 2025). Autistic girls are also more likely to be diagnosed later, which may limit their representation in research samples during childhood (Lockwood Estrin et al., 2021). While this imbalance reflects population trends, we conducted follow-up analyses to examine whether biological sex influenced the results.

2.3. Experimental Tasks

2.3.1. AX Same–Different Discrimination Task

To test the extent to which participants can discriminate contrastive pitch accent at the acoustic-form level, which we consider a prerequisite for understanding the pragmatic function of contrastive pitch accent, participants first completed an AX same–different discrimination task (Gerrits & Schouten, 2004). This task was programmed and implemented in PsychoPy (Peirce, 2007). Participants listened to 16 test trials with two acoustic stimuli and were instructed to press two keys on a computer keyboard to indicate whether the two acoustic stimuli presented the same or different stimulus. All acoustic stimuli follow the same structure: “the” + adjective + noun (e.g., the sunny morning, the hot summer). Each pair contains the same phrases, but the prenominal adjective was manipulated so that eight trials had a pair with identical contrastive pitch accent patterns (e.g., the SUNNY morning and the SUNNY morning, capitalized words denoting the presence of a contrastive pitch accent) and eight trials had a pair with different patterns (e.g., the SUNNY morning vs. the sunny morning). Participants were provided with two examples and four practice trials before test trials. Consistent with previous studies that used AX discrimination tasks (Gerrits & Schouten, 2004; Qin et al., 2022), we used the d-prime metric (d’) as a measure of participants’ ability to discriminate contrastive pitch accent. D’ is often considered a better metric than accuracy rates as it is not affected by a participant’s bias to answer one way or the other (Hautus et al., 2021). We also report accuracy to aid interpretation and comparability. Participants’ reaction time was used as a measure of the processing speed for perceiving prosodic form differences.

2.3.2. Visual-World Paradigm Task

To test participants’ ability to interpret contrastive pitch accent during spoken language processing, we designed a visual-world paradigm inspired by Ito et al. (2014). Participants watched a 19 min video with a total of 72 trials. In each trial, participants looked at a visual scene with 18 objects and heard two sentences (see Figure 1). Participants were instructed to look at two items. The assignment of contrastive pitch accents in the sentences was manipulated to establish either contextually appropriate or inappropriate conditions for contrastive pitch accent (elaborated in the Experimental Condition section below).

Figure 1. Schematic diagram for a sample trial in the visual-world task.

Visual Stimuli. The visual display for the 72 trials was prepared by combining 12 unique items from six categories (clothing items, household items, animals, furniture, office supplies, and fruit and vegetables) in four colors. For each trial, the visual display was divided into six cells with each cell containing one unique item in three colors (see Figure 1 for an example). The six items on each slide were always drawn from the same category. Items were carefully chosen to avoid items commonly reported as being of special interests to autistic individuals (Sasson et al., 2008, 2012). Images of items were first tested in a pilot study with 24 adults and six children between 8 and 14 who did not take part in this study to confirm that the selected items are recognizable and familiar to participants. In the pilot study, participants were shown all selected images and were asked to name each one. Only images that were correctly labeled by all participants were included in the task. The combination of items, colors, and positions were counterbalanced so that the number of appearances for each time, color, and positions within the slide were the same across the entire set of stimuli design.

Auditory Stimuli. The auditory stimuli were recorded by a female native speaker of Mainstream American English at 44.1 KHz using Praat (Boersma & Weenink, 2023). The auditory stimuli for each trial consist of a pair of sentences, including a context sentence and a target sentence. The context sentence contains a prenominal adjective with a neutral accent, whereas the prenominal adjective in the target sentence was either assigned a contrastive pitch accent or a neutral accent depending on the correspondent experiment condition. Acoustic analyses of the recorded stimuli were conducted to confirm that accented prenominal adjectives correspond to significantly longer duration (M_contrastive = 475.08 ms, M_neutral = 282.66 ms, p < 0.001), higher F0 mean (M_contrastive = 228 Hz, M_neutral = 166 Hz, p < 0.001), and higher F0 peak (M_contrastive = 306 Hz, M_neutral = 187 Hz, p < 0.001) than neutral adjectives. The Tone and Break Index (ToBI; Silverman et al., 1992) coding of recorded stimuli (see Figure 2 for examples) also confirmed that prenominal adjectives with contrastive pitch accents correspond to a L + H* annotation and prenominal adjectives that are not accented correspond to an L* annotation (Watson et al., 2008).

Figure 2. Example ToBI (tone and break indices) transcription of neutral and contrastive pitch accented prenominal adjectives. The blue line represents the pitch contour (F0) of the utterance extracted by Praat.

Once recorded, the sentences were edited so that the carrier phrases (i.e., “Look at the”) and the critical phrases (i.e., prenominal adjective and noun) were spliced out of their original context to create stimulus sentences. The same carrier phrase was used across conditions for each target item to ensure that any visual search patterns detected in this paradigm were solely due to the experimental manipulation of pitch accent patterns.

Experimental Conditions. This visual-world paradigm task has four critical conditions and one filler condition (see Table 2 for all conditions and examples of the context and target sentences): Appropriate—Accented (denoted as A), Appropriate—Neutral (B), Inappropriate—Accented (C), Inappropriate—Neutral (D), and Filler (F). In the two critical conditions, a contrastive pitch accent was assigned to the prenominal adjective in the target sentence to create either an appropriate context for the contrastive pitch accent (Condition A) or an inappropriate context (Condition C), whereas no contrastive pitch accent was assigned to the prenominal adjectives in the target sentence in the two control conditions (Conditions B and D). The filler trials were interspersed to prevent participants from anticipating the pitch accent patterns.

Table 2. Experimental conditions.

The 72 trials consist of 36 critical trials (9 critical trials for each experimental condition) and 36 filler trials. Half of the items in each of the six categories were randomly selected and assigned as targets in filler trials. The remaining 36 items were assigned to critical trials and were counterbalanced across four lists using a Latin Square design. Every list contained 72 unique items. The order of the trials was randomized in creating each list but was fixed for every use of that list. The presentation lists were counterbalanced across participants.

The key comparisons of interest are between Conditions A vs. B and C vs. D. Both Conditions A and B contain the same noun across the context sentence and the target sentence, thus creating an appropriate context to use a contrastive pitch accent to contrast the color modifier in Condition A. An anticipatory effect would be present if participants look at the target item faster in Condition A compared to Condition B. Conditions C and D contain different nouns across the context and target sentences, creating an inappropriate context to use a contrastive pitch accent in Condition C. A garden-path effect would be present if participants look at the target item slower in Condition C compared to Condition D.

Instrumentation. Participants sat in front of a Tobii X2 eye-tracker and a set of speakers. Participants’ eye movements were first calibrated using the Tobii Clear View 5-point calibration program. They were then instructed to look at pictures while listening to sentences that would ask them to look for specific items in each picture. Participants’ eye movements during the task were sampled at 60 Hz.

2.4. Assessment Measures

Participants completed the following assessments to examine the relations between participants’ ability to interpret contrastive pitch accent during the visual-world paradigm task and broader skills.

2.4.1. Profiling Elements of Prosody in Speech—Communication (PEPS-C)

The PEPS-C (Peppé & McCann, 2003) is a structured and computerized prosody assessment and has been used with both neurotypical and autistic children. It consists of 14 subtests, including seven expressive and seven receptive subtests. Two subtests assess prosodic ability at the level of form (auditory discrimination and imitation). Twelve subtests assess prosodic ability at the level of function, including turn-end understanding/expression, affect understanding/expression, lexical stress understanding/expression, phrasal stress understanding/expression, boundary understanding/expression, and focus understanding/expression. The receptive prosody composite, which is a sum of all receptive subtests, was used as an index of participants’ overall receptive prosodic ability.

2.4.2. Clinical Evaluation of Language Fundamentals—Fifth Edition Metalinguistics (CELF-5 Metalinguistics)

The CELF-5 Metalinguistics (Semel et al., 2014) is a standardized test that assesses individual’s ability to make inferences, engage in discourses, and understand ambiguous or figurative language. Participants completed two subtests, Making Inferences and Conversational Skills, which were used to derive a Meta-Pragmatic Index score as a measure of their pragmatic language ability. For the Making Inferences subtest, participants were asked to interpret short paragraph-length vignette by taking the communicative context into consideration. Conversation Skills asks participants to express their intentions appropriately during a conversation given semantic and contextual constraints.

2.4.3. Social Responsiveness Scale—Second Edition (SRS-2)

The SRS-2 (Constantino & Gruber, 2012) is an extensively validated parent-report rating scale designed to characterize and quantify differences in social communication and interaction often associated with autism in children and adults. The score from the Social Communication subscale was used to index participants’ social communication.

2.4.4. Autism Diagnostic Observation Scale—Second Edition (ADOS-2)

The ADOS-2 (Lord et al., 2012) is a semi-structured observational tool designed to diagnose autism and was used in this study to confirm autism diagnosis in autistic participants. Total score from the ADOS-2 logarithm was used as a measure of autism symptom severity for autistic participants.

2.5. Eye-Tracking Data Processing

A 250 × 250 pixels square around each target item was used as the target area of interest (AOI). Participants’ raw eye movement data were exported and coded as either 1 or 0 for each given AOI. The analysis window was selected a priori as 300–1500 ms after the prenominal adjective onset during the target sentence: the length of the window (1200 ms) is between the window used in Diehl et al. (2015) for autistic adolescents (1000 ms) and Venker (2019) for autistic preschoolers (1600 ms). This window was offset by 300 ms, as programming and executing an eye movement typically takes 200 ms in adults and 300 ms in children (Arnold, 2008; Hallet, 1986).

Prior to analyzing the eye-tracking data, eye-gaze trackloss data during the analysis window were analyzed to examine the proportion of data contributed by each group. Trackloss occurs when an eye-tracker is not able to capture valid eye movement due to blinks or excessive movements. The percentage of the tracked sample during the analysis window did not differ across conditions (p = 0.56) or groups (p = 0.98). On average, the percentage of tracked samples was 89.87% for neurotypical children and 87.11% for autistic children. Test trials with less than 50% of the tracked sample during the analysis window were eliminated as they were deemed to contain insufficient data. This data cleaning process removed 26 trials from neurotypical participants and 68 trials from autistic participants.

Additionally, given that the context sentence in each trial serves to create either an appropriate or inappropriate context for contrastive pitch accent in the target sentence, a trial where a participant did not look at the item in the context sentence does not provide meaningful information regarding whether the contrastive pitch accent was effective in influencing the participant’s visual search. Thus, we excluded trials where the participant failed to look at the context item following the onset of the noun. This data cleaning step removed 29 trials out of the total of 1728 trials from neurotypical participants (2% attrition) and 80 trials out of 1296 trials (6% attrition) from autistic participants. In the final analysis sample, on average, each neurotypical participant contributed 71 trials, and each autistic participant contributed 68 trials.

2.6. Data Analysis

To address our first research question regarding the extent to which participants were able to perceive and discriminate contrastive pitch accent at the acoustic-form level, we conducted independent t-tests to compare d’ and reaction time in the AX same–different task between neurotypical and autistic participants.

For our second research question on participants’ ability to interpret contrastive pitch accent during spoken language comprehension, we first tested whether neurotypical children demonstrated the expected anticipatory and garden-path effects. Establishing these effects in the neurotypical group provided a basis for interpreting potential group differences. We then conducted between-group comparisons using mixed-effects models. The dependent variable was participants’ proportion of looks to either the correct target or the incorrect competitor (in trials where contrastive pitch accent was used inappropriately). Fixed effects included group, condition, and their interaction, with crossed random effects for subjects and items. Additionally, because neurotypical and autistic participants differed significantly in language ability, we conducted post hoc analyses to examine the extent to which language ability relates to children’s interpretation of contrastive pitch accent.

To address our third research question on the extent to which individual differences in autistic children’s ability to interpret contrastive pitch accent are associated with broader language and social communication skills, we extracted two processing speed measures from the visual-world paradigm: (a) latency of first fixation to the correct target in Condition C (Inappropriate—Accented), indexing speed of contrastive pitch accent processing; and (b) latency of first fixation in Condition D (Inappropriate—Neutral), indexing general linguistic processing speed. These measures were correlated with four broader skill indices, including receptive prosody (PEPS-C Receptive Prosody Composite), pragmatic language (CELF-5 Metalinguistics Meta-Pragmatic Index), social communication (SRS Social Communication score), and autism severity (ADOS-2 Total Score).

3. Results

3.1. Research Question 1: Perception of Contrastive Pitch Accent on the Acoustic-Form Level

In the AX same–different task, neurotypical participants achieved a mean d’ of 3.08 (SD = 0.30), while autistic participants achieved a mean d’ of 2.94 (SD = 0.47). A Welch’s independent-samples t-test revealed no significant difference between groups, t(27.31) = 1.16, p = 0.26, 95% CI [−0.11, 0.40]. For reference, accuracy was also high in both groups: neurotypical participants had a mean accuracy of 98.96% (SD = 4.0%, range: 87.5–100%), and autistic participants had a mean accuracy of 97.22% (SD = 8.3%, range: 72.22–100%).

In terms of reaction time, neurotypical participants responded with an average latency of 0.86 s (SD = 0.35, range: 0.39–1.79), compared to 1.13 s in the autistic group (SD = 1.13, range: 0.53–2.68). A Welch’s t-test indicated a statistically significant difference in reaction time between groups, t(21.89) = –2.23, p = 0.03, 95% CI [−0.51, −0.018], with autistic participants showing slower responses on average.

Given the unbalanced distribution of males and females across groups, we conducted follow-up analyses to test for potential sex differences. Independent-samples t-tests revealed no significant differences by sex for either d’, t(37.80) = −1.56, p = 0.13, 95% CI [−0.34, 0.04] or reaction time, t(28.84) = 1.80, p = 0.09, 95% CI [−0.02, 0.39].

3.2. Research Question 2: Interpretation of Contrastive Pitch Accent During Spoken Language Comprehension

3.2.1. Preliminary Analysis: Replications of the Anticipatory Effect and the Garden-Path Effect in Neurotypical Participants

The second research question examined autistic children’s ability to interpret contrastive pitch accent during spoken language comprehension. Specifically, we examined the extent to which autistic children show an anticipatory effect, where appropriate use of the contrastive pitch accent leads to faster looks to the correct target, and a garden-path effect, where inappropriate use of the contrastive pitch accent leads to initial looks to an incorrect competitor.

As a preliminary analysis, we assessed the extent to which neurotypical children demonstrated an anticipatory effect or a garden-path effect to confirm that our visual-world paradigm task elicited the expected patterns of interpretation. Examining these effects in the neurotypical group provided a basis for interpreting potential group differences in the subsequent analyses. We tested this both visually and statistically. Participants’ proportion of looks to the target AOI during the target sentence was binned in time bins of 100 ms and plotted in a continuous time course. Figure 1 depicts participants’ proportions of looks to the target item in Conditions A and B, where the context sentence presents an appropriate context for a contrastive pitch accent (Figure 3A) and in Conditions C and D with an inappropriate context for a contrastive pitch accent (Figure 3B). In Figure 3B, a clear gap was observed in the analysis window between the two lines, respectively representing Condition C and Condition D, indicating a robust garden-path effect in neurotypical participants. In other words, neurotypical participants looked at the correct target item more slowly in Condition C when an inappropriate contrastive pitch accent was assigned compared to the control Condition D (see Supplementary Video S1 for an example of a critical test trial where the participant showed a garden-path effect). However, we did not detect an anticipatory effect in neurotypical participants. As shown in Figure 3A, participants’ fixations to the target in Conditions A and B align with each other, suggesting that participants did not visually locate the target item faster in Condition A when an appropriate contrastive pitch accent was assigned compared to the control Condition B.

Figure 3. Mean proportion of looking to the correct target in Conditions A (Appropriate—Accented) and B (Appropriate—Neutral) (Pane A) and Conditions C (Inappropriate—Accented) and D (Inappropriate—Neutral) (Pane B) in neurotypical participants.

Statistical analyses confirmed the presence of a garden-path effect and the lack of an anticipatory effect in neurotypical participants. Mixed-effect logistic regression models were conducted using the lme4 package in R version 4.3.1 (R Core Team, 2023). This statistical approach is a commonly used approach with visual-world paradigm studies (Porretta et al., 2017; Rabagliati et al., 2019). We used this method as it accommodates both binomially distributed looking data and also accounts for the clustered nature of observations from the visual-world paradigm (trials nested in participants and items) (Barr, 2008). The dependent variables in all mixed-effect logistic regression models were binary looking responses (look to the correct target or non-look) at each sampled timepoint during the critical analysis window. All models included crossed random intercepts and slopes for participants and items, which allow estimates of participant and level variability, in addition to fixed effects of conditions and groups for between-group analyses (Baayen et al., 2008).

To test the extent to which neurotypical participants showed a garden-path effect, participants’ eye-gaze data from Conditions C and D were fitted with a fixed effect of condition and crossed random effects of subjects and items. A significant fixed effect of condition confirmed the delayed visual search toward the target item in Condition C compared to Condition D (β = 0.76, SE = 0.24, Wald’s z = 3.01, p = 0.002). The odds of looking at the correct target time was 53% lower (odds ratio = 0.47, 95% CI: [0.29, 0.76]) in Condition C compared to Condition D. To test the anticipatory effect, participants’ eye-gaze data from Conditions A and B were fitted, which did not differ significantly based on condition (β = 0.02, SE = 0.2, Wald’s z = 0.09, p = 0.93). Given that we were only able to replicate the garden-path effect in neurotypical participants, we limited the between-group analyses to analyses of the garden-path effect to understand the extent to which autistic children were able to use contrastive pitch accent during spoken language processing.

3.2.2. Testing the Garden-Path Effect in Autistic Participants

Figure 4 and Figure 5 depict the mean proportion of looks to the correct target item and the incorrect competitor item in Conditions C and D for both neurotypical and autistic participants. The competitor item is the incorrect item primed by the inappropriate use of a contrastive pitch accent. For instance, in the example given in Table 2, the competitor item for the sentences, “Look at the blue grapes. Now look at the GREEN pumpkin”, would be the green grapes. A clear gap was observed between the solid line (representing Condition C) and the dashed line (representing Condition D) during the critical analysis window for both groups, indicating a robust garden-path effect in both groups. For both neurotypical and autistic participants, their looks to the correct target rose later in Condition C compared to Condition D. Participants’ looks to the competitor item followed the opposite pattern and started rising around 300 ms post-target adjective onset in Condition C. These early steep increases in looks to the incorrect competitor item suggest that participants immediately used the contrastive pitch accent cue to anticipate that the second item would be the same type as the previously mentioned context item before hearing and processing the noun information that specified the correct target item.

Figure 4. Proportion of looks to the correct target in Conditions C (Inappropriate —Accented) and D (Inappropriate—Neutral) in neurotypical and autistic participants.

Figure 5. Proportion of looks to the incorrect competitor item in Conditions C (Inappropriate—Accented) and D (Inappropriate—Neutral) in neurotypical and autistic participants.

Two mixed-effect logistic regression models were conducted with participants’ binary looks to the correct target item or the incorrect competitor item as the dependent variable and with a fixed effect of group, condition, their interaction, and crossed random effects of subjects and items. Results from both models showed a significant effect of condition with no significant group effect or group by condition interaction (Table 3), suggesting that both neurotypical and autistic children demonstrated the ability to interpret contrastive pitch accent during spoken language comprehension. Across participants, the odds of looking at the correct target time was 52% lower (odds ratio = 0.48, 95% CI: [0.31, 0.41]) in Condition C compared to Condition D. The odds of looking at the incorrect competitor item primed by the inappropriate use of the contrastive pitch accent in Condition C was 3.2 times higher than the odds of looking at the competitor item in the neutral Condition D (odds ratio = 3.2, 95% CI: [1.84, 5.54]).

Table 3. Summary of mixed-effect logistic regression analyses (fixed effects only) for participants’ looks to the correct target item or to the incorrect competitor item.

Similar to RQ1, we conducted follow-up analyses to examine whether participants’ biological sex influenced the results. We conducted the same set of mixed-effect models with participants’ binary looks to the correct target or the incorrect competitor item as the dependent variable, with a fixed effect of sex, condition, their interaction, and crossed random effects of subjects and items. There was no significant main effect of sex (p = 0.79 for the model with correct target looks as the dependent variable; p = 0.25 for competitor looks) and no significant sex × condition interaction (p = 0.28 and p = 0.88, respectively). The garden-path effect remained statistically robust (p < 0.001 for both models). These results suggest that there was no evidence that participants’ contrastive pitch accent processing differed by sex in this task.

3.2.3. Exploratory and Post Hoc Analysis on the Effect of Language Ability on Interpretation of Contrastive Pitch Accents

Given that a significant difference was detected between neurotypical and autistic participants on language ability, we conducted two sets of additional post hoc analyses to understand the effect of language on participants’ ability to interpret contrastive pitch accent. First, we fitted a mixed-effect model with language (indexed by participants’ standard score from the CELF-5 centered around the mean to reduce correlation with intercept and help with model convergence) and conditions as fixed effect, their interaction, and crossed random effects of subject and item. Only random intercepts were included as the model with the random slopes did not converge. Results revealed a marginally significant fixed effect of language (β = 0.01, SE = 0.01, Wald’s z = 1.66, p = 0.1), a significant fixed effect of condition (β = −0.63, SE = 0.03, Wald’s z = −23.237, p < 0.001), with no interaction (p = 0.74). These findings suggest that participants in all three groups have significantly lower odds of looking at the correct target in Condition C compared to Condition D. Additionally, high language ability shows a slight, marginally significant trend toward increasing the odds of a correct target look.

Additionally, we assigned autistic children into two language subgroups based on their CELF-5 standard score: autistic children with a language standard score of 85 or above were placed in the autism with typical language group (Autism + TL; n = 7, mean age = 10.97, mean CELF-5 standard score = 101.63) and those with a standard score of lower than 85 were placed in the autism with language impairment group (Autism + LI; n = 11, mean age = 11.16, mean CELF-5 standard score = 76). Participants’ looking patterns in all three subgroups (Neurotypical, Autism + TL, and Autism + LI) were examined visually and tested statistically in mixed-effect logistic regression models. As shown in Figure 6, children in all three groups showed a garden-path effect elicited by the inappropriate use of contrastive pitch accent in Condition C. A mixed-effect logistic regression model with subgroup, condition, and their interaction as fixed effects and random intercepts of subject and items (Table 4) revealed a significant fixed effect of condition (β = −0.21, SE = 0.02, Wald’s z = −9.113, p < 0.001) and a significant group (being in Autism + TL group as referenced to neurotypical group) by condition interaction (β = −0.21, SE = 0.05, Wald’s z = −3.8, p < 0.001). We followed up by conducting pairwise contrasts for each condition using the emmeans package. Results revealed significant differences between subgroups in Condition C but not in Condition D. Specifically, children in the Autism + LI group demonstrated significantly lower odds of looking at the correct target compared to the neurotypical group (log-odds ratio = −0.51, SE = 0.18, p = 0.01). When translated into odds ratios, these results indicate that neurotypical children are 1.66 times more likely to look at the correct target item in condition C when compared to autistic children with language impairments. No significant differences were detected between the neurotypical and Autism + TL groups (log-odds ratio = 0.26, SE = 0.15, p = 0.19).

Figure 6. Proportion of looks to the correct target in Condition C (Inappropriate − Accented) and D (Inappropriate − Neutral) shown by language subgroup (represented by color).

Table 4. Results from the mixed-effect logistic regression analysis with language subgroups.

3.3. Relation Between Speech of Processing Measures from the Visual-World Paradigm Task and Broader Skills in Autistic Participants

For our third research question, two speed of processing measures were extracted from the visual-world paradigm task. Latency of first fixation to the correct target in Condition C (Inappropriate−Accented) and latency of first fixation in Condition D (Inappropriate−Neutral) were derived to respectively index speed of contrastive pitch accent processing and speed of general linguistic processing. These two measures were correlated with four measures of broader skills, including receptive prosody measured by the PEPS-C Receptive Prosody Composite, pragmatic language measured by the CELF-5 Metalinguistics Meta-Pragmatic Index, social communication measured by the SRS Social Communication score, and autism severity measured by ADOS-2 Total Score.

We first conducted preliminary correlation analyses to evaluate the construct validity of the two speed of processing measures. These two measures were intended to index distinct processes: contrastive pitch accent processing and general linguistic processing. Given that no prior studies have extracted speed of processing measures from a visual-world paradigm and directly correlated them with broader language and social communication skills in autistic children, the preliminary analyses served to confirm that the speed of processing measures were distinct from each other and aligned with the theoretical constructs they were intended to represent. Good concurrent validity is demonstrated if participants’ speed of contrastive pitch accent processing is associated with the Contrastive Pitch Accent Understanding subtest scores from the PEPS-C and if the speed of general linguistic processing is associated with the CELF-5 language standard score. As shown in Table 5, the latency of first fixation to the correct target in Condition C was significantly correlated with the Contrastive Pitch Accent Understanding subtest but not language. The opposite pattern of associations was found for the speed of general linguistic processing: a significant correlation was detected between the latency of the first fixation to the correct target with language, but not the specific measure of prosody perception.

Table 5. Correlations of visual-world speed of processing measures and broader measures in the autistic group.

Further analyses revealed that both the speed of contrastive pitch accent processing and the speed of general linguistic processing were significantly correlated with pragmatic language. The speed of contrastive pitch accent also positively significantly correlated with autism symptom severity: autistic participants who took longer to locate the correct target item in Condition C tended to exhibit more severe autism symptoms. Neither speed of processing measures correlated with receptive prosody or social communication. All p-values reported have been adjusted for multiple comparisons using the Benjamini and Hochberg (1995) false discovery rate correction implemented using the p.adjust function in the stats package in R.

4. Discussion

This study investigated autistic children’s ability to perceive contrastive pitch accent on an acoustic-form level and interpret its pragmatic function during spoken language comprehension. Furthermore, we investigated the relations between autistic children’s ability to interpret contrastive pitch accent and broader skills, including receptive prosody, pragmatic language, social communication, and autism symptom severity. Results suggest that while autistic children as a group were able to discriminate and use prosodic information predictively during spoken language comprehension, group differences were detected in reaction time on the AX same–different task and in speed of processing among autistic children with language impairment. Additionally, the speed of contrastive pitch accent in autistic children is associated with pragmatic language skills and autism symptom severity. Specifically, autistic children with stronger pragmatic language skills and less severe autism symptom severity are quicker to look at the correct target item in the condition involving an inappropriate contrastive pitch accent.

For our first research question, our hypothesis was partially supported. Autistic children performed comparably to neurotypical peers in their ability to discriminate contrastive pitch accent at the acoustic-form level, as reflected in similar d’ on the AX same–different task. This finding is consistent with prior research showing intact or even enhanced performance in autistic individuals on similar low-level prosodic discrimination tasks (Heaton et al., 2008; Järvinen-Pasley et al., 2008a; Patel et al., 2023a). However, contrary to our hypothesis, autistic children showed significantly slower reaction times compared to neurotypical children, suggesting that although they were able to detect prosodic differences, the processing demands of the task may have required more cognitive effort or time. Importantly, this pattern emerged despite the two groups being matched on nonverbal IQ, indicating that the slower responses were not attributable to general cognitive ability. To our knowledge, this is the first study to demonstrate that autistic children may be slower at processing contrastive pitch accent at the acoustic-form level, even when their discrimination ability—as indexed by d’—is comparable to neurotypical peers. This suggests that differences in prosodic perception in autistic children may not reflect in perceptual sensitivity but rather in the efficiency of processing. These findings highlight the importance of examining processing speed alongside accuracy, as timing measures may reveal nuanced aspects of prosodic processing that accuracy alone would miss.

For our second research question, autistic children in this study as a group demonstrated the ability to interpret the pragmatic function of contrastive pitch accent during spoken language processing. Specifically, during the visual-world paradigm task, both neurotypical and autistic children displayed the tendency to look at the incorrect competitor item primed by the inappropriately assigned contrastive pitch accent in the target sentence, resulting in a delayed look to the correct target item. Results based on mixed-effect regression models revealed a main effect of condition without an effect of group. This lack of group differences contrasts with findings from previous studies on autistic children’s and adults’ ability to process contrastive pitch accent. Most prior research has reported that autistic individuals perform less accurately on tasks assessing receptive contrastive pitch accent. Three studies using the PEPS-C showed lower performance on the Focus Reception Task in autistic children (Diehl & Paul, 2013; Peppé et al., 2007) and autistic adults (Hesling et al., 2010). Three additional studies using similar behavioral tasks on contrastive pitch accent also reported significant differences between autistic and neurotypical groups (Grice et al., 2016; Paul et al., 2005a; Segal et al., 2017).

Notably, all aforementioned studies used offline tasks that require high behavioral demands or interactions with an experimenter. The majority of these tasks involve complicated instructions. For example, in the PEPS-C Focus Reception task, participants were instructed: “Earlier today, the person on the computer bought some socks. But when she got home, she realized she had forgotten to buy one color. If she says, ‘I wanted BLUE and black socks,’ that means she forgot to buy the blue ones, so you click on blue.” Participants then listened to sentences with contrastive pitch accents assigned to different colors and clicked on the color of the sock that was forgotten. Similarly, in Paul et al. (2005a), participants were asked to determine which sentence should logically precede the one spoken by the experimenter. For instance, after hearing, “I want CHOCOLATE ice cream,” an appropriate response would be, “Do you want vanilla?” The high language demands of these tasks make it unclear whether the performance by autistic children reflects difficulties with receptive prosody or broader language challenges. Unlike previously used offline tasks, the visual-world paradigm in this study offers improved sensitivity to the time course of spoken language processing, simple instructions, and reduced response demands from participants. This study offers the first evidence that autistic children are able to understand the pragmatic function of contrastive pitch accents and can use contrastive pitch accent predictively during spoken language processing.

Interestingly, results from our exploratory, post hoc analysis suggested that language ability may influence contrastive pitch accent processing. When analyzed as a continuous variable, language showed a marginally significant effect on participants’ looks to the correct target. Additionally, subgroup analysis revealed that autistic children with language impairment, but not autistic children with typical language, were significantly less likely to look at the correct target in Condition C when an inappropriate contrastive pitch accent was assigned. This finding is closely aligned with findings reported by Lyons et al. (2014) where the authors used behavioral tasks adapted from the PEPS-C and reported a significant difference between neurotypical children and autistic children with low language ability (defined as scoring lower than 90 on the CELF-4) in the receptive contrastive pitch accent task, overall receptive prosody, and overall expressive prosody. Another eye-tracking study with autistic children that examined language processing in autistic children similarly reported no significant difference between autism and neurotypical groups but detected significant differences once autistic children were reassigned into language-impaired and typical language groups (Brock et al., 2008).

In our study, autistic children with language impairment demonstrated significantly lower odds of looking at the correct target in Condition C, along with a qualitatively shallower slope in their gaze patterns and a lower peak level of fixations. This visual search pattern is consistent with previous studies using the visual-world paradigm with children with language impairments. McMurray et al. (2010) found that adolescents with Developmental Language Disorder (DLD) were slower to locate target items and showed a reduced peak fixation level during spoken word recognition tasks. These parallels suggest that autistic children with comorbid language impairment may share similar processing challenges during real-time spoken language comprehension with children with DLD.

We acknowledge that these subgroup comparisons were based on small sample sizes (n = 7 for Autism + TL and n = 11 for Autism + LI), which limit the statistical power of our models and increase the possibility of false negatives. For example, the lack of a significant difference between the Neurotypical and Autism + TL groups may reflect a Type II error due to low power. However, the significant difference observed between the Autism + LI and neurotypical groups and the significant subgroup by condition interaction remains noteworthy. A statistically significant finding in an underpowered model suggests a potentially robust effect—particularly when supported by consistent trends across multiple analyses and a direction of effect that aligns with prior literature (Lyons et al., 2014; Brock et al., 2008). In light of these strengths and limitations, we interpret these results as preliminary and hypothesis-generating rather than conclusive. Nonetheless, these exploratory findings reinforce the need to consider language subgroups in autism research as emphasized by Tager-Flusberg (2015) and Schaeffer et al. (2023).

Our third research question examined associations among speed of contrastive pitch accent processing and broader skills in the autistic group. This study is, to our knowledge, the first to extract latency-based measures of prosody processing from a visual-world paradigm and directly relate them to standardized assessments of prosody, structural language, pragmatic language, and autism symptom severity. Preliminary analyses supported the construct validity of the two latency measures derived from the task: latency of first fixation in Condition C (Inappropriate − Accented) was associated with performance on the PEPS-C Receptive Contrastive Pitch Accent subtest, while latency in Condition D (Inappropriate − Neutral) was associated with general language ability as measured by the CELF-5. Further analyses revealed significant correlations between speed of contrastive pitch accent processing and (a) pragmatic language skills and (b) autism symptom severity: consistent with our predictions, autistic participants with better pragmatic language or milder autism symptoms were quicker to look at the correct target in Condition C. Additionally, speed of general linguistic processing was positively correlated with pragmatic language. While previous studies have also documented significant associations between prosody and broader skills (Paul et al., 2005b), our findings extend this literature by showing that the speed—not just the accuracy—of prosodic processing is associated with individual variability in pragmatic functioning among autistic children. Lastly, the finding that both speed of processing measures was associated with pragmatic language but only the speed of contrastive pitch accent processing was correlated with autism symptom severity suggests that although prosody is often considered a component of language, prosodic and linguistic functions may operate somewhat independently, with prosody being more closely tied to autism symptomatology.

Findings from this study should be interpreted in light of several limitations. First of all, we were only able to replicate the garden-path effect of contrastive pitch accent but not the anticipatory effect reported by previous studies (Ito et al., 2014; Ito & Speer, 2008). Although this is somewhat expected based on previous literature that the anticipatory effect tends to have a smaller effect size than the garden-path effect (Ito et al., 2014), we speculate that the lack of the anticipatory effect in our study may also stem from analytical differences compared to Ito et al. (2014). In our study, we used a more specific, item-wise AOI instead of a larger, cell-wise AOI (i.e., the cell that contains three items of different colors, see Figure 1 for an example) as in Ito et al. (2014). The sentences used to examine the anticipatory effect include the same item with different colors as context item and target item across context and target sentences (e.g., “Look at the blue pumpkin. Now look at the GREEN pumpkin.” In Ito et al. (2014), it was assumed that participants would scan outside the cell containing both the context and target items after locating the context item and then return to the same cell to find the target. Trials where participants’ fixation stayed within the cell were excluded from analysis. When we re-analyzed our data using the larger, cell-wise AOIs, we found that our participants, especially autistic children, tended to keep their fixations within the cell after locating the context item and thus only needed to move a short distance to find the target item. This minimal movement may not have been sufficient to demonstrate an anticipatory effect in our design. A longer pause following the context sentence might provide participants with adequate time and encourage them to scan outside the cell. To better elicit anticipatory effects in similar paradigms, future studies may consider additional adjustments such as increasing the distance between visual items, using a screen with a large display area, and introducing additional brief visual stimuli to encourage participants to scan outside the cell. These modifications may be especially important when working with autistic children, who often exhibit more sustained or “sticky” fixation (difficulty in disengaging attention from an initial fixation; Elsabbagh et al., 2013; Landry & Bryson, 2004) or reduced spontaneous visual exploration (Sasson et al., 2008).

Our analysis sample only included autistic children without intellectual disability. We included a cognitive criterion (IQ above 70) to match neurotypical and autistic groups on cognitive ability. Although there was no inclusion or exclusion criterion based on language ability, all autistic participants in this sample demonstrated fluent and flexible use of language. One caveat of such a sample is that findings from this current study may not generalize to other subgroups of autistic children, such as minimally verbal autistic children. It is worth noting that the six autistic children excluded due to IQ scores below 70 were able to complete both the AX task and the visual-world paradigm. While these children exhibited significantly more trackloss and inattentive trials compared to autistic children without intellectual disability and neurotypical children, they still contributed an average of 46 usable trials (64% of the total number of trials). These data suggest that the visual-world paradigm has the potential to be used with autistic children with intellectual disability. Future studies with a larger sample of autistic children with intellectual disability are needed to confirm the feasibility of this paradigm and to explore its broader applicability.

Lastly, the wide age range in our sample (8 to 14 years) may introduce developmental variability in language and cognitive abilities, particularly in a heterogeneous population such as autistic children. Although age was matched across groups, the relatively wide age span—combined with a modest sample size—may limit the precision with which developmental effects can be interpreted. Future studies with larger samples and narrower age bands would be valuable for clarifying age-related changes in prosody processing. Finally, the sex distribution in our sample was imbalanced, with relatively few autistic girls. Although follow-up analyses did not provide evidence that sex influenced task performance, the underrepresentation of autistic girls in our sample limits our ability to fully examine potential sex-by-group differences in contrastive pitch accent processing. Future studies should aim to include more autistic girls to strengthen the generalizability of findings across the spectrum of autistic children.

5. Conclusions

This study focused on autistic children’s ability to perceive and interpret contrastive pitch accent during spoken language comprehension. Four main findings emerged. First, while autistic children demonstrated comparable ability in discriminating contrastive pitch accent at the acoustic-form level, their responses were significantly slower than those of neurotypical children. Second, autistic children as a group were able to interpret contrastive pitch accent and used prosodic information predictively during spoken language comprehension by showing a robust garden-path effect primed by an inappropriately placed contrastive pitch accent similar to neurotypical children. Moreover, language ability may influence the processing of contrastive pitch accents, and autistic children with language impairments, but not autistic children with typical language, showed processing patterns that differed from those of neurotypical children. Lastly, the speed of processing contrastive pitch accents was also linked to broader skills in autistic children, such as pragmatic language and autism symptom severity. Taken together, these findings highlight the value of incorporating processing-speed measures to better understand individual differences in receptive prosody ability among autistic children, the critical role of prosody in spoken language comprehension, and the broader implications of prosodic processing for understanding language and communication profiles in autism.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/languages10070161/s1, Video S1: Example of a critical test trial in Condition C with a participant’s visual pattern that demonstrates a garden-path effect.

Author Contributions

Conceptualization, P.L.S. and J.B.; methodology, P.L.S., D.G.W. and J.B.; software, P.L.S.; formal analysis, P.L.S. and J.B.; investigation, P.L.S. and J.B.; resources, J.B.; data curation, P.L.S.; writing—original draft preparation, P.L.S.; writing—review and editing, P.L.S., J.B., D.G.W. and S.C.; visualization, P.L.S.; supervision, J.B.; project administration, P.L.S.; funding acquisition, P.L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Vanderbilt Institute for Clinical and Translational research (VR52147) and a Graduate Research Grant from the Organization for Autism Research awarded to the first author.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Vanderbilt University (protocol code #180227 and date of approval: 17 April 2018).

Informed Consent Statement

Informed consent and assent were obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to restrictions imposed by the ethics approval for this study.

Acknowledgments

We thank all of the children and families who participated in making this work possible. We would also like to thank Rita Pfeiffer for her generous assistance in recording the experimental stimuli and Dr. Ling Chen for her assistance with automating the creation of the eye-tracking stimuli used in this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study, the collection, analyses, or interpretation of data, the writing of the manuscript, or the decision to publish the results.

References

Arnold, J. E. (2008). THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition, 108(1), 69–99. [Google Scholar] [CrossRef]
Asghari, S. Z., Farashi, S., Bashirian, S., & Jenabi, E. (2021). Distinctive prosodic features of people with autism spectrum disorder: A systematic review and meta-analysis study. Scientific Reports, 11(1), 1. [Google Scholar] [CrossRef] [PubMed]
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. [Google Scholar] [CrossRef]
Baltaxe, C. A. M. (1984). Use of contrastive stress in normal, aphasic, and autistic children. Journal of Speech and Hearing Research, 27(1), 97–105. [Google Scholar] [CrossRef]
Barr, D. J. (2008). Analyzing “visual world” eye-tracking data using multilevel logistic regression. Journal of Memory and Language, 59(4), 457–474. [Google Scholar] [CrossRef]
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(1), 289–300. [Google Scholar] [CrossRef]
Boersma, P., & Weenink, D. (2023). Praat: Doing phonetics by computer. Version 6.3.10, Computer software. Available online: http://www.praat.org/ (accessed on 20 June 2025).
Bolinger, D. (1961). Contrastive accent and contrastive stress. Language, 37(1), 83–96. [Google Scholar] [CrossRef]
Boucher, J. (2012). Research review: Structural language in autistic spectrum disorder—Characteristics and causes. Journal of Child Psychology and Psychiatry and Allied Disciplines, 53(3), 219–233. [Google Scholar] [CrossRef]
Brock, J., Norbury, C., Einav, S., & Nation, K. (2008). Do individuals with autism process words in context? Evidence from language-mediated eye-movements. Cognition, 108(3), 896–904. [Google Scholar] [CrossRef]
Cole, J. (2015). Prosody in context: A review. Language, Cognition and Neuroscience, 30(1–2), 1–31. [Google Scholar] [CrossRef]
Constantino, J. N., & Gruber, C. (2012). Social responsiveness scale—Second edition (SRS-2). Western Psychological Services. [Google Scholar]
Couper-Kuhlen, E. (2009). Prosody. In S. D’hondt, J.-O. Östman, & J. Verschueren (Eds.), Handbook of pragmatics highlights (Vol. 4, pp. 174–189). John Benjamins Publishing Company. [Google Scholar] [CrossRef]
Dahan, D., Tanenhaus, M. K., & Chambersc, C. G. (2002). Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language, 47(2), 292–314. [Google Scholar] [CrossRef]
DePape, A.-M. R., Chen, A., Hall, G. B. C., & Trainor, L. J. (2012). Use of prosody and information structure in high functioning adults with autism in relation to language ability. Frontiers in Psychology, 3, 72. [Google Scholar] [CrossRef] [PubMed]
Diehl, J. J., Friedberg, C., Paul, R., & Snedeker, J. (2015). The use of prosody during syntactic processing in children and adolescents with autism spectrum disorders. Development and Psychopathology, 27(3), 867–884. [Google Scholar] [CrossRef]
Diehl, J. J., & Paul, R. (2013). Acoustic and perceptual measurements of prosody production on the profiling elements of prosodic systems in children by children with autism spectrum disorders. Applied Psycholinguistics, 34(1), 135–161. [Google Scholar] [CrossRef]
Elsabbagh, M., Fernandes, J., Jane Webb, S., Dawson, G., Charman, T., & Johnson, M. H. (2013). Disengagement of visual attention in infancy is associated with emerging autism in toddlerhood. Biological Psychiatry, 74(3), 189–194. [Google Scholar] [CrossRef] [PubMed]
Gerrits, E., & Schouten, M. E. H. (2004). Categorical perception depends on the discrimination task. Perception & Psychophysics, 66(3), 363–376. [Google Scholar] [CrossRef]
Globerson, E., Amir, N., Kishon-Rabin, L., Golan, O., Kishon-Rabin, L., Golan, O., Kishon-Rabin, L., & Golan, O. (2015). Prosody recognition in adults with high-functioning autism spectrum disorders: From psychoacoustics to cognition. Autism Research, 8(2), 153–163. [Google Scholar] [CrossRef]
Grice, M., Krüger, M., & Vogeley, K. (2016). Adults with Asperger syndrome are less sensitive to intonation than control persons when listening to speech. Culture and Brain, 4(1), 38–50. [Google Scholar] [CrossRef]
Grice, M., Wehrle, S., Krüger, M., Spaniol, M., Cangemi, F., & Vogeley, K. (2023). Linguistic prosody in autism spectrum disorder—An overview. Language and Linguistics Compass, 17(5), e12498. [Google Scholar] [CrossRef]
Hallet, P. E. (1986). Eye movement. In K. Buff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. Wiley. [Google Scholar]
Hautus, M. J., Macmillan, N. A., & Creelman, C. D. (2021). Detection theory: A user’s guide (3rd ed.). Routledge. [Google Scholar] [CrossRef]
Heaton, P., Hudry, K., Ludlow, A., & Hill, E. (2008). Superior discrimination of speech pitch and its relationship to verbal ability in autism spectrum disorders. Cognitive Neuropsychology, 25(6), 771–782. [Google Scholar] [CrossRef]
Hesling, I., Dilharreguy, B., Peppé, S., Amirault, M., Bouvard, M., & Allard, M. (2010). The Integration of Prosodic Speech in High Functioning Autism: A Preliminary fMRI Study. PLoS ONE, 5(7), e11571. [Google Scholar] [CrossRef] [PubMed]
Hirst, D. J. (2005). Form and function in the representation of speech prosody. Speech Communication, 46, 334–347. [Google Scholar] [CrossRef]
Ito, K., Bibyk, S. A., Wagner, L., & Speer, S. R. (2014). Interpretation of contrastive pitch accent in six- to eleven-year-old English-speaking children (and adults). Journal of Child Language, 41(1), 84–110. [Google Scholar] [CrossRef] [PubMed]
Ito, K., & Speer, S. R. (2008). Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language, 58(2), 541–573. [Google Scholar] [CrossRef]
Järvinen-Pasley, A., Peppé, S., Gavin, A., Ae, K.-S., & Heaton, P. (2008a). The relationship between form and function level receptive prosodic abilities in autism. Journal of Autism and Developmental Disorders, 38, 1328–1340. [Google Scholar] [CrossRef]
Järvinen-Pasley, A., Wallace, G. L., Ramus, F., Happé, F., & Heaton, P. (2008b). Enhanced perceptual processing of speech in autism. Developmental Science, 11(1), 109–121. [Google Scholar] [CrossRef]
Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217–250. [Google Scholar] [CrossRef]
Kjelgaard, M. M., & Tager-Flusberg, H. (2001). An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes, 16(2–3), 287–308. [Google Scholar] [CrossRef]
Landry, R., & Bryson, S. E. (2004). Impaired disengagement of attention in young children with autism. Journal of Child Psychology and Psychiatry, 45(6), 1115–1122. [Google Scholar] [CrossRef]
Lehiste, I. (1970). Suprasegmentals (1st ed.). The MIT Press. [Google Scholar][Green Version]
Lindström, R., Lepistö-Paisley, T., Makkonen, T., Reinvall, O., Nieminen-von Wendt, T., Alén, R., & Kujala, T. (2018). Atypical perceptual and neural processing of emotional prosodic changes in children with autism spectrum disorders. Clinical Neurophysiology, 129(11), 2411–2420. [Google Scholar] [CrossRef]
Lockwood Estrin, G., Milner, V., Spain, D., Happé, F., & Colvert, E. (2021). Barriers to autism spectrum disorder diagnosis for young women and girls: A systematic review. Review Journal of Autism and Developmental Disorders, 8(4), 454–470. [Google Scholar] [CrossRef]
Lord, C., Rutter, M., DiLavore, P., Risi, S., Gotham, K., & Bishop, S. (2012). Autism diagnostic observation schedule—Second edition (ADOS-2). Western Psychological Services. [Google Scholar][Green Version]
Lyons, M., Simmons, E. S., & Paul, R. (2014). Prosodic development in middle childhood and adolescence in high-functioning autism. Autism Research, 7(2), 181–196. [Google Scholar] [CrossRef] [PubMed]
McCann, J., & Peppé, S. (2003). Prosody in autism spectrum disorders: A critical review. International Journal of Language & Communication Disorders, 38(4), 325–350. [Google Scholar] [CrossRef]
McMurray, B., Samelson, V. M., Lee, S. H., & Bruce Tomblin, J. (2010). Individual differences in online spoken word recognition: Implications for SLI. Cognitive Psychology, 60(1), 1–39. [Google Scholar] [CrossRef]
Mottron, L., Dawson, M., Soulières, I., Hubert, B., & Burack, J. (2006). Enhanced perceptual functioning in autism: An update, and eight principles of autistic perception. Journal of Autism and Developmental Disorders, 36(1), 27–43. [Google Scholar] [CrossRef] [PubMed]
Nadig, A., & Shaw, H. (2015). Acoustic marking of prominence: How do preadolescent speakers with and without high-functioning autism mark contrast in an interactive task? Language, Cognition and Neuroscience, 30(1–2), 32–47. [Google Scholar] [CrossRef]
O’Connor, K. (2012). Auditory processing in autism spectrum disorder: A review. Neuroscience and Biobehavioral Reviews, 36(2), 836–854. [Google Scholar] [CrossRef]
Ong, J. H., Zhao, C., Bacon, A., Leung, F. Y. N., Veic, A., Wang, L., Jiang, C., & Liu, F. (2024). The relationship between autism and pitch perception is modulated by cognitive abilities. Journal of Autism and Developmental Disorders, 54(9), 3400–3411. [Google Scholar] [CrossRef]
Patel, S. P., Landau, E., Martin, G. E., Rayburn, C., Elahi, S., Fragnito, G., & Losh, M. (2023a). A profile of prosodic speech differences in individuals with autism spectrum disorder and first-degree relatives. Journal of Communication Disorders, 102, 106313. [Google Scholar] [CrossRef]
Patel, S. P., Winston, M., Guilfoyle, J., Nicol, T., Martin, G. E., Nayar, K., Kraus, N., & Losh, M. (2023b). Neural processing of speech sounds in ASD and first-degree relatives. Journal of Autism and Developmental Disorders, 53(8), 3257–3271. [Google Scholar] [CrossRef]
Paul, R., Augustyn, A., Klin, A., & Volkmar, F. R. (2005a). Perception and production of prosody by speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders, 35(2), 205–220. [Google Scholar] [CrossRef]
Paul, R., Shriberg, L. D., McSweeny, J., Cicchetti, D., Klin, A., & Volkmar, F. (2005b). Brief report: Relations between prosodic performance and communication and socialization ratings in high functioning speakers with autism spectrum disorders. Journal of Autism and Developmental Disorders, 35(6), 861–869. [Google Scholar] [CrossRef] [PubMed]
Peirce, J. W. (2007). PsychoPy-Psychophysics software in Python. Journal of Neuroscience Methods, 162(1–2), 8–13. [Google Scholar] [CrossRef]
Peppé, S., & McCann, J. (2003). Assessing intonation and prosody in children with atypical language development: The PEPS-C test and the revised version. Clinical Linguistics and Phonetics, 17(4–5), 345–354. [Google Scholar] [CrossRef] [PubMed]
Peppé, S., McCann, J., Gibbon, F., O’Hare, A., & Rutherford, M. (2007). Receptive and expressive prosodic ability in children with high-functioning autism. Journal of Speech Language and Hearing Research, 50(4), 1015–1028. [Google Scholar] [CrossRef] [PubMed]
Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In Intentions in communication (pp. 271–311). MIT Press. [Google Scholar][Green Version]
Porretta, V., Kyröläinen, A. J., Van Rij, J., & Järvikivi, J. (2017, May). Visual world paradigm data: From preprocessing to nonlinear time-course analysis. In International conference on intelligent decision technologies (pp. 268–277). Springer International Publishing. [Google Scholar][Green Version]
Qin, Z., Jin, R., & Zhang, C. (2022). The effects of training variability and pitch aptitude on the overnight consolidation of lexical tones. Journal of Speech, Language, and Hearing Research, 65(9), 3377–3391. [Google Scholar] [CrossRef]
Rabagliati, H., Delaney-Busch, N., Snedeker, J., & Kuperberg, G. (2019). Spared bottom-up but impaired top-down interactive effects during naturalistic language processing in schizophrenia: Evidence from the visual-world paradigm. Psychological Medicine, 49(8), 1335–1345. [Google Scholar] [CrossRef]
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 20 June 2025).[Green Version]
Roid, G. H. (2003). Stanford-binet intelligence scales (5th ed.). Riverside Publishing. [Google Scholar][Green Version]
Russo, N. M., Skoe, E., Trommer, B., Nicol, T., Zecker, S., Bradlow, A., & Kraus, N. (2008). Deficient brainstem encoding of pitch in children with autism spectrum disorders. Clinical Neurophysiology, 119(8), 1720–1731. [Google Scholar] [CrossRef]
Rutter, M., Bailey, A., & Lord, C. (2003). The social communication questionnaire. Western Psychological Services. [Google Scholar][Green Version]
Sasson, N. J., Dichter, G. S., & Bodfish, J. W. (2012). Affective responses by adults with autism are reduced to social images but elevated to images related to circumscribed interests. PLoS ONE, 7(8), e42457. [Google Scholar] [CrossRef]
Sasson, N. J., Turner-Brown, L. M., Holtzclaw, T. N., Lam, K. S. L., & Bodfish, J. W. (2008). Children with autism demonstrate circumscribed attention during passive viewing of complex social and nonsocial picture arrays. Autism Research, 1(1), 31–42. [Google Scholar] [CrossRef]
Schaeffer, J., Abd El-Raziq, M., Castroviejo, E., Durrleman, S., Ferré, S., Grama, I., Hendriks, P., Kissine, M., Manenti, M., Marinis, T., Meir, N., Novogrodsky, R., Perovic, A., Panzeri, F., Silleresi, S., Sukenik, N., Vicente, A., Zebib, R., Prévost, P., & Tuller, L. (2023). Language in autism: Domains, profiles and co-occurring conditions. Journal of Neural Transmission, 130(3), 433–457. [Google Scholar] [CrossRef]
Segal, O., Kaplan, D., Patael, S., & Kishon-Rabin, L. (2017). Comprehension of “narrow focus” by adolescents in the autism spectrum. Folia Phoniatrica et Logopaedica: International Journal of Phoniatrics, Speech Therapy and Communication Pathology, 69(1–2), 67–77. [Google Scholar] [CrossRef] [PubMed]
Semel, W., Wiig, E. H., & Secord, W. A. (2014). Clinical evaluation of language fundamentals—Metalinguistics. Pearson Assessments. [Google Scholar]
Shaw, K. A., Williams, S., Patrick, M. E., Valencia-Prado, M., Durkin, M. S., Howerton, E. M., Ladd-Acosta, C. M., Pas, E. T., & Maenner, M. J. (2025). Prevalence and early identification of autism spectrum disorder among children aged 4 and 8 years—Autism and developmental disabilities monitoring network, 16 sites, United States, 2022. MMWR. Surveillance Summaries, 74(22), 1–22. [Google Scholar] [CrossRef] [PubMed]
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C. C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992, October 13–16). TOBI: A standard for labeling English prosody. Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP 1992) (pp. 867–870), Banff, AB, Canada. [Google Scholar] [CrossRef]
Tager-Flusberg, H. (2006). Defining language phenotypes in autism. Clinical Neuroscience Research, 6(3), 219–224. [Google Scholar] [CrossRef]
Tager-Flusberg, H. (2015). Defining language impairments in a subgroup of children with autism spectrum disorder. Science China Life Sciences, 58(10), 1044–1052. [Google Scholar] [CrossRef]
Tanenhaus, M. K., Spivey, M. J., Eberhard, K. M., Sedivy, J. C., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. [Google Scholar] [CrossRef] [PubMed]
Venker, C. E. (2019). Cross-situational and ostensive word learning in children with and without autism spectrum disorder. Cognition, 183, 181–191. [Google Scholar] [CrossRef]
Wagner, M., & Watson, D. G. (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes, 25(7), 905–945. [Google Scholar] [CrossRef]
Watson, D. G., Arnold, J. E., & Tanenhaus, M. K. (2006, May 2–5). Acoustic prominence and reference accessibility in language production. Proceedings of Speech Prosody 2006, Speech Prosody, Dresden, Germany. Available online: http://www.isca-speech.org/archive (accessed on 20 June 2025).
Watson, D. G., Tanenhaus, M. K., & Gunlogson, C. A. (2008). Interpreting pitch accents in online comprehension: H* vs. L + H*. Cognitive Science, 32(7), 1232–1244. [Google Scholar] [CrossRef]
Wiig, S. W., & Secord, W. A. (2013). Clinical evaluations of language fundamentals—Fifth edition (CELF-5). NCS Pearson. [Google Scholar]
Zhang, J., Meng, Y., Tong, X., Yuan, Z., Wu, C., & Ieong, S. L. (2018). Exploring the neural correlates of lexical stress perception in english among Chinese-English bilingual children with autism spectrum disorder: An ERP study. Neuroscience Letters, 666, 158–164. [Google Scholar] [CrossRef]
Zhang, M., Xu, S., Chen, Y., Lin, Y., Ding, H., & Zhang, Y. (2022). Recognition of affective prosody in autism spectrum conditions: A systematic review and meta-analysis. Autism, 26(4), 798–813. [Google Scholar] [CrossRef]
Zhou, P., Zhan, L., & Ma, H. (2019). Predictive language processing in preschool children with autism spectrum disorder: An eye-tracking study. Journal of Psycholinguistic Research, 48(2), 431–452. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic diagram for a sample trial in the visual-world task.

Figure 2. Example ToBI (tone and break indices) transcription of neutral and contrastive pitch accented prenominal adjectives. The blue line represents the pitch contour (F0) of the utterance extracted by Praat.

Figure 3. Mean proportion of looking to the correct target in Conditions A (Appropriate—Accented) and B (Appropriate—Neutral) (Pane A) and Conditions C (Inappropriate—Accented) and D (Inappropriate—Neutral) (Pane B) in neurotypical participants.

Figure 4. Proportion of looks to the correct target in Conditions C (Inappropriate —Accented) and D (Inappropriate—Neutral) in neurotypical and autistic participants.

Figure 5. Proportion of looks to the incorrect competitor item in Conditions C (Inappropriate—Accented) and D (Inappropriate—Neutral) in neurotypical and autistic participants.

Figure 6. Proportion of looks to the correct target in Condition C (Inappropriate − Accented) and D (Inappropriate − Neutral) shown by language subgroup (represented by color).

Table 1. Sample Characteristics.

	Neurotypical	Autism	p
Age (M, SD)	11.61 (1.97)	11.04 (1.92)	0.35
IQ (M, SD)	106.61 (11.21)	100.83 (14.44)	0.15
NVIQ ^a (M, SD)	11.26 (2.87)	10.28 (2.42)	0.25
VIQ ^b (M, SD)	10.96 (1.55)	10.00 (3.50)	0.24
Language ^c (M, SD)	108.04 (12.96)	91.67 (15.65)	<0.001 ***

Note. ^a NVIQ = non-verbal IQ measured by Stanford–Binet Intelligence Scale—Fifth Edition (SB-5; Roid, 2003); ^b VIQ = verbal IQ measured by SB-5; ^c Language measured by Clinical Evaluation of Language Fundamentals—Fifth Edition (CELF-5; Wiig & Secord, 2013). *** p < 0.001.

Table 2. Experimental conditions.

Condition	Example of a Context Sentence	Example of a Target Sentence
A. Appropriate—Accented	Look at the blue pumpkin	Now look at the GREEN pumpkin
B. Appropriate—Neutral	Look at the blue pumpkin	Now look at the green pumpkin
C. Inappropriate—Accented	Look at the blue grapes	Now look at the GREEN pumpkin
D. Inappropriate—Neutral	Look at the blue grapes	Now look at the green pumpkin
F. Filler	Look at the blue cherries	Now look at the yellow apple

Note. Capitalized words denote the presence of contrastive pitch accent.

Table 3. Summary of mixed-effect logistic regression analyses (fixed effects only) for participants’ looks to the correct target item or to the incorrect competitor item.

Variable	Binary Looks to Target				Binary Looks to Competitor
Variable	Estimate	SE	z	p	Estimate	SE	z	p
(Intercept)	−1.68	0.21	−7.78	<0.001 ***	−3.48	0.30	−11.45	<0.001 ***
Group (Ref = TD)	−0.09	0.19	−0.52	0.60	−0.03	0.26	−0.12	0.90
Condition (Ref = D)	−0.52	0.21	−2.49	0.01 *	1.11	0.26	4.27	<0.001 ***
Group × Condition	−0.22	0.19	−1.18	0.24	0.04	0.27	−0.18	0.86

Note. * p < 0.05, *** p < 0.001.

Table 4. Results from the mixed-effect logistic regression analysis with language subgroups.

Variables	Estimate	SE	z	p
(Intercept)	−0.76	0.12	−6.1	<0.001 ***
Group Autism_TL (Ref = Neurotypical)	−0.19	0.15	−1.24	0.21
Group Autism_LI (Ref = Neurotypical)	−0.29	0.17	−1.66	0.09
Condition C (Ref = Condition D)	−0.21	0.02	−9.11	<0.001 ***
Autism_TL × Condition C	−0.08	0.04	−1.73	0.08
Autism_LI × Condition C	−0.21	0.05	−3.8	<0.001 ***

Note. *** p < 0.001.

Table 5. Correlations of visual-world speed of processing measures and broader measures in the autistic group.

Construct and Measure	Speed of Contrastive Pitch Accent Processing		Speed of General Linguistic Processing
	Latency of First Fixation to Target in Condition C		Latency of First Fixation to Target in Condition D
	r	p	r	p
Contrastive Pitch Accent Understanding (PEPS-C Contrastive Pitch Accent Understanding Subtest)	−0.33	0.03 *	−0.22	0.16
Language (CELF-5 Standard Score)	−0.22	0.16	−0.32	0.04 *
Receptive Prosody (PEPS-C Receptive Prosody Composite)	−0.21	0.18	−0.15	0.34
Pragmatic Language (CELF-5 Metalinguistic Meta-Pragmatic Index)	−0.34	0.02 *	−0.40	0.008 **
Social Communication (SRS Social Communication)	−0.02	0.93	0.18	0.47
Autism Symptom Severity (ADOS-2 Total Score)	0.62	0.006 **	−0.08	0.76

Note. * p < 0.05; ** p < 0.01. Statistically significant values are presented in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Perception and Interpretation of Contrastive Pitch Accent During Spoken Language Processing in Autistic Children

Abstract

1. Introduction

1.1. Prosody Plays an Important Role in Spoken Language Communication

1.2. Prosodic Differences in Autism

1.3. Contrastive Pitch Accent in Autistic Children and Adults

1.4. This Current Study

2. Materials and Methods

2.1. Overview of Study Design

2.2. Participants

2.3. Experimental Tasks

2.3.1. AX Same–Different Discrimination Task

2.3.2. Visual-World Paradigm Task

2.4. Assessment Measures

2.4.1. Profiling Elements of Prosody in Speech—Communication (PEPS-C)

2.4.2. Clinical Evaluation of Language Fundamentals—Fifth Edition Metalinguistics (CELF-5 Metalinguistics)

2.4.3. Social Responsiveness Scale—Second Edition (SRS-2)

2.4.4. Autism Diagnostic Observation Scale—Second Edition (ADOS-2)

2.5. Eye-Tracking Data Processing

2.6. Data Analysis

3. Results

3.1. Research Question 1: Perception of Contrastive Pitch Accent on the Acoustic-Form Level

3.2. Research Question 2: Interpretation of Contrastive Pitch Accent During Spoken Language Comprehension

3.2.1. Preliminary Analysis: Replications of the Anticipatory Effect and the Garden-Path Effect in Neurotypical Participants

3.2.2. Testing the Garden-Path Effect in Autistic Participants

3.2.3. Exploratory and Post Hoc Analysis on the Effect of Language Ability on Interpretation of Contrastive Pitch Accents

3.3. Relation Between Speech of Processing Measures from the Visual-World Paradigm Task and Broader Skills in Autistic Participants

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics