2. Materials and Methods
2.1. Overview of Study Design
Autistic children and neurotypical children between 8 and 14 years participated in two experimental tasks. The study protocol was approved by the Vanderbilt University Institutional Review Board (IRB#180227). Parental consent and child assent were obtained prior to any study-related activities. For the first research question, participants first completed an AX same–different discrimination task to test their ability to perceive and discriminate contrastive pitch accent at the acoustic-form level. For the second research question, participants completed the visual-world paradigm task that examines their ability to interpret contrastive pitch accent at the pragmatic function level. Lastly, for the third research question, participants were tested on a battery of clinical assessment on receptive prosody, pragmatic language, social communication, and autism severity to examine the relations between participants’ ability to interpret contrastive pitch accent during spoken language comprehension and broader skills.
2.2. Participants
Forty-eight children between 8 and 14 were recruited (
n = 24 in each group). The inclusion criteria for neurotypical children are (a) native English speaker based on parent report, (b) no existing diagnosis of visual, hearing, neurological, or cognitive impairment per parent report, and (c) a score of under 15 (cutoff for autism) on the Social Communication Questionnaire (
Rutter et al., 2003). We directly assessed neurotypical participants’ cognitive functioning using the Stanford–Binet Intelligence Scales, Fifth Edition (SB-5;
Roid, 2003) and confirmed that all children in the neurotypical group demonstrated cognitive performance within the typical range. Autistic children were eligible for this study if they are native English speaker based on parent report, have a confirmed diagnosis of autism based on the Autism Diagnostic Observational Schedule—Second Edition (ADOS-2;
Lord et al., 2012), an Intelligent Quotient (IQ) of above 70 based on the SB-5, and have no visual or hearing impairment. Six out of the twenty-four autistic participants were excluded due to not meeting the cognitive criterion. The final sample included 24 neurotypical children (14 males, 10 females; M
age = 11.61 years; 18 White, 1 Black, 2 Asian, 3 Biracial) and 18 autistic children (16 males, 2 females; M
age = 11.04 years; 13 White, 2 Black, 1 Asian, 1 Biracial, and 1 Not Reported).
Participants were matched on age, overall IQ, nonverbal IQ, and verbal IQ (
Table 1). We also measured participants’ language ability using the Clinical Evaluations of Language Fundamentals—Fifth Edition (CELF-5;
Wiig & Secord, 2013) to characterize our sample rather than as an exclusion criterion. A significant difference was detected across neurotypical and autistic groups in language ability. Additional post hoc analyses were conducted to test the effect of language on the interpretation of contrastive pitch accent and were reported in the Results.
Additionally, the sex distribution in our sample was not balanced across groups. This pattern is consistent with the broader autism diagnostic landscape, where boys are more frequently identified than girls, with an estimated ratio of approximately 3.4:1 (
Shaw et al., 2025). Autistic girls are also more likely to be diagnosed later, which may limit their representation in research samples during childhood (
Lockwood Estrin et al., 2021). While this imbalance reflects population trends, we conducted follow-up analyses to examine whether biological sex influenced the results.
2.3. Experimental Tasks
2.3.1. AX Same–Different Discrimination Task
To test the extent to which participants can discriminate contrastive pitch accent at the acoustic-form level, which we consider a prerequisite for understanding the pragmatic function of contrastive pitch accent, participants first completed an AX same–different discrimination task (
Gerrits & Schouten, 2004). This task was programmed and implemented in PsychoPy (
Peirce, 2007). Participants listened to 16 test trials with two acoustic stimuli and were instructed to press two keys on a computer keyboard to indicate whether the two acoustic stimuli presented the same or different stimulus. All acoustic stimuli follow the same structure: “the” + adjective + noun (e.g., the sunny morning, the hot summer). Each pair contains the same phrases, but the prenominal adjective was manipulated so that eight trials had a pair with identical contrastive pitch accent patterns (e.g.,
the SUNNY morning and
the SUNNY morning, capitalized words denoting the presence of a contrastive pitch accent) and eight trials had a pair with different patterns (e.g.,
the SUNNY morning vs.
the sunny morning). Participants were provided with two examples and four practice trials before test trials. Consistent with previous studies that used AX discrimination tasks (
Gerrits & Schouten, 2004;
Qin et al., 2022), we used the d-prime metric (
d’) as a measure of participants’ ability to discriminate contrastive pitch accent.
D’ is often considered a better metric than accuracy rates as it is not affected by a participant’s bias to answer one way or the other (
Hautus et al., 2021). We also report accuracy to aid interpretation and comparability. Participants’ reaction time was used as a measure of the processing speed for perceiving prosodic form differences.
2.3.2. Visual-World Paradigm Task
To test participants’ ability to interpret contrastive pitch accent during spoken language processing, we designed a visual-world paradigm inspired by
Ito et al. (
2014). Participants watched a 19 min video with a total of 72 trials. In each trial, participants looked at a visual scene with 18 objects and heard two sentences (see
Figure 1). Participants were instructed to look at two items. The assignment of contrastive pitch accents in the sentences was manipulated to establish either contextually appropriate or inappropriate conditions for contrastive pitch accent (elaborated in the Experimental Condition section below).
Visual Stimuli. The visual display for the 72 trials was prepared by combining 12 unique items from six categories (clothing items, household items, animals, furniture, office supplies, and fruit and vegetables) in four colors. For each trial, the visual display was divided into six cells with each cell containing one unique item in three colors (see
Figure 1 for an example). The six items on each slide were always drawn from the same category. Items were carefully chosen to avoid items commonly reported as being of special interests to autistic individuals (
Sasson et al., 2008,
2012). Images of items were first tested in a pilot study with 24 adults and six children between 8 and 14 who did not take part in this study to confirm that the selected items are recognizable and familiar to participants. In the pilot study, participants were shown all selected images and were asked to name each one. Only images that were correctly labeled by all participants were included in the task. The combination of items, colors, and positions were counterbalanced so that the number of appearances for each time, color, and positions within the slide were the same across the entire set of stimuli design.
Auditory Stimuli. The auditory stimuli were recorded by a female native speaker of Mainstream American English at 44.1 KHz using Praat (
Boersma & Weenink, 2023). The auditory stimuli for each trial consist of a pair of sentences, including a context sentence and a target sentence. The context sentence contains a prenominal adjective with a neutral accent, whereas the prenominal adjective in the target sentence was either assigned a contrastive pitch accent or a neutral accent depending on the correspondent experiment condition. Acoustic analyses of the recorded stimuli were conducted to confirm that accented prenominal adjectives correspond to significantly longer duration (M
contrastive = 475.08 ms, M
neutral = 282.66 ms,
p < 0.001), higher F0 mean (M
contrastive = 228 Hz, M
neutral = 166 Hz,
p < 0.001), and higher F0 peak (M
contrastive = 306 Hz, M
neutral = 187 Hz,
p < 0.001) than neutral adjectives. The Tone and Break Index (ToBI;
Silverman et al., 1992) coding of recorded stimuli (see
Figure 2 for examples) also confirmed that prenominal adjectives with contrastive pitch accents correspond to a L + H* annotation and prenominal adjectives that are not accented correspond to an L* annotation (
Watson et al., 2008).
Once recorded, the sentences were edited so that the carrier phrases (i.e., “Look at the”) and the critical phrases (i.e., prenominal adjective and noun) were spliced out of their original context to create stimulus sentences. The same carrier phrase was used across conditions for each target item to ensure that any visual search patterns detected in this paradigm were solely due to the experimental manipulation of pitch accent patterns.
Experimental Conditions. This visual-world paradigm task has four critical conditions and one filler condition (see
Table 2 for all conditions and examples of the context and target sentences): Appropriate—Accented (denoted as A), Appropriate—Neutral (B), Inappropriate—Accented (C), Inappropriate—Neutral (D), and Filler (F). In the two critical conditions, a contrastive pitch accent was assigned to the prenominal adjective in the target sentence to create either an appropriate context for the contrastive pitch accent (Condition A) or an inappropriate context (Condition C), whereas no contrastive pitch accent was assigned to the prenominal adjectives in the target sentence in the two control conditions (Conditions B and D). The filler trials were interspersed to prevent participants from anticipating the pitch accent patterns.
The 72 trials consist of 36 critical trials (9 critical trials for each experimental condition) and 36 filler trials. Half of the items in each of the six categories were randomly selected and assigned as targets in filler trials. The remaining 36 items were assigned to critical trials and were counterbalanced across four lists using a Latin Square design. Every list contained 72 unique items. The order of the trials was randomized in creating each list but was fixed for every use of that list. The presentation lists were counterbalanced across participants.
The key comparisons of interest are between Conditions A vs. B and C vs. D. Both Conditions A and B contain the same noun across the context sentence and the target sentence, thus creating an appropriate context to use a contrastive pitch accent to contrast the color modifier in Condition A. An anticipatory effect would be present if participants look at the target item faster in Condition A compared to Condition B. Conditions C and D contain different nouns across the context and target sentences, creating an inappropriate context to use a contrastive pitch accent in Condition C. A garden-path effect would be present if participants look at the target item slower in Condition C compared to Condition D.
Instrumentation. Participants sat in front of a Tobii X2 eye-tracker and a set of speakers. Participants’ eye movements were first calibrated using the Tobii Clear View 5-point calibration program. They were then instructed to look at pictures while listening to sentences that would ask them to look for specific items in each picture. Participants’ eye movements during the task were sampled at 60 Hz.
2.4. Assessment Measures
Participants completed the following assessments to examine the relations between participants’ ability to interpret contrastive pitch accent during the visual-world paradigm task and broader skills.
2.4.1. Profiling Elements of Prosody in Speech—Communication (PEPS-C)
The PEPS-C (
Peppé & McCann, 2003) is a structured and computerized prosody assessment and has been used with both neurotypical and autistic children. It consists of 14 subtests, including seven expressive and seven receptive subtests. Two subtests assess prosodic ability at the level of form (auditory discrimination and imitation). Twelve subtests assess prosodic ability at the level of function, including turn-end understanding/expression, affect understanding/expression, lexical stress understanding/expression, phrasal stress understanding/expression, boundary understanding/expression, and focus understanding/expression. The receptive prosody composite, which is a sum of all receptive subtests, was used as an index of participants’ overall receptive prosodic ability.
2.4.2. Clinical Evaluation of Language Fundamentals—Fifth Edition Metalinguistics (CELF-5 Metalinguistics)
The CELF-5 Metalinguistics (
Semel et al., 2014) is a standardized test that assesses individual’s ability to make inferences, engage in discourses, and understand ambiguous or figurative language. Participants completed two subtests, Making Inferences and Conversational Skills, which were used to derive a Meta-Pragmatic Index score as a measure of their pragmatic language ability. For the Making Inferences subtest, participants were asked to interpret short paragraph-length vignette by taking the communicative context into consideration. Conversation Skills asks participants to express their intentions appropriately during a conversation given semantic and contextual constraints.
2.4.3. Social Responsiveness Scale—Second Edition (SRS-2)
The SRS-2 (
Constantino & Gruber, 2012) is an extensively validated parent-report rating scale designed to characterize and quantify differences in social communication and interaction often associated with autism in children and adults. The score from the Social Communication subscale was used to index participants’ social communication.
2.4.4. Autism Diagnostic Observation Scale—Second Edition (ADOS-2)
The ADOS-2 (
Lord et al., 2012) is a semi-structured observational tool designed to diagnose autism and was used in this study to confirm autism diagnosis in autistic participants. Total score from the ADOS-2 logarithm was used as a measure of autism symptom severity for autistic participants.
2.5. Eye-Tracking Data Processing
A 250 × 250 pixels square around each target item was used as the target area of interest (AOI). Participants’ raw eye movement data were exported and coded as either 1 or 0 for each given AOI. The analysis window was selected a priori as 300–1500 ms after the prenominal adjective onset during the target sentence: the length of the window (1200 ms) is between the window used in
Diehl et al. (
2015) for autistic adolescents (1000 ms) and
Venker (
2019) for autistic preschoolers (1600 ms). This window was offset by 300 ms, as programming and executing an eye movement typically takes 200 ms in adults and 300 ms in children (
Arnold, 2008;
Hallet, 1986).
Prior to analyzing the eye-tracking data, eye-gaze trackloss data during the analysis window were analyzed to examine the proportion of data contributed by each group. Trackloss occurs when an eye-tracker is not able to capture valid eye movement due to blinks or excessive movements. The percentage of the tracked sample during the analysis window did not differ across conditions (p = 0.56) or groups (p = 0.98). On average, the percentage of tracked samples was 89.87% for neurotypical children and 87.11% for autistic children. Test trials with less than 50% of the tracked sample during the analysis window were eliminated as they were deemed to contain insufficient data. This data cleaning process removed 26 trials from neurotypical participants and 68 trials from autistic participants.
Additionally, given that the context sentence in each trial serves to create either an appropriate or inappropriate context for contrastive pitch accent in the target sentence, a trial where a participant did not look at the item in the context sentence does not provide meaningful information regarding whether the contrastive pitch accent was effective in influencing the participant’s visual search. Thus, we excluded trials where the participant failed to look at the context item following the onset of the noun. This data cleaning step removed 29 trials out of the total of 1728 trials from neurotypical participants (2% attrition) and 80 trials out of 1296 trials (6% attrition) from autistic participants. In the final analysis sample, on average, each neurotypical participant contributed 71 trials, and each autistic participant contributed 68 trials.
2.6. Data Analysis
To address our first research question regarding the extent to which participants were able to perceive and discriminate contrastive pitch accent at the acoustic-form level, we conducted independent t-tests to compare d’ and reaction time in the AX same–different task between neurotypical and autistic participants.
For our second research question on participants’ ability to interpret contrastive pitch accent during spoken language comprehension, we first tested whether neurotypical children demonstrated the expected anticipatory and garden-path effects. Establishing these effects in the neurotypical group provided a basis for interpreting potential group differences. We then conducted between-group comparisons using mixed-effects models. The dependent variable was participants’ proportion of looks to either the correct target or the incorrect competitor (in trials where contrastive pitch accent was used inappropriately). Fixed effects included group, condition, and their interaction, with crossed random effects for subjects and items. Additionally, because neurotypical and autistic participants differed significantly in language ability, we conducted post hoc analyses to examine the extent to which language ability relates to children’s interpretation of contrastive pitch accent.
To address our third research question on the extent to which individual differences in autistic children’s ability to interpret contrastive pitch accent are associated with broader language and social communication skills, we extracted two processing speed measures from the visual-world paradigm: (a) latency of first fixation to the correct target in Condition C (Inappropriate—Accented), indexing speed of contrastive pitch accent processing; and (b) latency of first fixation in Condition D (Inappropriate—Neutral), indexing general linguistic processing speed. These measures were correlated with four broader skill indices, including receptive prosody (PEPS-C Receptive Prosody Composite), pragmatic language (CELF-5 Metalinguistics Meta-Pragmatic Index), social communication (SRS Social Communication score), and autism severity (ADOS-2 Total Score).
3. Results
3.1. Research Question 1: Perception of Contrastive Pitch Accent on the Acoustic-Form Level
In the AX same–different task, neurotypical participants achieved a mean d’ of 3.08 (SD = 0.30), while autistic participants achieved a mean d’ of 2.94 (SD = 0.47). A Welch’s independent-samples t-test revealed no significant difference between groups, t(27.31) = 1.16, p = 0.26, 95% CI [−0.11, 0.40]. For reference, accuracy was also high in both groups: neurotypical participants had a mean accuracy of 98.96% (SD = 4.0%, range: 87.5–100%), and autistic participants had a mean accuracy of 97.22% (SD = 8.3%, range: 72.22–100%).
In terms of reaction time, neurotypical participants responded with an average latency of 0.86 s (SD = 0.35, range: 0.39–1.79), compared to 1.13 s in the autistic group (SD = 1.13, range: 0.53–2.68). A Welch’s t-test indicated a statistically significant difference in reaction time between groups, t(21.89) = –2.23, p = 0.03, 95% CI [−0.51, −0.018], with autistic participants showing slower responses on average.
Given the unbalanced distribution of males and females across groups, we conducted follow-up analyses to test for potential sex differences. Independent-samples t-tests revealed no significant differences by sex for either d’, t(37.80) = −1.56, p = 0.13, 95% CI [−0.34, 0.04] or reaction time, t(28.84) = 1.80, p = 0.09, 95% CI [−0.02, 0.39].
3.2. Research Question 2: Interpretation of Contrastive Pitch Accent During Spoken Language Comprehension
3.2.1. Preliminary Analysis: Replications of the Anticipatory Effect and the Garden-Path Effect in Neurotypical Participants
The second research question examined autistic children’s ability to interpret contrastive pitch accent during spoken language comprehension. Specifically, we examined the extent to which autistic children show an anticipatory effect, where appropriate use of the contrastive pitch accent leads to faster looks to the correct target, and a garden-path effect, where inappropriate use of the contrastive pitch accent leads to initial looks to an incorrect competitor.
As a preliminary analysis, we assessed the extent to which neurotypical children demonstrated an anticipatory effect or a garden-path effect to confirm that our visual-world paradigm task elicited the expected patterns of interpretation. Examining these effects in the neurotypical group provided a basis for interpreting potential group differences in the subsequent analyses. We tested this both visually and statistically. Participants’ proportion of looks to the target AOI during the target sentence was binned in time bins of 100 ms and plotted in a continuous time course.
Figure 1 depicts participants’ proportions of looks to the target item in Conditions A and B, where the context sentence presents an appropriate context for a contrastive pitch accent (
Figure 3A) and in Conditions C and D with an inappropriate context for a contrastive pitch accent (
Figure 3B). In
Figure 3B, a clear gap was observed in the analysis window between the two lines, respectively representing Condition C and Condition D, indicating a robust garden-path effect in neurotypical participants. In other words, neurotypical participants looked at the correct target item more slowly in Condition C when an inappropriate contrastive pitch accent was assigned compared to the control Condition D (see
Supplementary Video S1 for an example of a critical test trial where the participant showed a garden-path effect). However, we did not detect an anticipatory effect in neurotypical participants. As shown in
Figure 3A, participants’ fixations to the target in Conditions A and B align with each other, suggesting that participants did not visually locate the target item faster in Condition A when an appropriate contrastive pitch accent was assigned compared to the control Condition B.
Statistical analyses confirmed the presence of a garden-path effect and the lack of an anticipatory effect in neurotypical participants. Mixed-effect logistic regression models were conducted using the
lme4 package in
R version 4.3.1 (
R Core Team, 2023). This statistical approach is a commonly used approach with visual-world paradigm studies (
Porretta et al., 2017;
Rabagliati et al., 2019). We used this method as it accommodates both binomially distributed looking data and also accounts for the clustered nature of observations from the visual-world paradigm (trials nested in participants and items) (
Barr, 2008). The dependent variables in all mixed-effect logistic regression models were binary looking responses (look to the correct target or non-look) at each sampled timepoint during the critical analysis window. All models included crossed random intercepts and slopes for participants and items, which allow estimates of participant and level variability, in addition to fixed effects of conditions and groups for between-group analyses (
Baayen et al., 2008).
To test the extent to which neurotypical participants showed a garden-path effect, participants’ eye-gaze data from Conditions C and D were fitted with a fixed effect of condition and crossed random effects of subjects and items. A significant fixed effect of condition confirmed the delayed visual search toward the target item in Condition C compared to Condition D (β = 0.76, SE = 0.24, Wald’s z = 3.01, p = 0.002). The odds of looking at the correct target time was 53% lower (odds ratio = 0.47, 95% CI: [0.29, 0.76]) in Condition C compared to Condition D. To test the anticipatory effect, participants’ eye-gaze data from Conditions A and B were fitted, which did not differ significantly based on condition (β = 0.02, SE = 0.2, Wald’s z = 0.09, p = 0.93). Given that we were only able to replicate the garden-path effect in neurotypical participants, we limited the between-group analyses to analyses of the garden-path effect to understand the extent to which autistic children were able to use contrastive pitch accent during spoken language processing.
3.2.2. Testing the Garden-Path Effect in Autistic Participants
Figure 4 and
Figure 5 depict the mean proportion of looks to the correct target item and the incorrect competitor item in Conditions C and D for both neurotypical and autistic participants. The competitor item is the incorrect item primed by the inappropriate use of a contrastive pitch accent. For instance, in the example given in
Table 2, the competitor item for the sentences, “Look at the blue grapes. Now look at the GREEN pumpkin”, would be the green grapes. A clear gap was observed between the solid line (representing Condition C) and the dashed line (representing Condition D) during the critical analysis window for both groups, indicating a robust garden-path effect in both groups. For both neurotypical and autistic participants, their looks to the correct target rose later in Condition C compared to Condition D. Participants’ looks to the competitor item followed the opposite pattern and started rising around 300 ms post-target adjective onset in Condition C. These early steep increases in looks to the incorrect competitor item suggest that participants immediately used the contrastive pitch accent cue to anticipate that the second item would be the same type as the previously mentioned context item before hearing and processing the noun information that specified the correct target item.
Two mixed-effect logistic regression models were conducted with participants’ binary looks to the correct target item or the incorrect competitor item as the dependent variable and with a fixed effect of group, condition, their interaction, and crossed random effects of subjects and items. Results from both models showed a significant effect of condition with no significant group effect or group by condition interaction (
Table 3), suggesting that both neurotypical and autistic children demonstrated the ability to interpret contrastive pitch accent during spoken language comprehension. Across participants, the odds of looking at the correct target time was 52% lower (odds ratio = 0.48, 95% CI: [0.31, 0.41]) in Condition C compared to Condition D. The odds of looking at the incorrect competitor item primed by the inappropriate use of the contrastive pitch accent in Condition C was 3.2 times higher than the odds of looking at the competitor item in the neutral Condition D (odds ratio = 3.2, 95% CI: [1.84, 5.54]).
Similar to RQ1, we conducted follow-up analyses to examine whether participants’ biological sex influenced the results. We conducted the same set of mixed-effect models with participants’ binary looks to the correct target or the incorrect competitor item as the dependent variable, with a fixed effect of sex, condition, their interaction, and crossed random effects of subjects and items. There was no significant main effect of sex (p = 0.79 for the model with correct target looks as the dependent variable; p = 0.25 for competitor looks) and no significant sex × condition interaction (p = 0.28 and p = 0.88, respectively). The garden-path effect remained statistically robust (p < 0.001 for both models). These results suggest that there was no evidence that participants’ contrastive pitch accent processing differed by sex in this task.
3.2.3. Exploratory and Post Hoc Analysis on the Effect of Language Ability on Interpretation of Contrastive Pitch Accents
Given that a significant difference was detected between neurotypical and autistic participants on language ability, we conducted two sets of additional post hoc analyses to understand the effect of language on participants’ ability to interpret contrastive pitch accent. First, we fitted a mixed-effect model with language (indexed by participants’ standard score from the CELF-5 centered around the mean to reduce correlation with intercept and help with model convergence) and conditions as fixed effect, their interaction, and crossed random effects of subject and item. Only random intercepts were included as the model with the random slopes did not converge. Results revealed a marginally significant fixed effect of language (β = 0.01, SE = 0.01, Wald’s z = 1.66, p = 0.1), a significant fixed effect of condition (β = −0.63, SE = 0.03, Wald’s z = −23.237, p < 0.001), with no interaction (p = 0.74). These findings suggest that participants in all three groups have significantly lower odds of looking at the correct target in Condition C compared to Condition D. Additionally, high language ability shows a slight, marginally significant trend toward increasing the odds of a correct target look.
Additionally, we assigned autistic children into two language subgroups based on their CELF-5 standard score: autistic children with a language standard score of 85 or above were placed in the autism with typical language group (Autism + TL;
n = 7, mean age = 10.97, mean CELF-5 standard score = 101.63) and those with a standard score of lower than 85 were placed in the autism with language impairment group (Autism + LI;
n = 11, mean age = 11.16, mean CELF-5 standard score = 76). Participants’ looking patterns in all three subgroups (Neurotypical, Autism + TL, and Autism + LI) were examined visually and tested statistically in mixed-effect logistic regression models. As shown in
Figure 6, children in all three groups showed a garden-path effect elicited by the inappropriate use of contrastive pitch accent in Condition C. A mixed-effect logistic regression model with subgroup, condition, and their interaction as fixed effects and random intercepts of subject and items (
Table 4) revealed a significant fixed effect of condition (
β = −0.21,
SE = 0.02, Wald’s
z = −9.113,
p < 0.001) and a significant group (being in Autism + TL group as referenced to neurotypical group) by condition interaction (
β = −0.21,
SE = 0.05, Wald’s
z = −3.8,
p < 0.001). We followed up by conducting pairwise contrasts for each condition using the
emmeans package. Results revealed significant differences between subgroups in Condition C but not in Condition D. Specifically, children in the Autism + LI group demonstrated significantly lower odds of looking at the correct target compared to the neurotypical group (log-odds ratio = −0.51,
SE = 0.18,
p = 0.01). When translated into odds ratios, these results indicate that neurotypical children are 1.66 times more likely to look at the correct target item in condition C when compared to autistic children with language impairments. No significant differences were detected between the neurotypical and Autism + TL groups (log-odds ratio = 0.26,
SE = 0.15,
p = 0.19).
3.3. Relation Between Speech of Processing Measures from the Visual-World Paradigm Task and Broader Skills in Autistic Participants
For our third research question, two speed of processing measures were extracted from the visual-world paradigm task. Latency of first fixation to the correct target in Condition C (Inappropriate−Accented) and latency of first fixation in Condition D (Inappropriate−Neutral) were derived to respectively index speed of contrastive pitch accent processing and speed of general linguistic processing. These two measures were correlated with four measures of broader skills, including receptive prosody measured by the PEPS-C Receptive Prosody Composite, pragmatic language measured by the CELF-5 Metalinguistics Meta-Pragmatic Index, social communication measured by the SRS Social Communication score, and autism severity measured by ADOS-2 Total Score.
We first conducted preliminary correlation analyses to evaluate the construct validity of the two speed of processing measures. These two measures were intended to index distinct processes: contrastive pitch accent processing and general linguistic processing. Given that no prior studies have extracted speed of processing measures from a visual-world paradigm and directly correlated them with broader language and social communication skills in autistic children, the preliminary analyses served to confirm that the speed of processing measures were distinct from each other and aligned with the theoretical constructs they were intended to represent. Good concurrent validity is demonstrated if participants’ speed of contrastive pitch accent processing is associated with the Contrastive Pitch Accent Understanding subtest scores from the PEPS-C and if the speed of general linguistic processing is associated with the CELF-5 language standard score. As shown in
Table 5, the latency of first fixation to the correct target in Condition C was significantly correlated with the Contrastive Pitch Accent Understanding subtest but not language. The opposite pattern of associations was found for the speed of general linguistic processing: a significant correlation was detected between the latency of the first fixation to the correct target with language, but not the specific measure of prosody perception.
Further analyses revealed that both the speed of contrastive pitch accent processing and the speed of general linguistic processing were significantly correlated with pragmatic language. The speed of contrastive pitch accent also positively significantly correlated with autism symptom severity: autistic participants who took longer to locate the correct target item in Condition C tended to exhibit more severe autism symptoms. Neither speed of processing measures correlated with receptive prosody or social communication. All
p-values reported have been adjusted for multiple comparisons using the
Benjamini and Hochberg (
1995) false discovery rate correction implemented using the
p.adjust function in the
stats package in R.
4. Discussion
This study investigated autistic children’s ability to perceive contrastive pitch accent on an acoustic-form level and interpret its pragmatic function during spoken language comprehension. Furthermore, we investigated the relations between autistic children’s ability to interpret contrastive pitch accent and broader skills, including receptive prosody, pragmatic language, social communication, and autism symptom severity. Results suggest that while autistic children as a group were able to discriminate and use prosodic information predictively during spoken language comprehension, group differences were detected in reaction time on the AX same–different task and in speed of processing among autistic children with language impairment. Additionally, the speed of contrastive pitch accent in autistic children is associated with pragmatic language skills and autism symptom severity. Specifically, autistic children with stronger pragmatic language skills and less severe autism symptom severity are quicker to look at the correct target item in the condition involving an inappropriate contrastive pitch accent.
For our first research question, our hypothesis was partially supported. Autistic children performed comparably to neurotypical peers in their ability to discriminate contrastive pitch accent at the acoustic-form level, as reflected in similar
d’ on the AX same–different task. This finding is consistent with prior research showing intact or even enhanced performance in autistic individuals on similar low-level prosodic discrimination tasks (
Heaton et al., 2008;
Järvinen-Pasley et al., 2008a;
Patel et al., 2023a). However, contrary to our hypothesis, autistic children showed significantly slower reaction times compared to neurotypical children, suggesting that although they were able to detect prosodic differences, the processing demands of the task may have required more cognitive effort or time. Importantly, this pattern emerged despite the two groups being matched on nonverbal IQ, indicating that the slower responses were not attributable to general cognitive ability. To our knowledge, this is the first study to demonstrate that autistic children may be slower at processing contrastive pitch accent at the acoustic-form level, even when their discrimination ability—as indexed
by d’—is comparable to neurotypical peers. This suggests that differences in prosodic perception in autistic children may not reflect in perceptual sensitivity but rather in the efficiency of processing. These findings highlight the importance of examining processing speed alongside accuracy, as timing measures may reveal nuanced aspects of prosodic processing that accuracy alone would miss.
For our second research question, autistic children in this study as a group demonstrated the ability to interpret the pragmatic function of contrastive pitch accent during spoken language processing. Specifically, during the visual-world paradigm task, both neurotypical and autistic children displayed the tendency to look at the incorrect competitor item primed by the inappropriately assigned contrastive pitch accent in the target sentence, resulting in a delayed look to the correct target item. Results based on mixed-effect regression models revealed a main effect of condition without an effect of group. This lack of group differences contrasts with findings from previous studies on autistic children’s and adults’ ability to process contrastive pitch accent. Most prior research has reported that autistic individuals perform less accurately on tasks assessing receptive contrastive pitch accent. Three studies using the PEPS-C showed lower performance on the Focus Reception Task in autistic children (
Diehl & Paul, 2013;
Peppé et al., 2007) and autistic adults (
Hesling et al., 2010). Three additional studies using similar behavioral tasks on contrastive pitch accent also reported significant differences between autistic and neurotypical groups (
Grice et al., 2016;
Paul et al., 2005a;
Segal et al., 2017).
Notably, all aforementioned studies used offline tasks that require high behavioral demands or interactions with an experimenter. The majority of these tasks involve complicated instructions. For example, in the PEPS-C Focus Reception task, participants were instructed: “
Earlier today, the person on the computer bought some socks. But when she got home, she realized she had forgotten to buy one color. If she says, ‘I wanted BLUE and black socks,’ that means she forgot to buy the blue ones, so you click on blue.” Participants then listened to sentences with contrastive pitch accents assigned to different colors and clicked on the color of the sock that was forgotten. Similarly, in
Paul et al. (
2005a), participants were asked to determine which sentence should logically precede the one spoken by the experimenter. For instance, after hearing, “
I want CHOCOLATE ice cream,” an appropriate response would be, “
Do you want vanilla?” The high language demands of these tasks make it unclear whether the performance by autistic children reflects difficulties with receptive prosody or broader language challenges. Unlike previously used offline tasks, the visual-world paradigm in this study offers improved sensitivity to the time course of spoken language processing, simple instructions, and reduced response demands from participants. This study offers the first evidence that autistic children are able to understand the pragmatic function of contrastive pitch accents and can use contrastive pitch accent predictively during spoken language processing.
Interestingly, results from our exploratory, post hoc analysis suggested that language ability may influence contrastive pitch accent processing. When analyzed as a continuous variable, language showed a marginally significant effect on participants’ looks to the correct target. Additionally, subgroup analysis revealed that autistic children with language impairment, but not autistic children with typical language, were significantly less likely to look at the correct target in Condition C when an inappropriate contrastive pitch accent was assigned. This finding is closely aligned with findings reported by
Lyons et al. (
2014) where the authors used behavioral tasks adapted from the PEPS-C and reported a significant difference between neurotypical children and autistic children with low language ability (defined as scoring lower than 90 on the CELF-4) in the receptive contrastive pitch accent task, overall receptive prosody, and overall expressive prosody. Another eye-tracking study with autistic children that examined language processing in autistic children similarly reported no significant difference between autism and neurotypical groups but detected significant differences once autistic children were reassigned into language-impaired and typical language groups (
Brock et al., 2008).
In our study, autistic children with language impairment demonstrated significantly lower odds of looking at the correct target in Condition C, along with a qualitatively shallower slope in their gaze patterns and a lower peak level of fixations. This visual search pattern is consistent with previous studies using the visual-world paradigm with children with language impairments.
McMurray et al. (
2010) found that adolescents with Developmental Language Disorder (DLD) were slower to locate target items and showed a reduced peak fixation level during spoken word recognition tasks. These parallels suggest that autistic children with comorbid language impairment may share similar processing challenges during real-time spoken language comprehension with children with DLD.
We acknowledge that these subgroup comparisons were based on small sample sizes (
n = 7 for Autism + TL and
n = 11 for Autism + LI), which limit the statistical power of our models and increase the possibility of false negatives. For example, the lack of a significant difference between the Neurotypical and Autism + TL groups may reflect a Type II error due to low power. However, the significant difference observed between the Autism + LI and neurotypical groups and the significant subgroup by condition interaction remains noteworthy. A statistically significant finding in an underpowered model suggests a potentially robust effect—particularly when supported by consistent trends across multiple analyses and a direction of effect that aligns with prior literature (
Lyons et al., 2014;
Brock et al., 2008). In light of these strengths and limitations, we interpret these results as preliminary and hypothesis-generating rather than conclusive. Nonetheless, these exploratory findings reinforce the need to consider language subgroups in autism research as emphasized by
Tager-Flusberg (
2015) and
Schaeffer et al. (
2023).
Our third research question examined associations among speed of contrastive pitch accent processing and broader skills in the autistic group. This study is, to our knowledge, the first to extract latency-based measures of prosody processing from a visual-world paradigm and directly relate them to standardized assessments of prosody, structural language, pragmatic language, and autism symptom severity. Preliminary analyses supported the construct validity of the two latency measures derived from the task: latency of first fixation in Condition C (Inappropriate − Accented) was associated with performance on the PEPS-C Receptive Contrastive Pitch Accent subtest, while latency in Condition D (Inappropriate − Neutral) was associated with general language ability as measured by the CELF-5. Further analyses revealed significant correlations between speed of contrastive pitch accent processing and (a) pragmatic language skills and (b) autism symptom severity: consistent with our predictions, autistic participants with better pragmatic language or milder autism symptoms were quicker to look at the correct target in Condition C. Additionally, speed of general linguistic processing was positively correlated with pragmatic language. While previous studies have also documented significant associations between prosody and broader skills (
Paul et al., 2005b), our findings extend this literature by showing that the speed—not just the accuracy—of prosodic processing is associated with individual variability in pragmatic functioning among autistic children. Lastly, the finding that both speed of processing measures was associated with pragmatic language but only the speed of contrastive pitch accent processing was correlated with autism symptom severity suggests that although prosody is often considered a component of language, prosodic and linguistic functions may operate somewhat independently, with prosody being more closely tied to autism symptomatology.
Findings from this study should be interpreted in light of several limitations. First of all, we were only able to replicate the garden-path effect of contrastive pitch accent but not the anticipatory effect reported by previous studies (
Ito et al., 2014;
Ito & Speer, 2008). Although this is somewhat expected based on previous literature that the anticipatory effect tends to have a smaller effect size than the garden-path effect (
Ito et al., 2014), we speculate that the lack of the anticipatory effect in our study may also stem from analytical differences compared to
Ito et al. (
2014). In our study, we used a more specific, item-wise AOI instead of a larger, cell-wise AOI (i.e., the cell that contains three items of different colors, see
Figure 1 for an example) as in
Ito et al. (
2014). The sentences used to examine the anticipatory effect include the same item with different colors as context item and target item across context and target sentences (e.g., “
Look at the blue pumpkin. Now look at the GREEN pumpkin.” In
Ito et al. (
2014), it was assumed that participants would scan outside the cell containing both the context and target items after locating the context item and then return to the same cell to find the target. Trials where participants’ fixation stayed within the cell were excluded from analysis. When we re-analyzed our data using the larger, cell-wise AOIs, we found that our participants, especially autistic children, tended to keep their fixations within the cell after locating the context item and thus only needed to move a short distance to find the target item. This minimal movement may not have been sufficient to demonstrate an anticipatory effect in our design. A longer pause following the context sentence might provide participants with adequate time and encourage them to scan outside the cell. To better elicit anticipatory effects in similar paradigms, future studies may consider additional adjustments such as increasing the distance between visual items, using a screen with a large display area, and introducing additional brief visual stimuli to encourage participants to scan outside the cell. These modifications may be especially important when working with autistic children, who often exhibit more sustained or “sticky” fixation (difficulty in disengaging attention from an initial fixation;
Elsabbagh et al., 2013;
Landry & Bryson, 2004) or reduced spontaneous visual exploration (
Sasson et al., 2008).
Our analysis sample only included autistic children without intellectual disability. We included a cognitive criterion (IQ above 70) to match neurotypical and autistic groups on cognitive ability. Although there was no inclusion or exclusion criterion based on language ability, all autistic participants in this sample demonstrated fluent and flexible use of language. One caveat of such a sample is that findings from this current study may not generalize to other subgroups of autistic children, such as minimally verbal autistic children. It is worth noting that the six autistic children excluded due to IQ scores below 70 were able to complete both the AX task and the visual-world paradigm. While these children exhibited significantly more trackloss and inattentive trials compared to autistic children without intellectual disability and neurotypical children, they still contributed an average of 46 usable trials (64% of the total number of trials). These data suggest that the visual-world paradigm has the potential to be used with autistic children with intellectual disability. Future studies with a larger sample of autistic children with intellectual disability are needed to confirm the feasibility of this paradigm and to explore its broader applicability.
Lastly, the wide age range in our sample (8 to 14 years) may introduce developmental variability in language and cognitive abilities, particularly in a heterogeneous population such as autistic children. Although age was matched across groups, the relatively wide age span—combined with a modest sample size—may limit the precision with which developmental effects can be interpreted. Future studies with larger samples and narrower age bands would be valuable for clarifying age-related changes in prosody processing. Finally, the sex distribution in our sample was imbalanced, with relatively few autistic girls. Although follow-up analyses did not provide evidence that sex influenced task performance, the underrepresentation of autistic girls in our sample limits our ability to fully examine potential sex-by-group differences in contrastive pitch accent processing. Future studies should aim to include more autistic girls to strengthen the generalizability of findings across the spectrum of autistic children.