Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives

Xing, Jiayin; Lau, Joseph C. Y.; Nayar, Kritika; Landau, Emily; Kumareswaran, Mitra; Grabowecky, Marcia; Losh, Molly

doi:10.3390/brainsci16010107

Open AccessArticle

Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives

by

Jiayin Xing

^1,2,

Joseph C. Y. Lau

²,

Kritika Nayar

^1,2

,

Emily Landau

²,

Mitra Kumareswaran

²,

Marcia Grabowecky

³ and

Molly Losh

^2,*

¹

Department of Child and Adolescent Psychiatry, Child Study Center, Hassenfeld Children’s Hospital at NYU Langone, New York, NY 10016, USA

²

Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL 60201, USA

³

Department of Psychology, Northwestern University, Evanston, IL 60201, USA

^*

Author to whom correspondence should be addressed.

Brain Sci. 2026, 16(1), 107; https://doi.org/10.3390/brainsci16010107

Submission received: 6 September 2025 / Revised: 6 January 2026 / Accepted: 8 January 2026 / Published: 19 January 2026

(This article belongs to the Collection Editorial Board Members’ Collection Series: Autism Spectrum Conditions from Childhood to Adulthood—Current Situation and Prospects)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Narrative differences in autism spectrum disorder (ASD) and subtle and parallel differences among their first-degree relatives suggest potential genetic liability to this critical social-communication skill. Effective social-communication relies on coordinating signals across modalities, which is often disrupted in ASD. Therefore, the current study examined the coordination of fundamental skills—gaze and speech—as a potential mechanism underlying narrative and broader pragmatic differences in ASD and their first-degree relatives. Methods: Participants included 35 autistic individuals, 41 non-autistic individuals, 90 parents of autistic individuals, and 34 parents of non-autistic individuals. Participants narrated a wordless picture book presented on an eye-tracker, with gaze and speech simultaneously recorded and subsequently coded. Time series analyses quantified their temporal coordination (i.e., the temporal lead of gaze to speech) and content coordination (i.e., the amount of gaze-speech content correspondence). These metrics were then compared between autistic and non-autistic groups and between parent groups and examined in relation to narrative quality and conversational pragmatic language skills. Results: Autistic individuals showed reduced temporal coordination but increased content coordination relative to non-autistic individuals with no significant differences found between parent groups. In both autistic individuals, and parent groups combined, increased content coordination and reduced temporal coordination were linked to reduced narrative quality and pragmatic language skills, respectively. Conclusions: Reduced temporal and increased content coordination may reflect a localized strategy of labeling items upon visualization. This pattern may indicate more limited visual, linguistic, and cognitive processing and underlie differences in higher-level social-communicative abilities in ASD. To our knowledge, this study is the first to identify multimodal skill coordination as a potential mechanism contributing to higher-level social-communicative differences in ASD and first-degree relatives, implicating mechanism-based interventions to support pragmatic language skills in ASD.

Keywords:

ASD; gaze-speech coordination; narrative; family study; pragmatic language

1. Introduction

Human social-communication is a multimodal system, where different component competencies collaborate to support successful communication [1]. Differences in the coordination of social-communicative signals in a socially informative manner have been observed in individuals with autism spectrum disorder (ASD) [2,3] and may relate to social-communication patterns that are characteristic of the diagnosis [4]. The current study examined social-communicative coordination in ASD, with a focus on differences in narrative, or storytelling, and investigated the coordination of two fundamental skills related to narration, i.e., gaze and speech [5], as the potential mechanism underlying narrative and broader pragmatic language differences. In this context, gaze was used as a measure of visual attention directed toward relevant story elements when viewing a storybook or a movie scene; speech refers to the coded verbal narratives obtained from the participants who described the story elements; and gaze-speech coordination refers to how visual information is integrated into speech and language to support storytelling [6,7,8,9,10,11,12]. Further, previous family studies demonstrated subtle but parallel subclinical social-communicative differences in first-degree relatives of autistic individuals (i.e., parents and siblings), suggesting potential genetic liability [13,14,15,16]. Therefore, the current study examined gaze-speech coordination during narration in both ASD and first-degree relatives of autistic individuals as a potential underlying mechanism of higher-level differences in social-communication skills. These skills include narrative abilities, i.e., organizing events in temporal and causal frameworks to communicate one’s reality during social interactions [17,18], as well as conversational skills, which involve managing dynamic social exchanges in real time.

Social-communicative coordination refers to the integration of different modalities to support communication. These social-communicative behaviors can complement each other to support communicative functions (e.g., mutual gaze during speech can communicate interest and intention) [19] and can coordinate temporally (e.g., gestures can augment speech by emphasizing specific information) [20]. This ability emerges in early child development (e.g., the coordination between reaching and gaze behaviors) [21,22] and is universally observed across cultures [1]. In individuals with ASD, cortical connectivity differences may lead to a perseverative cognitive style, reduced flexibility, and difficulty integrating information across modalities [23,24]. Indeed, studies have documented developmental differences in social-communicative coordination in ASD [2,3], the disruptions in which can further negatively impact social and language development [3,25,26]. These differences in social-communicative coordination within complex communicative contexts (e.g., narrative and conversation) may further serve as mechanistic contributors to high-level social-communication differences in ASD, such as pragmatic language abilities. However, this association between coordination differences and higher-order communication skills has not been examined in prior work on ASD.

The current study examines such relationships in autism and in parents in the context of narrative for the first time, which constitutes an important social-communication ability. Specifically, differences in the coordination of two component skills of narration—gaze and speech—were examined as potential contributors to narrative differences and broader social-communicative differences in ASD. Two primary types of gaze-speech coordination include temporal coordination (i.e., the extent to which gaze precedes speech on the same stimuli) and content coordination (i.e., the degree of correspondence between gaze focus and speech content) [12,27].

1.1. Gaze-Speech Temporal Coordination

Gaze-speech temporal coordination is the time lag between gaze and speech on the same visual stimulus. It may reflect underlying cognitive and speech and language planning processes and relate to higher-level social-communicative skills [5]. Temporal coordination has been studied in tasks such as rapid automatized naming (RAN) and narrative contexts [6,10,11,12].

The RAN task developed by Denckla and Rudel [28] assesses the speed of naming familiar visual stimuli, including letters, numbers, colors, and objects. Some studies recorded gaze during RAN to examine potential mechanisms of verbal output [5], offering a more nuanced examination of automaticity than traditional metrics of reaction time and error rate. As such, RAN may serve as a model to study gaze-speech temporal coordination. More specifically, temporal coordination during RAN reflects the fluency of underlying cognitive processes (e.g., visual processing and working memory) and linguistic processes (e.g., phonological retrieval and articulatory planning) that support complex speech and language skills, including reading [8,29,30,31,32,33].

Unlike the RAN task, a narrative context involves enriched visual information about story plots and characters (including both social and non-social information), higher-level linguistic processing (e.g., planning sentence structure within unfolding discourse), and integrative cognitive processes, such as interpreting relationships among story elements and the overarching story theme. In narrative tasks, participants narrated from a picture or storybook presented on an eye-tracker while gaze and speech were recorded simultaneously [8,9,12]. In typical development, two stages are observed: broad scanning to extract the gist, followed by focused gaze on individual objects to describe them in speech [9,12]. This sequence allows visual-conceptual processing and speech planning, supporting enriched and cohesive narratives [27,34,35,36]. The current study extends prior work by investigating temporal coordination during narration in autistic individuals and their first-degree relatives. This approach can shed light on related visual and cognitive processing patterns during narration and their potential impact on ASD-related differences in narrative quality and pragmatic language more broadly.

1.2. Gaze-Speech Content Coordination

In addition to temporal coordination, content coordination refers to the degree of content consistency or correspondence between what is fixated in gaze and what is described in speech during narration. This alignment suggests that gaze may be used strategically to enrich narrative content. In typical development, visual attention to contextual elements correlates with verbal descriptions of those elements, suggesting strategic use of gaze to enrich narrative content [27,34,35,36,37]. However, prior studies to date have not quantified content coordination in ASD, which may elucidate mechanisms underlying narrative differences.

1.3. Gaze-Speech Coordination in First-Degree Relatives

Previous studies have revealed a constellation of subclinical features that are parallel in quality to the defining characteristics of ASD in first-degree relatives of autistic individuals (i.e., parents and siblings), including differences in narrative functions and pragmatic language [13,14,15,16,38,39,40,41,42,43,44]. These features, called the broad autism phenotype (BAP), may reflect the genetic risk for autism [15,16]. Beyond behavioral and language features, parallel differences in underlying gaze-speech temporal coordination in RAN have been found in siblings and parents of autistic individuals [5,44], compared to non-autistic individuals and parent controls, respectively, suggesting similar mechanisms contributing to differences in higher-level social-communicative skills in both ASD and first-degree relatives.

1.4. The Current Study

This study examines gaze-speech temporal and content coordination during narratives in autistic individuals and their parents. The study additionally examines how these coordination processes may relate to differences in narrative quality and broader pragmatic language ability. We predict that autistic individuals would differ in both temporal and content gaze-speech coordination, with similar but subtler differences in their parents. We also expect that reduced gaze-speech coordination would be associated with reduced-quality narratives and conversational pragmatics in both groups.

2. Materials and Methods

2.1. Participants

Participants included 35 autistic individuals (ASD group), 41 individuals without autism (non-ASD group), 90 parents of autistic individuals (ASD parent group), and 34 parents of individuals without autism (parent control group). Parent groups included both parents when possible (ASD parent group: n = 20 dyads; parent control group: n = 2 dyads). All participants spoke English as their first language and were recruited from local registries, clinics, and advocacy groups. ASD diagnostic status was confirmed or ruled out using gold standard tools, including the Autism Diagnostic Observation Schedule-Second Edition (ADOS-2) [45] and/or the Autism Diagnostic Interview-Revised (ADI-R) [46]. ASD calibratedseverity scores were also extracted from ADOS-2, including Overall, Social Affect, and Restricted and Repetitive Behaviors (RRB) [45,47]. The Broad Autism Phenotype Questionnaire (BAPQ) [48,49] was used to assess the highest level of education completed by parents as well as autistic and non-autistic individuals.

For all participants, those who were under the age of 10 and who had a verbal IQ lower than 80, measured by the Wechsler Intelligence Scale for Children—Third Edition (WISC-IV), Wechsler Abbreviated Scale of Intelligence (WASI), or the Wechsler Adult Intelligence Scale (WAIS)—Third or Fourth Editions [50,51,52], were excluded to ensure sufficient language ability to produce a complex narrative. Participants with a family history of ASD-related disorders (e.g., fragile X syndrome and tuberous sclerosis) were also excluded. See Table 1 for detailed demographic information.

2.2. Procedures

Participants viewed a 24 page wordless picture book, Frog, Where Are You? [53] presented on a Tobii T60 series eye-tracker (Tobii Technology AB, Danderyd, Sweden) with a resolution of 1280 × 1024 pixels. All participants were seated approximately 50–60 cm from the screen during eye-tracking tasks. The frog story is about a series of adventures that a boy and his pet dog went through when looking for their missing frog and has been widely used as a well-controlled narrative elicitation task among children and adults across different languages [54], numerous neurodevelopmental populations [55,56], and in both ASD and first-degree relatives [57]. Participants completed the task in a quiet, private space in their home, a reserved space (e.g., hotel), or in the laboratory, depending on the scheduling needs of participants. Only the participant and experimenter were present. Parents did not accompany children to prevent practice effects, and sessions were scheduled at convenient times. While participants narrated each page, narrative speech was recorded by an external microphone (Blue Snowball or Logitech USB Desktop Microphone) (see Lee et al., 2020 [8] for details). Gaze and narrative data reported in prior work [5,8,57,58] were used to support new analyses applied in the current study, including detailed hand-coding of visual attentional focus and speech content and computational time series analyses (see details below).

2.3. Existing Data Processing

2.3.1. Transcription

Narratives were transcribed by transcribers trained to ≥80% word reliability and blind to diagnostic status using conventions in either Systematic Analysis of Language Transcripts (SALT) [59] or EUDICO Linguistic Annotator (ELAN) software version 5.8 [60]. A total of 18.5% of transcripts (n = 37), randomly selected by diagnosis and sex, were double transcribed for reliability. Overall word-level reliability was 95.86%, with 95.98% for parent controls, 96.93% for ASD parents, 95.11% for non-ASD controls, and 94.67% for ASD participants.

2.3.2. Alignment

Trained coders, blind to diagnosis, marked utterance onsets and offsets using TextGrids in Praat [61] (https://www.fon.hum.uva.nl/praat/, accessed on 1 September 2022; version 6.0.29). The Forced Alignment and Vowel Extraction [62] program was subsequently used to align speech with transcription at the word level automatically. The onset and offset times of speech during each word were further extracted using a Praat script (see Patel et al., 2020 [43] for details).

2.3.3. Gaze Processing

Gaze data has been reported and processed in prior work [8,57,58]. Areas of interest (AOIs) were pre-selected in Tobii Studio software (Version 3 2.1), including social story characters and non-social setting elements. Fixations were defined based on the I-VT fixation filter available in Tobii Studio. Participants with a word-to-tracked eye movement time ratio greater than four words/second within each story episode (i.e., setting, searching, and resolution) were excluded, indicating poor gaze tracking during vocalization (see Lee et al., 2020 [8] for details).

2.4. Narrative Quality

Narrative quality was assessed using a hand-coding scheme developed by prior studies [57,58]. Primary measures of narrative quality included Affect/Cognition, which represents the percentage of descriptions of thoughts/emotions of the story characters; Story Components Present, which measures the inclusion of key story elements in the narrative; and Causal Inferences, which refers to the percentage of causal explanations of story events and actions of the story characters (see Nayar et al., 2024 [57] for details). These data were reported in prior work [57,58] and were used here for correlational analyses.

2.5. New Data Processing

2.5.1. Narrative Coding

Speech content was hand-coded at the word level to determine the specific story component referenced in each spoken word. The coding scheme included key story elements related to the thematic content of each page of the story, such as the main story characters (e.g., boy, frog, and dog) and setting elements (e.g., moon and woods). Coders were trained to ≥80% reliability and blind to diagnostic status. Twenty percent of transcripts (n = 40) were randomly selected by diagnosis and sex and double-coded, with an inter-rater reliability of 98.2%.

2.5.2. Gaze Coding

AOIs included in the existing gaze processing were further filtered to include only those relevant to the story theme, following the same coding scheme used for narrative speech. When AOIs overlapped, fixations were assigned to the AOI with greater social relevance to the story context (e.g., social stimuli over non-social stimuli and protagonists over secondary story characters). For each fixation, the onset and offset times and the AOIs fixated were extracted. The coded narrative speech and gaze time series were then aligned at 0.2 s intervals. No participants were excluded due to alignment issues.

2.5.3. Pragmatic Language Ability

The Pragmatic Rating Scale-School Age (PRS-SA) [63] and the Pragmatic Rating Scale (PRS) [64] were used to assess conversational pragmatic language abilities in ASD and non-ASD groups, and parent groups, respectively. Both the PRS and PRS-SA capture components of conversational pragmatic language, such as conversational management and nonverbal communication [65]. The PRS-SA is rated based on conversational components of the ADOS-2, while the PRS ratings are based on conversations from a life history interview, where examiners converse with parents on topics such as their childhood, marriage, family relationships, and profession, etc. Coders were trained to ≥ 80% reliability and were blind to group status. Part of the coded PRS-SA and PRS files were reported in prior work [65,66]. In the current sample, 31% (n = 17) of the PRS-SA files and 84% (n = 116) of the PRS files were double-coded for reliability check, with an inter-rater reliability at 76.10% and 83.52%, respectively. Discrepancies were resolved between reliability coders, and consensus codes were used for analyses.

2.6. Data Analysis

2.6.1. Gaze-Speech Coordination: Diagonal Cross Recurrence Profiles (DCRP) Analysis

Given the continuous nature of multimodal behaviors during narration, we applied diagonal cross recurrence profiles (DCRP) analysis using the CRQA package(version 2.0.7 [67] in R (version 4.4.1 (2024-06-14)) to quantify the content and temporal coordination between speech and gaze across time. DCRP is a recurrence-based non-linear approach to computational time series analysis. It measures the shared dynamics of two coupled time series, which have been increasingly employed in the cognitive and social sciences [68,69]. The primary measures used to quantify gaze-speech coordination are described below (see Supplement and Figure S1 for detailed computational methods) as follows:

Recurrence rate (RR): Recurrence rate quantifies the proportion of recurrence/correspondence between two time series (i.e., gaze focus and narrative content examining the same story component/visual stimuli). In the current study, a higher RR indicates greater consistency between gaze focus and speech content, representing greater gaze-speech content coordination.
Recurrence rate peak (RRpeak): Recurrence rate peak measures the highest proportion of correspondence between gaze and speech during narration. A higher RRpeak represents a greater maximum amount of coordination across different time lags between time series, which refers to the greatest level of content coordination or consistency between gaze and speech across narration in the current study.
Qlos: Qlos measures the extent to which one behavioral time series leads the other, i.e., gaze leads speech on the same story components in time. A higher Qlos represents a greater temporal lead of gaze over speech, or a higher average extent of gaze-speech temporal coordination across narration.

2.6.2. Sham Data Samples

Sham data samples were created following methods developed in prior studies [70,71] to serve as a statistical baseline. Sham data samples were created by shuffling the time sequences of gaze and speech time series (iterations n = 10,000), and randomly pairing gaze and speech data from different individuals across diagnostic groups. The sham data samples were generated separately for ASD and non-ASD groups, and for the parent groups, with sample sizes matched to the participant groups. Further, the mean values of the three coordination measures, i.e., RR, RRpeak, and Qlos, in the sham data samples were extracted from those in real participants to quantify the temporal and content coordination levels relative to the randomized level.

2.6.3. Group Comparisons

A series of mixed-effects linear regression models were applied using the lmer package(version 1.1-38) [72] in R (version 4.4.1 (2024-06-14)) for group comparisons on coordination metrics. Both measures of temporal coordination (i.e., Qlos) and the measure of content coordination (i.e., RR and RRpeak) from DCRP analyses were compared, with averages from the sham groups extracted. All models included diagnosis (ASD vs. non-ASD or ASD parents vs. parent controls) and covariates (i.e., age, gender, VIQ, and duration of speech) as fixed effects and participants as random effects. Age, gender, and verbal IQ were controlled as covariates given their potential influences on narrative. The duration of speech was covaried because of its potential influence on gaze-speech content coordination across narration (i.e., the longer the narration, the greater the possibility of correspondence between gaze focus and speech content). Given the expected subtle differences in non-clinical parent groups, both significant and marginally significant group differences were reported.

2.6.4. Correlations

Partial Pearson correlations examined the associations between main measures of gaze-speech coordination (i.e., Qlos and RR) and ASD symptom severity, narrative quality, and conversational pragmatic language abilities, with age, sex, VIQ, and duration of speech as covariates. Within-family associations between mothers and children were also examined. Only statistically significant correlations (ps < 0.05) are reported. Benjamini-Hochberg-adjusted p-values [73] were calculated with an FDR of 0.10, following prior work [8,74] in heterogeneous ASD and BAP samples, to balance the control of false positives with sensitivity to subtle, potentially meaningful effects in subclinical populations.

The correlations with ASD symptom severity extracted from ADOS-2 via calibrated severity scores, including domains of Overall, Social Affect, and Restricted and Repetitive Behaviors (RRBs), were conducted in the ASD group only. Correlations between gaze-speech coordination and narrative quality measures (i.e., Affect/Cognition, Story Components Present, and Causal Inferences) and conversational pragmatic language (measured by PRS-SA for ASD and non-ASD groups, and by PRS for parent groups) were conducted. The correlations were examined in both ASD and ASD parent groups individually and in ASD and non-ASD groups combined and parent groups combined, respectively, to understand the potential contributions of differences in gaze-speech coordination to differences in higher-level social-communicative skills in ASD, first-degree relatives impacted by ASD-related genetic risk, and the broader general population.

Further, within-family correlations in mother-child dyads were examined on coordination measures. Specifically, mother-child correlations were examined in families of autistic individuals only (n = 18), because of the low sample size of families of non-autistic individuals (n = 8). Correlations between father-child pairs were not conducted because of low sample size (n = 11 for ASD families and n = 4 for families of non-autistic individuals).

2.7. GenAI Use

During the preparation of this manuscript/study, the authors used ChatGPT (OpenAI, GPT-5, 2025) for the purposes of language refinement and improving clarity of expression. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

3. Results

3.1. Group Comparisons

For Qlos, the ASD group had reduced temporal coordination compared to the non-ASD group (estimate = −0.02, t = −3.81, p < 0.001, Cohen’s d = −2.98), whereas no significant difference was found between parent groups (estimate = 0.001, t = 0.21, p = 0.84, Cohen’s d = 0.13). For gaze-speech content coordination, the autistic group demonstrated increased content coordination measured by both RR (estimate = 4.19, t = 2.18, p = 0.03, Cohen’s d = 1.78) and RRpeak (estimate = 0.04, t = 1.96, p = 0.05, Cohen’s d = 1.6), compared to non-autistic individuals. No significant differences were found between parent groups for RR (estimate = 0.62, t = 0.35, p = 0.73, Cohen’s d = 0.22) or RRpeak (estimate = 0.008, t = 0.45, p = 0.66, Cohen’s d = 0.28) (see Figure 1). Effect sizes (Cohen’s d) are relatively large because d scales mean differences by the pooled standard deviation [75]. DCRP-derived gaze-speech coordination measures are continuous and temporally dense, which can result in reduced within-group variability; consequently, moderate absolute differences may yield large, standardized effects and should be interpreted in this measurement context.

3.2. Correlations with ASD Symptom Severity

Gaze-speech coordination was not significantly correlated with ASD symptom severity (see detailed statistics in Table S1 in Supplementary Materials).

3.3. Correlations with Narrative Quality

Greater gaze-speech content coordination, as measured by RR, was significantly correlated with increased inclusion of descriptions of emotional and cognitive states and behaviors in narrative storytelling (Affect/Cognition) in the ASD group (r = 0.40, p = 0.03, adjusted p = 0.34). In parent groups combined, a greater RR was associated with the inclusion of fewer key story elements and interactions among story characters (Story Components Present) (r = −0.19, p = 0.04, adjusted p = 0.34). No other significant correlations emerged (see detailed statistics in Table S2 in Supplementary Materials).

3.4. Correlations with Pragmatic Language Ability

Reduced temporal coordination between gaze and speech, as measured by Qlos, was correlated with more conversational pragmatic language violations, reflected by higher PRS-SA or PRS scores, in the autistic and non-autistic groups combined (r = −0.31, p = 0.03, adjusted p = 0.08). This finding was mainly driven by the ASD group (r = −0.37, p = 0.05, adjusted p = 0.09) (see Figure 2). In contrast, increased content coordination between gaze and speech, as measured by RR, was correlated with poorer conversational pragmatic language ability in parent groups combined (r = 0.26, p = 0.005, adjusted p = 0.04) (see Figure 2), which was driven by the ASD parent group (r = 0.26, p = 0.02, adjusted p = 0.08). No other significant correlations were found between gaze-speech content coordination and pragmatic language (see detailed statistics in Table S3 in Supplementary Materials).

3.5. Mother-Child Correlations

In the n = 18 mother-child dyads from ASD families, significant correlations were found for gaze-speech temporal coordination, such that reduced Qlos in mothers was correlated with reduced Qlos in their children with ASD (r = 0.57, p = 0.02, adjusted p = 0.03) (see Figure 3). No significant mother–child correlation was found for gaze-speech content coordination measured by RR.

4. Discussion

This study is the first to examine gaze-speech coordination during narration in autistic individuals and their parents. Two aspects of coordination were assessed as follows: (1) temporal coordination, representing the extent of gaze leading speech in time; and (2) content coordination, referring to the content consistency between gaze and speech. Reduced temporal and increased content gaze-speech coordination during narration were found in the ASD group compared to non-autistic controls. Although ASD parents showed a similar descriptive pattern—lower temporal coordination and higher content coordination compared to parent controls—these differences were more subtle and did not reach statistical significance, likely reflecting smaller effect sizes in subclinical populations. However, in both the ASD and ASD parent groups, differences in gaze-speech coordination were associated with differences in higher-level communicative abilities, including narrative and conversational pragmatic language abilities, providing potential insight into gaze-speech coordination as an underlying mechanism contributing to higher-level ASD-related social-communicative differences.

Reduced gaze-speech temporal coordination aligns with prior RAN studies showing shorter gaze-speech lag in autistic individuals [44]. These earlier results suggest reduced automaticity in the integration of visual and verbal information in ASD, a mechanism that may underlie narrative or higher-level social-communicative differences. The current study extends prior work from RAN tasks to a more complex and cognitively demanding linguistic context (i.e., narrative production) that mirrors language use in daily life and similarly observed reduced gaze-speech temporal coordination in ASD. Unlike RAN tasks, which involve simple visual and linguistic processing, the narrative task engages more enriched visual-conceptual processing of story plots and characters (both social and non-social information), high-level cognitive processes (e.g., extracting the gist of and relations within a scene, relating individual story elements to the overarching theme) before speech output, as well as higher-order executive-linguistic processing such as planning sentence structure within unfolding discourse [8,11,34]. Therefore, disrupted temporal coordination during narration in ASD may reduce the time available for visual, linguistic, and higher-level cognitive systems to operate in a coordinated manner and may serve as a potential mechanistic factor contributing to higher-order social-communicative differences in ASD.

The temporal coordination is precise, quantifiable, and neurocognitively grounded, moving beyond observable behaviors. The observed mother-child associations in temporal coordination suggest that gaze-speech coordination during narration may reflect heritable traits influenced by ASD-related molecular-genetic variability, consistent with prior literature on pragmatic language differences in the broad autism phenotype (BAP) [40,42,43,57]. Additionally, prior family studies have revealed potential patterns of maternal linearity, where phenotypic associations in social communication between mothers and autistic children appear more consistent and robust than those between fathers and autistic children, potentially reflecting a shared social-communicative “signature” [76]. By contrast, associations between fathers and autistic children appear to be more evident in the rigidity/repetitive behavior domain [5,76]. These mechanistic coordination differences may represent promising heritable features linking genetic likelihood to observable behaviors and could help reduce phenotypic heterogeneity in biological research by identifying more homogeneous family subgroups. In this context, the mother-child associations observed here may reflect shared genetic influences on the integrated visual, cognitive, and linguistic processes that support narrative communication, suggesting the potential value of temporal coordination as a window into the underlying mechanisms supporting social-communicative functioning in ASD and the BAP. Future studies of temporal coordination may benefit from a focus on genetically informed designs (e.g., twin or extended family studies) and longitudinal studies tracking the development of temporal coordination across early childhood to more explicitly examine temporal coordination as a potential phenotypic marker of heritable social-communication differences in ASD. Nevertheless, the current study may highlight the potential value of temporal coordination as a window into the underlying processes supporting social-communicative functioning in ASD and the BAP.

Although prior studies in non-autistic individuals suggest that gaze-speech content consistency reflects the integration of visual information to enrich speech and language production [27,35,37], the increased gaze-speech content coordination observed in ASD in the present study may reflect a different functional pattern. Specifically, heightened coordination may indicate a greater reliance on immediately available visual input during narration, biasing speakers toward directly labeling visually salient elements rather than integrating information across story components. This tendency may reduce inferences about relationships among events (e.g., cause-effect relations), characters, or the overarching narrative theme. This localized gaze-speech coordination pattern is consistent with Bayesian accounts of perception in ASD, which posit atypical weighting of prior expectations relative to sensory input, potentially resulting in greater reliance on moment-to-moment visual information and reduced integration of broader contextual cues [77].

An illustrative example highlights this distinction. When describing page 4 of the picture book, an individual with ASD stated, “the boy picked up his boot and looked inside and he turned over a stepstool and the dog stuck his head in the jar to try to find him there,” closely mirroring the sequence of visually fixated elements for this individual. In contrast, a non-autistic individual produced a higher-level summary—“he looked everywhere and the dog helped to look too”—that parsimoniously integrated the characters’ actions within the broader narrative theme of searching. A similar pattern emerged in descriptions of characters’ facial expressions. Individuals with greater gaze-speech content coordination tended to directly label emotional or cognitive states, potentially without fully integrating the contextual information or event relationships that give rise to those states. Accordingly, greater gaze-speech content coordination was associated with lower narrative quality, as reflected by fewer descriptions of interactions among key story elements and affective and cognitive states; however, these associations were significant only before multiple comparison correction and should therefore be interpreted cautiously. More broadly, a reliance on direct visual labeling may constrain interpretation of the wider social-communicative context, contributing to pragmatic language differences observed across both clinical and non-clinical populations, highlighting a potential shared cognitive or processing pattern between autistic individuals and their first-degree relatives.

Because the measures of content correspondence employed here (RR and RRpeak) capture gaze-speech content synchronization rather than a specific cognitive strategy, interpretations of increased coordination as reflecting overreliance on immediate visual input should require further investigation. Further, greater gaze-speech content coordination should not be interpreted as inherently maladaptive. Its functional significance is likely context-dependent. While heightened content coordination may be associated with reduced narrative integration in the present task, close alignment between visual attention and verbal output may support grounding, shared reference, and listener comprehension in other communicative contexts. Conversely, reduced gaze-speech coordination may also pose challenges, such as insufficient use of visual information to guide verbal output or weaker integration across perceptual and linguistic systems—patterns not captured by the current narrative paradigm. More broadly, effective communication may depend less on the absolute degree of synchrony than on the flexibility to dynamically shift between synchronous and asynchronous modes as tasks and social demands change. Future studies could therefore benefit from examining how both heightened and diminished gaze-speech coordination and the dynamics of changes in intrapersonal coordination may relate to social-communication skills across contexts.

Associations detected between gaze-speech coordination during narration and higher-level pragmatic language and social-communicative skills may also have important clinical implications for mechanism-based social communication interventions in ASD. Because gaze-speech temporal coordination may reflect the integration of visual, cognitive, and linguistic processes during narration, interventions that explicitly support multimodal integration, rather than focusing narrowly on labeling or describing visual information, may help strengthen the underlying mechanisms that contribute to cohesive storytelling and higher-level social-communicative skills. Evidence-based programs such as PEERS [78], which target pragmatic skills across specific conversational and social contexts (e.g., entering a conversation and being a good sport), may benefit from incorporating training in the coordination of gaze, speech, and cognitive processes to support effective narrative and conversational communication. Furthermore, speech-gaze coordination could serve as a mechanistic marker to track intervention progress, and future studies should further understand whether temporal coordination is malleable and can improve following social-pragmatics interventions.

These differences in gaze-speech temporal and content coordination observed in ASD may reflect variation in cortical structure and large-scale brain networks supporting visual, linguistic, and cognitive processes. Although neural mechanisms were not directly measured during narration, evidence from related paradigms (e.g., picture naming) suggests that coordination relies on interactions among distributed occipital, temporal, parietal, and frontal regions involved in visual processing, semantic integration, and speech planning [79,80,81,82,83]. Altered structure and function in these networks has been reported previously in ASD (e.g., angular gyrus, superior temporal sulcus, and Wernicke’s area [84,85,86,87,88]), which may contribute to reliance on compensatory pathways and differences in gaze-speech coordination. Future studies combining eye tracking and neuroimaging during narrative tasks are needed to directly examine how these distributed networks may support gaze-speech coordination and how such mechanisms may vary in autistic individuals and their first-degree relatives.

Several limitations of this study should be noted. First, although the frog story has been widely used to assess narrative ability across the lifespan, future work should examine its generalizability to real-world storytelling contexts. Moreover, while narrative is a primary form of social communication, conversations may provide a more naturalistic context mirroring language use during dynamic social interactions. Prior work examining gaze during conversational speech mainly focused on its social-pragmatic functions (e.g., turn-taking [89,90]), but little is known about gaze content and temporal coordination with speech in conversation. Therefore, future studies may leverage novel technology, such as head-mounted eye trackers, to examine gaze-speech coordination during conversations and its contributions to social communication. Second, the current study examined content and temporal coordination across the entire narration without a more detailed investigation into gaze patterns across time or the dynamics in gaze-speech coordination. Prior work has found that during narration, individuals may have a brief apprehension phase, or a global scan of pictures first, and then have a closer examination of each component closely [7]. Future studies might apply more fine-grained time course analyses to examine how gaze focus and its coordination with speech may dynamically change across time. Finally, due to the linguistic demands of the task, the current autistic sample did not include those with lower verbal or cognitive abilities. Future studies should include more diverse autistic samples, including individuals with higher support needs and larger groups of females, to better understand gaze-speech coordination across the spectrum.

5. Conclusions

In conclusion, the current study revealed differences in temporal and content coordination between gaze and speech during narration in autistic individuals. Associations between measures of coordination and narrative and conversational skill in both ASD and in parents implicate variability in gaze-speech as a potential mechanistic contributor to higher-level social-communication abilities. Future research should examine whether gaze-speech coordination is malleable and can improve following targeted social-pragmatics interventions, potentially serving as a mechanistic marker of intervention progress. Additionally, evidence-based programs, such as PEERS, could incorporate training in gaze-speech coordination to strengthen the underlying mechanisms of social communication.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/brainsci16010107/s1, Figure S1: Gaze-speech coordination was calculated based on the DCRP plots; Table S1: Correlations with ASD symptom severity; Table S2: Correlations with narrative quality; Table S3: Correlations with pragmatic language ability.

Author Contributions

Conceptualization, J.X., J.C.Y.L., K.N., and M.L.; methodology, J.X. and J.C.Y.L.; software, J.X.; validation, J.X., J.C.Y.L., and M.K.; formal analysis, J.X.; investigation, J.X., E.L., K.N., and M.K.; resources, M.L.; data curation, J.X., E.L., K.N., and M.K.; writing—original draft preparation, J.X.; writing—review and editing, J.C.Y.L., E.L., M.L. and K.N.; visualization, J.X.; supervision, M.L., M.G., J.C.Y.L. and K.N.; project administration, M.L.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by grants from the National Institutes of Health (5R21DC022031-02, R01DC010191, R03MH079998, R01MH091131, PI: Losh) and the National Science Foundation (BCS-0820394, PI: Losh) and P30DC012035.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Northwestern University (FWA 00001549, approval date: 17 September 2020).

Informed Consent Statement

Informed assent/consent was obtained from all participants and guardians (as applicable).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request due to privacy and ethical restrictions.

Acknowledgments

We are grateful to the individuals who participated in this study and to all the staff and students who assisted with data collection and processing. During the preparation of this manuscript/study, the authors used ChatGPT (OpenAI, GPT-5, 2025) for the purposes of language refinement and improving clarity of expression. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADI-R	Autism Diagnostic Interview—Revised
ADOS-2	Autism Diagnostic Observation Schedule, Second Edition
AOIs	Areas of Interest
ASD	Autism Spectrum Disorder
BAP	Broad Autism Phenotype
DCRP	Diagonal Cross Recurrence Profiles
ELAN	EUDICO Linguistic Annotator
FSIQ	Full-Scale Intelligence Quotient
IQ	Intelligence Quotient
LOS	Gray Line of Synchrony
PIQ	Performance Intelligence Quotient
PRS	Pragmatic Rating Scale
PRS-SA	Pragmatic Rating Scale—School Age
RAN	Rapid Automatized Naming
RR	Recurrence Rate
RRBs	Restricted and Repetitive Behaviors
RRpeak	Recurrence Rate Peak
SALT	Systematic Analysis of Language Transcripts
STS	Superior Temporal Sulcus
VIQ	Verbal Intelligence Quotient
WAIS	Wechsler Adult Intelligence Scale—Third or Fourth Editions
WASI	Wechsler Abbreviated Scale of Intelligence
WISC-IV	Wechsler Intelligence Scale for Children—Fourth Edition

References

Levinson, S.C.; Holler, J. The origin of human multi-modal communication. Philos. Trans. R. Soc. B Biol. Sci. 2014, 369, 20130302. [Google Scholar] [CrossRef] [PubMed]
de Marchena, A.; Eigsti, I.M. Conversational gestures in autism spectrum disorders: Asynchrony but not decreased frequency. Autism Res. 2010, 3, 311–322. [Google Scholar] [CrossRef] [PubMed]
Parladé, M.V.; Iverson, J.M. The development of coordinated communication in infants at heightened risk for autism spectrum disorder. J. Autism Dev. Disord. 2015, 45, 2218–2234. [Google Scholar] [CrossRef] [PubMed]
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5^®); American Psychiatric Pub: Washington, DC, USA, 2013. [Google Scholar]
Nayar, K.; Gordon, P.C.; Martin, G.E.; Hogan, A.L.; La Valle, C.; McKinney, W.; Lee, M.; Norton, E.S.; Losh, M. Links between looking and speaking in autism and first-degree relatives: Insights into the expression of genetic liability to autism. Mol. Autism 2018, 9, 51. [Google Scholar] [CrossRef]
Gordon, P.C.; Hoedemaker, R.S. Effective scheduling of looking and talking during rapid automatized naming. J. Exp. Psychol. Hum. Percept. Perform. 2016, 42, 742–760. [Google Scholar] [CrossRef]
Griffin, Z.M.; Bock, K. What the eyes say about speaking. Psychol. Sci. 2000, 11, 274–279. [Google Scholar] [CrossRef] [PubMed]
Lee, M.; Nayar, K.; Maltman, N.; Hamburger, D.; Martin, G.E.; Gordon, P.C.; Losh, M. Understanding social communication differences in autism spectrum disorder and first-degree relatives: A study of looking and speaking. J. Autism Dev. Disord. 2020, 50, 2128–2141. [Google Scholar] [CrossRef] [PubMed]
Meyer, A.S.; Roelofs, A.; Levelt, W.J. Word length effects in object naming: The role of a response criterion. J. Mem. Lang. 2003, 48, 131–147. [Google Scholar] [CrossRef]
Miyake, A.; Friedman, N.P. The Nature and Organization of Individual Differences in Executive Functions: Four General Conclusions. Curr. Dir. Psychol. Sci. 2012, 21, 8–14. [Google Scholar] [CrossRef]
van der Meulen, F. Coordination of eye gaze and speech in sentence production. Trends Linguist. Stud. Monogr. 2003, 152, 39–64. [Google Scholar]
Griffin, Z.M.; Oppenheimer, D.M. Speakers gaze at objects while preparing intentionally inaccurate labels for them. J. Exp. Psychology. Learn. Mem. Cogn. 2006, 32, 943–948. [Google Scholar] [CrossRef]
Bolton, P.; Macdonald, H.; Pickles, A.; Rios, P.; Goode, S.; Crowson, M.; Bailey, A.; Rutter, M. A case-control family history study of autism. J. Child Psychol. Psychiatry Allied Discip. 1994, 35, 877–900. [Google Scholar] [CrossRef]
Landa, R.; Folstein, S.E.; Isaacs, C. Spontaneous narrative-discourse performance of parents of autistic individuals. J. Speech Hear. Res. 1991, 34, 1339–1345. [Google Scholar] [CrossRef]
Landa, R.; Piven, J.; Wzorek, M.M.; Gayle, J.O.; Chase, G.A.; Folstein, S.E. Social language use in parents of autistic individuals. Psychol. Med. 1992, 22, 245–254. [Google Scholar] [CrossRef]
Piven, J.; Palmer, P.; Jacobi, D.; Childress, D.; Arndt, S. Broader autism phenotype: Evidence from a family history study of multiple-incidence autism families. Am. J. Psychiatry 1997, 154, 185–190. [Google Scholar] [CrossRef]
Bruner, J. Actual Minds, Possible Worlds; Harvard University Press: Cambridge, MA, USA, 1986. [Google Scholar]
Bruner, J. Acts of Meaning; Harvard University Press: Cambridge, MA, USA, 1990. [Google Scholar]
Gibson, J.J.; Pick, A.D. Perception of another person’s looking behavior. Am. J. Psychol. 1963, 76, 386–394. [Google Scholar]
Goldin-Meadow, S. The role of gesture in communication and thinking. Trends Cogn. Sci. 1999, 3, 419–429. [Google Scholar] [CrossRef]
Crais, E.R.; Watson, L.R.; Baranek, G.T. Use of gesture development in profiling children’s prelinguistic communication skills. Am. J. Speech Lang. Pathol. 2009, 18, 95–108. [Google Scholar] [CrossRef] [PubMed]
Stone, W.L.; Ousley, O.Y.; Yoder, P.J.; Hogan, K.L.; Hepburn, S.L. Nonverbal communication in two-and three-year-old children with autism. J. Autism Dev. Disord. 1997, 27, 677–696. [Google Scholar] [CrossRef] [PubMed]
Hull, J.V.; Dokovna, L.B.; Jacokes, Z.J.; Torgerson, C.M.; Irimia, A.; Van Horn, J.D. Resting-state functional connectivity in autism spectrum disorders: A review. Front. Psychiatry 2017, 7, 205. [Google Scholar] [CrossRef] [PubMed]
Rane, P.; Cochran, D.; Hodge, S.M.; Haselgrove, C.; Kennedy, D.N.; Frazier, J.A. Connectivity in autism: A review of MRI connectivity studies. Harv. Rev. Psychiatry 2015, 23, 223–244. [Google Scholar] [CrossRef]
Parlade, M.V.; Messinger, D.S.; Delgado, C.E.; Kaiser, M.Y.; Van Hecke, A.V.; Mundy, P.C. Anticipatory smiling: Linking early affective communication and social outcome. Infant Behav. Dev. 2009, 32, 33–43. [Google Scholar] [CrossRef]
Rowe, M.L.; Goldin-Meadow, S. Early gesture selectively predicts later language learning. Dev. Sci. 2009, 12, 182–187. [Google Scholar] [CrossRef]
Senzaki, S.; Masuda, T.; Ishii, K. When is perception top-down and when is it not? Culture, narrative, and attention. Cogn. Sci. 2014, 38, 1493–1506. [Google Scholar] [CrossRef]
Denckla, M.B.; Rudel, R.G. Rapid “automatized” naming of pictured objects, colors, letters, and numbers by normal children. Cortex 1974, 10, 186–202. [Google Scholar] [CrossRef] [PubMed]
Levelt, W.J.M.; Roelofs, A.; Meyer, A.S. A theory of lexical access in speech production. Behav. Brain Sci. 1999, 22, 1–38. [Google Scholar] [CrossRef] [PubMed]
Meyer, A.S.; Sleiderink, A.M.; Levelt, W.J. Viewing and naming objects: Eye movements during noun phrase production. Cognition 1998, 66, B25–B33. [Google Scholar] [CrossRef] [PubMed]
Smith, M.; Wheeldon, L. High level processing scope in spoken sentence production. Cognition 1999, 73, 205–246. [Google Scholar] [CrossRef] [PubMed]
Norton, E.S.; Wolf, M. Rapid automatized naming (RAN) and reading fluency: Implications for understanding and treatment of reading disabilities. Annu. Rev. Psychol. 2012, 63, 427–452. [Google Scholar] [CrossRef]
Kane, M.J.; Engle, R.W. Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. J. Exp. Psychol. Gen. 2003, 132, 47–70. [Google Scholar] [CrossRef]
Meyer, A.S.; Wheeldon, L.; van der Meulen, F.; Konopka, A. Effects of speech rate and practice on the allocation of visual attention in multiple object naming. Front. Psychol. 2012, 3, 39. [Google Scholar] [CrossRef]
Wühr, P.; Frings, C. A case for inhibition: Visual attention suppresses the processing of irrelevant objects. J. Exp. Psychol. Gen. 2008, 137, 116–130. [Google Scholar] [CrossRef]
Wühr, P.; Waszak, F. Object-based attentional selection can modulate the Stroop effect. Mem. Cogn. 2003, 31, 983–994. [Google Scholar] [CrossRef]
Engle, R.W. Working memory capacity as executive attention. Curr. Dir. Psychol. Sci. 2002, 11, 19–23. [Google Scholar] [CrossRef]
Di Michele, V.; Mazza, M.; Cerbo, R.; Roncone, R.; Casacchia, M. Deficits in pragmatic conversation as manifestation of genetic liability in autism. Clin. Neuropsychiatry 2007, 4, 144–151. [Google Scholar]
Losh, M.; Childress, D.; Lam, K.; Piven, J. Defining key features of the broad autism phenotype: A comparison across parents of multiple- and single-incidence autism families. Am. J. Med. Genet. Part B Neuropsychiatry Genet. Off. Publ. Int. Soc. Psychiatry Genet. 2008, 147B, 424–433. [Google Scholar] [CrossRef] [PubMed]
Miller, M.; Young, G.S.; Hutman, T.; Johnson, S.; Schwichtenberg, A.J.; Ozonoff, S. Early pragmatic language difficulties in siblings of children with autism: Implications for DSM-5 social communication disorder? J. Child Psychol. Psychiatry 2015, 56, 774–781. [Google Scholar] [CrossRef] [PubMed]
Patel, S.P.; Cole, J.; Lau, J.C.Y.; Fragnito, G.; Losh, M. Verbal entrainment in autism spectrum disorder and first-degree relatives. Sci. Rep. 2022, 12, 11496. [Google Scholar] [CrossRef] [PubMed]
Patel, S.P.; Landau, E.; Martin, G.E.; Rayburn, C.; Elahi, S.; Fragnito, G.; Losh, M. A profile of prosodic speech differences in individuals with autism spectrum disorder and first-degree relatives. J. Commun. Disord. 2023, 102, 106313. [Google Scholar] [CrossRef]
Patel, S.P.; Nayar, K.; Martin, G.E.; Franich, K.; Crawford, S.; Diehl, J.J.; Losh, M. An acoustic characterization of prosodic differences in autism spectrum disorder and first-degree relatives. J. Autism Dev. Disord. 2020, 50, 3032–3045. [Google Scholar] [CrossRef]
Hogan-Brown, A.L.; Hoedemaker, R.S.; Gordon, P.C.; Losh, M. Eye-voice span during rapid automatized naming: Evidence of reduced automaticity in individuals with autism spectrum disorder and their siblings. J. Neurodev. Disord. 2014, 6, 33. [Google Scholar] [CrossRef]
Lord, C.; Rutter, M.; DiLavore, P.; Risi, S.; Gotham, K.; Bishop, S. Autism Diagnostic Observation Schedule—2nd Edition (ADOS-2); Western Psychological Corporation: Los Angeles, CA, USA, 2012. [Google Scholar]
Lord, C.; Rutter, M.; Le Couteur, A. Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 1994, 24, 659–685. [Google Scholar] [CrossRef]
Hus, V.; Gotham, K.; Lord, C. Standardizing ADOS domain scores: Separating severity of social affect and restricted and repetitive behaviors. J. Autism Dev. Disord. 2014, 44, 2400–2412. [Google Scholar] [CrossRef]
Hurley, R.S.; Losh, M.; Parlier, M.; Reznick, J.S.; Piven, J. The Broad Autism Phenotype Questionnaire. J. Autism Dev. Disord. 2007, 37, 1679–1690. [Google Scholar] [CrossRef]
Sasson, N.J.; Nowlin, R.B.; E Pinkham, A. Social cognition, social skill, and the broad autism phenotype. Autism 2012, 17, 655–667. [Google Scholar] [CrossRef]
Wechsler, D. Wechsler Intelligence Scale for Children; Psychological Corporation: San Antonio, TX, USA, 1991. [Google Scholar]
Wechsler, D. Wechsler Adult Intelligence Scale; The Psychological Corporation: San Antonio, TX, USA, 1997. [Google Scholar]
Wechsler, D. Wechsler Abbreviated Scale of Intelligence (WASI); Pearson: London, UK, 1999. [Google Scholar]
Mayer, M. Frog, Where Are You? Penguin: New York, NY, USA, 1969. [Google Scholar]
Relating Events in Narrative: A Crosslinguistic Developmental Study; Berman, R.A., Slobin, D.I., Eds.; Psychology Press: Hove, UK, 2013. [Google Scholar]
Reilly, J. “Frog, where are you?” narratives in children with specific language impairment, early focal brain injury, and Williams syndrome. Brain Lang. 2004, 88, 229–247. [Google Scholar] [CrossRef]
Miles, S.; Chapman, R.S. Narrative content as described by individuals with Down syndrome and typically developing children. J. Speech Lang. Hear. Res. 2002, 45, 175–189. [Google Scholar] [CrossRef]
Nayar, K.; Landau, E.; Martin, G.E.; Stevens, C.J.; Xing, J.; Sophia, P.; Guilfoyle, J.; Gordon, P.C.; Losh, M. Narrative ability in autism and first-degree relatives. J. Autism Dev. Disord. 2024, 55, 3822–3837. [Google Scholar] [CrossRef]
Landau, E.; Nayar, K.; Martin, G.E.; Stevens, C.; Xing, J.; Guilfoyle, J.; Lau, J.C.Y.; Losh, M. Context effects: Discourse structure influences narrative ability in autism and first-degree relatives. Front. Psychiatry 2025, 16, 1588429. [Google Scholar] [CrossRef] [PubMed]
Miller, J.; Iglesias, A. Systematic Analysis of Language Transcripts (SALT), Research version; Salt Software, LLC: Middleton, WI, USA, 2008. [Google Scholar]
Wittenburg, P.; Brugman, H.; Russel, A.; Klassmann, A.; Sloetjes, H. ELAN: A professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 22–28 May 2006; pp. 1556–1559. [Google Scholar]
Boersma, P.; Weenink, D. Praat: Doing Phonetics by Computer, version 6.0.29; University of Amsterdam: Amsterdam, The Netherlands, 2017. [Google Scholar]
Rosenfelder, I. Forced Alignment & Vowel Extraction (FAVE): An Online Suite for Automatic Vowel Analysis; University of Pennsylvania Linguistics Lab: Philadelphia, PA, USA, 2013. [Google Scholar]
Landa, R. (Kennedy Kreiger Institute, Baltimore, MD, USA). Pragmatic rating scale for school-age children. Unpublished manuscript, 2011.
Landa, R. Pragmatic rating scale. In Encyclopedia of Autism Spectrum Disorders; Springer: Berlin/Heidelberg, Germany, 2013; pp. 2327–2331. [Google Scholar]
Klusek, J.; Losh, M.; Martin, G.E. Sex differences and within-family associations in the broad autism phenotype. Autism 2014, 18, 106–116. [Google Scholar] [CrossRef] [PubMed]
Losh, M.; Klusek, J.; Martin, G.E.; Sideris, J.; Parlier, M.; Piven, J. Defining genetically meaningful language and personality traits in relatives of individuals with fragile X syndrome and relatives of individuals with autism. Am. J. Med. Genet. Part B Neuropsychiatry Genet. 2012, 159, 660–668. [Google Scholar]
Coco, M.I.; Dale, R. Cross-recurrence quantification analysis of categorical and continuous time series: An R package. Front. Psychol. 2014, 5, 510. [Google Scholar] [CrossRef]
Dale, R.; Warlaumont, A.S.; Richardson, D.C. Nominal cross recurrence as a generalized lag sequential analysis for behavioral streams. Int. J. Bifurc. Chaos 2011, 21, 1153–1161. [Google Scholar] [CrossRef]
Wallot, S.; Leonardi, G. Analyzing Multivariate Dynamics Using Cross-Recurrence Quantification Analysis (CRQA), Diagonal-Cross-Recurrence Profiles (DCRP), and Multidimensional Recurrence Quantification Analysis (MdRQA)—A Tutorial in R. Front. Psychol. 2018, 9, 2232. [Google Scholar] [CrossRef]
Bernieri, F.; Reznick, J.; Rosenthal, R. Synchrony, pseudosynchrony, and dissynchrony: Measuring the entrainment process in mother-infant interactions. J. Personal. Soc. Psychol. 1988, 54, 243. [Google Scholar] [CrossRef]
Duran, N.D.; Fusaroli, R. Conversing with a devil’s advocate: Interpersonal coordination in deception and disagreement. PLoS ONE 2017, 12, e0178140. [Google Scholar] [CrossRef]
Bates, E. Language and Context: The Acquisition of Pragmatics; Academic Press: New York, NY, USA, 1976. [Google Scholar]
Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
Nayar, K.; Shic, F.; Winston, M.; Losh, M. A constellation of eye-tracking measures reveals social attention differences in ASD and the broad autism phenotype. Mol. Autism 2022, 13, 18. [Google Scholar] [CrossRef]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988. [Google Scholar]
Nayar, K.; Sealock, J.M.; Maltman, N.; Bush, L.; Cook, E.H.; Davis, L.K.; Losh, M. Elevated polygenic burden for autism spectrum disorder is associated with the broad autism phenotype in mothers of individuals with autism spectrum disorder. Biol. Psychiatry 2021, 89, 476–485. [Google Scholar] [CrossRef]
Pellicano, E.; Burr, D. When the world becomes “too real”: A Bayesian explanation of autistic perception. Trends Cogn. Sci. 2012, 16, 504–510. [Google Scholar] [CrossRef]
Laugeson, E.A.; Frankel, F.; Gantman, A.; Dillon, A.R.; Mogil, C. Evidence-based social skills training for adolescents with autism spectrum disorders: The UCLA PEERS program. J. Autism Dev. Disord. 2012, 42, 1025–1036. [Google Scholar] [CrossRef]
Binder, J.R.; Desai, R.H.; Graves, W.W.; Conant, L.L. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cereb. Cortex 2009, 19, 2767–2796. [Google Scholar] [CrossRef]
Redcay, E. The superior temporal sulcus performs a common function for social and speech perception: Implications for the emergence of autism. Neurosci. Biobehav. Rev. 2008, 32, 123–142. [Google Scholar] [CrossRef] [PubMed]
Hickok, G.; Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 2007, 8, 393–402. [Google Scholar] [CrossRef]
Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201–215. [Google Scholar] [CrossRef] [PubMed]
Salmelin, R.; Hari, R.; Lounasmaa, O.V.; Sams, M. Dynamics of brain activation during picture naming. Nature 1994, 368, 463–465. [Google Scholar] [CrossRef]
Kennedy, D.P.; Courchesne, E. The intrinsic functional organization of the brain is altered in autism. J. Neurosci. 2008, 28, 788–796. [Google Scholar] [CrossRef]
Liu, J.; Yao, L.; Zhang, W.; Xiao, Y.; Liu, L.; Gao, X.; Gong, Q. Gray matter abnormalities in pediatric autism spectrum disorder: A meta-analysis with signed differential mapping. Eur. Child Adolesc. Psychiatry 2017, 26, 933–945. [Google Scholar] [CrossRef]
Zilbovicius, M.; Boddaert, N.; Belin, P.; Poline, J.B.; Remy, P.; Mangin, J.F.; Brunelle, F.; Samson, Y.; Régis, J. Temporal lobe dysfunction in childhood autism: A PET study. Am. J. Psychiatry 2006, 163, 1109–1117. [Google Scholar] [CrossRef]
Zhan, L.; Gao, Y.; Huang, L.; Zhang, H.; Huang, G.; Wang, Y.; Sun, J.; Xie, Z.; Li, M.; Jia, X.; et al. Brain functional connectivity alterations of Wernicke’s area in individuals with autism spectrum conditions in multi-frequency bands: A mega-analysis. Heliyon 2024, 10, e26198. [Google Scholar] [CrossRef]
Uddin, L.Q.; Supekar, K.; Menon, V. Reconceptualizing functional brain connectivity in autism from a developmental perspective. Front. Hum. Neurosci. 2013, 7, 458. [Google Scholar] [CrossRef]
Mirenda, P.L.; Donnellan, A.M.; Yoder, D.E. Gaze behavior: A new look at an old problem. J. Autism Dev. Disord. 1983, 13, 397–409. [Google Scholar] [CrossRef] [PubMed]
Turkstra, L.S. Looking while listening and speaking: Eye-to-face gaze in adolescents with and without traumatic brain injury. J. Speech Lang. Hear. Res. JSLHR 2005, 48, 1429–1441. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Gaze-speech Coordination Across Diagnostic Groups. Gaze-speech coordination measures: (A,B) Qlos: temporal coordination, the extent to which gaze was ahead of speech in time; (C,D) RR: content coordination, the total amount of content consistency between gaze and speech; (E,F) RRpeak: maximum content coordination across different time lags. The ASD group had reduced gaze-speech temporal coordination (A) and increased content coordination (C,E) compared to the non-ASD group, with no significant differences found in parent groups (B,D,F). Dashed lines represent the sham groups as the randomized control level. Box plots showed the five-number summary of a set of data, including the minimum, the 25th percentile, the median, the 75th percentile, and the maximum. ^ p < 0.10, * p < 0.05, *** p < 0.001.

Figure 2. Correlations Between Gaze-Speech Coordination and Pragmatic Language Ability. (A) Reduced temporal coordination correlated with greater conversational pragmatic violations in the ASD group; (B) increased content coordination was associated with increased conversational pragmatic violations in parent groups combined. Shaded regions represent 95% confidence intervals.

Figure 3. Mother-child Correlations for Gaze-Speech Temporal Coordination. Reduced temporal coordination in mothers was correlated with reduced temporal coordination in their children with ASD. Shaded regions represent 95% confidence intervals.

Table 1. Demographic Information.

	ASD	Non-ASD	ASD Parents	Parent Controls
	M (SD)	M (SD)	M (SD)	M (SD)
N (M/F) ^a	35 (29/6)	41 (20/21)	90 (32/58)	34 (17/17)
Age (years) ^b	18.79 (7.54)	18.70 (5.23)	45.59 (8.4)	41.39 (10.04)
FSIQ ^a,b	106.51 (13.42)	117.59 (12.93)	109.76 (11.87)	117.03 (11.96)
VIQ ^a,b	107.31 (14.88)	119.05 (12.39)	107.97 (12.09)	114.13 (13.06)
PIQ ^a,b	104.40 (16.36)	113.21 (14.72)	109.34 (11.76)	115.90 (11.97)

^a Significant difference between ASD and non-ASD groups; ^b Significant difference between ASD parent and parent control groups. Note. FSIQ, VIQ, and PIQ refer to full-scale IQ, verbal IQ, and performance IQ derived from the Wechsler Abbreviated Scale of Intelligence, Wechsler Adult Intelligence Scale—Third or Fourth Editions, or the Wechsler Intelligence Scale for Children—Fourth Edition.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xing, J.; Lau, J.C.Y.; Nayar, K.; Landau, E.; Kumareswaran, M.; Grabowecky, M.; Losh, M. Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives. Brain Sci. 2026, 16, 107. https://doi.org/10.3390/brainsci16010107

AMA Style

Xing J, Lau JCY, Nayar K, Landau E, Kumareswaran M, Grabowecky M, Losh M. Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives. Brain Sciences. 2026; 16(1):107. https://doi.org/10.3390/brainsci16010107

Chicago/Turabian Style

Xing, Jiayin, Joseph C. Y. Lau, Kritika Nayar, Emily Landau, Mitra Kumareswaran, Marcia Grabowecky, and Molly Losh. 2026. "Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives" Brain Sciences 16, no. 1: 107. https://doi.org/10.3390/brainsci16010107

APA Style

Xing, J., Lau, J. C. Y., Nayar, K., Landau, E., Kumareswaran, M., Grabowecky, M., & Losh, M. (2026). Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives. Brain Sciences, 16(1), 107. https://doi.org/10.3390/brainsci16010107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gaze-Speech Coordination During Narration in Autism Spectrum Disorder and First-Degree Relatives

Abstract

1. Introduction

1.1. Gaze-Speech Temporal Coordination

1.2. Gaze-Speech Content Coordination

1.3. Gaze-Speech Coordination in First-Degree Relatives

1.4. The Current Study

2. Materials and Methods

2.1. Participants

2.2. Procedures

2.3. Existing Data Processing

2.3.1. Transcription

2.3.2. Alignment

2.3.3. Gaze Processing

2.4. Narrative Quality

2.5. New Data Processing

2.5.1. Narrative Coding

2.5.2. Gaze Coding

2.5.3. Pragmatic Language Ability

2.6. Data Analysis

2.6.1. Gaze-Speech Coordination: Diagonal Cross Recurrence Profiles (DCRP) Analysis

2.6.2. Sham Data Samples

2.6.3. Group Comparisons

2.6.4. Correlations

2.7. GenAI Use

3. Results

3.1. Group Comparisons

3.2. Correlations with ASD Symptom Severity

3.3. Correlations with Narrative Quality

3.4. Correlations with Pragmatic Language Ability

3.5. Mother-Child Correlations

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI