Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders

Nurhasan, Amnaduny Akhara; Kasprowski, Paweł

doi:10.3390/app16052548

Open AccessReview

Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders

by

Amnaduny Akhara Nurhasan

and

Paweł Kasprowski

^*

Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2548; https://doi.org/10.3390/app16052548

Submission received: 24 January 2026 / Revised: 1 March 2026 / Accepted: 4 March 2026 / Published: 6 March 2026

(This article belongs to the Special Issue Eye Tracking Technology and Its Applications)

Download Versions Notes

Abstract

Eye tracking has emerged as a valuable, non-invasive tool for identifying cognitive and motor abnormalities across a wide range of brain-related disorders. Recent studies have explored its utility in neurodevelopmental, neurological, and neurodegenerative conditions. This review synthesizes the findings of studies that apply eye movement analysis including fixation patterns, saccades, scanpaths, and pupil dynamics combined with machine learning (ML) and deep learning (DL) approaches for disease detection and classification. Particular attention is given to the design of eye-tracking tasks, feature extraction strategies, and algorithmic frameworks. Across clinical categories, models such as Support Vector Machines (SVM), random forests (RF), and Convolutional Neural Networks (CNN) have demonstrated promising diagnostic potential, with several studies reporting classification accuracies exceeding 80%, although performance varies depending on the task design, dataset characteristics, and validation methodology. These findings support the potential of eye movement-based biomarkers for early detection and clinical monitoring. Despite encouraging results, current research faces important limitations, including small sample sizes, a lack of standardization, and limited generalizability across populations. To advance clinical translation, future work should emphasize data augmentation, multimodal integration, external validation, and the use of explainable AI (XAI). Overall, eye movement analysis offers a scalable and objective pathway toward improving diagnostic precision in brain-related disorders.

Keywords:

eye tracking; machine learning; deep learning; neurological disorders; neurodevelopmental disorders; neurodegenerative disorders; oculomotor biomarkers

1. Introduction

Eye movement analysis has gained significant attention as a non-invasive and objective tool for assessing cognitive and neurological functions. It provides valuable insights into brain activity and has been widely applied in medical research to diagnose and monitor various neurological, neurodevelopmental, and neurodegenerative disorders [1]. Eye-tracking technology has evolved with advancements in hardware and computational methods, allowing for more precise and automated analysis [2,3].

The use of eye movement analysis has been particularly effective in neurodegenerative disorders such as Parkinson’s disease (PD) [4,5] and Alzheimer’s disease (AD) [6]. These diseases exhibit different abnormalities in eye movement, including saccadic impairments and deficits in smooth pursuit, which can serve as potential biomarkers for early detection [7,8]. Furthermore, rapid eye movement (REM) sleep behavior disorder, a precursor to neurodegenerative diseases, has been assessed using machine learning (ML) for early diagnosis [9,10].

In the field of neurodevelopmental disorders, eye movement analysis has been instrumental in detecting autism spectrum disorder (ASD) [11,12]. Machine learning techniques have been employed to classify individuals based on facial processing abnormalities and gaze patterns, contributing to the early diagnosis of ASD [13,14]. Similarly, attention-deficit/hyperactivity disorder (ADHD) has been studied using eye-tracking measures to identify patterns in attention and gaze behavior [15,16].

Neurological disorders such as mild traumatic brain injury (mTBI) and stroke have also been analyzed using eye-tracking technology. Machine learning models have demonstrated the ability to classify the severity of traumatic brain injury (TBI) based on saccadic eye movement [17,18]. In stroke patients, eye movement features have been explored for early detection and rehabilitation assessments [19,20]. Additionally, cerebral palsy (CP) has been investigated using ML techniques applied to eye images [21], and cerebellar ataxia has been assessed using eye-tracking data [22].

In addition to these conditions, eye movement analysis has shown potential in diagnosing locked in syndrome [23] and obsessive–compulsive disorder (OCD) [24]. Moreover, dyslexia has been identified using gaze self-similarity plots (GSSP) [25].

The integration of ML and deep learning (DL) methods has significantly improved the accuracy of eye movement-based disease classification. Various algorithms, including SVM, Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), have been utilized for disease prediction and classification [11,26]. Decision Tree (DT) and Random Forest (RF) models have also been applied in the diagnosis of neurodegenerative and cognitive disorders such as vertigo and dyslexia [8,27]. Furthermore, eye-tracking assistive technologies have been developed for individuals with amyotrophic lateral sclerosis (ALS) to enhance communication and improve their quality of life [28,29].

Despite significant advancements, several challenges remain in the application of eye movement analysis in medical diagnosis. Issues such as variability in data collection, the lack of standardized protocols, and the need for large-scale datasets hinder the widespread adoption of this technology [3]. This review aims to provide a comprehensive overview of recent advancements in eye-tracking applications for neurological, neurodevelopmental, and neurodegenerative disorders, with a particular focus on ML and DL methodologies. By systematically analyzing existing approaches, this review explores the use of eye tracking across various neurological, neurodegenerative, and neurodevelopmental disorders. The review is organized into three principal sections: disorder-specific applications, a comparative analysis of features, models, and task designs, and a discussion of the key limitations and future directions in the field.

This review was conducted using a structured literature search to identify relevant studies on the application of eye movement analysis in neurodevelopmental, neurological, and neurodegenerative disorders. Publications were primarily identified using Google Scholar, which provides a broad coverage of biomedical, neuroscience, and engineering literature. The search focused mainly on studies published between 2015 and 2025, with emphasis on recent developments in eye-tracking technologies and machine learning applications, while earlier foundational studies were included when relevant. Combinations of the following keywords were used in the search: eye tracking, eye movement, machine learning, deep learning, neurological disorders, neurodevelopmental disorders, and neurodegenerative disorders as well as specific disorder names relevant to neurodevelopmental, neurological, and neurodegenerative conditions (e.g., autism spectrum disorder, Parkinson’s disease, Alzheimer’s disease, stroke, dyslexia, and amyotrophic lateral sclerosis). Studies were included if they reported a quantitative analysis of eye movement data for the diagnosis, classification, monitoring, or assessment of brain-related disorders, particularly those employing machine learning or computational approaches. Peer-reviewed journal articles and conference papers were considered. Studies not directly related to medical applications or lacking a quantitative analysis were excluded. The selected articles were analyzed qualitatively to identify common methodologies, feature extraction strategies, machine learning techniques, and clinical applications.

2. Application in Different Disorders

2.1. Neurodegenerative Disorders

Neurodegenerative disorders, including PD, AD, mild cognitive impairment (MCI), ALS, and rapid eye movement sleep behavior disorder (RBD), are characterized by progressive damage to the nervous system, affecting motor, cognitive, and behavioral functions. These conditions are often challenging to diagnose early due to overlapping clinical features and gradual symptom onset. Eye movement abnormalities have emerged as promising, non-invasive biomarkers in this context, offering insights into both motor and cognitive deficits associated with neurodegeneration [2]. With advances in eye-tracking technology and computational analysis, particularly ML, research has increasingly focused on using oculomotor metrics to support early detection, subtype differentiation, and disease monitoring across a range of neurodegenerative conditions [6,9].

2.1.1. Parkinson’s Disease

Eye-tracking technologies and automated analyses have significantly advanced PD diagnostics and monitoring. Oculomotor parameters such as saccades, fixation stability, and smooth pursuit movements, captured via electronic tablets, have been analyzed using ML algorithms to assess cognitive function and disease severity. These studies typically employ supervised learning techniques, such as SVM, to classify patients with mild versus moderate PD severity groups based on Unified Parkinson’s Disease Rating Scale, Part III (UPDRS-III) scores with 90% of accuracy [4]. In addition to classification performance, several studies have demonstrated significant correlations between specific eye movement features, including saccade latency, fixation stability, and error rates, and clinical cognitive assessment scores. These findings suggest that eye movement abnormalities reflect underlying neural dysfunction associated with both motor and cognitive impairment in PD [5,30].

2.1.2. Alzheimer’s Disease

Eye-tracking methodologies are increasingly employed for the early detection of cognitive decline in AD. A novel 3 min eye-tracking test assessed gaze patterns and fixation durations across ten cognitive tasks, with features derived from the percentage of fixation on regions of interest (ROIs) [6]. Four subscale scores were computed: delayed recall, working memory, judgement, and visuospatial function. These features were used to train a supervised ML model specifically, a SVM with 6-fold cross-validation implemented using the scikit-learn library. The model demonstrated robust classification performance, with the following AUC: AD vs. NC = 0.90 ± 0.08, MCI vs. NC = 0.75 ± 0.11, AD vs. MCI = 0.78 ± 0.13, and (MCI + AD) vs. NC = 0.83 ± 0.10. The test’s total score also showed a significant correlation with Minimum Mean Square Error (MMSE) scores (r = 0.57; p < 0.05), while memory recall and deductive reasoning tasks showed the highest discriminative power [6]).

The study affirmed the test’s sensitivity in distinguishing stages of cognitive decline, with the added advantage of being brief and language independent. However, some tasks with lower discriminative value could be streamlined, suggesting opportunities to further optimize this tool for clinical screening and early intervention in AD and MCI populations [31,32].

2.1.3. Amyotrophic Lateral Sclerosis

Applications integrating ML with biomedical signal analysis and eye-tracking technologies have demonstrated significant effectiveness in the diagnosis and monitoring of ALS. In a systematic review of 18 studies [29], supervised ML models including SVM, Artificial Neural Networks (ANN), RF, K-Nearest Neighbors (KNN), and Deep Learning Networks (DLN) were applied to various biomedical signals such as electromyography (EMG), gait rhythm (GR), electroencephalography (EEG), and magnetic resonance imaging (MRI). Feature extraction methods included Discrete Wavelet Transform (DWT), Stockwell Transform (ST), Multiscale Principal Component Analysis (MSPCA), and statistical measures like kurtosis, entropy, and signal amplitude histograms. The most accurate diagnostic result was achieved by a KNN model using EMG features, reaching 98.8% accuracy. Communication-focused systems using EEG signals and BCI interfaces achieved up to 95.25% accuracy with KNN and LDA classifiers. For survival prediction, a DLN combining MRI and clinical data attained 84.4% accuracy. These results confirm the strong potential of ML-based biomedical signal processing in supporting ALS diagnosis, monitoring, and assistive communication.

Eye-tracking assistive technologies integrated with ML have significantly enhanced communication capabilities for individuals with ALS. These systems analyze ocular features such as gaze fixation points, saccades, and dwell times to enable interaction with digital interfaces. Notably, SVM combined with electrooculography (EOG) signals and gaze data achieved an accuracy of 76.1% across the fixation, saccade, and blink categories. Additionally, a CNN-based DL model for EOG word decoding reached a mean accuracy of 90.58%, enabling voiceless communication through continuous word construction from eye movements. These findings demonstrate the viability of ML enhanced eye-tracking systems in providing reliable, non-verbal communication methods, thereby significantly improving quality of life for ALS patients in advanced stages of motor impairment [28]. These findings confirm the value of combining physiological signals and eye tracking in ALS care, though practical deployment requires addressing the comfort, calibration, and accessibility of assistive interfaces for users with severe motor limitations.

2.1.4. Mild Cognitive Impairment

Eye-tracking technologies have demonstrated high sensitivity in identifying subtle cognitive changes characteristic of MCI. In a study involving 52 MCI patients, supervised ML, specifically the SVM, was applied to gaze features such as percentage of fixation duration on ROIs, collected across ten cognitive tasks [6]. The model achieved an Area Under the Curve (AUC) of 0.75 ± 0.11 for distinguishing MCI from normal controls. The most discriminative tasks included memory recall and deductive reasoning, which also contributed to subscale scores in delayed recall, working memory, judgement, and visuospatial function. The eye-tracking total scores significantly declined in MCI subjects (41.4 ± 17.9) compared to the controls (61.1 ± 19.0; p < 0.01) and correlated with MMSE scores (r = 0.57; p < 0.05). These results highlight the diagnostic utility of eye tracking-based cognitive assessment for early MCI detection, enabling earlier intervention strategies to delay progression into AD. The correlation with MMSE scores (r = 0.57) confirms convergent validity. While the 3 min test offers a rapid, language-independent assessment, not all tasks contributed equally to classification performance. Tasks like memory recall and deductive reasoning were most effective, whereas simpler tasks such as static images or landscape viewing showed minimal discriminative power.

2.1.5. Rapid Eye Movement Sleep Behavior Disorders

Machine learning algorithms have shown strong potential in diagnosing RBD, particularly as a prodromal marker of Parkinsonian neurodegeneration. In [9], supervised classifiers including SVM and KNN were applied to spectral and polysomnographic features derived from REM sleep EMG recordings to detect REM Sleep Without Atonia (RSWA), a key diagnostic criterion for RBD. The K-NN classifier achieved an accuracy of 86.96%, sensitivity of 93.33%, and specificity of 75% during 5-fold cross-validation. On a held-out test set, it maintained 81.82% accuracy and 85.71% sensitivity. Moreover, a continuous Dissociation Index (DI) was developed using Euclidean distances from healthy reference profiles, enabling a nuanced, graded assessment of REM sleep disruption severity.

Another study employed a fully automated acoustic analysis of connected speech in 50 idiopathic RBD subjects. A quadratic discriminant classifier trained on features such as RST, DPI, DVI, and PIR successfully distinguished motor-positive from motor-negative RBD individuals based on UPDRS III scores with an accuracy of 70.0%, sensitivity of 73.9%, and specificity of 66.7% [10].

While EMG-based classifiers offer higher sensitivity in detecting RSWA, speech-based approaches provide a practical remote solution for identifying prodromal motor symptoms. Together, these findings emphasize the benefit of integrating physiological and acoustic biomarkers to improve early detection and monitoring in RBD populations at a high risk of Parkinsonian progression.

2.2. Neurodevelopmental Disorders

Neurodevelopmental disorders are a group of conditions that emerge during early brain development and affect emotional regulation, learning ability, self-control, and memory. Among the most prevalent are ADHD, ASD, schizophrenia (Sz), fetal alcohol spectrum disorder (FASD), dyslexia, and CP. ADHD, one of the most common childhood neurodevelopmental disorders, impairs attention regulation and impulse control and is associated with long-term functional impacts [15]. ASD and Sz, though differing in terms of cognitive mechanisms, both exhibit profound disruptions in social cognition and emotion recognition, particularly evident in gaze and eye movement patterns during facial expression tasks [16]. Children with FASD and ADHD share overlapping attentional deficits, yet differ in their oculomotor control and visual sensory processing, posing diagnostic challenges [33]. Dyslexia, often undetected in adulthood due to compensatory strategies, manifests primarily through impaired reading abilities and abnormal eye movement behaviors while reading [25]. In the case of CP, a non-progressive neurological disorder caused by early brain injury, visual dysfunctions such as strabismus and oculomotor abnormalities are commonly observed, and eye tracking and image processing techniques are being increasingly applied for rehabilitation monitoring [21]. The diversity and overlap of symptoms across neurodevelomental disorders underline the importance of objective, technology-driven diagnostic tools for accurate early detection and individualized intervention.

2.2.1. Attention-Deficit/Hyperactivity Disorder

In [15], researchers investigated whether integrating eye-tracking metrics with CPTs (Continuous Performance Tests) could enhance diagnostic accuracy for ADHD in children aged 6–10 years. The features analyzed included traditional CPT indicators such as omission errors (OE), commission errors (CE), mean reaction time (RT mean), and variability in reaction time (RT SD) as well as eye-tracking features, including the fixation ratio (FR), mean fixation time (FT), central gaze ratio (CR), and standard deviation of gaze coordinates (gaze SD). Using logistic regression (LR), the study found that eye-tracking features alone yielded higher diagnostic performance (AUC = 0.856, sensitivity = 0.733, and specificity = 0.861) compared to CPT alone (AUC = 0.769, sensitivity = 0.533, and specificity = 0.931). Notably, when combining both modalities, performance improved further (AUC = 0.889, sensitivity = 0.833, and specificity = 0.862), indicating that multimodal approaches could significantly improve the identification of ADHD. Moreover, follow-up testing showed that both CPT and eye-tracking indicators significantly improved after the administration of ADHD medication, particularly stimulant drugs like methylphenidate, underscoring the potential of eye tracking not only for diagnosis, but also for treatment monitoring.

Another study [33] explored a high-throughput, low-cost classification method based on eye movement patterns recorded during naturalistic video viewing. A total of 224 features were extracted from eye movement data and categorized into three groups: oculomotor-based (e.g., saccade duration, amplitude, and velocity), saliency-based (e.g., the correlation between gaze and computational saliency maps representing bottom-up visual attention), and group-based features (e.g., the similarity of gaze patterns to those of typically developing (TD) controls, reflecting top-down control). An ML pipeline, specifically Support Vector Machine Recursive Feature Elimination (SVM-RFE), was used to identify the most discriminative features for classifying ADHD, FASD, and control children. The accuracy reached 77.3% (chance = 40.4%), with pairwise classification results of 83.3% for ADHD vs. the control, 79.2% for FASD vs. the control, and 90.4% for ADHD vs. FASD. Interestingly, saliency-based features proved most effective in differentiating ADHD from the controls, indicating stronger stimulus-driven attention in ADHD. Texture processing was identified as a particularly strong discriminative feature for ADHD, suggesting heightened sensitivity to visual textures. In contrast, children with FASD exhibited impairments in both top-down and bottom-up attention, offering a distinct attentional profile compared to ADHD.

2.2.2. Autism Spectrum Disorder

The integration of ML and DL techniques with eye-tracking technology data has increasingly enabled the identification of ASD biomarkers through gaze behavior [11]. Across multiple studies, various eye movement features have been utilized as diagnostic indicators. For instance, fixation duration and frequency are often abnormal in individuals with ASD, indicating differences in attentional focus [12]. Saccadic amplitude and speed have also been found to differ, reflecting atypical visual scanning strategies [13]. Moreover, the trajectory of gaze or scanpaths have been used to visually encode gaze dynamics, allowing for classification through image-based models [14]. Lastly, attention to specific facial features or areas of interest (AOIs) such as the eyes, nose, and mouth during tasks involving facial recognition or emotional expression analysis is a well-documented discriminator between ASD and TD individuals [16].

A study conducted in [34], combined EEG and eye-tracking data, applying SVM classifiers to distinguish children with ASD (ages 3–6) from TD peers. By selecting 32 features using the Minimum Redundancy Maximum Relevance (MRMR) method, they achieved an accuracy of 85.44% and an AUC of 0.93, highlighting the advantage of multimodal data fusion for diagnostic enhancement. In [14], an innovative method that transforms eye-tracking scanpaths into visual representations was proposed and used to classify ASD. Despite the simplicity of the model and a relatively limited dataset, their approach achieved a promising AUC greater than 0.9, demonstrating that visual scanpath representations can effectively support ASD diagnosis through ANN-based classification.

Liu et al. [12] focused exclusively on eye-tracking data. They used SVMs to classify children with ASD based on facial scanning patterns, particularly analyzing eye movements during a facial recognition task. Their ML framework achieved an accuracy of 88.51%, reinforcing the notion that gaze patterns on facial stimuli are strong indicators of ASD. Similarly Iwauchi et al. [16] investigated gaze behavior during the Facial Emotion Identification Test (FEIT), employing CNNs and RF classifiers to identify ASD in both children and adults. For child participants, their model achieved an accuracy of 66.7%, a 16.7% improvement over models that did not consider facial expression types.

In adults, Yaneva et al. [13] recorded eye movements during information retrieval tasks on webpages, achieving an accuracy of approximately 74% in detecting high-functioning ASD using various ML classifiers, including SVM, LR, RF and naïve Bayes. The study emphasized the relevance of gaze-based features in naturalistic settings and suggested their potential as unobtrusive screening tools, particularly for adults who might otherwise remain undiagnosed.

A recent review [11] comprehensively analyzed ML and DL models applied to ASD detection using eye tracking. The reported accuracies differed based on the choice of model and dataset, with algorithms such as SVM, KNN, RF, CNNs, and hybrid ensemble methods consistently demonstrating strong performance across various age groups. For instance, SVM achieved a 92.31% accuracy in one study, while a DE-tuned SVM reached 100% accuracy in another, underscoring the potential of optimization techniques in improving ML models’ performance.

Across studies, ML classifiers applied to eye-tracking data have demonstrated strong diagnostic performance for ASD, with classification accuracies ranging from 66.7% to 100%, depending on the age group, task type, and model complexity [11,12,13,14,34]. While models such as SVMs and CNNs show high sensitivity in both children and adults, multimodal approaches combining EEG and eye tracking often yield superior AUC values (e.g., 0.93) [34]. However, performance tends to vary widely due to factors like small sample sizes, heterogeneous age groups, a lack of task standardization, and differing eye-tracking protocols. The current literature reveals a pressing need for harmonized datasets, age-specific benchmarks, and real-world validation to ensure that ML-based ASD diagnostics generalize across populations and settings [35].

2.2.3. Schizophrenia

Huang et al. [36] proposed a robust computer-aided method for Sz recognition by extracting both hand-crafted discriminative eye movement features (e.g., average saccade velocity, fixation skewness, outside fixation count, and pupil size dynamics) and model metric-based features derived from DL saliency models. These features were collected from a free-viewing task involving 100 images viewed by 40 patients and 30 healthy controls, using a high-frequency eye tracker. They trained two classifiers, SVM and RF, and achieved high discriminative performance. Specifically, statistically significant differences (p < 0.001) were observed in the average saccadic amplitude (5.45° in Sz vs. 6.55° in controls), total saccade amplitude (63.72° vs. 93.78°), and average saccade velocity (79.44 vs. 123.89). Model metric features derived from saliency models (e.g., CC_SAM_VGG) also revealed substantial differences (0.37 in Sz vs. 0.48 in controls), demonstrating their potential as visual biomarkers.

In the research by [16], eye movement patterns during the FEIT, to distinguish Sz from ASD and TD individuals, were also explored. Their features included fixation count, saccade count, and scanpath length, with a CNN-based heatmap classification approach, yielding an accuracy of 64.5% for Sz in adults. Importantly, they found that incorporating facial expression specificity improved accuracy by 10% compared to unweighted models. The study also noted significantly fewer fixations at the eyes and longer scanpath lengths in Sz participants compared to controls, consistent with social cognition impairments in Sz.

2.2.4. Fetal Alcohol Spectrum Disorder

In a study [33], a high-throughput eye-tracking approach was developed to detect subtle attentional impairments in children with FASD during passive video viewing. From these recordings, the researchers extracted 224 features categorized into three types: oculomotor-based features (e.g., saccade amplitude, duration, and velocity), saliency-based features (correlations between gaze and computational models of visual salience, reflecting bottom-up attention), and group-based features (similarity of gaze behavior to a normative control group, reflecting top-down attention control). Feature selection and classification were performed using Multiple Support Vector Machine Recursive Feature Elimination (MSVM-RFE). Their model achieved an accuracy of 79.2% when distinguishing children with FASD from TD controls (chance level: 58.1%), and an impressive 90.4% accuracy in separating FASD from ADHD, a condition often misdiagnosed due to overlapping behavioral symptoms.

Saliency-based features were found to be particularly effective in identifying FASD-related visual processing abnormalities, achieving 77.6% accuracy (p < 0.01), suggesting weakened stimulus-driven (bottom-up) attention in children with FASD. Group-based features also contributed significantly (accuracy: 69.8%; p < 0.01), highlighting impaired volitional (top-down) attentional control. Despite these promising outcomes, limitations remain due to modest sample sizes (13 FASD participants), potential feature redundancy, and a lack of standardized video stimuli. Future work should aim to validate findings in larger, diverse cohorts and explore task optimization to reduce test duration while preserving diagnostic accuracy.

2.2.5. Dyslexia

Eye-tracking features such as fixation duration, saccade amplitude, regression count, and reading time have been widely studied to distinguish dyslexic readers from typical readers. Rello et al. [37] developed a model trained on 1135 reading samples from Spanish speakers aged 11–54 using 12 text types, and found that features like mean fixation duration and total reading time were highly predictive. Their SVM model achieved an accuracy of 80.18% in a 10-fold cross-validation, demonstrating the feasibility of automatic dyslexia detection using eye-tracking features. Similarly, El Hmimdi et al. [27] analyzed both reading and non-reading eye movement tasks, specifically saccade and vergence tests, using LED targets and reported that the Alouette reading test, a non-semantic text requiring decoding, yielded an 81.25% accuracy, the same as the saccade task, while the vergence task achieved a 77.3% accuracy, all using multiple ML classifiers. Their findings support that dyslexia-related oculomotor abnormalities exist beyond reading and can be detected in controlled tasks.

In another study, Kasprowski et al. [25] introduced a deep learning-based approach utilizing GSSP generated from raw eye movement data, bypassing traditional fixation–saccade segmentation. The plots were input to a CNN, which achieved 89% accuracy and AUC of 0.93, representing state-of-the-art performance for dyslexia detection in adults using the CopCo dataset. This method also eliminated the need for manual feature extraction, streamlining the detection pipeline. The research presented in [26] applied a deep CNN architecture to a large dataset of 4243 eye movement recordings from real clinical settings, including saccade, vergence, and reading tasks. Their model achieved precision of 80.20% and sensitivity of 75.1% on the vergence data, and 77.2% precision and 77.5% sensitivity on the saccade data, validating the robustness of DL for dyslexia detection under real-world conditions. These results highlight that combining high-resolution eye-tracking with DL enables scalable, non-invasive, and language-agnostic approaches to diagnosing dyslexia, even outside of traditional reading contexts [38].

These methods highlight both reading and non-reading eye movement abnormalities in dyslexic individuals. Despite promising results, the limitations include the small sample sizes in some studies, variability in text difficulty, and task heterogeneity. There is a clear need for standardized data acquisition protocols and broader population validation to ensure generalizability in real-world dyslexia screening.

2.2.6. Cerebral Palsy

To improve the non-invasive rehabilitation assessment of CP, a study [21] proposed a computational methodology that uses eye images captured via a camera to quantify visual improvement in children with CP. The study focused on 40 children aged 3 to 11, whose eye images were collected periodically over three sessions (initial, 6th, and 12th month). From these images, the authors extracted 39 oculomotor-related features, including iris center coordinates

(X_{0}, Y_{0})

, eye corner locations

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

, angular deviations

(θ_{1}, θ_{2}, θ_{3})

, and distances between anatomical eye points

(D_{1}

–

D_{4})

, which serve as biomarkers of motor alignment and improvement in gaze control.

Three ML classifiers, SVM, RF, and Neural Network (NN), were applied to classify and assess changes in eye alignment patterns across time. Among them, the NN yielded the best performance with an accuracy of 94.17%, sensitivity of 91.65%, and specificity of 98.00%. In comparison, the SVM achieved 86.67% accuracy, and the RF reached 73.33%. This study demonstrates that tracking changes in eye position features over time can reliably reflect therapeutic outcomes, providing a low-cost, efficient alternative to more invasive and expensive systems like Visually Evoked Potential (VEP) devices.

These proposed methods confirm the feasibility of non-invasive, image-based tracking of visual improvement over time. However, the study’s relatively small cohort (40 children) and reliance on manually captured images in non-standardized lighting conditions pose limitations to scalability. Future work should explore larger, multicenter validations and real-time data acquisition to enhance clinical applicability and generalizability.

2.3. Neurological Disorders

Neurological disorders such as brain injuries, vertigo, locked-in syndrome (LIS), mTBI, post-traumatic syndrome (PTS), left visuo-spatial neglect, stroke, and OCD represent a diverse group of conditions that disrupt normal brain and nervous system function, often leading to impairments in cognition, movement, perception, and behavior. These disorders can emerge from trauma, vascular incidents, neurodevelopmental anomalies, or degenerative processes, and frequently affect visual and oculomotor control systems. For instance, vertigo is commonly associated with disruptions in vestibular–ocular pathways, while visuo-spatial neglect manifests as an attentional deficit toward one side of visual space, typically after a right-hemisphere stroke [20]. Locked-in syndrome presents a unique challenge due to preserved consciousness but complete motor paralysis, leaving eye movement as the primary means of communication [23]. Conditions such as OCD are often linked to dysfunction in the neural circuits governing motor control and executive function, manifesting through involuntary tics or compulsive behaviors [24]. Moreover, brain injuries, including mTBI and its long-term sequelae like PTS, can impair saccadic coordination and attention regulation, which are essential for day-to-day functioning [17,18]. Stroke, a leading cause of adult disability, frequently results in impairments in gaze stability and visual tracking due to focal damage in brain regions involved in eye movement control [19].

2.3.1. Vertigo

Mao et al. [8] conducted a study that employed eye movement analysis to differentiate vertigo patients from those with brain injuries and healthy controls using ML. The experimental design included optokinetic and smooth pursuit tests in which participants tracked a moving red light, allowing researchers to extract six original eye movement features: the abscissa (x), ordinate (y), and area (r) of the pupil across 250 frames. These temporal features were first processed using Long Short-Term Memory (LSTM) networks to generate evolutionary features, which were then input into Decision Tree (DT) constructed using the C4.5 algorithm. The final classification was achieved through a Random Forest (RF) ensemble. The RF model demonstrated high robustness, achieving 96.88% accuracy in four out of six test cases and 93.75% in the other two, significantly outperforming standalone LSTM and DT classifiers.

The model’s success highlights the strength of combining temporal eye movement data with ensemble learning. However, the study was limited by a relatively small and fixed sample size (n = 96), and required strict testing conditions, including head stabilization and active participation, which may limit clinical scalability. Future research should aim for broader validation with larger and more diverse datasets, while exploring less restrictive data acquisition protocols to enhance real-world applicability.

2.3.2. Locked-In Syndrome

In the research by [39], the authors developed a robust, camera-based communication interface using 54 Euclidean distance features calculated from 30 eye landmarks extracted via the Face Mesh library. These features fed into an ANN designed to recognize four ocular gestures: looking up, looking down, eye closure, and no gesture. The ANN achieved an impressive accuracy of 99.8% and enabled users to navigate and select options from a five-item menu using only eye movements. This system eliminated the need for calibration and performed flawlessly even in a validation test with five new users, enabling accurate and intuitive communication for individuals with LIS.

Complementing this, ref. [23] proposed an objective cognitive assessment platform based on eye tracking for patients with LIS or disorders of consciousness. Their system integrated standard neuropsychological instruments like adapted Montreal Cognitive Assessment (MoCA) tests into an eye-tracking interface, allowing cognitive functions such as attention, memory, and orientation to be assessed in non-communicative patients. In a pilot study involving 56 patients, test results were consistent with diagnostic categories, with certain cognitive domains showing meaningful correlations with behavioral responsiveness (e.g., r = 0.53 in MCS+ patients), suggesting the system’s clinical relevance in both diagnosis and neurorehabilitation. Even so, while the results were promising, limitations included the small validation cohorts, the need for real-user trials in LIS populations, and variation in setup environments. Broader clinical testing and the standardization of assessment protocols are essential to ensure robustness and usability in real-world care settings.

2.3.3. Mild Traumatic Brain Injury

In the study by [17], three saccade tasks were utilized, such as Visually Guided (Step), Anti-saccade, and Go/No-Go (GNG), administered via a portable saccadometer to assess 34 mTBI patients, 27 individuals with PTS, and 31 healthy controls. A total of 11 raw saccadic measurements per trial were collected, including latency, duration, amplitude, peak velocity, acceleration, and deceleration metrics. These features were further expanded through statistical analysis and binning, resulting in 3450 engineered features. A feature selection pipeline identified 116 optimal features, which were used to train an RF ensemble classifier, achieving 87.8% accuracy in classifying mTBI vs. PTS vs. the control and 91.1% accuracy for distinguishing TBI (mTBI + PTS) from healthy individuals.

A complementary study Cade et al. [18] evaluated a computerized eye tracker assessment (CEA) battery comprising six tasks: egocentric localization, fixation stability, smooth pursuit, pro/anti-saccades, the Stroop test, and vestibulo-ocular reflex (VOR). Eye movement metrics such as vertical gaze error, pursuit gain, and number of catch-up saccades were extracted from 55 control, 20 mTBI, and 40 persistent post-concussion syndrome (PPCS) participants. Using an Extreme Gradient Boosting with Dropout (xgbDART) model, they achieved a balanced accuracy of 0.83 (control), 0.66 (mTBI), and 0.76 (PPCS), with an overall AUC of 0.82 across the three groups. The key discriminative features included vertical gaze error during fixation, pursuit error and gain, and gaze stability in the VOR test, emphasizing the multi-dimensional nature of visual dysfunction in mTBI and PPCS.

These results underscore that ML models can effectively parse subtle oculomotor deviations post injury, although distinctions between mTBI and chronic PPCS remain challenging due to overlapping symptomatology. The limitations include the small sample sizes, variable test reliability across tasks, and the need for larger, standardized datasets to refine the model’s generalizability.

2.3.4. Stroke

A recent study [19] investigated the use of eye-tracking technology combined with traditional Chinese medicine (TCM)-based five-color visual stimuli (red, green, yellow, white, and black) to classify stroke patients using ML. A total of 16 stroke patients and 24 healthy controls were assessed using a Tobii 4C eye tracker while passively viewing the five colored images. From this task, three significant eye-movement features were extracted: the mean fixation duration, mean vectorial saccade velocity, and mean vectorial saccade amplitude. These were selected based on statistical differences between groups (p < 0.05) and served as input for several supervised learning algorithms, including RF, CatBoost, XGBoost, KNN, DT, and Gradient Boosting Classifier (GBC).

The models demonstrated robust performance, particularly under the red and green colors of stimuli. Notably, the RF model under the red stimulus achieved the highest accuracy of 88.45%, with a sensitivity of 84.65%, precision of 86.48% and F1 score of 85.47%. Other high-performing configurations included CatBoost and XGBoost under the red and green stimuli, achieving accuracies consistently above 85%. This research supports the feasibility of using a portable eye-tracking system and simple visual stimuli for non-invasive, real-time stroke screening, offering a cost-effective alternative to imaging-based diagnostics in daily life scenarios. The study also underscores the utility of combining traditional diagnostic frameworks with modern computational techniques to improve early stroke detection.

2.3.5. Obsessive–Compulsive Disorder

Obsessive–compulsive disorder is characterized by persistent intrusive thoughts and repetitive behaviors and is often accompanied by deficits in executive function and visuospatial memory. A recent study [24] developed a deep learning-based eye-tracking system to assess these cognitive impairments during a modified Rey–Osterrieth Complex Figure Test (RCFT). The study involved 104 patients with OCD, along with psychosis patients and healthy controls, who were instructed to memorize a complex figure for three minutes while their eye movements were recorded. The system captured the fixation point coordinates, number of saccades, and fixation durations as indicators of visual attention and cognitive organization. These features were processed using an LSTM network with an attention mechanism which modeled the temporal sequence of fixations during visual encoding. The model achieved an F1 score of 83.5% for distinguish between normal and impaired executive function and 80.7% for classifying visuospatial memory deficits, with the respective AUCs of 60.7% and 69.9% regardless of psychiatric diagnosis. The study showed that OCD patients with impaired executive function exhibited significantly fewer fixations and saccades (p = 0.002), suggesting less effective encoding strategies. Unlike traditional RCFT scoring, this model offered a fast, objective, and direct assessment, highlighting the utility of gaze-based DL methods in evaluating executive dysfunction in OCD.

3. Comparative Analysis

3.1. Feature Usage Across Disorders

The diagnostic utility of eye tracking across neurodevelopmental, neurodegenerative, and neurological disorders depends on the types of features extracted from gaze data. Different disorders exhibit distinct gaze signatures, making certain feature categories more relevant than others. This is shown in Table 1.

While Table 1 summarizes the types of gaze features used across disorders, the diagnostic relevance of these features depends strongly on the experimental task paradigm and the underlying oculomotor and cognitive functions being assessed. Different eye-tracking tasks probe distinct neural and functional domains, including attention, executive control, and sensorimotor coordination. For example, anti-saccade tasks assess inhibitory control and executive function, which are often impaired in Parkinson’s disease, ADHD, and schizophrenia, whereas smooth pursuit tasks evaluate sensorimotor integration, commonly affected in neurodegenerative and neurological disorders. Reading and free viewing paradigms are particularly informative for disorders involving cognitive processing and attentional regulation, such as dyslexia, ASD, and Alzheimer’s disease. Table 2 provides a taxonomy linking common eye-tracking task paradigms to their associated oculomotor constructs and relevant disorders.

3.1.1. Fixation-Based Features

As shown in Table 1, fixation features are widely used across nearly all disorders. Fixation duration, count, and distribution on AOIs are widely used in both clinical and research settings. In ASD, atypical fixations, especially reduced attention to the eye region when observing faces, have been used for robust classification [11,12,14]. In dyslexia, prolonged fixations on text signal difficulty in decoding [37]. AD and MCI studies show decreased fixation performance on cognitively demanding AOIs, correlating with cognitive decline [6]. In OCD, a reduced fixation count and saccadic behavior during figure-copying tasks have been linked to impaired executive function [24,40,41].

3.1.2. Saccadic Parameters

Saccadic features such as amplitude, latency, peak velocity, and directional error are particularly informative in disorders affecting motor control or neural integration. As indicated in Table 1, these features are used extensively across conditions such as PD, mTBI, stroke, and ALS. In PD, saccade metrics decline with disease severity and correlate with UPDRS-III motor scores [4,5]. mTBI and PPCS assessments rely heavily on visually guided and anti-saccade performance to detect subtle post-concussive impairments [17,18]. In stroke, abnormalities in saccade amplitude and velocity under visual stimuli have been utilized for classification using color-based tasks [19]. Similarly, in ALS, saccadic gestures serve as reliable indicators for communication models and neurophysiological monitoring [28,40,41].

3.1.3. Scanpath and Trajectory Features

Scanpath length, complexity, and directionality provide insights into how individuals visually navigate tasks, particularly in disorders like ASD, Sz, and dyslexia, where these features are frequently employed (see Table 1). In ASD and Sz, disrupted or excessively long scanpaths are observed during social tasks like emotion recognition, indicating atypical visual strategies [16,36]. In dyslexia, scanpaths are disorganized during reading, and self-similarity-based visual encodings have achieved high classification performance [25]. In RBD, while scanpath features are less commonly mentioned, emerging speech- and behavior-linked gaze profiles suggest evolving roles for these metrics in early risk classification.

3.1.4. Pupil Size and Gaze Variability

Although less widely reported across all conditions, pupil size dynamics and spatial gaze deviation offer useful biomarkers in specific disorders, as shown in Table 1. Schizophrenia studies have used pupil dynamics and fixation skewness as discriminative markers [36]. In CP, spatial gaze measurements such as iris displacement, angular deviation, and distances between anatomical eye landmarks have been used to monitor visual rehabilitation progress [21]. These features are especially valuable in tracking longitudinal improvement or capturing subtle neuromuscular impairments.

3.1.5. Task-Specific and Saliency-Based Features

Saliency-based features comparing gaze points to saliency map predictions reflect bottom-up attention and stimulus responsiveness. According to Table 1, these features are used notably in FASD, ADHD, ALS, OCD, and emerging areas like RBD. In FASD, poor alignment with saliency regions indicates impaired visual attention, while group-based metrics reflect deficits in top-down control [12]. Similar results occur in ADHD, whereby gaze behavior indicates reduced visual salience synchronization during free viewing [15,33]. In RBD, saliency features are used less directly, but cognitive control can be inferred through complementary data such as speech or behavioral profiles [9]. In ALS, gaze-based systems leverage task-directed gestures such as looking up or blinking as communicative inputs, making task-constrained spatial and temporal gaze features central to interface design [28].

3.1.6. Hybrid and Multimodal Models

Several studies combine gaze data with other physiological modalities to enhance diagnostic accuracy. In ASD, combining EEG and eye-tracking data yielded 85.44% accuracy and an AUC of 0.93 using SVM [34]. In PD and RBD, integrating connected speech features with eye-tracking data has improved the classification of motor-positive vs. motor-negative status [10]. These multimodal approaches represent a growing trend toward more comprehensive and informative biomarker sets for complex disorders.

3.2. Task Design and Protocols

The techniques for collecting gaze data and the design of experimental tasks have a significant impact on how well eye-tracking-based ML models perform. Numerous task types have been used in the reviewed investigations, each focusing on distinct cognitive, motor, or perceptual domains related to certain disease. Variability triggered by this diversity, however, also affects study comparability and generalizability.

CommonTask Types

Several recurring task types have emerged as standard across specific clinical domains:

Free Viewing Tasks: Used in schizophrenia [36], ASD [14] and FASD [12], participants observe static or dynamic visual scenes without instructions. These tasks capture spontaneous gaze patterns and are useful for evaluating visual attention and scanpath strategies.
Facial Emotion Recognition and Social Tasks: Predominantly used in ASD [11,12,16] and Sz [36], these tasks assess social gaze behavior by presenting emotional facial stimuli and tracking individuals’ fixation on facial AOIs such as eyes and mouth.
Reading and Language Tasks: Widely applied in dyslexia research [25,26,37], participants read semantic or non-semantic passages, allowing for the extraction of regressions, fixation durations, and reading times. Some studies also include non-reading tasks such as saccade or vergence tests to highlight oculomotor irregularities [27].
Visually Guided Eye Movement Tasks: Disorders like PD [4,5], mTBI [17,18], and vertigo [8] often utilize saccade, anti-saccade, smooth pursuit, and go/no-go tasks to elicit precise oculomotor responses. These are valuable for detecting subtle motor or control deficits.
Cognitive Test Adaptations: Adapted neuropsychological instruments like MoCA, Stroop or figure copying have been embedded into eye-tracking platforms for conditions like AD/MCI [6], OCD [24], and LIS [23,24]. These structured assessments enable the mapping of cognitive functions like attention, working memory, and executive control through gaze metrics.
Assistive and Communication Interfaces: In LIS and ALS, task design shifts toward interface usability requiring to perform discrete ocular gestures such as look up, blink and dwell to control menu navigation or spelling tools [28,39].

3.3. Machine Learning and Deep Learning

The integration of ML and DL with eye-tracking technologies has rapidly expanded across the domains of neurodevelopmental, neurodegenerative, and neurological disorder research. These approaches have enabled automated, objective, and often highly accurate classification systems that support early detection, differential diagnosis, and rehabilitation monitoring.

3.3.1. Machine Learning Approaches

Machine learning remains the predominant modeling approach across eye tracking studies. Among these, SVM are the most frequently employed due to their robustness with high-dimensional data and small sample sizes. For example, SVM achieved 88.51% accuracy in ASD classification using facial scanning patterns [12], 80.18% accuracy in dyslexia detection [37], and 79.2% accuracy in distinguishing FASD from controls [12]. Similarly, RF and KNN have been widely used in neurological and motor-based disorders such as stroke [19], mTBI [17], and ALS [29], frequently selected for their interpretability and ability to handle imbalanced datasets.

3.3.2. Deep Learning Approaches

Recent years have seen the increased use of DL methods, particularly where gaze data are visualized or sequential. CNN is used to classify scanpath heatmaps in ASD [14] and dyslexia [25], with one study achieving an AUC of 0.93 and 89% accuracy using GSSP [25]. For temporally structured gaze sequences, LSTM networks have been applied, such as vertigo diagnosis [8], where LSTM-derived features fed into an RF ensemble achieved up to 96.88% accuracy. Deep learning models have also been integrated in communication tools, such as an ANN-based “Call with Eyes” system for LIS, which achieved 99.8% gesture accuracy [39].

3.3.3. Feature Selection and Optimization Techniques

To manage high-dimensional data, various feature selection methods have been adopted. MRMR and RFE (Recursive Feature Elimination) are commonly used, with the latter applied in the FASD study to select 19 optimal features from 224 candidates [12]. Optimization techniques like Differential Evolution (DE) have also been explored, with DE-tuned SVM reaching up to 100% accuracy in ASD classification [11], while overfitting in limited datasets may be the cause of such outcomes.

3.3.4. Model Evaluation and Generalizability

Model performance is evaluated using cross-validation methods (e.g., 10-fold, and leave one out), with metrics such as accuracy, AUC, precision, and sensitivity. While internal validation shows strong results, for instance, 91.1% accuracy for mTBI vs. control classification using RF [17], few studies validate models on external test sets. The lack of shared datasets and standardized benchmarking remains a barrier to generalizability, highlighting the need for multi-site validation and cross-population testing.

3.4. Diagnostic Performance and Structured Summary

To improve transparency and reproducibility, Table 3 provides a structured overview of representative studies, including dataset characteristics, eye-tracking setup, task paradigms, machine learning models, validation strategies, and classification performance. Reported performance metrics should be interpreted cautiously, as methodological differences across studies limit direct comparability.

Among the eleven representative studies summarized in Table 3, eight employed subject-independent validation, three used trial-level validation, and none performed external validation. High classification accuracy (>80%) was reported in nine studies, including seven using subject-independent validation and two using trial-level validation. Subject-independent studies generally involved modest sample sizes (typically 36–124 participants) but provide more reliable estimates of generalization. In contrast, trial-level validation, used in stroke and cerebral palsy studies with small cohorts (n = 40–263), reported high accuracy (88.45–94.17%), which may reflect optimistic estimates due to within-subject data overlap. Overall, while the reported accuracy is consistently high, the lack of external validation and frequent reliance on small datasets limit conclusions regarding real-world clinical robustness.

4. Discussion

As it was already mentioned, a major limitation in the field is the lack of standardized protocols. Studies differ in terms of screen sizes, stimulus presentation durations, calibration routines, and even fixation thresholds, for instance defining a minimal fixation duration as 100 ms versus 200 ms.This heterogeneity leads to inconsistencies in feature extraction and reduces the reproducibility of findings. Additionally, participants’ setup such as head-free versus stabilized or room lighting is often underreported while it can significantly influence eye-tracking precision.

Although many studies report high accuracy, often exceeding 80%, these results should be interpreted cautiously due to substantial heterogeneity across studies. Differences in experimental task design (e.g., free viewing, reading tasks, smooth pursuit, or cognitive tests), dataset size, class balance, eye-tracking hardware specifications, preprocessing pipelines, and feature extraction methods can significantly influence models’ performance. Additionally, validation procedures vary widely, with many studies relying on internal cross-validation rather than independent external validation, which may overestimate generalizability. The level of data splitting also differs, with some studies using subject-independent validation while others employ trial-level splits that may introduce optimistic bias. Therefore, the reported performance metrics should not be interpreted as directly comparable across disorders or studies, but rather as indicative of the potential of eye movement-based machine learning methods under specific experimental conditions.

To ensure comparability and scalability, future research must prioritize protocol harmonization. This includes the following:

Developing common sets of eye-tracking tasks that can be used across different disorders and tested in multiple studies.
Using shared datasets and agreeing on clear definitions for AOIs in visual stimuli, so that results from different research groups can be compared.
Applying consistent methods for calibration and feature extraction, so that eye movement data is processed the same way in each study.
Clearly reporting details about the equipment used and testing conditions, such as screen size, lighting, and participant position, to make studies more transparent and repeatable.

While the integration of ML with eye-tracking technologies has demonstrated promising diagnostic capabilities across a range of neurological, neurodegenerative and neurodevelopmental disorders, the generalizability of these findings is constrained by several persistent data-related limitations.

4.1. Small Sample Sizes and Class Imbalance

One of the most general challenges is the limited sample size observed across many studies. In disorders such as FASD [33], CP [21], and even some adult ASD and Sz studies [14,36], datasets often include fewer than 50 participants per group. These small datasets not only reduce statistical power but also increase the risk of overfitting, especially when complex models such as Deep Neural Network are applied. Additionally, class imbalance (e.g., more healthy controls than clinical cases) can bias model performance and obscure true diagnostic potential.

4.2. Lack of Demographic Diversity

Many datasets are confined to narrow age groups, such as school-aged children (ASD, dyslexia, and ADHD) or elderly adults (AD and PD), limiting model applicability across broader populations. Few studies stratify results by age, gender, or cultural background, which are known to influence gaze behavior. Without such diversity, the trained models risk underperforming in real-world, heterogeneous clinical populations.

Recent large-scale evidence demonstrates that eye movement characteristics vary systematically across the lifespan and differ significantly between sexes, even in neurologically healthy individuals. Age-related changes have been observed in relation to saccade velocity, fixation dispersion, smooth pursuit accuracy, and error rates, while sex-specific differences affect multiple oculomotor parameters, including saccadic targeting accuracy and pursuit stability. These normative variations can be comparable in magnitude to disease-related effects, indicating that models trained on demographically homogeneous cohorts may inadvertently conflate normal developmental or sex-related variability with pathological patterns. Consequently, failure to incorporate age- and sex-specific normative baselines reduces generalizability and may introduce bias into diagnostic and prognostic models [42,43,44].

4.3. Device and Setup Variability

The hardware used for eye-tracking data collection varies widely, ranging from high-end research-grade systems to low-cost webcams and tablet-based devices. Differences in sampling rate, spatial accuracy, calibration quality, and experimental setup (e.g., head-mounted vs. remote systems) can substantially affect feature reliability and model input quality, limiting cross-study comparability and model transferability. In particular, the sampling rate determines the temporal precision with which eye movements can be measured. Lower sampling rates can reliably capture fixation-based metrics and general gaze allocation, whereas higher sampling rates are required to accurately resolve rapid oculomotor dynamics such as saccade timing, velocity profiles, and microsaccades. Consequently, insufficient temporal resolution may introduce measurement uncertainty and reduce the comparability of fine-grained features across studies.

Most wearable eye trackers operate at sampling rates between 50 and 100 Hz, which are generally sufficient for extracting fixation-based features and overall gaze trends in applied healthcare contexts [45]. However, higher sampling rates are required for the precise characterization of temporal gaze dynamics. Studies investigating reading and detailed oculomotor behavior typically employ sampling rates of at least 500 Hz, with many research-grade systems operating at 500–1000 Hz or higher to ensure the accurate detection of saccades and scanpath structure [46]. These differences have important implications for feature robustness: fixation-based summary features, such as fixation duration, fixation count, and gaze distribution, are relatively robust across devices, whereas fine-grained temporal features, including saccade dynamics and detailed scanpath patterns, require higher sampling rates for reliable measurement. The explicit reporting of hardware specifications and the alignment of feature selection with device capabilities are therefore essential to ensure reproducibility and valid cross-study comparisons.

4.4. Limited External Validation

Most studies rely on cross-validation within a single dataset (e.g., k-fold), with few evaluating their models on independent external test sets. Without external validation, it is difficult to assess how well models generalize to new participants, clinical settings, or hardware platforms. This limitation is particularly critical for eye tracking-based digital biomarkers, which are sensitive to demographic factors, task design, and recording conditions, and may exhibit disease-overlapping patterns across neurodegenerative disorders. As highlighted in the recent cross-disease investigations of ML-driven digital biomarkers, the lack of external and multi-cohort validation remains a major barrier to clinical translation and robust early-stage detection. Additionally, the absence of shared benchmarks and public datasets hinders progress toward reproducible and comparable research outcomes [47].

4.5. Recommended Reporting Standards and Validation Framework

Trial-level validation can inflate performance because trials from the same participant share stable individual gaze characteristics, such as fixation patterns, saccade dynamics, and scanpath tendencies. When these trials appear in both training and validation sets, models may learn subject-specific signatures rather than disease-related features, resulting in overly optimistic accuracy estimates. Subject-independent validation prevents this leakage by ensuring the complete separation of participants between training and validation sets, thereby providing a more realistic assessment of generalization to unseen individuals. Nested cross-validation further improves methodological rigor by separating hyperparameter tuning (inner loop) from model evaluation (outer loop), preventing information leakage and ensuring unbiased performance estimation, particularly in small or high-dimensional datasets.

To improve the reproducibility, comparability, and clinical translation of eye tracking-based machine learning studies, it is essential to adopt standardized reporting practices and robust validation strategies. Many existing studies lack sufficient detail regarding participant characteristics, eye-tracking hardware specifications, preprocessing pipelines, and validation design, which limits their reproducibility and generalizability. Table 4 summarizes recommended minimum reporting items and a validation ladder that can support the development of clinically reliable eye movement-based digital biomarkers.

4.6. Model Selection and Validation Strategies for Small Datasets

Many eye tracking studies involve relatively small sample sizes, which increase the risk of overfitting, particularly when using complex models. In such settings, simpler models such as SVM, RF, and regularized linear models are often preferable, as they require fewer parameters and tend to generalize more reliably. These models can achieve consistent performance when combined with well-defined oculomotor features derived from fixation, saccade, and scanpath measures.

Deep learning models typically require larger datasets to achieve reliable generalization and should be used cautiously in small-N studies. When applied, their performance should be compared with simpler baseline models, and safeguards such as regularization and appropriate validation procedures should be implemented. Studies should report uncertainty measures alongside performance metrics, including variability across cross-validation folds and confidence intervals. Nested cross-validation is recommended when performing hyperparameter tuning, and validation should ideally progress from internal cross-validation to independent hold-out testing and, when possible, external multi-site validation.

Overall, selecting models appropriate for the dataset size, combined with transparent validation and uncertainty reporting, is essential for developing reliable and clinically applicable eye tracking-based classification models. These validation and reporting practices complement the recommended checklist presented in Table 4.

5. Conclusions and Future Works

This review demonstrates the growing utility of eye-tracking technologies combined with ML and DL for the diagnosis, classification, and monitoring of neurological, neurodevelopmental, and neurodegenerative disorders. Across conditions such as ASD, ADHD, dyslexia, PD, AD, mTBI, FASD, and more, eye movement data offer a rich and non-invasive window into cognitive and motor function.

A wide range of features relating to fixation durations, saccadic parameters, scanpath complexity, pupil dynamics, and saliency alignment have been applied to characterize disorders in specific gaze behavior. Supervised ML models, particularly SVM and RF, remain the most widely used due to their effectiveness on small datasets. However, DL models, especially CNN and ANN, are emerging as powerful tools for spatial and temporal gaze pattern recognition. Hybrid and multimodal approaches that integrate eye tracking with EEG, speech, or MRI further enhance diagnostic accuracy and reflect the field’s trajectory toward more comprehensive modeling.

Despite this progress, there are still challenges. Many studies suffer from limited sample sizes, a lack of demographic diversity, inconsistent feature extraction methods, and the absence of standardized task protocols. These limitations hinder generalization and clinical translation. Additionally, the variability in eye-tracking hardware, calibration routines, and task design underscores the need for harmonized research practices.

To address these gaps and advance the field, future research should prioritize the following:

Standardization of Protocols
Future studies should aim to develop unified and validated eye-tracking tasks, consistent calibration routines, and shared definitions for gaze metrics and AOIs. Standardized protocols would facilitate comparability across studies and improve reproducibility.
Expansion of Benchmark Datasets
Many studies suffer from small and demographically narrow samples. There is a pressing need for large-scale, diverse, and publicly accessible datasets spanning multiple disorders, age groups, and cultural backgrounds. Such resources would support generalizable models and robust cross-study validation.
Integration of Multimodal Data
Combining eye-tracking data with complementary modalities such as EEG, MRI, speech analysis, or clinical assessments can improve accuracy and capture richer neurophysiological patterns, particularly for complex or overlapping conditions.
Application of Data Augmentation Techniques
Given the limited size of many eye-tracking datasets, applying data augmentation methods such as adding noise to saccade trajectories, jittering fixation coordinates, or generating synthetic scanpaths can enhance models’ robustness. This is especially beneficial for training DL models, improving generalization across different populations and devices.
Emphasis on Explainability and Clinical Interpretability
As ML moves closer to clinical deployment, the need for interpretable and transparent models becomes essential. Explainable AI (XAI) approaches should be incorporated to provide insights into decision-making processes and to build trust among healthcare professionals.

To conclude, the review presented in this paper shows that a lot of work has been already undertaken in the field eye movement data application in neurodevelopmental, neurological, and neurodegenerative disorders but there is still a long way to go to establish these methods as standard medical procedures.

Author Contributions

Conceptualization, A.A.N. and P.K.; investigation, A.A.N. and P.K.; methodology, A.A.N.; project administration, A.A.N.; supervision, P.K.; validation, P.K.; writing—original draft, A.A.N.; and writing—review and editing, P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Silesian University of Technology grant no. 02/100/BKM25/0047.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Department of Applied Informatics, Silesian University of Technology, for administrative and technical support during the preparation of this manuscript. This work was funded by the Department of Applied Informatics, Silesian University of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

REM	Rapid Eye Movement
ASD	Autism Spectrum Disorder
ADHD	Attention-Deficit/hyperactivity Disorder
mTBI	Mild Traumatic Brain Injury
TBI	Traumatic Brain Injury
SVM	Support Vector Machine
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
ALS	Amyotrophic Lateral Sclerosis
PD	Parkinson’s Disease
AD	Alzheimer’s Disease
MCI	Mild Cognitive Impairment
RBD	Rapid Eye Movement Sleep Behavior Disorder
UPDRS-III	(Unified Parkinson’s Disease Rating Scale-Part III)
RST	Rate of Speech Timing
DPI	Duration of Pause Intervals
DUS	Duration of Unvoice stops
DVI	Duration of Voice Intervals
PIR	Pause Intervals per Respiration
QDA	Quadratic Discriminant Analysis
ROIs	Region of Interest
AUC	Area Under the Curve
MMSE	Minimum Mean Square Error
ANN	Artificial Neural Network
RF	Random Forest
KNN	K-Nearest Neighbors
DLN	Deep Learning Networks
EMG	Electromyography
GR	Gait Rhythm
EEG	Electroencephalography
MRI	Magnetic Resonance Imaging
DWT	Discrete Wavelet Transform
ST	Stockwell Transform
MSPCA	Multiscale Principal Component Analysis
EOG	Electrooculography
RSWA	REM Sleep Without Atonia
DI	Dissociation Index
Sz	Schizophrenia
FASD	Fetal Alcohol Spectrum Disorder
CP	Cerebral Palsy
CPTs	Continuous Performance Tests
OE	Omission Errors
CE	Commission Errors
RT mean	Mean Reaction Time
RT SD	Variability in Reaction Time
CR	Central Gaze Ration
gaze SD	Standard Deviation of Gaze Coordinates
SVM-RFE	Support Vector Machine Recursive Feature Elimination
ML	Machine Learning
DL	Deep Learning
AOIs	Areas of Interests
TD	Typically Developing
MRMR	Minimum Redundancy Maximum Relevance
FEIT	Facial Emotion Identification Test
LR	Logistic Regression
MSVM-RFE	Multiple Support Vector Machine Recursive Feature Elimination
GSSP	Gaze Self-Similarity Plots
NN	Neural Network
VEP	Visually Evoked Potential
PTS	Post-Traumatic Syndrome
OCD	Obsessive-Ompulsive Disorder
LSTM	Long Short-Term Memory
LIS	Locked-In Syndrome
GNG	Go/No-Go
CEA	Computerized Eye Tracker Assessment
VOR	Vestibulo-Ocular Reflex
PPCS	Persistent Post-Concussion Syndrome
xgbDART	Extreme Gradient Boosting with Dropout
TCM	Traditional Chinese Medicine
DT	Decision Tree
RCFT	Rey–Osterrieth Complex Figure Test
MoCA	Montreal Cognitive Assessment
DE	Differential Evolution

References

Kang, J.J.; Lee, S.U.; Kim, J.M.; Oh, S.Y. Recording and interpretation of ocular movements: Saccades, smooth pursuit, and optokinetic nystagmus. Ann. Clin. Neurophysiol. 2023, 25, 55–65. [Google Scholar] [CrossRef]
Band, T.G.; Bar-Or, R.Z.; Ben-Ami, E. Advancements in eye movement measurement technologies for assessing neurodegenerative diseases. Front. Digit. Health 2024, 6, 1423790. [Google Scholar] [CrossRef] [PubMed]
Harezlak, K.; Kasprowski, P. Application of eye tracking in medicine: A survey, research issues and challenges. Comput. Med. Imaging Graph. 2018, 65, 176–190. [Google Scholar] [CrossRef]
Koch, N.A.; Voss, P.; Cisneros-Franco, J.M.; Drouin-Picaro, A.; Tounkara, F.; Ducharme, S.; Guitton, D.; de Villers-Sidani, É. Eye movement function captured via an electronic tablet informs on cognition and disease severity in Parkinson’s disease. Sci. Rep. 2024, 14, 9082. [Google Scholar] [CrossRef]
Liao, X.; Yao, J.; Tang, H.; Xing, Y.; Zhao, X.; Nie, D.; Luan, P.; Li, G. Deciphering Parkinson’s Disease through Eye Movements: A Promising Tool for Early Diagnosis in the Face of Cognitive Impairment. Int. J. Clin. Pract. 2024, 2024, 5579238. [Google Scholar] [CrossRef]
Tadokoro, K.; Yamashita, T.; Fukui, Y.; Nomura, E.; Ohta, Y.; Ueno, S.; Nishina, S.; Tsunoda, K.; Wakutani, Y.; Takao, Y.; et al. Early detection of cognitive decline in mild cognitive impairment and Alzheimer’s disease with a novel eye tracking test. J. Neurol. Sci. 2021, 427, 117529. [Google Scholar] [CrossRef]
Sekar, A.; Panouillères, M.T.; Kaski, D. Detecting abnormal eye movements in patients with neurodegenerative diseases–current insights. Eye Brain 2024, 16, 3–16. [Google Scholar] [CrossRef]
Mao, Y.; He, Y.; Liu, L.; Chen, X. Disease classification based on eye movement features with decision tree and random forest. Front. Neurosci. 2020, 14, 798. [Google Scholar] [CrossRef]
Rechichi, I.; Iadarola, A.; Zibetti, M.; Cicolin, A.; Olmo, G. Assessing rem sleep behaviour disorder: From machine learning classification to the definition of a continuous dissociation index. Int. J. Environ. Res. Public Health 2021, 19, 248. [Google Scholar] [CrossRef]
Hlavnička, J.; Čmejla, R.; Tykalová, T.; Šonka, K.; Růžička, E.; Rusz, J. Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder. Sci. Rep. 2017, 7, 12. [Google Scholar] [CrossRef]
Jeyarani, R.A.; Senthilkumar, R. Eye tracking biomarkers for autism spectrum disorder detection using machine learning and deep learning techniques. Res. Autism Spectr. Disord. 2023, 108, 102228. [Google Scholar] [CrossRef]
Liu, W.; Li, M.; Yi, L. Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework. Autism Res. 2016, 9, 888–898. [Google Scholar] [CrossRef] [PubMed]
Yaneva, V.; Eraslan, S.; Yesilada, Y.; Mitkov, R. Detecting high-functioning autism in adults using eye tracking and machine learning. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 1254–1261. [Google Scholar] [CrossRef]
Carette, R.; Elbattah, M.; Cilia, F.; Dequen, G.; Guerin, J.L.; Bosche, J. Learning to Predict Autism Spectrum Disorder based on the Visual Patterns of Eye-tracking Scanpaths. In Proceedings of the International Conference on Health Informatics (HEALTHINF); SciTePress: Setubal, Portugal, 2019; pp. 103–112. [Google Scholar]
Lee, D.Y.; Shin, Y.; Park, R.W.; Cho, S.M.; Han, S.; Yoon, C.; Choo, J.; Shim, J.M.; Kim, K.; Jeon, S.W.; et al. Use of eye tracking to improve the identification of attention-deficit/hyperactivity disorder in children. Sci. Rep. 2023, 13, 14469. [Google Scholar] [CrossRef] [PubMed]
Iwauchi, K.; Tanaka, H.; Okazaki, K.; Matsuda, Y.; Uratani, M.; Morimoto, T.; Nakamura, S. Eye-movement analysis on facial expression for identifying children and adults with neurodevelopmental disorders. Front. Digit. Health 2023, 5, 952433. [Google Scholar] [CrossRef]
Tirdad, K.; Cruz, A.D.; Austin, C.; Sadeghian, A.; Nia, S.M.; Cusimano, M. Machine learning-based approach to analyze saccadic eye movement in patients with mild traumatic brain injury. Comput. Methods Programs Biomed. Update 2021, 1, 100026. [Google Scholar] [CrossRef]
Cade, A.; Turnbull, P.R. Classification of short and long term mild traumatic brain injury using computerized eye tracking. Sci. Rep. 2024, 14, 12686. [Google Scholar] [CrossRef]
Lu, Q.; Deng, J.; Yu, Y.; Li, Y.; Wei, K.; Han, X.; Wang, Z.; Zhang, X.; Wang, X.; Yan, C. Machine learning models for stroke detection by observing the eye-movement features under five-color visual stimuli in traditional Chinese medicine. J. Tradit. Chin. Med. Sci. 2023, 10, 321–330. [Google Scholar] [CrossRef]
Franceschiello, B.; Di Noto, T.; Bourgeois, A.; Murray, M.M.; Minier, A.; Pouget, P.; Richiardi, J.; Bartolomeo, P.; Anselmi, F. Machine learning algorithms on eye tracking trajectories to classify patients with spatial neglect. Comput. Methods Programs Biomed. 2022, 221, 106929. [Google Scholar] [CrossRef]
Illavarason, P.; Arokia Renjit, J.; Mohan Kumar, P. Medical diagnosis of cerebral palsy rehabilitation using eye images in machine learning techniques. J. Med. Syst. 2019, 43, 278. [Google Scholar] [CrossRef]
Vodrahalli, K.; Filipkowski, M.; Chen, T.; Zou, J.; Liao, Y.J. Predicting visuo-motor diseases from eye tracking data. In Proceedings of the Pacific Symposium on Biocomputing 2022; World Scientific: Singapore, 2022; pp. 242–253. [Google Scholar]
Kasprowski, P.; Żurek, G.; Olejniczak, R. A novel diagnostic tool utilizing eye tracking technology to allow objective assessment of patients’ cognitive functions. In Proceedings of the 2024 Symposium on Eye Tracking Research and Applications (ETRA); Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–3. [Google Scholar]
Kim, M.; Lee, J.; Lee, S.Y.; Ha, M.; Park, I.; Jang, J.; Jang, M.; Park, S.; Kwon, J.S. Development of an eye-tracking system based on a deep learning model to assess executive function in patients with mental illnesses. Sci. Rep. 2024, 14, 18186. [Google Scholar] [CrossRef] [PubMed]
Kasprowski, P. Utilizing gaze self similarity plots to recognize dyslexia when reading. In Proceedings of the 2024 Symposium on Eye Tracking Research and Applications (ETRA); Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–5. [Google Scholar]
El Hmimdi, A.E.; Kapoula, Z.; Sainte Fare Garnot, V. Deep learning-based detection of learning disorders on a large scale dataset of eye movement records. BioMedInformatics 2024, 4, 519–541. [Google Scholar] [CrossRef]
El Hmimdi, A.E.; Ward, L.M.; Palpanas, T.; Kapoula, Z. Predicting dyslexia and reading speed in adolescents from eye movements in reading and non-reading tasks: A machine learning approach. Brain Sci. 2021, 11, 1337. [Google Scholar] [CrossRef]
Edughele, H.O.; Zhang, Y.; Muhammad-Sukki, F.; Vien, Q.T.; Morris-Cafiero, H.; Agyeman, M.O. Eye-tracking assistive technologies for individuals with amyotrophic lateral sclerosis. IEEE Access 2022, 10, 41952–41972. [Google Scholar] [CrossRef]
Fernandes, F.; Barbalho, I.; Barros, D.; Valentim, R.; Teixeira, C.; Henriques, J.; Gil, P.; Dourado Júnior, M. Biomedical signals and machine learning in amyotrophic lateral sclerosis: A systematic review. Biomed. Eng. Online 2021, 20, 61. [Google Scholar] [CrossRef] [PubMed]
Graham, L.; Vitorio, R.; Walker, R.; Barry, G.; Godfrey, A.; Morris, R.; Stuart, S. Digital Eye-Movement Outcomes (DEMOs) as biomarkers for neurological conditions: A narrative review. Big Data Cogn. Comput. 2024, 8, 198. [Google Scholar] [CrossRef]
Mukunoki, T.; Nagasawa, J.; Nakata, Y.; Hiroe, M.; Zheng, Y.; Nakayama, M.; Sonoda, Y.; Kowa, H.; Nagamatsu, T. Dementia detection by gaze using visuospatial memory task with CNN. In Proceedings of the 2025 Symposium on Eye Tracking Research and Applications (ETRA); Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–3. [Google Scholar]
Miles, G.; Smith, M.; Zook, N.; Zhang, W. EM-COGLOAD: An investigation into age and cognitive load detection using eye tracking and deep learning. Comput. Struct. Biotechnol. J. 2024, 24, 264–280. [Google Scholar] [CrossRef] [PubMed]
Tseng, P.H.; Cameron, I.G.; Pari, G.; Reynolds, J.N.; Munoz, D.P.; Itti, L. High-throughput classification of clinical populations from natural viewing eye movements. J. Neurol. 2013, 260, 275–284. [Google Scholar] [CrossRef]
Kang, J.; Han, X.; Song, J.; Niu, Z.; Li, X. The identification of children with autism spectrum disorder by SVM approach on EEG and eye-tracking data. Comput. Biol. Med. 2020, 120, 103722. [Google Scholar] [CrossRef]
Patel, R.; Jerskey, B.A.; Shannon, J.; Soares, N.; Fogler, J.M. AI-Enabled Technologies and Biomarker Analysis for the Early Identification of Autism and Related Neurodevelopmental Disorders. Children 2025, 12, 1670. [Google Scholar] [CrossRef]
Huang, L.; Wei, W.; Liu, Z.; Zhang, T.; Wang, J.; Xu, L.; Chen, W.; Le Meur, O. Effective schizophrenia recognition using discriminative eye movement features and model-metric based features. Pattern Recognit. Lett. 2020, 138, 608–616. [Google Scholar] [CrossRef]
Rello, L.; Ballesteros, M. Detecting readers with dyslexia using machine learning with eye tracking measures. In Proceedings of the 12th International Web for All Conference; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1–8. [Google Scholar]
Le, L.; Nguyen, Q.T.; Duong-Trung, N.; Williams-King, D. Time-Series Grid Encoding of Eye-Tracking Data for Explainable AI in Dyslexia Detection. In Proceedings of the 2025 Symposium on Eye Tracking Research and Applications (ETRA); Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–3. [Google Scholar]
Beltrán-Vargas, R.A.; Sandoval-Espino, J.A.; Marbán-Salgado, J.A.; Licea-Rodriguez, J.; Palillero-Sandoval, O.; Escobedo-Alatorre, J.J. Call with eyes: A robust interface based on ANN to assist people with locked-in syndrome. SoftwareX 2024, 27, 101883. [Google Scholar] [CrossRef]
Montolio-Vila, A.; Argilés, M.; Sunyer-Grau, B.; Quevedo, L.; Erickson, G. Effect of action video games in eye movement behavior: A systematic review. J. Eye Mov. Res. 2024, 17, 10–16910. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Butala, A.A.; Moro-Velazquez, L.; Meyer, T.; Oh, E.S.; Motley, C.; Villalba, J.; Dehak, N. Automating the analysis of eye movement for different neurodegenerative disorders. Comput. Biol. Med. 2024, 170, 107951. [Google Scholar] [CrossRef] [PubMed]
Carrick, F.R.; Hunfalvay, M.; Bolte, T.; Azzolino, S.F.; Abdulrahman, M.; Hankir, A.; Antonucci, M.M.; Al-Rumaihi, N. Age-and sex-based developmental biomarkers in eye movements. Brain Sci. 2024, 14, 1288. [Google Scholar] [CrossRef] [PubMed]
Hunfalvay, M.; Bolte, T.; Singh, A.; Greenstein, E.; Murray, N.P.; Carrick, F.R. Age-based developmental biomarkers in eye movements: A retrospective analysis using machine learning. Brain Sci. 2024, 14, 686. [Google Scholar] [CrossRef]
Tao, L.; Wang, Q.; Liu, D.; Wang, J.; Zhu, Z.; Feng, L. Eye tracking metrics to screen and assess cognitive impairment in patients with neurological disorders. Neurol. Sci. 2020, 41, 1697–1704. [Google Scholar] [CrossRef]
Pauszek, J.R. An introduction to eye tracking in human factors healthcare research and medical device testing. Hum. Factors Healthc. 2023, 3, 100031. [Google Scholar] [CrossRef]
Angele, B.; Gunes Ozkan, Z.; Serrano-Carot, M.; Duñabeitia, J.A. How low can you go? Tracking eye movements during reading at different sampling rates. Behav. Res. Methods 2025, 57, 195. [Google Scholar] [CrossRef]
Chudzik, A.; Śledzianowski, A.; Przybyszewski, A.W. Machine learning and digital biomarkers can detect early stages of neurodegenerative diseases. Sensors 2024, 24, 1572. [Google Scholar] [CrossRef] [PubMed]

Table 1. Eye-tracking feature types used across disorders.

Disorder	Fixation Features	Saccadic Features	Scanpath Features	Pupil/Spatial Features	Saliency/Task-Based Features
ASD	✓	✓	✓	✗	✓
ADHD	✓	✓	✗	✗	✓
Dyslexia	✓	✓	✓	✗	✗
MCI/AD	✓	✗	✗	✗	✓
Schizophrenia	✓	✓	✓	✓	✗
CP	✓	✗	✗	✓	✗
Stroke	✓	✓	✗	✗	✗
PD	✓	✓	✗	✗	✗
mTBI/PPCS	✓	✓	✗	✗	✗
RBD	✗	✗	✗	✗	✓
FASD	✓	✓	✓	✗	✓
OCD	✓	✓	✗	✗	✓
ALS	✗	✓	✗	✓	✓

Note: ✓ indicates the feature type is used for the disorder, while ✗ indicates it is not used.

Table 2. Taxonomy linking eye-tracking task paradigms, oculomotor constructs, and associated disorders.

Task	Function Assessed	Key Features	Disorders
Pro-/Anti-saccade	Executive control, inhibitory control, attention	Saccade latency, error rate, directional accuracy	PD, ADHD, Schizophrenia, OCD
Smooth pursuit	Sensorimotor integration, motor coordination	Pursuit gain, tracking error, corrective saccades	PD, Stroke, mTBI, CP
Free viewing	Visual attention, saliency processing, attentional bias	Fixation duration, fixation distribution, scanpath patterns	ASD, ADHD, Schizophrenia, FASD
Reading tasks	Language processing, cognitive processing, attention	Fixation duration, regression rate, saccade amplitude	Dyslexia, AD, MCI
Visual search tasks	Attention allocation, executive function, cognitive flexibility	Search latency, fixation count, scanpath efficiency	ADHD, PD, OCD
Fixation stability tasks	Sensorimotor control, motor stability	Fixation dispersion, microsaccade frequency	PD, CP, ALS

Table 3. Classification performance reported in representative studies. Reported metrics include accuracy and AUC, where available. These metrics should be interpreted in the context of dataset size, class balance, and validation methodology.

Disorder	Ref.	Samples	Device/Task	Model	Val./Split By	Acc/AUC
ASD	[12]	29 ASD/29 TD-Age & 29 TD-Ability	Tobii T60 (60 Hz); face recognition task	SVM	LOO-CV/Subject	88.51/0.896
ASD	[34]	49 ASD/48 TD	Tobii TX300 (300 Hz); visual task	SVM	–/Subject	85.44/0.93
ADHD	[33]	21 ADHD/18 HC	–; free view	MSVM-RFE	LOO-CV/Subject	83.3/–
Dyslexia	[25]	18 Dyslexia/18 HC	EyeLink 1000 (1000 Hz); reading text	CNN-GSSP	9-fold CV/Subject	88.9/0.93
PD	[4]	46 mild PD/12 moderate PD	Tablet tracker (60 Hz); Multi-task	SVM	5-fold stratified CV/Subject	84.0/0.94
AD	[6]	72 AD/52 HC	Gazefinder NP-100 (50 Hz); cognitive task	SVM	6-fold CV/Subject	–/0.90
Stroke	[19]	16 Stroke/24 HC	Tobii 4C (90 Hz); visual task	RF	5-fold CV/Trial	88.45/–
Vertigo	[8]	36 Vertigo/36 HC	Infrared Eye Tracker (30 Hz); pursuit task	RF	–/Subject	96.88/–
mTBI	[17]	34 mTBI/31 HC	Saccadomoter (1000 Hz); saccade	RF	10-fold CV/Subject	91.11/–
CP	[21]	40 CP	Canon EOS 5D	NN	–/Trial	94.17/–
OCD	[24]	104 OCD/159 HC	EyeLink 1000 (1000 Hz); RCFT task	LSTM	–/Trial	–/0.607

HC = healthy control; CNN-GSSP = Convolutional Neural Network using Gaze Self-Similarity Plots; and LOO-CV = leave-one-out cross-validation. Split By: Subject = subject-independent validation, where data from each subject appears only in either training or validation sets, preventing data leakage and providing more reliable performance estimates; Trial = trial-level validation, where individual trials from the same subject may appear in both training and validation sets, potentially inflating performance due to subject-specific characteristics. Note: the Accuracy and AUC values are reported as provided in the original studies. The interpretation of these metrics should consider class balance, dataset size, and validation methodology.

Table 4. Recommended minimum reporting checklist and validation framework for eye tracking-based classification studies.

Category	Recommended Reporting Items
Participants’ Characteristics	Number of participants (patients/controls), age range, sex distribution, clinical diagnosis criteria, disease severity, medication status
Eye-Tracking Hardware	Device model, sampling rate (Hz), spatial accuracy, calibration procedure, head stabilization method
Quality Control	Calibration accuracy thresholds, exclusion criteria for poor-quality recordings, handling of missing data
Task Design	Description of experimental tasks, stimulus presentation parameters, duration, task instructions
Feature Definitions	Operational definitions of fixations, saccades, and other eye movement metrics; threshold parameters used
Preprocessing Pipeline	Filtering methods, artifact removal procedures, segmentation methods, feature extraction steps
Validation Strategy	Clear description of validation approach: internal cross-validation, hold-out testing, temporal split, or subject-independent validation
Validation Ladder	Recommended progression: internal cross-validation, independent hold-out dataset, external multi-site validation
Reproducibility	Availability of code, datasets, implementation details when possible

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nurhasan, A.A.; Kasprowski, P. Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders. Appl. Sci. 2026, 16, 2548. https://doi.org/10.3390/app16052548

AMA Style

Nurhasan AA, Kasprowski P. Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders. Applied Sciences. 2026; 16(5):2548. https://doi.org/10.3390/app16052548

Chicago/Turabian Style

Nurhasan, Amnaduny Akhara, and Paweł Kasprowski. 2026. "Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders" Applied Sciences 16, no. 5: 2548. https://doi.org/10.3390/app16052548

APA Style

Nurhasan, A. A., & Kasprowski, P. (2026). Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders. Applied Sciences, 16(5), 2548. https://doi.org/10.3390/app16052548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Eye Movement Analysis in Medicine: A Review Across Neurodevelopmental, Neurological, and Neurodegenerative Disorders

Abstract

1. Introduction

2. Application in Different Disorders

2.1. Neurodegenerative Disorders

2.1.1. Parkinson’s Disease

2.1.2. Alzheimer’s Disease

2.1.3. Amyotrophic Lateral Sclerosis

2.1.4. Mild Cognitive Impairment

2.1.5. Rapid Eye Movement Sleep Behavior Disorders

2.2. Neurodevelopmental Disorders

2.2.1. Attention-Deficit/Hyperactivity Disorder

2.2.2. Autism Spectrum Disorder

2.2.3. Schizophrenia

2.2.4. Fetal Alcohol Spectrum Disorder

2.2.5. Dyslexia

2.2.6. Cerebral Palsy

2.3. Neurological Disorders

2.3.1. Vertigo

2.3.2. Locked-In Syndrome

2.3.3. Mild Traumatic Brain Injury

2.3.4. Stroke

2.3.5. Obsessive–Compulsive Disorder

3. Comparative Analysis

3.1. Feature Usage Across Disorders

3.1.1. Fixation-Based Features

3.1.2. Saccadic Parameters

3.1.3. Scanpath and Trajectory Features

3.1.4. Pupil Size and Gaze Variability

3.1.5. Task-Specific and Saliency-Based Features

3.1.6. Hybrid and Multimodal Models

3.2. Task Design and Protocols

CommonTask Types

3.3. Machine Learning and Deep Learning

3.3.1. Machine Learning Approaches

3.3.2. Deep Learning Approaches

3.3.3. Feature Selection and Optimization Techniques

3.3.4. Model Evaluation and Generalizability

3.4. Diagnostic Performance and Structured Summary

4. Discussion

4.1. Small Sample Sizes and Class Imbalance

4.2. Lack of Demographic Diversity

4.3. Device and Setup Variability

4.4. Limited External Validation

4.5. Recommended Reporting Standards and Validation Framework

4.6. Model Selection and Validation Strategies for Small Datasets

5. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI