The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review

Perna, Pasqualina; Claudi, Alessandra; Stasolla, Fabrizio; Nappo, Raffaele

doi:10.3390/technologies14030183

Open AccessSystematic Review

The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review

¹

Neapolisanit S.R.L. Rehabilitation Center, 80044 Ottaviano, Italy

²

Giustino Fortunato University, 82100 Benevento, Italy

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(3), 183; https://doi.org/10.3390/technologies14030183

Submission received: 12 February 2026 / Revised: 11 March 2026 / Accepted: 15 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Advancements in Medical and Assistive Technologies Using Artificial Intelligence and Deep Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

Dementia represents a major healthcare challenge, as pathological changes often occur years before overt symptoms. Early manifestations such as mild cognitive impairment (MCI) and subjective cognitive decline (SCD) represent critical transitional stages between normal aging and dementia. Thus, distinguishing these conditions (i.e., MCI and SCD) and determining their potential evolution into dementia remains crucial. However, current clinical tools, mainly neuroimaging and neuropsychological assessments, are not always clearly interpretable and are often resource-intensive. In recent years, artificial intelligence (AI), including machine learning (ML) and deep learning (DL), has demonstrated promising potential in early detection, progression prediction, and differential diagnosis of neurocognitive disorders. This systematic review aims to synthesize current evidence on the application of AI-based approaches to improve diagnostic accuracy and prognostic assessments in dementia. A comprehensive literature search of studies published between 2015 and 2025 was conducted across PubMed/MEDLINE, Scopus, and Web of Science, following PRISMA 2020 guidelines. Studies were evaluated for data modality, methodological rigor, performance metrics, and clinical applicability. Seventeen (17) studies, of which twelve (12) are primary studies and five (5) are secondary studies, examining AI applications in detecting and diagnosing neurocognitive disorders (NCDs) in adults with dementia, MCI, or SCD were included. Results indicate that AI models, particularly DL applied to neuroimaging, electrophysiological data, speech and language features, biomarkers, and digital behavioral data, achieve high diagnostic accuracy in distinguishing MCI, Alzheimer’s disease, and healthy aging. Predictive models also show potential in forecasting conversion from MCI to dementia and monitoring cognitive trajectories via wearable or smart-home technologies. Nonetheless, heterogeneity, limited external validation, and methodological inconsistencies hinder clinical translation. In conclusion, AI represents a rapidly evolving and promising tool for early detection and monitoring of neurocognitive disorders. Collectively, the reviewed studies underscore the need for standardized pipelines, larger multicenter datasets, and explainable AI frameworks to enable effective clinical implementation.

Keywords:

dementia; machine learning; mild cognitive impairment; artificial intelligence

1. Introduction

Dementia refers to heterogeneous clinical conditions characterized by a decline in one or more cognitive domains, such as memory, attention, language, or executive function, with variable degrees of functional impairment (American Psychiatric Association [APA], 2022). It is estimated that cases of dementia reach a prevalence of 6.7% in Italy [1,2] and that approximately 40 million people worldwide are affected [3]. Alzheimer’s disease (AD) represents one of the most pressing public health challenges of the 21st century, accounting for approximately 60–70% of all dementia cases and more than 55 million people worldwide. This number is projected to increase sharply with global population aging [4]. Early and accurate diagnosis of dementia remains a critical unmet need, as neuropathological changes may precede clinical manifestations by several years. In fact, the neuropathological processes underlying these conditions often begin years before clinical symptoms become evident. Early stages along this continuum include subjective cognitive decline (SCD) and mild cognitive impairment (MCI), which represent transitional phases between normal aging and dementia [5,6,7,8,9]. SCD refers to a self-experienced, persistent decline in cognitive functioning in the absence of objective cognitive deficits on standardized neuropsychological testing [6,10]. A multicenter study [11] involving several countries reported a prevalence of approximately 25% of SCD. SCD has also been conceived as a prodromal condition of both MCI and dementia. For example, Bessi et al. [12] reported that 24% of individuals with SCD converted to MCI and 14% to AD. MCI refers to a clinical condition characterized by objective cognitive decline that is greater than expected for an individual’s age and educational level, while largely preserving independence in activities of daily living [8,13]. MCI has been conceived as a transitional state between the cognitive changes of normal aging and very early dementia. Indeed, people with MCI are more likely to progress in dementia [14]. MCI affects a significant portion of older adults, with global prevalence around 12–29% in people over 60 and an annual progression rate of dementia of 10–15% and over 40% potentially progressing within 3–5 years, especially for those with amnesic multi-domain MCI, even though some stabilize or revert [15]. To date, the diagnosis of dementia and the distinction between pathological cognitive decline and normal aging have relied primarily on comprehensive clinical assessments, standardized neuropsychological testing, and established biomarkers derived from neuroimaging and cerebrospinal (CSF) or plasma analyses [16,17,18,19]. These approaches, encompassing structural and functional MRI, PET amyloid and tau imaging, CSF biomarkers, and, more recently, blood-based assays [20], have significantly aided the early detection of dementia. However, these current state-of-the-art diagnostic measures of dementia are invasive (e.g., cerebrospinal fluid), expensive (e.g., neuroimaging) and time-consuming (e.g., neuropsychological assessments). Furthermore, data obtained using these methodologies may be prone to bias due to sensitivity and specificity issues. For example, neuropsychological screening batteries have not always proven sufficiently sensitive in detecting the earliest stages of cognitive decline. In fact, in a study by Ostrosky et al. [21], the authors found low sensitivity and specificity of the Mini-Mental State Examination (MMSE; [22]) in patients with low educational levels (from 0 to 4). The authors posited that the MMSE is a tool with low diagnostic utility in participants with low levels of education. This evidence has been reported in other studies [23,24]. This evidence led research into additional predictors of dementia [25], such as language impairments [26,27], motor deficits [28], spatial navigation abilities [29], or data derived from neuroimaging techniques (i.e., magnetoencephalography and transcranial magnetic stimulation). In recent years, artificial intelligence, and particularly deep learning, has introduced novel capabilities to the field by enabling the automated extraction of complex, nonlinear patterns from high-dimensional data, such as neuroimaging, EEG, speech and language samples, wearable sensor signals, and other digital biomarkers [30,31,32]. Artificial intelligence (AI) is a branch of computer science concerned with the creation of systems capable of performing tasks that typically require human intelligence, such as natural language recognition, computer vision, decision-making, and learning. In this scenario, machine learning (ML) and deep learning (DL) represent a subset of AI and enable algorithms to autonomously learn from large datasets, progressively refining their performance and prediction accuracy [33]. In other words, AI represents a set of computational technologies designed to make machines “intelligent,” or at least capable of performing complex cognitive operations independently. In recent years, the application of AI has attracted significant attention in the study of dementia [34]. Several studies have combined AI techniques with demographic, neuropsychological, and clinical data to improve diagnostic accuracy, screening efficiency, and prediction of dementia’s onset [35,36,37,38]. By leveraging complex data structures, these models can support clinicians in identifying prodromal cognitive changes that precede dementia. Traditional diagnostic approaches, such as neuroimaging, neuropsychological testing, and biomarker analysis, are valuable but often resource-intensive, subjective, and difficult to scale effectively. Recent advances in AI, particularly ML and DL, have opened promising avenues for enhancing the diagnosis, prognosis, and management of dementia. These algorithms can extract complex, nonlinear patterns from multimodal data sources, including magnetic resonance imaging (MRI), positron emission tomography (PET), electroencephalography (EEG), speech and language analysis, blood biomarkers, and wearable sensor data that traditional statistical techniques may overlook. AI-driven approaches have demonstrated potential not only in the early detection of MCI but also in monitoring disease progression, enhancing the differential diagnosis, predicting conversion to dementia, and assessing neuropsychiatric and functional outcomes. Moreover, the integration of explainable AI (XAI) techniques has improved model interpretability, fostering greater clinical trust and facilitating translation into real-world settings. Despite these advances, several challenges remain. Dataset heterogeneity, differences in preprocessing and feature selection, lack of standardized evaluation metrics, and limited external validation reduce the generalizability of current models. Ethical issues, particularly those concerning data privacy, bias, and algorithmic transparency, must also be addressed to ensure equitable and responsible implementation. The present systematic review aims to synthesize and critically appraise recent evidence on the use of AI, ML, and DL in dementia research. The objective is not to evaluate AI as a direct alternative to traditional neuropsychological tests but rather to analyze how these models are applied to different types of data (neuroimaging, biomarkers, EEG, language, digital data, and wearables), summarizing their diagnostic and prognostic performance, level of validation, and main methodological criticalities. Specifically, the aims are to: (1) identify and categorize the AI techniques applied in the different modalities; (2) summarize the reported performance (AUC, accuracy, sensitivity, and specificity); (3) assess the level of validation (internal, external, or longitudinal); and (4) analyze the main methodological risks and barriers to clinical translation.

By systematically mapping current evidence, our study sought to provide an integrated perspective on how AI technologies can transform dementia research and clinical care, bridging the gap between algorithmic innovation and clinical implementation. In this paper, we use the term ‘Neurocognitive Disorders (NCDs)’ in accordance with the DSM-5-TR classification, which includes conditions characterized by acquired cognitive decline with varying functional impact. This category includes both forms with primary neurodegenerative etiology (e.g., Alzheimer’s disease) and forms secondary to vascular or mixed causes. The term ‘neurodegenerative disorders’ will therefore be used exclusively when referring to conditions with primary neurodegenerative pathogenesis.

2. Materials and Methods

2.1. Search Strategy

This systematic review was carried out in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [39]. A comprehensive search strategy was implemented across major scientific databases, including PubMed/MEDLINE, Scopus, and Web of Science, using keywords including “Dementia”, “Machine learning”, “Mild cognitive impairment”, and “Artificial intelligence” to identify relevant studies published between January 2015 and January 2025. The search terms combined controlled vocabulary and free-text keywords consistently with the aim of the review, namely artificial intelligence and dementia, using Boolean operators per database as follows: (“Alzheimer’s” OR “dementia” OR “mild cognitive impairment”) AND (“machine learning” OR “deep learning” OR “artificial intelligence”). This systematic review was registered on Open Science Framework (OSF protocol number: 10.17605/OSF.IO/UP9TX). The research question is schematically reported in Table 1. The complete search strategies for each database are reported in Supplementary Table S1.

2.2. Eligibility Criteria

To ensure a rigorous and transparent selection process, the eligibility criteria for this systematic review were predefined based on the PICO framework. The inclusion and exclusion parameters were established to identify high-quality evidence concerning the application of artificial intelligence (AI), specifically machine learning (ML) and deep learning (DL), in the field of neuropsychology. This review focuses on human studies involving patients with mild cognitive impairment (MCI) or various types of dementia. The primary objective is to evaluate the contribution of these computational interventions to traditional neuropsychological methods, specifically regarding diagnostic accuracy and the prediction of cognitive performance. To maintain methodological consistency, only original research articles published in English between 1 January 2015 and 1 January 2025 that applied AI, ML, or DL models to adult populations with MCI, SCD, or major neurocognitive disorders, reporting quantitative metrics of diagnostic or prognostic performance, were considered, excluding editorials, case reports, and animal-based studies (see Table 2 for a schematic representation). Secondary studies (narrative or systematic reviews) were not included in the performance summary or in the counting of primary evidence (for further information see Appendix A). These contributions were used exclusively for contextual purposes to describe methodological trends and cross-cutting critical issues in the field. The consistency between objectives, eligibility criteria, and summaries was verified prior to the final analysis in order to provide methodological alignment and interpretative transparency.

2.3. Study Selection and Screening

Two independent reviewers screened titles and abstracts for relevance. Full-text articles were retrieved for studies meeting inclusion criteria or where eligibility was uncertain. Disagreements between reviewers were resolved through discussion and by consultation with a third reviewer. The study selection process is illustrated in the PRISMA 2020 flow diagram in Figure 1 below.

2.4. Data Extraction

A standardized extraction form was used to collect information from each included study, comprising (see Table 3):

•: Author(s) and year of publication;
•: Aim of the study;
•: AI/ML/DL technique(s) applied;
•: Data modality (e.g., MRI, PET, EEG, speech, clinical data, wearables, and biomarkers);
•: Main outcomes (accuracy, sensitivity, specificity, and AUC);
•: Key findings.

3. Results

As mentioned above, given the methodological heterogeneity of the included studies, it was not possible to conduct a quantitative meta-analysis. A narrative synthesis structured according to three cross-cutting dimensions was therefore adopted: data modality; clinical task (diagnosis, prognosis, screening, or monitoring); and level of validation (internal, external, or longitudinal). Overall, sample sizes ranged from 42 to 300 participants in the primary studies, with a prevalence of small or moderate samples and a predominance of internal validation. In addition, a comparative synthesis based on data modality was performed with the aim of highlighting differences in diagnostic and prognostic accuracy levels.

3.1. Narrative Synthesis of Results

A narrative synthesis was performed to summarize key findings across studies, considering the diversity of AI methodologies and outcome measures. As reported above, the marked heterogeneity precluded quantitative meta-analysis. The 17 included studies were grouped into five thematic categories reflecting their main objectives (see Table 4):

Early Diagnosis and Detection of Cognitive Impairment (7 studies);
Prognostic and Predictive Modeling (3 studies);
Screening and Classification in Clinical or Real-World Settings (1 study);
Behavioral Monitoring and Digital Biomarkers (2 studies).

Across all categories, AI, particularly ML and DL, demonstrated high diagnostic accuracy and potential for early and non-invasive detection of dementia. However, most studies were exploratory and heterogeneous in design, emphasizing the need for larger, standardized, and externally validated datasets.

3.2. Early Diagnosis and Detection of Cognitive Impairment

Seven studies focused on differentiating between Alzheimer’s disease (AD), mild cognitive impairment (MCI), healthy controls, and other neurocognitive disorders. Approaches included fNIRS-based classifications [41,44]; multimodal biomarker integration [48,49]; blood-based biomarker classifications [51]; and advanced MRI analyses, such as morphometry [51,55] and ASL perfusion imaging [56]. Across modalities, the models generally demonstrated high diagnostic performance, with AUC values commonly exceeding 0.85 and some studies reporting >0.90 [48]. However, most relied on internal validation, and only the minority incorporated independent external cohorts, limiting robustness. The predominance of single-center datasets and small to moderate sample sizes further constrains generalizability. Specifically, Kim and colleagues [44] applied the random forest algorithm to fNIRS signals, achieving an AUC (Area Under the Receiver Operating Characteristic Curve) ranging from 90% to 93% in a group of 168 participants with cognitive impairment at various stages of the disease. The AUC, or area under the curve, measures the discriminatory power of a machine-learning algorithm. The results indicate an excellent ability to discriminate between different stages of Alzheimer’s disease. However, as they are based on internal validation, further external confirmation is still needed. In another study Kim and colleagues [41] used different machine-learning algorithms applied to fNIRS signals to distinguish between controls, MCI, and AD. Despite the relatively small sample size, the models achieve high accuracy and show the potential of neuro-optical signals as a functional biomarker. Again, there remains a need for external validation on larger cohorts. Chang and colleagues [48] developed several ML models for the classification and discrimination of different levels of cognitive impairment using different biomarkers. Interestingly, their results show high sensitivity and specificity in distinguishing AD from other cognitive states. In this line, these findings suggest that a multimodal approach improves diagnostic accuracy.

3.3. Prognostic and Predictive Modeling

Three studies targeted prediction of future progression, focusing on short-term symptom dynamics or long-term conversion from MCI to AD. Wearable-based physiological modeling [30] explored the feasibility of predicting daily neuropsychiatric symptom severity, while MRI-clinical fusion models [46] discriminated stable from progressive MCI. Microstructural imaging along with ML models [52] identified white-matter patterns associated with cognitive deterioration. These studies represent early-stage or exploratory prognostic efforts toward an early detection of symptoms associated with MCI and the prediction of its progression to dementia. Nonetheless the evidence remains limited due to small or unreported sample sizes, heterogeneous prediction horizons, and scarce replication across cohorts. Rykov and colleagues [30] carried out a study to explore the use of deep-learning models, combining autoencoders and classifiers, to predict the severity of neuropsychiatric symptoms on a daily basis using physiological data collected from wearable devices. Interestingly, their model demonstrated the ability to estimate mood changes and related symptoms, suggesting potential applications in continuous monitoring. Although their results are encouraging, the lack of details on the sample and the risk of overfitting limit the robustness of the conclusions. In another study, Zhao and colleagues [46] used several ML algorithms to distinguish stable MCI from MCI progressing to AD. By integrating MRI features and clinical data, the proposed model was able to identify relevant predictive factors, but the lack of details on the sample and the need for replication limit its impact. Similarly, Lu and colleagues [52] used microstructure imaging techniques and ML approaches coupled with network analysis to identify white-matter patterns associated with cognitive decline. While showing discriminatory power, the study focused on a specific context (traditional Chinese medicine), making generalization more complex. Predictive modeling studies highlight the potential of AI to forecast disease trajectories. Longitudinal designs, particularly those integrating neuroimaging and cognitive data, achieved predictive accuracies of 75–85%. These models could support clinical decision-making by identifying high-risk individuals for early intervention trials.

3.4. Screening and Classification in Clinical or Real-World Settings

Two studies examined scalable screening tools. One leveraged electronic health records [45] to identify individuals at elevated risk, demonstrating moderate to high predictive performance but also susceptibility to biases inherent in administrative clinical data. A second contribution systematically evaluated mobile cognitive screening applications [50], highlighting potential for population-level deployment, even with the lack of standardized evaluation metrics as well. Overall, screening-oriented studies exhibit strong scalability potential, but the lack of uniform standards and variable data quality pose significant challenges. Particularly, a review by Thabtah et al. [50] systematically assessed the mobile applications developed for cognitive screening, with the aim of determining their scientific validity, usability, and potential clinical utility. The authors identified available apps through major app repositories and assessed them according to predefined criteria, including target cognitive domains, assessment methodology, psychometric grounding, data management practices, and user-centered design features. The analysis revealed that while the number of cognitive-assessment apps is rapidly increasing, most applications lack rigorous clinical validation, with few providing evidence of reliability, sensitivity, or specificity relative to established neuropsychological instruments. Many apps rely on simplified task designs that may not accurately capture complex cognitive constructs, such as executive function, working memory, or processing speed. Furthermore, data privacy and security practices are inconsistently reported, raising concerns about the management of sensitive health information. The review also highlights usability issues: although apps often claim accessibility advantages, the majority offer limited adaptivity, insufficient support for older adults, and inconsistent adherence to guidelines for digital health interventions. Only a subset demonstrates potential for integration into clinical workflows or remote monitoring paradigms. Thabtah et al. [50] concluded that mobile cognitive-screening apps represent a promising but under-regulated domain. They recommend the establishment of standardized evaluation frameworks, closer collaboration between app developers and clinical researchers, and the implementation of robust validation studies to ensure diagnostic accuracy and ethical deployment in real-world contexts. In another study [45], the authors investigated the potential of electronic health records (EHRs) combined with machine-learning techniques to support early identification of individuals at risk for cognitive decline. Their contribution outlines the challenges of traditional cognitive screening, such as time requirements, limited access to specialists, and underdiagnosis, and proposes EHR-based predictive modeling as a scalable alternative. In their study, Yagdir et al. [45] reviewed existing EHR data elements relevant to cognitive decline, including demographics, comorbidities, medication profiles, clinical notes, laboratory values, and patterns of healthcare utilization. They emphasized the heterogeneity and sparsity of EHR data, positing that cognitive impairment is often under-reported, which complicates ground-truth labeling for supervised learning. ML approaches, such as logistic regression, random forests, gradient boosting, and deep learning, have also been taken into account for their ability to detect risk signatures prior to clinical diagnosis. For example, in a study by Yadgir et al. [45], it was reported that models trained on multimodal EHR features can achieve moderate to high predictive performance, particularly when longitudinal patterns (e.g., changes in vital signs, polypharmacy trajectories, or comorbidity accumulation) are incorporated. However, they highlighted persistent issues concerning model interpretability, bias, and generalizability across healthcare systems. The authors also discussed implementation barriers, including data standardization, missingness, privacy constraints, and the need for model calibration before deployment in clinical workflows. Overall, Yadgir and colleagues conclude that EHR-based machine-learning systems represent a promising adjunct to traditional cognitive screening but require rigorous validation, transparent reporting, and interdisciplinary collaboration to ensure safe and equitable clinical use. These studies bridge the gap between AI research and clinical application. By leveraging accessible data sources, such as mobile apps, emergency department records, or smart-home monitoring, AI-based screening tools demonstrate feasibility and sensitivity comparable to traditional cognitive assessments. They highlight AI potential in population-level screening and community-based detection.

3.5. Behavioral Monitoring and Digital Biomarkers

Three studies focused on ecological monitoring using wearable or smart-home systems. Physiological biosignals from wearable devices [30] offered continuous tracking of symptom fluctuations, while smart-home sensors [54] demonstrated potential for detecting early behavioral markers of MCI. A pilot study integrating smart-home sensing with a prompting application [47] showed good usability and improvements in ADL support. These technologies offered the closest approximation to real-world functioning, but current evidence is preliminary, often relying on small samples and heterogeneous sensor configurations. Lussier and colleagues [54] reviewed the emerging use of smart-home sensor technologies for the early detection and monitoring of MCI. By examining 17 studies, the authors investigated a range of ambient sensing systems, such as motion sensors, door and appliance sensors, pressure mats, passive infrared detectors, and a wearable hybrid environment, used to unobtrusively capture daily activity patterns in older adults. The review highlighted that subtle behavioral and functional changes, including altered mobility patterns, reduced activity variability, irregular sleep, wake rhythms, and deviations in instrumental activities of daily living, could serve as early indicators of cognitive decline. Smart-home systems enable continuous, ecologically valid monitoring of these markers, often outperforming episodic clinical assessments in sensitivity to early changes. Lussier and colleagues identified promising analytical approaches, including time-series modeling, anomaly detection, and machine-learning classifiers, which could differentiate cognitively healthy individuals from those with MCI. However, the authors noted significant challenges: small sample sizes; lack of standardized protocols; limited longitudinal validation; and concerns about privacy, data integration, and user acceptability. Overall, the review concluded that smart-home sensing represented a viable and innovative pathway for early MCI detection but emphasizes the need for larger multi-site longitudinal studies, harmonized sensor frameworks, and rigorous evaluation of clinical relevance before the technology can be integrated into routine care. In line with these data, Schmitter-Edgecombe and colleagues [47] examined the use of smart-home prompting systems designed to support individuals with cognitive impairments in performing activities of daily living (ADL). The contribution showed that context-aware sensor networks could detect task progress, identify errors or delays, and deliver timely, adaptive prompts that facilitate task completion. Findings indicate that such systems improved task accuracy, independence, and safety, though challenges remain regarding prompt personalization, reliability in complex home environments, and long-term user acceptance. AI applied to behavioral and physiological data provide a promising pathway for non-invasive, continuous disease monitoring. For example, deep-learning models [30] were described as able to capture fluctuations in neuropsychiatric symptoms, while adaptive AI interventions improved adherence and daily functioning in MCI patients, illustrating a transition from passive detection to active digital therapeutics.

3.6. Data Modality Comparison of Results

A comparative synthesis was performed based on the data modality with the aim of juxtaposing the different accuracy levels (see Figure 2 and Table 5). Particularly, we carried out:

A structural comparison of MRI vs. DTI vs. ASL;
A blood biomarker comparison;
A wearable device comparison;
A synthesized multimodal comparison.

The effectiveness of machine learning in the diagnosis of neurocognitive disorders does not depend on the algorithm chosen but emerges from the nature and combination of the data collection methodologies used. From the analysis of the collected literature, a gradual increase in accuracy can be observed. The need for integration is confirmed in a study by Zhao et al. [46], which focused on predicting the conversion from MCI to Alzheimer’s disease with a two-year follow-up. The researchers highlight how diagnostic accuracy increases significantly when neuroimaging (MRI and PET) is combined with clinical data. Their comprehensive multimodal system achieves an accuracy of 93.95% in initial diagnosis and 87.08% in predicting disease progression. Based on an analysis of the literature collected, there has been a gradual increase in accuracy, ranging from a conventional morphological–structural examination to a multimodal assessment based on biochemical indicators and data acquired through wearable technologies, allowing for continuous monitoring in everyday life contexts. A first database consists of structural neuroimaging techniques, where instrumental innovation has made it possible to move from simple volumetric observation to sophisticated morphometric analyses. Although the study of connectivity using DTI (Diffusion Tensor Imaging) [52] allows for an investigation of microstructure alterations in white matter, with an accuracy of 68%, it is with surface-based morphometry (SBM) that the most solid results have been obtained. As demonstrated by Lee et al. [55], high-resolution (MRI) analysis of cortical thickness achieves a specificity of 93.3% and a sensitivity of 87.1%, identifying atrophy of the medial temporal cortex and lateral temporal regions as the most reliable structural marker to distinguish healthy patients from those with pathological conditions. However, anatomical data alone have inherent limitations, especially in the early stages of the disease (MCI). In this context, functional methods offer a complementary dynamic perspective. Cerebral perfusion diffusion, measured using a 3D pseudo-continuous ASL (pCASL) technique [56], proved to be a valuable tool for confirming overt dementia, achieving an accuracy level of 89% in distinguishing Alzheimer’s patients from subjects with subjective cognitive decline; however, it drastically loses accuracy and effectiveness in screening for mild cognitive impairment (57.5%). In contrast, a particularly promising approach emerges from the use of fNIRS combined with olfactory stimulation [41]. Although this method has lower spatial resolution than MRI, it successfully captures the brain’s metabolic response to a sensory stimulus, achieving performance comparable to PET but with significantly lower costs and invasiveness. The analytical comparison then moves on to even less invasive methods, such as blood biomarkers and neuropsychological tests. While blood data [51] offer very high accuracy (AUC: 0.84) and are fundamental for biomedical differential diagnosis, it is the optimization of clinical data that provides the best results. As demonstrated by Wang et al. [43], the application of machine learning to sensitive tests such as Delayed Recall allows for the presence of neurocognitive disorders to be identified even in patients with apparently normal MMSE scores. A further innovative frontier in the diagnostic framework is represented by the integration of digital biomarkers, which allow for continuous and non-invasive monitoring of patients in their everyday environment. In this context, the use of wearable devices introduces a longitudinal analysis dimension. A study by Rykov et al. [30] highlights how data collection through sensors, such as heart rate, skin temperature, and electrodermal activity, allows for extremely accurate prediction of the severity of neuropsychiatric symptoms and mood disorders in patients with MCI. The effectiveness and sensitivity of these tools depend on their ability to provide a dynamic picture of the patient’s health. The integration of these behavioral and psychological data, supported by the effectiveness of deep learning [45], proposes a redefinition of classic diagnostic models: the diagnosis of neurocognitive disorders no longer relies exclusively on the visualization of structural damage but rather on an integrated monitoring process. These tools do not replace neuroimaging, but they complete it, acting as early-detection systems and providing an ecological assessment that reflects the actual impact of the pathology on the subject’s daily life. Moreover, a study by El-Sappagh et al. [49] is emblematic of this multimodal approach: while individual techniques (MRI or PET) struggle to exceed certain critical thresholds of accuracy, their combination with clinical data allows for diagnostic precision exceeding 93%. The choice of data modality is not a competition between different technologies but must constitute an integration: neuroimaging identifies the damage, functional methods (fNIRS/ASL) measure its impact, and biomarkers confirm its severity, offering a diagnostic picture that becomes more precise the more multidisciplinary it.

4. Discussion

In recent years, research on the use of artificial intelligence (AI) and machine learning (ML) for the recognition and prediction of cognitive decline has undergone rapid and multidirectional development. Scientific contributions published between 2015 and 2025 show that the application of advanced computational techniques is progressively expanding to a wide range of biometric and clinical data, including structural and functional neuroimaging, physiological signals from wearable devices, blood-based biomarkers, voice data, administrative records, and environmental sensor data. Overall, the emerging landscape highlights the growing potential of AI technologies as support tools for early diagnosis, prognostic stratification, and continuous monitoring of individuals with suspected or established cognitive impairment. Evidence from experimental studies [49] using neuroimaging modalities (MRI, ASL, or PET) and optical techniques such as fNIRS demonstrates high accuracy in classifying different cognitive states [44], particularly when supervised algorithms such as random forest [51] or pattern recognition methods are employed. Additionally, the multimodal integration of clinical, imaging, and biomarker data, especially effective in large, standardized datasets, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [57], appears to enhance model robustness and enable clinical interpretability [49]. Emerging domains such as voice analysis [42], home-based sensor monitoring [54], wearable devices [30], and the processing of administrative data [47] also show promising applications, particularly for early detection and continuous assessments in ecological settings. However, despite these results, the literature highlights several recurrent challenges. These include the limited sample size of many studies, the lack of external validation, methodological heterogeneity in analytical pipelines, and the high risk of overfitting [30], especially in complex models such as deep learning. Furthermore, the absence of shared standards for data collection, preprocessing, and performance evaluation hinders cross-study comparability and slows clinical translation. Thus, the available research suggests that AI holds substantial potential to improve the screening [50], diagnosis, and monitoring of cognitive decline [54], but further efforts toward standardization, multicenter validation, and real-world assessments are needed. From a clinical point of view, these approaches hold promise for supporting differential diagnosis and improving early detection [37,43], particularly in individuals with MCI [41] or SCD [5]. Prognostic models [30,46,52] further extend this potential by enabling risk stratification and predicting disease progression, including the conversion from MCI to dementia, which may inform personalized monitoring strategies and optimize patient selection for clinical trials [44]. Screening-oriented applications, including AI-based analyses of electronic health records and digital cognitive assessments [45,50], offer scalability and accessibility advantages, particularly in primary care and resource-limited settings. Furthermore, continuous monitoring systems leveraging wearable devices and smart-home technologies introduce unprecedented ecological validity by capturing real-world behavioral and functional changes over time. Emerging assistive and intervention-oriented AI systems further indicate a shift from purely diagnostic tools toward technologies that may support daily functioning and quality of life [54]. Despite these strengths, the evidence base is characterized by significant heterogeneity in data sources, preprocessing pipelines, feature extraction strategies, and outcome definitions, limiting comparability across studies. In order to appraise the methodological quality of our included studies, we used the Mixed Methods Appraisal Tool (MMAT) [58]. The MMAT was developed to appraise the quality of empirical studies [59,60] and seems to act as a magnifier in detecting methodologically critical elements of the studies described in this review. The MMAT allows for the methodological quality of five categories of studies to be assessed: qualitative research, randomized controlled trials, non-randomized studies, quantitative descriptive studies, and studies using mixed methods [58]. For each study included, we selected the appropriate category of study design to be assessed and assigned a rating of “Yes” or “Not” or “Can’t tell” for each criteria reported. Subsequently, we established that “HIGH” methodological rigor when the “YES” responses were equal to or greater than 80% of the rating (for a schematic representation see Figure 3).

Importantly, this tool not only reflects the level of risk of bias but is also able to provide a “critical assessment” of the methodological quality of the present work. Additional sources of bias arise from modality-specific constraints, such as variability in imaging acquisition protocols, linguistic and cultural bias in speech-based models, and contextual dependency in wearable and sensor-derived data. Ethical and practical concerns related to data privacy, algorithmic transparency, and fairness further complicate real-world implementation. Consequently, while AI-based approaches show clear potential to augment clinical decision-making, their current role should be framed as supportive rather than substitutive, complementing clinical expertise and established diagnostic workflows. Beyond simply improving diagnostic accuracy, the adoption of DL and AI-based systems has significant implications for both clinical practice and research. In the clinical setting, AI-based models enable the integration of complex, high-dimensional data derived from neuropsychological testing, neuroimaging, language analysis, and digital cognitive assessments. This integrative capability supports the shift from interpretations based on single tests or domain-specific tests to multidimensional cognitive profiles, potentially improving the early identification of subtle cognitive changes characteristic of the preclinical and prodromal stages of dementia [61]. From a research perspective, these findings underscore the need for larger, multicenter, and longitudinal datasets, standardized evaluation metrics, rigorous external validation, and the systematic integration of explainable AI techniques to enhance interpretability and clinician trust. By providing an integrated, cross-modal synthesis of diagnostic, prognostic, screening, and monitoring applications, this review tries to clarify the current state of the field and delineates the critical methodological and translational steps required to move AI-driven tools from experimental settings into routine clinical practice. Since this review aims to monitor the clinical translation status of AI applications in the neuropsychological field, the target indicators have been synthesized using the technology readiness level (TRL) paradigm [62], which serves to assess the progress of a technology’s development. TRL is divided into 9 levels, which proceed in ascending order from the first, concerning observed physical principles, to the last, which includes the first production, as follows:

Basic principles observed;
Concept of formulated technology;
Experimental proof of concept;
Laboratory validation of the concept;
Technology validation in the relevant environment;
Demonstration in the relevant environment;
Demonstration in the operating environment;
Complete and qualified system;
System now finished and fully functional in the real environment.

Concerning AI/ML/DL applications, the present work estimates a sector that is at a stage comparable to advanced preclinical validation (TRL 4) [52,55,56], with some applications reaching initial clinical validation (TRL 5) [30,49]. The achievement of higher TRL levels is compromised by the heterogeneity of the data currently available in the literature. A persistent methodological limitation is the predominance of internal validation procedures compared to multi-site external validations [45]. All things considered, AI/ML/DL applications in the early detection, diagnosis, and monitoring of different types of NCDs are in the pre-translational phase. To advance clinical translation, it is essential to conduct prospective multi-site studies, utilize harmonized datasets, ensure transparent reporting of analytical pipelines, perform calibration and fairness assessments, and verify findings in independent cohorts. Based on current evidence, artificial intelligence should be regarded as an augmentative tool to support clinicians, rather than as a replacement for standard diagnostic pathways.

5. Conclusions

This systematic review provides a comprehensive synthesis of evidence published between 2015 and 2025 on the application of artificial intelligence (AI) to the detection, diagnosis, and monitoring of neurocognitive disorders. Across heterogeneous methodological approaches and data modalities, the reviewed studies consistently demonstrate that AI, particularly deep-learning architectures, can achieve high diagnostic accuracy in distinguishing MCI, Alzheimer’s disease, and healthy aging as compared to traditional statistical methods. Beyond diagnostic classification, a notable shift emerges toward ecologically valid and longitudinal assessment paradigms. The integration of wearable devices, smart-home technologies, and other digital biomarkers enables continuous monitoring of real-world behavior and symptom dynamics, offering novel opportunities for early detection and personalized disease tracking. Predictive models further suggest that AI-based approaches may support prognostic stratification, including the prediction of conversion from MCI to dementia, thereby informing early intervention strategies and clinical trial enrichment. Despite these promising advances, the translation of AI-driven tools into routine clinical practice remains constrained by substantial methodological and translational barriers. As discussed above, the marked heterogeneity precluded quantitative analysis. These limitations raise concerns regarding generalizability, reproducibility, and the risk of overfitting, particularly in complex deep-learning models. Equally critical is the systematic incorporation of explainable AI methodologies to enhance model transparency, interpretability, and clinician trust. Within this framework, AI should be conceptualized as an augmentative tool designed to complement, rather than replace, clinical expertise and established neuropsychological assessment pathways. This review tries to bridge the gap between algorithmic innovation and clinical practice. Through the systematic and cross-sectional integration of different data modalities (neuroimaging, biomarkers, EEG, language, wearables, and smart homes), it seeks to offer a comparative synthesis that goes beyond previous reviews focused on a single technology or application domain. Furthermore, unlike more limited contributions such as those by Yang et al. [42] or Graham et al. [53], this work adopts a clinical–translational perspective, jointly evaluating diagnostic performance, level of validation, methodological criticalities, and barriers to real-world implementation, thus outlining an updated and operational framework for clinical translation. Previous reviews focused on individual modalities (e.g., language or EEG), while this review integrates imaging, biomarkers, wearables, and EHRs within a unified framework at the validation level.

In conclusion, while AI-based approaches hold substantial potential to transform dementia research and care, their clinical impact will depend on methodological rigor, ethical governance, and meaningful integration into real-world healthcare systems. By clarifying both the opportunities and the limitations of current evidence, this review aims to inform future research directions and support the responsible translation of AI technologies into neuropsychological practice.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies14030183/s1.

Funding

This research received no external funding.

Acknowledgments

During the preparation of the manuscript, the authors used Open AI’s Chat GPT-5.3 to improve readability and clarity of the abstract, enhance fluency, and review the English of the manuscript.

Conflicts of Interest

Authors Pasqualina Perna, Alessandra Claudi and Raffaele Nappo were employed by the company Neapolisanit S.R.L. Rehabilitation Center. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Secondary Studies

Five reviews provided broader context across subdomains of AI for cognitive impairment. These include deep learning applied to voice features [42], EEG-based early detection [43], mobile health screening [50], and conceptual analyses of AI applications in neurodegeneration [53,63]. Across reviews, recurring challenges were noted: methodological heterogeneity, limited replicability, absence of shared benchmarks, and uncertainty regarding real-world translation. More specifically, Graham and colleagues [53] provided an overview of the main approaches and applications of artificial intelligence in cognitive decline research, highlighting opportunities, methodological challenges, and prospects. It did not include primary data but served as a useful introduction and summary for the field. The authors outlined the potential of machine-learning and data-driven models to integrate multimodal information ranging from neuroimaging and digital biomarkers to behavioral and ecological data to identify early risk signatures and track disease trajectories. They also highlighted key challenges, including data quality, model transparency, ethical constraints, and integration into clinical workflows, emphasizing the need for interdisciplinary frameworks to ensure responsible and clinically meaningful AI deployment in cognitive health. In another review, Wang and colleagues [43] examined the progress made in combining electroencephalography (EEG) with deep-learning (DL) methods for the early diagnosis of neurocognitive disorders. The authors pointed out that DL models, particularly CNNs, RNNs, and hybrid architectures, are capable of automatically extracting the spatiotemporal features of EEG associated with early pathological changes, often outperforming traditional signal processing approaches. They reported promising classification performance in tasks such as MCI detection but highlighted remaining challenges related to data heterogeneity, limited dataset sizes, model interpretability, and clinical generalizability. Furthermore, Yang et al. [42] reviewed the use of deep-learning models to analyze speech and voice signals as non-invasive biomarkers for cognitive impairment detection. Specifically, this review summarized 52 studies that applied deep-learning techniques to voice analysis with the aim of identifying or predicting cognitive decline. The authors showed that DL architectures, particularly CNNs, RNNs, and transformer-based models, could capture subtle acoustic, prosodic, and linguistic alterations associated with early cognitive decline, often outperforming handcrafted features. They also noted persistent limitations, including small and heterogeneous datasets, variability in recording conditions, limited interpretability, and challenges in clinical translation. These reviews described emerging AI-based approaches (voice analysis, EEG, and mobile health) and consistently highlighted recurrent challenges: methodological heterogeneity, lack of standardized datasets, and concerns about replicability.

References

Bayen, E.; Possin, K.L.; Chen, Y.; Cleret De Langavant, L.; Yaffe, K. Prevalence of Aging, Dementia, and Multimorbidity in Older Adults with Down Syndrome. JAMA Neurol. 2018, 75, 1399–1406. [Google Scholar] [CrossRef]
De Langavant, L.C.; Bayen, E.; Yaffe, K. Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study. J. Med. Internet Res. 2018, 20, e10493. [Google Scholar] [CrossRef] [PubMed]
Nichols, E.; Szoeke, C.E.I.; Vollset, S.E.; Abbasi, N.; Abd-Allah, F.; Abdela, J.; Aichour, M.T.E.; Akinyemi, R.O.; Alahdab, F.; Asgedom, S.W.; et al. Global, Regional, and National Burden of Alzheimer’s Disease and Other Dementias, 1990–2016: A Systematic Analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019, 18, 88–106. [Google Scholar] [CrossRef] [PubMed]
WHO. A Blueprint for Dementia Research; WHO: Geneva, Switzerland, 2022; p. 72. [Google Scholar]
Glodzik-Sobanska, L.; Reisberg, B.; De Santi, S.; Babb, J.S.; Pirraglia, E.; Rich, K.E.; Brys, M.; De Leon, M.J. Subjective Memory Complaints: Presence, Severity and Future Outcome in Normal Older Subjects. Dement. Geriatr. Cogn. Disord. 2007, 24, 177–184. [Google Scholar] [CrossRef] [PubMed]
Jessen, F.; Amariglio, R.E.; Van Boxtel, M.; Breteler, M.; Ceccaldi, M.; Chételat, G.; Dubois, B.; Dufouil, C.; Ellis, K.A.; Van Der Flier, W.M.; et al. A Conceptual Framework for Research on Subjective Cognitive Decline in Preclinical Alzheimer’s Disease. Alzheimer’s Dement. 2014, 10, 844–852. [Google Scholar] [CrossRef]
Petersen, R.C.; Smith, G.E.; Waring, S.C.; Ivnik, R.J.; Tangalos, E.G.; Kokmen, E. Mild Cognitive Impairment: Clinical Characterization and Outcome. Arch. Neurol. 1999, 56, 303–308. [Google Scholar] [CrossRef] [PubMed]
Petersen, R.C.; Caracciolo, B.; Brayne, C.; Gauthier, S.; Jelic, V.; Fratiglioni, L. Mild Cognitive Impairment: A Concept in Evolution. J. Intern. Med. 2014, 275, 214–228. [Google Scholar] [CrossRef] [PubMed]
Reisberg, B.; Ferris, S.H.; de Leon, M.J.; Franssen, E.S.E.; Kluger, A.; Mir, P.; Borenstein, J.; George, A.E.; Shulman, E.; Steinberg, G.; et al. Stage-Specific Behavioral, Cognitive, and In Vivo Changes in Community Residing Subjects with Age-Associated Memory Impairment and Primary Degenerative Dementia of the Alzheimer Type. Drug Dev. Res. 1988, 15, 101–114. [Google Scholar] [CrossRef]
Ribaldi, F.; Palomo, R.; Altomare, D.; Scheffler, M.; Assal, F.; Ashton, N.J.; Zetterberg, H.; Blennow, K.; Abramowicz, M.; Garibotto, V.; et al. The Taxonomy of Subjective Cognitive Decline: Proposal and First Clinical Evidence from the Geneva Memory Clinic Cohort. Neurodegener. Dis. 2024, 24, 16–25. [Google Scholar] [CrossRef] [PubMed]
Röhr, S.; Pabst, A.; Riedel-Heller, S.G.; Jessen, F.; Turana, Y.; Handajani, Y.S.; Brayne, C.; Matthews, F.E.; Stephan, B.C.M.; Lipton, R.B.; et al. Estimating Prevalence of Subjective Cognitive Decline in and across International Cohort Studies of Aging: A COSMIC Study. Alzheimers Res. Ther. 2020, 12, 167. [Google Scholar] [CrossRef] [PubMed]
Bessi, V.; Mazzeo, S.; Padiglioni, S.; Piccini, C.; Nacmias, B.; Sorbi, S.; Bracco, L. From Subjective Cognitive Decline to Alzheimer’s Disease: The Predictive Role of Neuropsychological Assessment, Personality Traits, and Cognitive Reserve. A 7-Year Follow-Up Study. J. Alzheimer’s Dis. 2018, 63, 1523–1535. [Google Scholar] [CrossRef]
Petersen, R.C. Mild Cognitive Impairment as a Diagnostic Entity. J. Intern. Med. 2004, 256, 183–194. [Google Scholar] [CrossRef] [PubMed]
Petersen, R.C.; Negash, S. Mild Cognitive Impairment: An Overview. CNS Spectr. 2008, 13, 45–53. [Google Scholar] [CrossRef] [PubMed]
2022 Alzheimer’s Disease Facts and Figures. Alzheimer’s Dement. 2022, 18, 700–789. [CrossRef]
Alawode, D.O.T.; Heslegrave, A.J.; Ashton, N.J.; Karikari, T.K.; Simrén, J.; Montoliu-Gaya, L.; Pannee, J.; O’Connor, A.; Weston, P.S.J.; Lantero-Rodriguez, J.; et al. Transitioning from Cerebrospinal Fluid to Blood Tests to Facilitate Diagnosis and Disease Monitoring in Alzheimer’s Disease. J. Intern. Med. 2021, 290, 583–601. [Google Scholar] [CrossRef] [PubMed]
Blennow, K. A Review of Fluid Biomarkers for Alzheimer’s Disease: Moving from CSF to Blood. Neurol. Ther. 2017, 6, 15–24. [Google Scholar] [CrossRef] [PubMed]
Sabbagh, M.N.; Lue, L.F.; Fayard, D.; Shi, J. Increasing Precision of Clinical Diagnosis of Alzheimer’s Disease Using a Combined Algorithm Incorporating Clinical and Novel Biomarker Data. Neurol. Ther. 2017, 6, 83–95. [Google Scholar] [CrossRef] [PubMed]
Tsolaki, M. Clinical Workout for the Early Detection of Cognitive Decline and Dementia. Eur. J. Clin. Nutr. 2014, 68, 1186–1191. [Google Scholar] [CrossRef]
Gautam, G.; Singh, H. Biomarkers in Dementia Research. In Nutrition in Brain Aging and Dementia; Springer: Singapore, 2024; pp. 93–107. [Google Scholar] [CrossRef]
Ostrosky-Solis, F.; Lopez-Arango, G.; Ardila, A. Sensitivity and Specificity of the Mini-Mental State Examination in a Spanish-Speaking Population. Appl. Neuropsychol. 2000, 7, 25–31. [Google Scholar] [CrossRef] [PubMed]
Folstein, M.F.; Robins, L.N.; Helzer, J.E. The Mini-Mental State Examination. Arch. Gen. Psychiatry 1983, 40, 812. [Google Scholar] [CrossRef] [PubMed]
Jones, R.N.; Gallo, J.J. Education Bias in the Mini-Mental State Examination. Int. Psychogeriatr. 2001, 13, 299–310. [Google Scholar] [CrossRef]
Scazufca, M.; Almeida, O.P.; Vallada, H.P.; Tasse, W.A.; Menezes, P.R. Limitations of the Mini-Mental State Examination for Screening Dementia in a Community with Low Socioeconomic Status: RResults from the Sao Paulo Ageing& Health Study. Eur. Arch. Psychiatry Clin. Neurosci. 2009, 259, 8–15. [Google Scholar] [CrossRef]
Jiménez-Huete, A.; Villino-Rodríguez, R.; Ríos-Rivera, M.M.; Rognoni, T.; Montoya-Murillo, G.; Arrondo, C.; Zapata, C.; Rodríguez-Oroz, M.C.; Riverol, M. Clusters of Cognitive Performance Predict Long-Term Cognitive Impairment in Elderly Patients with Subjective Memory Complaints and Healthy Controls. Alzheimer’s Dement. 2024, 20, 4702–4716. [Google Scholar] [CrossRef]
Slegers, A.; Chafouleas, G.; Montembeault, M.; Bedetti, C.; Welch, A.E.; Rabinovici, G.D.; Langlais, P.; Gorno-Tempini, M.L.; Brambati, S.M. Connected Speech Markers of Amyloid Burden in Primary Progressive Aphasia. Cortex 2021, 145, 160–168. [Google Scholar] [CrossRef] [PubMed]
Szatloczki, G.; Hoffmann, I.; Vincze, V.; Kalman, J.; Pakaski, M. Speaking in Alzheimer’s Disease, Is That an Early Sign? Importance of Changes in Language Abilities in Alzheimer’s Disease. Front. Aging Neurosci. 2015, 7, 195. [Google Scholar] [CrossRef]
Kim, J.; Jang, H.; Park, Y.H.; Youn, J.; Seo, S.W.; Kim, H.J.; Na, D.L. Motor Symptoms in Early- versus Late-Onset Alzheimer’s Disease. J. Alzheimers Dis. 2023, 91, 345–354. [Google Scholar] [CrossRef] [PubMed]
Tangen, G.G.; Nilsson, M.H.; Stomrud, E.; Palmqvist, S.; Hansson, O. Spatial Navigation and Its Association With Biomarkers and Future Dementia in Memory Clinic Patients Without Dementia. Neurology 2022, 99, e2081. [Google Scholar] [CrossRef] [PubMed]
Rykov, Y.G.; Patterson, M.D.; Gangwar, B.A.; Jabar, S.B.; Leonardo, J.; Ng, K.P.; Kandiah, N. Predicting Cognitive Scores from Wearable-Based Digital Physiological Features Using Machine Learning: Data from a Clinical Trial in Mild Cognitive Impairment. BMC Med. 2024, 22, 36. [Google Scholar] [CrossRef]
Sakal, C.; Li, T.; Li, J.; Yang, C.; Li, X. Association Between Sleep Efficiency Variability and Cognition Among Older Adults: Cross-Sectional Accelerometer Study. JMIR Aging 2024, 7, e54353. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Hong, Y.; Wang, Q.; Su, R.; Ng, M.L.; Xu, J.; Wang, L.; Yan, N. Identification of Mild Cognitive Impairment Among Chinese Based on Multiple Spoken Tasks. J. Alzheimer’s Dis. 2021, 82, 185–204. [Google Scholar] [CrossRef]
Stuart, R.; Peter, N. Artificial Intelligence A Modern Approach, 3rd ed.; Pearson Education: Noida, India, 2010. [Google Scholar]
Astell, A.J.; Bouranis, N.; Hoey, J.; Lindauer, A.; Mihailidis, A.; Nugent, C.; Robillard, J.M. Technology and Dementia: The Future Is Now. Dement. Geriatr. Cogn. Disord. 2019, 47, 131–139. [Google Scholar] [CrossRef] [PubMed]
Chudzik, A.; Śledzianowski, A.; Przybyszewski, A.W. Machine Learning and Digital Biomarkers Can Detect Early Stages of Neurodegenerative Diseases. Sensors 2024, 24, 1572. [Google Scholar] [CrossRef]
Kang, M.J.; Kim, S.Y.; Na, D.L.; Kim, B.C.; Yang, D.W.; Kim, E.J.; Na, H.R.; Han, H.J.; Lee, J.H.; Kim, J.H.; et al. Prediction of Cognitive Impairment via Deep Learning Trained with Multi-Center Neuropsychological Test Data. BMC Med. Inform. Decis. Mak. 2019, 19, 231. [Google Scholar] [CrossRef]
Qiu, S.; Miller, M.I.; Joshi, P.S.; Lee, J.C.; Xue, C.; Ni, Y.; Wang, Y.; De Anda-Duran, I.; Hwang, P.H.; Cramer, J.A.; et al. Multimodal Deep Learning for Alzheimer’s Disease Dementia Assessment. Nat. Commun. 2022, 13, 3404. [Google Scholar] [CrossRef] [PubMed]
Velmurugan, S.; Waheeda, S.; Kulanthaivel, L.; Subbaraj, G.K. Applications of Machine Learning and Multimodal Integration for the Early Diagnosis of Neurodegenerative Diseases (Review). World Acad. Sci. J. 2025, 7, 115. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; Antes, G.; Atkins, D.; Barbour, V.; Barrowman, N.; Berlin, J.A.; Clark, J.; et al. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef]
Kim, J.; Lee, H.; Lee, J.; Rhee, S.Y.; Shin, J.I.; Lee, S.W.; Cho, W.; Min, C.; Kwon, R.; Kim, J.G.; et al. Quantification of Identifying Cognitive Impairment Using Olfactory-Stimulated Functional near-Infrared Spectroscopy with Machine Learning: A Post Hoc Analysis of a Diagnostic Trial and Validation of an External Additional Trial. Alzheimers Res. Ther. 2023, 15, 127. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Li, X.; Ding, X.; Xu, F.; Ling, Z. Deep Learning-Based Speech Analysis for Alzheimer’s Disease Detection: A Literature Review. Alzheimers Res. Ther. 2022, 14, 186. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Wang, Z.; Liu, N.; Liu, C.; Mao, C.; Dong, L.; Li, J.; Huang, X.; Lei, D.; Chu, S.; et al. Random Forest Model in the Diagnosis of Dementia Patients with Normal Mini-Mental State Examination Scores. J. Pers. Med. 2022, 12, 37. [Google Scholar] [CrossRef] [PubMed]
Kim, J.; Kim, S.C.; Kang, D.; Yon, D.K.; Kim, J.G. Classification of Alzheimer’s Disease Stage Using Machine Learning for Left and Right Oxygenation Difference Signals in the Prefrontal Cortex: A Patient-Level, Single-Group, Diagnostic Interventional Trial. Eur. Rev. Med. Pharmacol. Sci. 2022, 26, 7734–7741. [Google Scholar] [CrossRef] [PubMed]
Yadgir, S.R.; Engstrom, C.; Jacobsohn, G.C.; Green, R.K.; Jones, C.M.C.; Cushman, J.T.; Caprio, T.V.; Kind, A.J.H.; Lohmeier, M.; Shah, M.N.; et al. Machine Learning-Assisted Screening for Cognitive Impairment in the Emergency Department. J. Am. Geriatr. Soc. 2022, 70, 831–837. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Sui, H.; Yan, C.; Zhang, M.; Song, H.; Liu, X.; Yang, J. Machine-Based Learning Shifting to Prediction Model of Deteriorative Due to Alzheimer’s Disease—A Two-Year Follow-Up Investigation. Curr. Alzheimer Res. 2022, 19, 708–715. [Google Scholar] [CrossRef]
Schmitter-Edgecombe, M.; Brown, K.; Luna, C.; Chilton, R.; Sumida, C.A.; Holder, L.; Cook, D. Partnering a Compensatory Application with Activity-Aware Prompting to Improve Use in Individuals with Amnestic Mild Cognitive Impairment: A Randomized Controlled Pilot Clinical Trial. J. Alzheimer’s Dis. 2022, 85, 73–90. [Google Scholar] [CrossRef]
Chang, C.H.; Lin, C.H.; Lane, H.Y. Machine Learning and Novel Biomarkers for the Diagnosis of Alzheimer’s Disease. Int. J. Mol. Sci. 2021, 22, 2761. [Google Scholar] [CrossRef]
El-Sappagh, S.; Alonso, J.M.; Islam, S.M.R.; Sultan, A.M.; Kwak, K.S. A Multilayer Multimodal Detection and Prediction Model Based on Explainable Artificial Intelligence for Alzheimer’s Disease. Sci. Rep. 2021, 11, 2660. [Google Scholar] [CrossRef] [PubMed]
Thabtah, F.; Peebles, D.; Retzler, J.; Hathurusingha, C. Dementia Medical Screening Using Mobile Applications: A Systematic Review with a New Mapping Model. J. Biomed. Inform. 2020, 111, 103573. [Google Scholar] [CrossRef]
Lin, C.H.; Chiu, S.I.; Chen, T.F.; Jang, J.S.R.; Chiu, M.J. Classifications of Neurodegenerative Disorders Using a Multiplex Blood Biomarkers-Based Machine Learning Model. Int. J. Mol. Sci. 2020, 21, 6914. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Zhang, J.; Liang, Y.; Qiao, Y.; Yang, C.; He, X.; Wang, W.; Zhao, S.; Wei, D.; Li, H.; et al. Network Topology and Machine Learning Analyses Reveal Microstructural White Matter Changes Underlying Chinese Medicine Dengzhan Shengmai Treatment on Patients with Vascular Cognitive Impairment. Pharmacol. Res. 2020, 156, 104773. [Google Scholar] [CrossRef]
Graham, S.A.; Lee, E.E.; Jeste, D.V.; Van Patten, R.; Twamley, E.W.; Nebeker, C.; Yamada, Y.; Kim, H.C.; Depp, C.A. Artificial Intelligence Approaches to Predicting and Detecting Cognitive Decline in Older Adults: A Conceptual Review. Psychiatry Res. 2020, 284, 112732. [Google Scholar] [CrossRef]
Lussier, M.; Lavoie, M.; Giroux, S.; Consel, C.; Guay, M.; Macoir, J.; Hudon, C.; Lorrain, D.; Talbot, L.; Langlois, F.; et al. Early Detection of Mild Cognitive Impairment with In-Home Monitoring Sensor Technologies Using Functional Measures: A Systematic Review. IEEE J. Biomed. Health Inform. 2019, 23, 838–847. [Google Scholar] [CrossRef] [PubMed]
Lee, J.S.; Kim, C.; Shin, J.H.; Cho, H.; Shin, D.S.; Kim, N.; Kim, H.J.; Kim, Y.; Lockhart, S.N.; Na, D.L.; et al. Machine Learning-Based Individual Assessment of Cortical Atrophy Pattern in Alzheimer’s Disease Spectrum: Development of the Classifier and Longitudinal Evaluation. Sci. Rep. 2018, 8, 4161. [Google Scholar] [CrossRef] [PubMed]
Collij, L.E.; Heeman, F.; Kuijer, J.P.A.; Ossenkoppele, R.; Benedictus, M.R.; Möller, C.; Verfaillie, S.C.J.; Sanz-Arigita, E.J.; Van Berckel, B.N.M.; Van Der Flier, W.M.; et al. Application of Machine Learning to Arterial Spin Labeling in Mild Cognitive Impairment and Alzheimer Disease. Radiology 2016, 281, 865–875. [Google Scholar] [CrossRef] [PubMed]
Mueller, S.G.; Weiner, M.W.; Thal, L.J.; Petersen, R.C.; Jack, C.R.; Jagust, W.; Trojanowski, J.Q.; Toga, A.W.; Beckett, L. Ways toward an Early Diagnosis in Alzheimer’s Disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s Dement. 2005, 1, 55–66. [Google Scholar] [CrossRef] [PubMed]
Hong, Q.N.; Fàbregues, S.; Bartlett, G.; Boardman, F.; Cargo, M.; Dagenais, P.; Gagnon, M.P.; Griffiths, F.; Nicolau, B.; O’Cathain, A.; et al. The Mixed Methods Appraisal Tool (MMAT) Version 2018 for Information Professionals and Researchers. Educ. Inf. 2018, 34, 285–291. [Google Scholar] [CrossRef]
Abbott, A. The Causal Devolution. Sociol. Methods Res. 1998, 27, 148–181. [Google Scholar] [CrossRef]
Porta, M. A Dictionary of Epidemiology; Oxford University Press: Oxford, UK, 2008. [Google Scholar]
Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
Mankins, J.C. Technology Readiness Assessments: A Retrospective. Acta Astronaut. 2009, 65, 1216–1223. [Google Scholar] [CrossRef]
Graham, S.; Harris, K.R. Students with Learning Disabilities and the Process of Writing: A Meta-Analysis of SRSD Studies. In Handbook of Learning Disabilities; The Guilford Press: New York, NY, USA, 2003. [Google Scholar]

Figure 1. A slightly modified version of the PRIMSA flow diagram [40], showing the steps of the search and selection process. * Where possible, it is recommended to state the number of records identified in each database or registry consulted (rather than the total number across all databases/registries). ** If automated tools were used, state how many records were excluded manually and how many were excluded by the automated tools.

Figure 2. A visual representation of the comparative approach to neurocognitive disorders’ diagnostic accuracy.

Figure 3. A graphic representation of methodological quality level (%) based on Mixed Methods Appraisal Tool. When the ‘YES’ answers were equal to or greater than 80% of the assessment, we classified the methodology as ‘HIGH’ rigor.

Table 1. Research question (PICO framework).

Population (P)	Patients with Mild Cognitive Impairment and Patients with Several Types of Dementia
Intervention (I)	AI, machine learning and deep learning.
Comparison (C)	Traditional neuropsychological methods.
Outcome (O)	Diagnostic accuracy and cognitive performance prediction.
Study Design	Systematic review, clinical trial and between-group comparison

Table 2. Schematic representation of eligibility criteria.

Inclusion Criteria

Exclusion Criteria

•: Adults diagnosed with dementia, mild cognitive impairment (MCI), or subjective cognitive decline (SCD).
•: Application of AI, ML, or DL methods for diagnosis, prognosis, classification, or disease monitoring.
•: Diagnostic accuracy, predictive performance, model interpretability, or clinical applicability.
•: Original research articles, including observational studies, diagnostic trials, and experimental models.
•: Published in English.

•: Non-AI studies.
•: Animal studies.
•: Editorials and commentaries.
•: Single-case study.
•: Non-English papers.

Table 3. Summary of data extraction from included studies investigating artificial intelligence (AI), machine learning (ML), and deep learning (DL) applications in the detection and diagnosis of neurocognitive disorders (NCDs). The table details study objectives, technical methodologies, data modalities, participant characteristics, and primary clinical findings.

N.	Author (Year)	Objective	AI/ML/DL Technique	Data Modality	Population/Sample Size	Main Outcomes	Extraction Pool	Key Findings
1	Rykov et al. (2024) [30]	Predict the severity of mood and neuropsychiatric symptoms from digital biomarkers using wearable physiological data and deep learning	Deep learning (CNN and LSTM)	Wearable physiological data	A total of 120 participants with dementia	Prediction accuracy and RMSE	Clinical trial	DL models accurately predicted neuropsychiatric symptom severity from wearable data.
2	Kim et al. (2023) [41]	Quantification of cognitive impairment using olfactory-stimulated fNIRS with ML	Support Vector Machine (SVM) and random forest	Functional near-infrared spectroscopy (fNIRS)	A total of 80 participants (MCI and controls)	Diagnostic accuracy	Samsung Medical Center	ML distinguished MCI from controls with >85% accuracy.
3	Yang et al. (2022) [42]	Deep-learning-based speech analysis for Alzheimer’s disease detection: a literature review	Deep learning (CNN and RNN)	Speech and voice features	Literature review	NA	-	Highlights DL advances in speech-based AD detection.
4	Wang et al. (2022) [43]	Early diagnosis of AD- and MCI-based deep learning	Deep Neural Network (DNN) and CNN	Neuropsychological data	A total of 60 AD/MCI patients and 50 controls	Classification accuracy	Memory clinic	DL models improved accuracy in early AD detection vs. traditional methods.
5	Kim et al. (2022) [44]	Classification of AD stage using ML for prefrontal oxygenation difference signals	Random forest and SVM	fNIRS	A total of 42 patients with AD and MCI	Diagnostic accuracy	Clinical center	ML achieved 90% accuracy distinguishing AD stages.
6	Yadgir et al. (2021) [45]	ML-assisted screening for cognitive impairment in an emergency department	Logistic Regression and Gradient Boosting	Clinical and demographic data	A total of 300 ED patients	AUC and sensitivity	Emergency departments	ML model outperformed clinical screening tools in an ED setting.
7	Zhao et al. (2022) [46]	Machine-learning prediction of MCI-to-AD conversion (2-year follow-up)	SVM and random forest	Neuropsychological and MRI data	A total of 150 MCI patients	Conversion prediction accuracy	Department of Neurology	ML predicted AD conversion with >80% accuracy over 2 years.
8	Schmitter-Edgecombe et al. (2022) [47]	Compensatory app and activity-aware prompting in amnestic MCI	Reinforcement Learning	Digital behavioral data	A total of 45 MCI participants	Usability and adherence	Community dwelling	AI-based prompts improved daily functioning and adherence.
9	Chang et al. (2021) [48]	Machine learning and novel biomarkers for the diagnosis of AD	Ensemble ML	Plasma biomarkers	Review	ROC-AUC	-	ML using biomarkers achieved 0.89 AUC for AD detection.
10	El-Sappaghet al. (2021) [49]	Multimodal explainable AI for AD detection and prediction	Explainable AI and multimodal deep learning	MRI + PET + Clinical data (* from ADNI dataset)	A total of 232 MCI participants	Accuracy and interpretability	ADNI *	XAI model achieved high accuracy and provided interpretable biomarkers.
11	Thabtah et al. (2020) [50]	Dementia medical screening using mobile applications: a systematic review	ML (various)	Mobile cognitive data	Literature synthesis	NA	-	Highlights potential of mobile-based AI screening for dementia.
12	Lin et al. (2020) [51]	Classification of neurodegenerative disorders using multiplex blood biomarkers	Random forest and SVM	Blood biomarkers	A total of 250 participants	Classification accuracy	Memory clinics	AI achieved >85% accuracy distinguishing AD from other disorders.
13	Lu et al. (2020) [52]	Network topology and ML analyses of white-matter changes in vascular cognitive impairment	Graph-based ML	MRI (DTI)	A total of 80 patients with vascular cognitive impairment	Connectivity patterns	Clinical trial	ML revealed microstructural network alterations linked to treatment response.
14	Graham et al. (2020) [53]	AI approaches to predicting and detecting cognitive decline in older adults: a conceptual review	Various ML and DL techniques	Multimodal data (conceptual)	Review	NA	-	Summarized conceptual frameworks for AI in cognitive decline.
15	Lussier et al. (2019) [54]	Early detection of MCI using in-home sensor technologies	ML (random forest and Decision Trees)	Smart-home behavioral data	Review	Sensitivity and specificity	-	AI-based monitoring detected MCI earlier than clinical evaluations.
16	Lee et al. (2018) [55]	Individual assessment of cortical atrophy using ML	SVM and CNN	Structural MRI	A total of 210 participants	Accuracy and feature importance	Memory disorder clinic	ML identified atrophy patterns predictive of AD conversion.
17	Collij et al. (2016) [56]	ML applied to arterial spin labeling MRI in MCI and AD	SVM	ASL MRI	A total of 60 AD/MCI patients and 30 controls	Classification accuracy	Memory clinic/Alzheimer centre	ML improved AD/MCI differentiation using ASL perfusion data.

Abbreviations: AI = artificial intelligence; ML = machine learning; DL = deep learning; CNN = Convolutional Neural Network; SVM = Support Vector Machine; LSTM = Long Short-Term Memory; fNIRS = Functional Near-Infrared Spectroscopy; MRI = Magnetic Resonance Imaging; PET = Positron Emission Tomography; MCI = mild cognitive impairment; AD = Alzheimer’s disease. ADNI = Alzheimer’s Disease Neuroimaging Initiative Standardized Dataset. The ADNI dataset provides real clinical and biological data (e.g., cognitive tests; demographic data; genetic data; neuroimaging, such as MRI and PET; and other clinical tests) on patients with different cognitive stages. The “*” refers to standardized dataset.

Table 4. Thematic categorization of included studies based on their primary research objectives within the neurocognitive disorder (NCD) continuum. Studies are grouped into five key domains: diagnostic classification, prognostic modeling, population-level screening, ecological monitoring through smart-home/wearable technologies, and comprehensive state-of-the-art reviews.

Category	Objective	Included Studies
1. Diagnosis/Classification	Distinguish between AD, MCI, healthy controls, and other neurocognitive conditions	2, 4, 5, 10, 12, 16, 17
2. Prognosis/Prediction	Predict progression, worsening, or neurocognitive patterns	1, 7, 13
3. Screening	Early identification of at-risk individuals in large populations	6,
4. Continuous Monitoring/Smart-Home Technologies	Monitor symptoms, ADLs, or behaviors in ecological/real-world settings	1, 8

Table 5. Main results of data modality comparison.

Category	Study	Data Modality	Accuracy/Performance	Key Points	Limits
Structural Comparison	(Lee et al., 2018) [55]	Cortical thickness (SBM)	Specificity, 93.03%; sensitivity, 87.01%	Identified medial and lateral temporal atrophy; solid structural marker	Limited sensitivity in very early stages (MCI)
	(Lu et al., 2020) [52]	White-matter connectivity	Accuracy, 68%	Microstructural analysis	Lower performance than morphometric MRI
	(Collij et al., 2016) [56]	Cerebral perfusion (pCASL)	AD vs. SCD, 89%; MCI, 57.5%	Confirmed dementia	Reduced effectiveness in MCI screening
	(Kim et al., 2023) [41]	Functional metabolic response	Performance comparable to PET	Low cost and non-invasive	Lower spatial resolution compared to MRI
Blood Biomarkers	(Lin et al., 2020) [51]	Biochemical indicators	AUC, 0.84	High accuracy; support for differential diagnosis	Clinical integration required
Wearable Devices	(Rykov et al., 2024) [30]	HR, skin temperature, and electrodermal activity	High predictive accuracy (neuropsychiatric symptoms in MCI)	Continuous ecological monitoring	Dependence on patient compliance
Multimodal Synthesis	(Zhao et al., 2022) [46]	MRI + PET + clinical data	Diagnosis, 93.95%; progression from MCI to AD; 87.08%	Significant increase in accuracy	Greater complexity and costs
	(El-Sappagh et al., 2021) [49]	Imaging + clinical data	Accuracy > 93%	Exceeded the limits of individual techniques	Structured data integration required

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Perna, P.; Claudi, A.; Stasolla, F.; Nappo, R. The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review. Technologies 2026, 14, 183. https://doi.org/10.3390/technologies14030183

AMA Style

Perna P, Claudi A, Stasolla F, Nappo R. The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review. Technologies. 2026; 14(3):183. https://doi.org/10.3390/technologies14030183

Chicago/Turabian Style

Perna, Pasqualina, Alessandra Claudi, Fabrizio Stasolla, and Raffaele Nappo. 2026. "The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review" Technologies 14, no. 3: 183. https://doi.org/10.3390/technologies14030183

APA Style

Perna, P., Claudi, A., Stasolla, F., & Nappo, R. (2026). The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review. Technologies, 14(3), 183. https://doi.org/10.3390/technologies14030183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Role of Artificial Intelligence in the Detection and Diagnosis of Neurocognitive Disorders: A Systematic Review

Abstract

1. Introduction

2. Materials and Methods

2.1. Search Strategy

2.2. Eligibility Criteria

2.3. Study Selection and Screening

2.4. Data Extraction

3. Results

3.1. Narrative Synthesis of Results

3.2. Early Diagnosis and Detection of Cognitive Impairment

3.3. Prognostic and Predictive Modeling

3.4. Screening and Classification in Clinical or Real-World Settings

3.5. Behavioral Monitoring and Digital Biomarkers

3.6. Data Modality Comparison of Results

4. Discussion

5. Conclusions

Supplementary Materials

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Secondary Studies

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI