Next Article in Journal
Computation Offloading in Space–Air–Ground Integrated Networks for Diverse Task Requirements with Integrated Reliability Mechanisms
Next Article in Special Issue
IoT Applications and Challenges in Global Healthcare Systems: A Comprehensive Review
Previous Article in Journal
The Challenge of Dynamic Environments in Regard to RSSI-Based Indoor Wi-Fi Positioning—A Systematic Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Artificial Intelligence-Enabled Facial Expression Analysis for Mental Health Assessment in Older Adults: A Systematic Review and Research Agenda

by
Fernando M. Runzer-Colmenares
1,2,*,
Nelson Luis Cahuapaza-Gutierrez
1,2,3,
Cielo Cinthya Calderon-Hernandez
1,2,3 and
Christian Loret de Mola
1,4
1
School of Medicine, Universidad Científica del Sur, Lima 15842, Peru
2
CHANGE Research Working Group, Universidad Científica del Sur, Lima 15842, Peru
3
Research Department, NyC—Center of Research and Medical Excellence (CRME), Lima 15067, Peru
4
Graduate Program in Public Health, Universidade Federal do Rio Grande, Rio Grande 96201-900, Brazil
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(12), 541; https://doi.org/10.3390/fi17120541
Submission received: 1 November 2025 / Revised: 21 November 2025 / Accepted: 24 November 2025 / Published: 26 November 2025

Abstract

Facial expression analysis using artificial intelligence (AI) represents an emerging approach for assessing mental health, particularly in neurocognitive disorders. This study encompassed observational investigations that assessed facial expressions in individuals aged 60 years and above. A comprehensive literature search was carried out across PubMed, Scopus, EMBASE, and Web of Science. Risk of bias and study quality were assessed using the QUADAS-2 and CLAIM tools. Descriptive analysis and meta-analysis of proportions were performed using STATA version 19. The pooled effect size (ES) was calculated using a random-effects model (DerSimonian–Laird method), and results were presented with corresponding 95% confidence intervals (CI). Six studies were analyzed, comprising a total of 433 participants aged over 60 years, representing diverse AI applications in the detection of neurocognitive disorders. The disorders evaluated included mild cognitive impairment (MCI) (37.4%), dementia (29.3%), and Alzheimer’s disease (AD) (33.3%). Most studies (83.3%) used video-based facial recordings analyzed through deep learning algorithms and emotion recognition models. The pooled meta-analysis demonstrated that AI-based facial recognition algorithms achieved a high overall detection accuracy in older adults (ES = 0.84; 95% CI: 0.77–0.91), with the best performance observed in Alzheimer’s disease (ES = 0.93; 95% CI: 0.89–0.97). AI-based facial analysis demonstrates high, robust, and non-invasive accuracy for the early and differential detection of neurocognitive disorders, including MCI, dementia-related conditions, and AD, in older adults.

1. Introduction

Artificial intelligence (AI) in healthcare refers to the ability of computational systems to analyze and learn from large and diverse multimodal datasets, with the aim of identifying patterns and dynamically adapting to achieve specific clinical objectives [1]. In this context, AI is not a single technology but rather a broad field encompassing multiple algorithmic approaches [2,3]. Among these, machine learning, based on statistical prediction models, and deep learning, which employs neural networks to analyze complex data such as images or video sequences, are particularly notable [2,3,4]. These tools have demonstrated great potential in automating diagnostic tasks, optimizing clinical decision-making, and improving the accuracy of medical interventions especially in domains that require the processing of visual data, such as facial expression analysis [5,6,7].
In recent years, the incorporation of AI into mental health research and practice has grown remarkably, with an increasing number of studies examining its usefulness for the early identification of psychiatric disorders, as well as for intervention planning, treatment optimization, and continuous patient monitoring [8]. This accelerated progress has been driven in part by the rising global prevalence of mental health conditions, a trend that was further amplified during the COVID-19 pandemic [9,10]. During this period, the demand for diagnosis and treatment of conditions such as depression, anxiety, and neurocognitive disorders particularly among older adults increased considerably, underscoring the pressing need for innovative and widely accessible strategies to confront emerging mental health demands [11,12]. In response, AI has emerged as a promising tool by enabling the automated analysis of behavioral, physiological, and expressive data [13].
Specifically, AI-assisted facial expression analysis for mental health assessment in older adults represents a significant innovation that integrates principles of computer vision with psychological foundations [14]. These systems rely on deep learning models specialized in dynamic visual data, such as convolutional neural networks (CNNs) for extracting spatial features and recurrent neural networks (RNNs) or long short-term memory (LSTM) architectures for temporal analysis of video sequences [15]. The methodological foundation of this approach is grounded in well-established psychological frameworks, such as Paul Ekman’s theory of basic emotions, which posits that certain facial expressions corresponding to primary emotional states (happiness, sadness, fear, anger, surprise, disgust, and contempt) are universally recognizable [16]. Additionally, this theory highlights the presence of brief, involuntary microexpressions that convey authentic affective states [17].
The convergence of artificial intelligence and Internet-based technologies has profoundly transformed healthcare delivery by enabling interoperable digital ecosystems that support remote monitoring, automated diagnosis, and personalized clinical interventions [5,6]. For instance, sensors integrated into devices connected to the Internet of Medical Things (IoMT) can continuously collect data on vital signs, mobility, or social interactions without requiring in-person visits to healthcare facilities. Within this framework, AI-enabled facial analysis represents a key component of digital health systems, as it enables the detection of cognitive events through accessible devices such as webcams, smartphones, or videoconferencing platforms [18]. This approach is particularly relevant for the older adult population, which faces major barriers to specialized care, including geographic dispersion, a shortage of geriatric specialists, and the stigma associated with mental disorders [19].
For this population, AI-based facial recognition tools can facilitate early screening and continuous monitoring without requiring patient travel, thereby reducing logistical burdens and improving accessibility. Moreover, the distributed architecture of many of these systems combining edge computing, cloud-based analytics, and deep learning models allows scalable implementation in telemedicine and home-monitoring environments [20]. Nevertheless, the current evidence regarding the accuracy, validity, and clinical applicability of these technologies remains heterogeneous, highlighting the need for systematic and critical appraisal [21].
Within this framework, this study sought to synthesize the existing evidence on AI-enabled facial expression analysis technologies to evaluate their diagnostic accuracy in detecting previously diagnosed neurocognitive disorders in older adults, particularly cognitive impairment and dementia, as well as to identify existing research gaps and propose a comprehensive research agenda.

2. Materials and Methods

2.1. Study Protocol

The protocol for this study was developed and reported in accordance with the methodological standards set forth by the statement “Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols” (PRISMA-P 2015) [22]. Additionally, the protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO) under registration code CRD420251169499. The execution and documentation of this systematic review were carried out full accordance with the methodological principles and reporting standards specified in the PRISMA 2020 guidelines [23].

2.2. Eligibility Criteria

The eligibility criteria were defined through the structured formulation of the research question, guided by the key elements of the PICO framework (P: population; I: intervention; C: comparator; O: outcome). Observational studies (cohort and cross-sectional) were eligible for inclusion provided the examined adults aged 60 years or older with a previously diagnosed neurocognitive disorder, such as cognitive impairment or dementia confirmed by clinical evaluation or standardized diagnostic criteria. The interventions assessed comprised all AI technologies, including algorithmic descriptions and models used to evaluate detection accuracy. The comparators were healthy control participants without cognitive impairment or dementia. The outcomes focused on assessing the diagnostic accuracy of AI systems in detecting neurocognitive disorders through facial expression analysis. Case reports, case series, letters to the editor, editorials, clinical images, letters, comments, notes, correspondences, short communications, brief reports, conference abstracts, narrative reviews, systematic reviews, books, book chapters, journalistic articles, and opinion pieces were excluded. Likewise, studies that evaluated different outcomes, populations, or interventions were omitted, particularly those involving individuals younger than 60 years or chronic neurodegenerative and/or neurological disorders, including multiple sclerosis, amyotrophic lateral sclerosis, hereditary and acquired ataxias, muscular dystrophies, and neurocutaneous syndromes.

2.3. Information Sources

The initial literature search was performed on 26 September 2025, followed by a final and comprehensive search on 25 October 2025. A targeted bibliographic search strategy was implemented across four major electronic databases: PubMed, Scopus, EMBASE, and Web of Science. To enhance the completeness of evidence retrieval process, the reference lists of all eligible articles were examined manually, and supplementary searches for gray literature were conducted through Google Scholar to identify any additional pertinent studies.

2.4. Search Strategy

The search strategy was designed using terminology obtained from the Medical Subject Headings (MeSH) of the National Library of Medicine (NLM): “Deep learning”, “Mental health”, and “Facial Recognition”, which were combined through the Boolean operators AND and OR. No restrictions were applied regarding publication date. The search was confined to studies published in English. The complete search strategy for each database is detailed in Supplementary Material S1.

2.5. Study Selection Process

All references were exported into Rayyan QCRI for duplicate removal. Two reviewers (NLCG and CCCH) independently evaluated the titles and abstracts to identify studies meeting the inclusion criteria, after which full-text manuscripts were examined to confirm eligibility. Any discrepances were resolved through discussion with a third reviewer (NLCG).

2.6. Data Extraction Process

Data extraction was performed using a pre-designed Microsoft Excel sheet. The most relevant study characteristics were extracted, including first author and publication year, country of origin, study design, total sample size, number of participants with neurocognitive disorders (n), age (mean/median), control group, inclusion criteria, tool used, facial component analyzed, neurological disorder, algorithm description, models used, best-performing model, detection accuracy (%), main findings, and conclusions. Any disagreements detected during the extraction process were addressed and resolved by consensus among all authors.

2.7. Risk of Bias and Quality Assessment

The methodological quality and risk of bias of the included studies were examined using two complementary instruments: the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) tool [24] and the CLAIM (Checklist for Artificial Intelligence in Medical Imaging) framework [25]. QUADAS-2 provides a structured appraisal of the rigor and reliability of diagnostic accuracy research, whereas CLAIM assesses transparency and methodological rigor in AI-based medical imaging research. The combined use of these tools ensured a comprehensive and standardized evaluation, covering both the validity of clinical outcomes and the methodological robustness of AI approaches.
The QUADAS-2 instrument assesses the risk of bias across four principal domains: patient selection, index test, reference standard, and flow and timing, as well as applicability concerns in three domains (patient selection, index test, and reference standard). For each domain, responses to the signaling questions determine whether the assessment is classified as “yes”, “no”, “low risk”, “high risk”, or “unclear risk”.
In contrast, the CLAIM checklist provides 44 items that address essential components of AI-related research, including study design, dataset characteristics, reference standard, methodological details of the AI system, performance evaluation, and results presentation. Each item is rated as “yes”, “no”, or “not applicable”. “Yes” is assigned when most or all criteria are met with minor acceptable omissions; “no” is assigned when key requirements are absent, thereby affecting the study’s quality or reproducibility; and “not applicable” when the item does not pertain to the specific study design.

2.8. Statistical Analysis

A meta-analysis of proportions was conducted to derive the aggregated accuracy of AI models utilizing facial analysis for the identification of neurocognitive disorders in older adults, including MCI, dementia, and AD.
For each study, the following variables were extracted: author (study), diagnostic category (diagnosis), sample size (n), and accuracy (accuracy). The number of events (true positives) was calculated using the formula:
events = round ( accuracy × n )
The pooled effect size (ES) was estimated using a DerSimonian–Laird random-effects model, considering the expected heterogeneity among studies. Results were expressed with their 95% CI.
Heterogeneity was evaluated using the I2 statistic, which measures the proportion of total variation due to between-study heterogeneity (I2 < 50% = low heterogeneity; I2 > 50% = high heterogeneity), as well as the Cochran Q test, where a p-value < 0.10 was considered evidence of significant heterogeneity. A forest plot was generated to visualize individual and pooled effect sizes, including CIs and study weights.

3. Results

3.1. Study Selection

Selective literature searches were conducted across four databases, retrieving a total of 879 records. After removing duplicates, 659 studies remained. Title and abstract screening identified 11 potentially eligible manuscripts after excluding 648 records. Full-text review of these 11 manuscripts led to the exclusion of 6 studies due to being of a different design or providing insufficient data (Supplementary Material S2). The reference lists of the selected studies, along with other supplementary sources, were examined, resulting in the identification of one additional study that met the eligibility criteria. In total, six studies were ultimately included in this review. A detailed depiction of the study selection flow diagram is presented in Figure 1.

3.2. Characteristics of the Included Studies

A total of six studies were included, comprising two cohort studies and four cross-sectional designs. The publication period ranged from 2021 to 2025. The studies were conducted in Japan, Taiwan, China, Italy, and the United States. In total, 433 participants with a neurocognitive disorder were reported, including mild cognitive impairment (n = 162; 37.4%), dementia (n = 127; 29.3%), and Alzheimer’s disease (n = 144; 33.3%). The age of participants varied across studies, though all were older than 60 years. No sex stratification was performed, as most studies focused on facial expression analysis. Some included healthy control groups to improve the accuracy of their AI-based models. The primary evaluation modality involved recorded facial videos (83.3%), while only one study used static photographs. A comprehensive summary of the characteristics of all included studies is provided in Table 1.

3.3. Characteristics of Artificial Intelligence and Facial Recognition

The studies analyzed the entire face, considering various aspects such as emotional expressions and visual regions. Collectively, they demonstrated consistently high performance, with detection accuracy ranging from 60.9% to 92.5%, depending on the algorithm and dataset used. The highest performance was reported by Umeda-Kameyama et al. [26], who employed the deep learning network Xception combined with the Adam optimizer (accuracy: 92.5%), demonstrating the model’s ability to discriminate between facial images of individuals with cognitive impairment and healthy controls. Similarly, Sun et al. [30] reported an accuracy of 90.63% using the MC-ViViT architecture to identify cognitive impairment from facial videos. Likewise, Chen et al. [27] obtained an accuracy of 86.0% using a Facial Expression Recognition System (FERS) to predict behavioral and psychological symptoms associated with dementia.
Machine learning approaches, such as MobileNet + SVM (Fei et al. [28]) and HOG-based models (Zheng et al. [29]), achieved accuracies of 73.3% and 79.0%, respectively, supporting the feasibility of combining traditional classifiers with deep feature extraction techniques. Meanwhile, the recent study by Bergamasco et al. [31] employed multiple machine learning models (KNN, LR, SVM) to distinguish between mild cognitive impairment, dementia, and healthy controls, reaching 76.0% accuracy for mild cognitive impairment vs. controls and 73.6% for dementia vs. controls.
The algorithms used across the included studies encompassed both traditional machine learning and deep learning approaches. Among the former, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Logistic Regression (LR) were the most common, typically applied to facial features extracted using methods such as Histogram of Oriented Gradients (HOG) or Action Units. In contrast, deep learning approaches utilized advanced convolutional and transformer-based architectures, including Xception, ResNet50, VGG16, SENet50, MobileNet, and MC-ViViT. The detailed characteristics of the included studies are presented in Table 2. Additionally, a summary table (Table 3) was created detailing the model architectures used and the datasets employed in the included studies.

3.4. AI Accuracy for Facial Expressions in Adults

The meta-analysis pooled studies that evaluated the detection accuracy of AI based on facial expression analysis for identifying neurocognitive disorders in older adults, stratified by specific diagnosis: MCI, dementia, and AD.
In the MCI group, the pooled estimate showed a combined accuracy (ES) of 0.82 (95% CI: 0.69–0.94), indicating relatively high precision. In the dementia group, the combined accuracy was 0.78 (95% CI: 0.71–0.85), also significant but comparatively lower. In contrast, the AD group reached a pooled estimate of 0.93 (95% CI: 0.89–0.97), suggesting substantially higher and more consistent AI model performance in this population.
The heterogeneity analysis among groups revealed a statistically significant difference (p = 0.001), indicating that the discriminative ability of the algorithms varies according to the type of neurocognitive disorder analyzed. Overall heterogeneity was moderate to high (I2 = 73.5%, p < 0.001), suggesting variability among the included studies, likely attributable to differences in the algorithms and computational models used to determine accuracy in each study.
Taken together, the results support that AI algorithms based on facial recognition demonstrate high overall detection accuracy in older adults (ES = 0.84, 95% CI: 0.77–0.91), with the best performance observed in the AD context. Figure 2 presents the forest plot of the analyzed studies.

3.5. Subgroup and Sensitivity Analyses

Several subgroup analyses were conducted to explore variability across studies. First, according to the type of neurocognitive disorder, the estimated proportion for MCI was 0.82 (95% CI: 0.69–0.94); for dementia, 0.78 (95% CI: 0.71–0.85); and for AD, 0.93 (95% CI: 0.89–0.97), all of which were consistent with the estimates obtained in the overall analysis. Second, an analysis was performed including only studies that used recorded facial videos, excluding the Umeda-Kameyama study, which used static images. This analysis yielded an estimated proportion of 0.82 (95% CI: 0.74–0.89), with slightly lower heterogeneity (I2 = 61.9%; p = 0.02). Third, an analysis was conducted based on model architecture (Convolutional Neural Network (CNN) vs. machine learning (ML)), showing a combined estimated proportion of 0.85 (95% CI: 0.7–0.95). Models based on CNN architectures demonstrated higher accuracy, reaching a proportion of 0.93 (95% CI: 0.89–0.97) (Figure 3).
Regarding the sensitivity analysis, after excluding studies with small sample sizes, the meta-analysis of proportions again showed a high overall performance, with a pooled accuracy of 0.88 (95% CI: 0.80–0.96). Accuracy was higher for the identification of MCI and AD (0.91 and 0.93, respectively), while performance was lower for dementia (0.79). However, when attempting to restrict the analysis to studies with n > 100 participants, it was not possible to estimate heterogeneity within each subgroup because only one study was available per diagnostic category (df = 0). Nevertheless, the test for heterogeneity between subgroups was significant (Q = 11.37; p < 0.001), indicating important differences in model performance according to the type of neurocognitive disorder evaluated. All additional analyses are available in Supplementary Material S3.

3.6. Risk of Bias and Quality

All six studies included in the review were appraised using the QUADAS-2 and CLAIM tools. According to the QUADAS-2 assessment, a considerable risk of bias was identified, primarily within the reference standard domain. This finding is explained by the fact that the studies employed AI models focused on detection, which were not validated as true diagnostic tools and therefore lacked a standard comparator. Moreover, as AI-based investigations, the reference standards used did not align with traditional diagnostic scales (Figure 4).
In contrast, when applying the CLAIM tool, a higher level of compliance with reporting guidelines was observed, with an average above 60% (range: 59.0–79.5%). This suggests better methodological rigor and transparency in study documentation. However, the main limitations remained within the reference standard domain. The detailed assessment according to the CLAIM tool is presented in Supplementary Material S4.

4. Discussion

A considerable proportion of older adults worldwide present some degree of cognitive impairment or dementia, conditions characterized by progressive alterations in memory, language, and daily functioning. The management of these diseases entails a substantial burden of human and economic resources, both for healthcare systems and for families. Since MCI represents an intermediate and potentially reversible stage before dementia, its early detection is essential to optimize clinical management and plan preventive interventions. In this context, the present study synthesized the available evidence on the diagnostic accuracy of AI models based on facial expression analysis for detecting neurocognitive disorders in older adults. The results showed an overall high performance, with a pooled accuracy of 84%. However, significant differences were observed when stratified by specific diagnosis. In the MCI group, the pooled accuracy was 82%, demonstrating solid discriminative ability, although heterogeneous among studies. In the dementia group, accuracy was slightly lower (78%), suggesting less robustness of the algorithms in more advanced stages of cognitive decline. In contrast, models applied to patients with AD showed the greatest consistency and accuracy, with a pooled accuracy of 93%. These findings support the potential use of AI-based facial recognition as an innovative and promising approach for detecting cognitive impairment in the geriatric population, contributing to more accessible, objective, and efficient assessment in both clinical and community settings.
AI encompasses the design of computational systems capable of mimicking human cognitive abilities, including learning, reasoning, decision-making, and problem-solving [32]. In the field of mental health, AI has gained significant importance due to its ability to process large and complex datasets and identify patterns that may indicate psychiatric disorders such as depression, anxiety, or schizophrenia, among others [33]. Within this field, machine learning (ML) is an essential tool that enables the processing of large volumes of data and the discovery of hidden patterns useful for comprehensive diagnosis and personalized interventions in mental health [34]. Likewise, deep learning (DL)—a more advanced branch of ML—has demonstrated remarkable efficacy in recognizing affective states and mood disorders [35]. By neural networks, DL allows systems to learn and solve complex problems, such as image recognition, video analysis, and natural language processing—areas of growing importance in mental health evaluation [35].
In this context, AI-based facial expression analysis has gained prominence as a valuable approach for evaluating the affective states of individuals with mental disorders [36]. Various programs and software are employed for this purpose, among the most widely used are OpenFace 2.0., an open-source tool that detects facial landmarks and identifying action units based on the Facial Action Coding System (FACS), and FaceReader 9.0., a widely validated commercial software that identifies and quantifies the intensity of basic emotions from videos or images [37,38]. These platforms process images or videos through stages of detection, facial alignment, and feature extraction, generating a quantitative map of expressive behavior that can be correlated with depressive or anxious symptoms, or with signs of cognitive impairment in older adults [39]. Subsequently, AI applies convolutional neural network models, such as AlexNet, to extract deep features from the images [40]. These features contain relevant information about facial expressions and underlying emotions, allowing the detection of subtle patterns associated with symptoms of mental disorders [39,40]. In this way, AI can classify and suggest the presence of such disorders objectively and non-invasively [34].
The findings of this systematic review have substantial implications for both public health and clinical practice in the context of global population aging. From a public health perspective, implementing AI-enabled facial analysis systems could significantly contribute to early detection and population-based screening for cognitive impairment and dementia conditions that represent a growing burden for healthcare systems worldwide and remain underdiagnosed in early stages, particularly in low- and middle-income countries with limited access to geriatric specialists [41]. The fact that these technologies are non-invasive, low-cost, and easily scalable makes them strong candidates for use in epidemiological monitoring and community-level screening initiatives [42]. Their implementation facilitates the early detection of individuals at elevated risk, allowing timely before irreversible functional deterioration develops [42]. Clinically, integrating these systems into telemedicine workflows and remote geriatric consultations can enhance diagnostic efficiency, reduce waiting lists for specialized assessments, and facilitate longitudinal monitoring of patients with neurocognitive disorders through objective tracking of facial expressivity changes over time [41].
However, effective adoption requires addressing critical challenges related to external validation in culturally diverse populations, mitigation of algorithmic biases that may perpetuate health inequities (particularly “digital ageism”), ensuring privacy and protection of sensitive data, and developing regulatory frameworks that guarantee quality and safety standards in clinical implementation [42,43]. Furthermore, collaborative care models in which AI complements rather than replaces clinical judgment must be promoted, respecting the autonomy and dignity of older adults throughout diagnostic and therapeutic processes.
Recent advances in facial expression analysis reveal a clear shift toward transformer-based architectures, developed to address the complexity of affective and cognitive phenomena in clinical contexts [44]. Pure video transformers, such as ViViT, have demonstrated robust spatiotemporal modeling for video classification tasks, while extensions like MC-ViViT incorporating a multi-branch classification module have shown promising results in detecting mild cognitive impairment from facial data [30]. Complementarily, models such as CmdVIT, which incorporate explicit positional encoding and sparse attention mechanisms, improve the representation of subtle facial features and reduce inter- and intra-class similarities [36]. Hybrid approaches also stand out, such as the Attention-Enhanced Multi-Layer Transformer (AEMT), which combines a dual-branch CNN with attention modules and a multilayer transformer encoder with transfer learning, achieving greater robustness in naturalistic settings [45]. These developments reflect the evolution of the field and underscore the need to evaluate them in older adults with cognitive impairment to enhance the detection of microexpressions and relevant spatiotemporal patterns in affective computing [45].
The risk of bias and the quality of the included studies were assessed using the QUADAS-2 and CLAIM tools, selected for their ability to address both diagnostic aspects and the methodological particularities of AI-based research. The application of QUADAS-2 revealed a considerable risk of bias, particularly within the reference standard domain. This finding was expected, given that the AI models analyzed were designed for detection accuracy rather than as validated diagnostic tools and therefore lacked a standardized clinical comparator. Consequently, several studies employed different AI technologies to detect previously diagnosed neurocognitive disorders, which affect internal validity. This represents an inherent limitation of AI studies in early stages of development.
In contrast, the CLAIM assessment demonstrated an overall adequate level of compliance, with values exceeding 60%, reflecting acceptable rigor and sufficient transparency regarding the reproducibility of reporting guidelines. However, domains related to the documentation of the reference standard also showed important gaps, consistent with the limitations identified through QUADAS-2.
The combined assessment using both tools provided a comprehensive perspective: while QUADAS-2 highlighted weaknesses in diagnostic validity, CLAIM underscored strengths in transparency and reproducibility critical aspects in the field of AI applied to health. A combined weighting of results was not performed due to substantial differences between the two tools in terms of structure, purpose, and evaluation criteria.

4.1. Ethical and Practical Considerations

AI applied to facial expression analysis represents an emerging and promising tool for assessing mental health in older adults. By enabling the early detection of emotional and cognitive alterations through the automated recognition of facial patterns, this technology could transform the diagnosis and monitoring of disorders such as depression, mild cognitive impairment, and dementia. However, its implementation in clinical practice raises significant ethical and methodological challenges. Issues such as data privacy, algorithmic bias, and model interpretability must be rigorously addressed to ensure equitable, transparent, and ethical use. Therefore, it is essential to promote collaboration among researchers, healthcare professionals, and policymakers to develop regulatory and methodological frameworks that guarantee the accessibility, validity, and benefit of these tools particularly for the older adult population, which requires safe technological solutions tailored to its specific needs.

4.2. Future Directions

Future investigations should aim to address these existing limitations while simultaneously broadening the range of AI-driven facial recognition applications directed toward the early, preventive detection of dementia. Fields such as neurology, psychiatry, and geriatrics could benefit substantially from these advancements.
Additionally, longitudinal studies are needed to determine the long-term implications of AI-based interventions on mental health outcomes. Although this synthesis provides evidence of AI effectiveness, showing optimal detection rates for different degrees or types of dementia in controlled environments, clinical trials are indispensable to optimize its application and facilitate its integration with new scientific developments.

4.3. Strengths

This review presents several notable strengths, including its board and integrative scope, its strict compliance with PRISMA standards, and its emphasis on the diagnostic and therapeutic implications of AI within geriatric care. It provides a systematic search and a critical analysis of various models and applications, offering a structured overview of the current evidence.

4.4. Limitations

Several limitations were identified in this study. First, due to the current focus of AI models primarily on detection rather than clinical diagnosis, it was not possible to determine their true diagnostic capacity. Although these models show promising potential for identifying patients with neurocognitive disorders, the included studies did not evaluate their performance against healthy subjects; therefore, their utility as comprehensive diagnostic tools remains undetermined. Second, despite an exhaustive search across multiple databases, some relevant studies published in bioengineering or computational engineering journals may not have been identified due to access restrictions or indexing differences. Third, the high heterogeneity observed among studies may be attributed to the diversity of algorithms and analytical approaches employed, which limits the feasibility of conducting a homogeneous comparison or a robust quantitative synthesis of a single model. Fourth, the number of included studies was relatively small (n = 6), which may reduce the precision of the estimates and limit the generalizability of the findings regarding detection accuracy. This limited number of studies may also increase the risk of publication bias, as research with inconclusive results may not have been published or may not be available in the sources consulted. Additionally, the moderate to high heterogeneity observed among the studies makes direct comparison difficult and may affect the consistency of the pooled results. Finally, the studies did not stratify older adults by age groups (e.g., 60–70, 70–80, and >80 years), instead applying a uniform model across the entire population. This limitation is noteworthy since aging is associated with progressive changes in facial expressions that could significantly influence the detection accuracy of AI models.

5. Conclusions

AI models based on facial expression analysis demonstrate high accuracy, robustness, and non-invasiveness for the detection of neurocognitive disorders in older adults, including mild cognitive impairment, dementia, and Alzheimer’s disease. Overall, the results of this study support the potential of AI as a complementary clinical assessment tool, capable of contributing to the early identification of cognitive alterations and strengthening digital screening strategies in geriatric practice.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fi17120541/s1.

Author Contributions

Conceptualization, F.M.R.-C., N.L.C.-G., C.C.C.-H. and C.L.d.M.; methodology, F.M.R.-C., N.L.C.-G. and C.C.C.-H.; validation, N.L.C.-G. and C.L.d.M.; formal analysis, N.L.C.-G.; investigation, F.M.R.-C., N.L.C.-G., C.C.C.-H. and C.L.d.M.; data curation, F.M.R.-C. and N.L.C.-G.; writing—original draft preparation, N.L.C.-G. and C.C.C.-H.; writing—review and editing, F.M.R.-C., C.C.C.-H. and C.L.d.M.; visualization, N.L.C.-G.; supervision, F.M.R.-C., C.C.C.-H. and C.L.d.M.; project administration, F.M.R.-C.; funding acquisition, F.M.R.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was conducted without the support of any external funding sources. Access to databases such as Embase and Scopus, as well as the use of software tools like Rayyan, was provided by Universidad Científica del Sur. This article was self-funded using research group resources.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest related to this study. All interpretations, evaluations, and conclusions presented in this paper are the sole responsibility of the authors and were not influenced by any external affiliations, financial interests, or personal relationships that could have appeared to affect the work reported in the work.

References

  1. Renn, B.N.; Schurr, M.; Zaslavsky, O.; Pratap, A. Artificial Intelligence: An Interprofessional Perspective on Implications for Geriatric Mental Health Research and Care. Front. Psychiatry 2021, 12, 734909. [Google Scholar] [CrossRef] [PubMed]
  2. Kumar, R.D.; Prudhviraj, G.; Vijay, K.; Kumar, P.S.; Plugmann, P. Exploring COVID-19 Through Intensive Investigation with Supervised Machine Learning Algorithm. In Handbook of Artificial Intelligence and Wearables; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
  3. Hu, H.; Zhou, Z. Evaluation and Comparison of Ten Machine Learning Classification Models Based on the Mobile Users Experience. In Proceedings of the 2023 3rd International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 22–24 September 2023; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2023; pp. 767–771. [Google Scholar]
  4. Weitz, K.; Hassan, T.; Schmid, U.; Garbas, J.U. Deep-learned faces of pain and emotions: Elucidating the differences of facial expressions with the help of explainable AI methods. Tech. Mess. 2019, 86, 404–412. [Google Scholar] [CrossRef]
  5. Graham, S.A.; Lee, E.E.; Jeste, D.V.; Van Patten, R.; Twamley, E.W.; Nebeker, C.; Yamada, Y.; Kim, H.-C.; Depp, C.A. Artificial Intelligence Approaches to Predicting and Detecting Cognitive Decline in Older Adults: A Conceptual Review. Psychiatry Res. 2020, 284, 112732. [Google Scholar] [CrossRef]
  6. Lu, S.C.; Xu, C.; Nguyen, C.H.; Geng, Y.; Pfob, A.; Sidey-Gibbons, C. Machine Learning-Based Short-Term Mortality Prediction Models for Patients with Cancer Using Electronic Health Record Data: Systematic Review and Critical Appraisal. JMIR Med. Inform. 2022, 10, e33182. [Google Scholar] [CrossRef]
  7. Stroud, A.M.; Curtis, S.H.; Weir, I.B.; Stout, J.J.; Barry, B.A.; Bobo, W.V.; Athreya, A.P.; Sharp, R.R. Physician Perspectives on the Potential Benefits and Risks of Applying Artificial Intelligence in Psychiatric Medicine: Qualitative Study. JMIR Ment. Health 2025, 12, e64414. [Google Scholar] [CrossRef]
  8. Díaz-Guerra, D.D.; Hernández-Lugo, M.d.l.C.; Broche-Pérez, Y.; Ramos-Galarza, C.; Iglesias-Serrano, E.; Fernández-Fleites, Z. AI-assisted neurocognitive assessment protocol for older adults with psychiatric disorders. Front. Psychiatry 2025, 15, 1516065. [Google Scholar] [CrossRef]
  9. Betancourt-Ocampo, D.; Toledo-Fernández, A.; González-González, A. Mental Health Changes in Older Adults in Response to the COVID-19 Pandemic: A Longitudinal Study in Mexico. Front. Public Health 2022, 10, 848635. [Google Scholar] [CrossRef]
  10. Manca, R.; De Marco, M.; Venneri, A. The Impact of COVID-19 Infection and Enforced Prolonged Social Isolation on Neuropsychiatric Symptoms in Older Adults with and Without Dementia: A Review. Front. Psychiatry 2020, 11, 585540. [Google Scholar] [CrossRef]
  11. Bailey, L.; Ward, M.; DiCosimo, A.; Baunta, S.; Cunningham, C.; Romero-Ortuno, R.; Kenny, R.A.; Purcell, R.; Lannon, R.; McCarroll, K.; et al. Physical and mental health of older people while cocooning during the COVID-19 pandemic. QJM 2021, 114, 648–653. [Google Scholar] [CrossRef]
  12. Latoo, J.; Haddad, P.M.; Mistry, M.; Wadoo, O.; Islam, S.M.S.; Jan, F.; Iqbal, Y.; Howseman, T.; Riley, D.; Alabdulla, M. The COVID-19 pandemic: An opportunity to make mental health a higher public health priority. BJPsych Open 2021, 7, e172. [Google Scholar] [CrossRef]
  13. Cummins, N.; Matcham, F.; Klapper, J.; Schuller, B. Artificial intelligence to aid the detection of mood disorders. In Artificial Intelligence in Precision Health; Barh, D., Ed.; Academic Press: Cambridge, MA, USA, 2020; pp. 231–255. [Google Scholar]
  14. Goudarzi, N.; Taheri, Z.; Nezhad Salari, A.M.; Kazemzadeh, K.; Tafakhori, A. Recognition and classification of facial expression using artificial intelligence as a key of early detection in neurological disorders. Rev. Neurosci. 2025, 36, 479–495. [Google Scholar] [CrossRef]
  15. Ko, B.C. A brief review of facial emotion recognition based on visual information. Sensors 2018, 18, 401. [Google Scholar] [CrossRef] [PubMed]
  16. Ekman, P.; Cordaro, D. What is meant by calling emotions basic. Emot. Rev. 2011, 3, 364–370. [Google Scholar] [CrossRef]
  17. Palermo, R.; O’Connor, K.B.; Davis, J.M.; Irons, J.; McKone, E. New tests to measure individual differences in matching and labelling facial expressions of emotion, and their association with ability to recognise vocal emotions and facial identity. PLoS ONE 2013, 8, e68126. [Google Scholar] [CrossRef] [PubMed]
  18. Lifelo, Z.; Ding, J.; Ning, H.; Qurat-Ul-Ain; Dhelim, S. Artificial intelligence-enabled metaverse for sustainable smart cities: Technologies, applications, challenges, and future directions. Electronics 2024, 13, 4874. [Google Scholar] [CrossRef]
  19. Srinivasan, S.; Jones, A.B.; Hilty, D. Geriatric Telepsychiatry in Academic Settings. In Geriatric Telepsychiatry: A Clinician’s Guide; Springer International Publishing: Cham, Switzerland, 2017; pp. 55–98. [Google Scholar]
  20. Mohammed, S.A.; Ralescu, A.L. Future Internet architectures on an emerging scale—A systematic review. Future Internet 2023, 15, 166. [Google Scholar] [CrossRef]
  21. Taati, B.; Zhao, S.; Ashraf, A.B.; Asgarian, A.; Browne, M.E.; Prkachin, K.M.; Mihailidis, A.; Hadjistavropoulos, T. Algorithmic bias in clinical populations—Evaluating and improving facial analysis technology in older adults with dementia. IEEE Access 2019, 7, 25527–25534. [Google Scholar] [CrossRef]
  22. Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A.; Prisma-P Group. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst. Rev. 2015, 4, 1. [Google Scholar] [CrossRef]
  23. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  24. Whiting, P.F.; Rutjes, A.W.S.; Westwood, M.E.; Mallett, S.; Deeks, J.J.; Reitsma, J.B.; Leeflang, M.M.; Sterne, J.A.; Bossuyt, P.M.; QUADAS-2 Group. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern. Med. 2011, 155, 529–536. [Google Scholar] [CrossRef]
  25. Tejani, A.S.; Klontzas, M.E.; Gatti, A.A.; Mongan, J.T.; Moy, L.; Park, S.H.; Kahn, C.E., Jr.; CLAIM 2024 Update Panel. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 update. Radiol. Artif. Intell. 2024, 6, e240300. [Google Scholar] [CrossRef]
  26. Umeda-Kameyama, Y.; Kameyama, M.; Tanaka, T.; Son, B.K.; Kojima, T.; Fukasawa, M.; Iizuka, T.; Ogawa, S.; Iijima, K.; Akishita, M. Screening of Alzheimer’s disease by facial complexion using artificial intelligence. Aging 2021, 13, 1765–1772. [Google Scholar] [CrossRef]
  27. Chen, L.Y.; Tsai, T.H.; Ho, A.; Li, C.H.; Ke, L.J.; Peng, L.N.; Lin, M.-H.; Hsiao, F.-Y.; Chen, L.-K. Predicting neuropsychiatric symptoms of persons with dementia in a day care center using a facial expression recognition system. Aging 2022, 14, 1280–1291. [Google Scholar] [CrossRef]
  28. Fei, Z.; Yang, E.; Yu, L.; Li, X.; Zhou, H.; Zhou, W. A novel deep neural network-based emotion analysis system for automatic detection of mild cognitive impairment in the elderly. Neurocomputing 2022, 468, 306–316. [Google Scholar] [CrossRef]
  29. Zheng, C.; Bouazizi, M.; Ohtsuki, T.; Kitazawa, M.; Horigome, T.; Kishimoto, T. Detecting dementia from face-related features with automated computational methods. Bioengineering 2023, 10, 862. [Google Scholar] [CrossRef] [PubMed]
  30. Sun, J.; Dodge, H.H.; Mahoor, M.H. MC-ViViT: Multi-branch Classifier-ViViT to detect mild cognitive impairment in older adults using facial videos. Expert Syst. Appl. 2024, 238, 121929. [Google Scholar] [CrossRef] [PubMed]
  31. Bergamasco, L.; Coletta, A.; Olmo, G.; Cermelli, A.; Rubino, E.; Rainero, I. AI-Based Facial Emotion Analysis for Early and Differential Diagnosis of Dementia. Bioengineering 2025, 12, 1082. [Google Scholar] [CrossRef]
  32. Cappello, G.; Defeudis, A.; Giannini, V.; Mazzetti, S.; Regge, D. Artificial Intelligence in Oncologic Imaging. In Multimodality Imaging and Intervention in Oncology; Neri, E., Erba, P.A., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 585–597. [Google Scholar]
  33. Turcian, D.; Stoicu-Tivadar, V. Real-Time Detection of Emotions Based on Facial Expression for Mental Health. Stud. Health Technol. Inform. 2023, 309, 272–276. [Google Scholar]
  34. Iyortsuun, N.K.; Kim, S.H.; Jhon, M.; Yang, H.J.; Pant, S. A Review of Machine Learning and Deep Learning Approaches on Mental Health Diagnosis. Healthcare 2023, 11, 285. [Google Scholar] [CrossRef]
  35. Arji, G.; Erfannia, L.; Alirezaei, S.; Hemmat, M. A systematic literature review and analysis of deep learning algorithms in mental disorders. Informatics Med. Unlocked 2023, 40, 101284. [Google Scholar] [CrossRef]
  36. Ye, J.; Yu, Y.; Wang, Q.; Liu, G.; Li, W.; Zeng, A.; Zhang, Y.; Liu, Y.; Zheng, Y. CmdVIT: A Voluntary Facial Expression Recognition Model for Complex Mental Disorders. IEEE Trans. Image Process. 2025, 34, 3013–3024. [Google Scholar] [CrossRef] [PubMed]
  37. Ambrosen, K.S.; Lemvigh, C.K.; Nielsen, M.Ø.; Glenthøj, B.Y.; Syeda, W.T.; Ebdrup, B.H. Using computer vision of facial expressions to assess symptom domains and treatment response in antipsychotic-naïve patients with first-episode psychosis. Acta Psychiatr. Scand. 2025, 151, 270–279. [Google Scholar] [CrossRef] [PubMed]
  38. Nomiya, H.; Shimokawa, K.; Namba, S.; Osumi, M.; Sato, W. An Artificial Intelligence Model for Sensing Affective Valence and Arousal from Facial Images. Sensors 2025, 25, 1188. [Google Scholar] [CrossRef] [PubMed]
  39. Zhou, Y.; Han, W.; Yao, X.; Xue, J.; Li, Z.; Li, Y. Developing a machine learning model for detecting depression, anxiety, and apathy in older adults with mild cognitive impairment using speech and facial expressions: A cross-sectional observational study. Int. J. Nurs. Stud. 2023, 146, 104562. [Google Scholar] [CrossRef]
  40. Fei, Z.; Yang, E.; Li, D.D.U.; Butler, S.; Ijomah, W.; Li, X.; Zhou, H. Deep convolution network based emotion analysis towards mental health care. Neurocomputing 2020, 388, 212–227. [Google Scholar] [CrossRef]
  41. Shiwani, T.; Relton, S.; Evans, R.; Kale, A.; Heaven, A.; Clegg, A.; Todd, O. New Horizons in artificial intelligence in the healthcare of older people. Age Ageing 2023, 52, afad219. [Google Scholar] [CrossRef]
  42. Loveys, K.; Prina, M.; Axford, C.; Domènec, Ò.R.; Weng, W.; Broadbent, E.; Pujari, S.; Jang, H.; Han, Z.A.; Thiyagarajan, J.A. Artificial intelligence for older people receiving long-term care: A systematic review of acceptability and effectiveness studies. Lancet Healthy Longev. 2022, 3, e286–e297. [Google Scholar] [CrossRef]
  43. van Kolfschooten, H. The AI cycle of health inequity and digital ageism: Mitigating biases through the EU regulatory framework on medical devices. J. Law Biosci. 2023, 10, lsad031. [Google Scholar] [CrossRef]
  44. Min, S.; Yang, J.; Lim, S. Emotion Recognition Using Transformers with Random Masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4860–4865. [Google Scholar]
  45. Wang, J. Evaluation and analysis of visual perception using attention-enhanced computation in multimedia affective computing. Front Neurosci. 2024, 18, 1449527. [Google Scholar] [CrossRef]
Figure 1. Selection of studies on facial recognition using AI in older adults.
Figure 1. Selection of studies on facial recognition using AI in older adults.
Futureinternet 17 00541 g001
Figure 2. Meta-analysis of pooled proportions. A forest plot is shown displaying the point estimates (gray squares) and 95% confidence intervals (horizontal lines) for each included study. The gray vertical lines represent reference markers on the effect axis. The red dashed vertical line indicates the null value for the overall estimate. The blue diamond’s correspond to the pooled estimates for each subgroup and for the overall analysis; the center of the diamond represents the combined point estimate, and its edges show the 95% confidence interval [26,27,28,29,30,31].
Figure 2. Meta-analysis of pooled proportions. A forest plot is shown displaying the point estimates (gray squares) and 95% confidence intervals (horizontal lines) for each included study. The gray vertical lines represent reference markers on the effect axis. The red dashed vertical line indicates the null value for the overall estimate. The blue diamond’s correspond to the pooled estimates for each subgroup and for the overall analysis; the center of the diamond represents the combined point estimate, and its edges show the 95% confidence interval [26,27,28,29,30,31].
Futureinternet 17 00541 g002
Figure 3. Subgroup analysis for model architectures. A forest plot is shown displaying the point estimates (gray squares) and 95% confidence intervals (horizontal lines) for each included study. The gray vertical lines represent reference markers on the effect axis. The red dashed vertical line indicates the null value for the overall estimate. The blue diamond’s correspond to the pooled estimates for each subgroup and for the overall analysis; the center of the diamond represents the combined point estimate, and its edges show the 95% confidence interval [26,27,29,31].
Figure 3. Subgroup analysis for model architectures. A forest plot is shown displaying the point estimates (gray squares) and 95% confidence intervals (horizontal lines) for each included study. The gray vertical lines represent reference markers on the effect axis. The red dashed vertical line indicates the null value for the overall estimate. The blue diamond’s correspond to the pooled estimates for each subgroup and for the overall analysis; the center of the diamond represents the combined point estimate, and its edges show the 95% confidence interval [26,27,29,31].
Futureinternet 17 00541 g003
Figure 4. Risk of bias and quality using the QUADAS-2 tool (A,B). D1: Patient selection (Risk of Bias); D2: Index test (Risk of Bias); D3: Reference Standard (Risk of Bias); D4: Flow and Timing (Risk of Bias); D5: Patient selection (Applicability Concerns); D6: Index test (Applicability Concerns); D7: Reference Standard (Applicability Concerns) and D8: Overall (D1 + D2 + D3 + D4 + D5 + D6 + D7). The colored spheres: green/+: low risk; yellow/?: unclear; red/−: high risk [26,27,28,29,30,31].
Figure 4. Risk of bias and quality using the QUADAS-2 tool (A,B). D1: Patient selection (Risk of Bias); D2: Index test (Risk of Bias); D3: Reference Standard (Risk of Bias); D4: Flow and Timing (Risk of Bias); D5: Patient selection (Applicability Concerns); D6: Index test (Applicability Concerns); D7: Reference Standard (Applicability Concerns) and D8: Overall (D1 + D2 + D3 + D4 + D5 + D6 + D7). The colored spheres: green/+: low risk; yellow/?: unclear; red/−: high risk [26,27,28,29,30,31].
Futureinternet 17 00541 g004
Table 1. Characteristics of the included studies.
Table 1. Characteristics of the included studies.
Author/DateCountryDesignSample Size Patients with Neurocognitive Disorder (n)Age (Mean/Median)ControlInclusion CriteriaTool Used
Umeda-Kameyama et al., 2021 [26]JapanCohort238AD: 12180.6 Healthy: 117Patients diagnosed with ADFacial images
Chen et al., 2022 [27]TaiwanCross-sectional23AD: 2383.6 -Patients diagnosed with AD Recorded facial videos
Fei et al., 2022 [28]ChinaCohort61MCI: 36>65 Healthy: 25Patients with MCI Recorded facial videos
Zheng et al., 2023 [29]JapanCross-sectional117Dementia: 117>65 -Patients participating in the PROMPT DatasetRecorded facial videos
Sun et al., 2024 [30]USACross-sectional 186MCI: 100>75Healthy: 86Patients with MCIRecorded facial videos
Bergamasco et al., 2025 [31]ItalyCross-sectional60MCI: 26
Dementia: 10
68.2 Healthy: 28Patients with CI Recorded facial videos
AD: Alzheimer’s disease; CI: cognitive impairment; MCI: mild cognitive impairment.
Table 2. Analysis of facial expressions using artificial intelligence.
Table 2. Analysis of facial expressions using artificial intelligence.
Author/DateFacial ComponentNeurological DisorderAlgorithm DescriptionModelsBest Performance ModelPercentage of Detection AccuracyMain FindingsConclusions
Umeda-Kameyama et al., 2021 [26]Full faceADXception
SENet50
ResNet50
VGG16
simple CNN
Deep learning networkXception + Adam92.5%Se: 87.52 ± 11.91%
Sp: 94.57 ± 10.88%
Xception AI can differentiate facial images of people with mild dementia from those without dementia
Chen et al., 2022 [27]Full faceADFERSDeep learning networkFERS86.0%FERS: 86.0%FERS-based AI successfully predicted behavioral and psychological symptoms of dementia
Fei et al., 2022 [28]Full faceMCI AlexNet
MobileNet + block_11_add + SVM
Deep learning networkMobileNet + block_11_add + SVM73.3%SVM 73.3%
KNN: 60.0%
The AI algorithm has good recognition accuracy
Zheng et al., 2023 [29]Full faceDementiaFace Mesh
HOG
Action Unit
Traditional machine learning
Deep learning
HOG79.0% Face Mesh: 66.0%
HOG: 79.0%
Action Unit: 71.0%
Computer programs have the potential to be a crucial indicator in the detection of dementia
Sun et al., 2024 [30]Full face MCIMC-ViViTDeep learning network MC-ViViT 90.63%MC-ViViT: 90.63%AI can detect MCI with promising accuracy
Bergamasco et al., 2025 [31]Full face MCI
Dementia
KNN
LR
SVM
Machine learningKNN73.6%MCI vs. HC: 76.0%
Dementia vs. HC: 73.6%
AI as a potential facial emotion analyzer and non-invasive tool for the early detection of CI
AD: Alzheimer’s disease; AI: artificial intelligence; FERS: facial expression recognition systems; HC: healthy controls; HOG: Histogram of Oriented Gradients; KNN: K-Nearest Neighbors; LR: Logistic Regression; MC-ViViT: Multi-branch Classifier-Video Vision Transformer; MCI: mild cognitive impairment; Se: Sensitivity; Sp: specificity; SVM: Support Vector Machine.
Table 3. Model architectures and datasets.
Table 3. Model architectures and datasets.
Author/DateModel ArchitecturesSpecific ModelsDataset UsedModality
Umeda-Kameyama et al., 2021 [26] CNNXception
SENet50
ResNet50
VGG16
simple CNN
Proprietary dataset collected by the researchersStatic facial images
Chen et al., 2022 [27] CNNFERSDataset collected in a day-care centerFacial video recordings
Fei et al., 2022 [28] CNN + MLAlexNet
MobileNet + block_11_add + SVM
Videos collectedFacial video recordings
Zheng et al., 2023 [29] MLFace Mesh
HOG
Action Unit
PROMPT datasetFacial video recordings
Sun et al., 2024 [30] Transformer-based modelMC-ViViTI-CONECT datasetFacial video recordings
Bergamasco et al., 2025 [31] MLKNN
LR
SVM
Dataset collected at clinical sitesFacial video recordings
CNN: Convolutional Neural Network; FERS: Facial Expression Recognition System; HOG: Histogram of Oriented Gradients; I-CONECT: Internet-Based Conversational Engagement Clinical Trial; KNN: K-Nearest Neighbors; ML: Machine learning; LR: Logistic Regression; MC-ViViT: Multi-branch Classifier-Video Vision Transformer; SVM: Support Vector Machine.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Runzer-Colmenares, F.M.; Cahuapaza-Gutierrez, N.L.; Calderon-Hernandez, C.C.; Loret de Mola, C. Artificial Intelligence-Enabled Facial Expression Analysis for Mental Health Assessment in Older Adults: A Systematic Review and Research Agenda. Future Internet 2025, 17, 541. https://doi.org/10.3390/fi17120541

AMA Style

Runzer-Colmenares FM, Cahuapaza-Gutierrez NL, Calderon-Hernandez CC, Loret de Mola C. Artificial Intelligence-Enabled Facial Expression Analysis for Mental Health Assessment in Older Adults: A Systematic Review and Research Agenda. Future Internet. 2025; 17(12):541. https://doi.org/10.3390/fi17120541

Chicago/Turabian Style

Runzer-Colmenares, Fernando M., Nelson Luis Cahuapaza-Gutierrez, Cielo Cinthya Calderon-Hernandez, and Christian Loret de Mola. 2025. "Artificial Intelligence-Enabled Facial Expression Analysis for Mental Health Assessment in Older Adults: A Systematic Review and Research Agenda" Future Internet 17, no. 12: 541. https://doi.org/10.3390/fi17120541

APA Style

Runzer-Colmenares, F. M., Cahuapaza-Gutierrez, N. L., Calderon-Hernandez, C. C., & Loret de Mola, C. (2025). Artificial Intelligence-Enabled Facial Expression Analysis for Mental Health Assessment in Older Adults: A Systematic Review and Research Agenda. Future Internet, 17(12), 541. https://doi.org/10.3390/fi17120541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop