Next Article in Journal
Operationalizing Instability in Rule-Based Complete Blood Count Phenotyping Using Uncertainty-Aware Machine Learning
Previous Article in Journal
Modeling Glucocorticoid-Induced Renin Regulation from Sparse Data Using Physics-Informed Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable AI-Driven Identification of Multimodal Biomarkers for Early Prediction of Cognitive Decline

by
A. H. M. Fahad
1,
Masahiko Nakatsui
2,3,
Takeshi Abe
2,3,
Takahide Hayano
1,3,
M. H. Mahbub
3,4,
Ryosuke Hase
4,
Natsu Yamaguchi
4,
Yoshihiro Hayakawa
5,
Yusuke Inohana
5,
Yutaka Umakoshi
5,
Ryo Yamaguchi
5,
Ren Kimura
6,
Hisashi Tsujimura
6,
Mitsuharu Matsumoto
7,
Fumiaki Higashijima
8,
Takuya Yoshimoto
8,
Kazuhiro Kimura
8,
Tsunahiko Hirano
9,
Keiji Ohishi
9,
Keiko Doi
9,
Kazuto Matsunaga
9,
Tsuyoshi Tanabe
4 and
Yoshiyuki Asai
1,2,3,*
add Show full author list remove Hide full author list
1
Department of Systems Bioinformatics, Graduate School of Medicine, Yamaguchi University, Yamaguchi 755-8505, Japan
2
AI Systems Medicine Research and Training Center, Graduate School of Medicine, Yamaguchi University Hospital, Yamaguchi University, Yamaguchi 755-8505, Japan
3
Division of Systems Medicine and Informatics, Research Institute for Cell Design Medical Science, Yamaguchi University, Yamaguchi 755-8505, Japan
4
Department of Public Health and Preventive Medicine, Graduate School of Medicine, Yamaguchi University, Yamaguchi 755-8505, Japan
5
Shimadzu Corporation, Nishinokyo Kuwabara-cho, Nakagyo-ku, Kyoto 604-8511, Japan
6
R&D—Analytical Science Research, Kao Corporation, 2-1-3 Bunka, Sumida-ku, Tokyo 131-8501, Japan
7
Research Laboratories, Kyodo Milk Industry Co., Ltd., 20-1 Hirai, Hinode-machi, Nishitama, Tokyo 190-0182, Japan
8
Department of Ophthalmology, Graduate School of Medicine, Yamaguchi University, Yamaguchi 755-8505, Japan
9
Department of Respiratory Medicine and Infectious Disease, Graduate School of Medicine, Yamaguchi University, Yamaguchi 755-8505, Japan
*
Author to whom correspondence should be addressed.
AI Med. 2026, 1(2), 12; https://doi.org/10.3390/aimed1020012
Submission received: 16 January 2026 / Revised: 8 April 2026 / Accepted: 27 April 2026 / Published: 8 May 2026

Abstract

This study developed a two-stage, explainable machine learning framework to predict 18-month MMSE-based cognitive status from baseline multimodal data in community-dwelling older adults in Japan. A hierarchical design was used in which Stage 1 distinguished cognitively Normal participants from those with any abnormality (Possible Mild Cognitive Impairment (MCI) or Impaired), and Stage 2 further separated Possible MCI from Impaired within the abnormal subgroup. Both an Imbalanced-Learn Random Forest and a penalized logistic regression baseline were trained under Leave-One-Out Cross-Validation, yielding fair discrimination in Stage 1 (Random Forest AUC = 0.72, accuracy = 0.71; logistic regression AUC = 0.71, accuracy = 0.76) and apparently strong separability in Stage 2 (Random Forest AUC = 0.95, accuracy = 0.96; logistic regression AUC = 0.82, accuracy = 0.92) in a small sample size with high class imbalance. SHapley Additive exPlanations (SHAP) with TreeExplainer for Random Forest and LinearExplainer for logistic regression were used to identify interpretable biomarkers at each stage though feature attribution. In Stage 1, both models highlighted renal and systemic metabolic markers (e.g., creatinine, uric acid, blood urea nitrogen), amino acid and redox-related metabolites (including D-serine, D-amino acid proportions, L-asparagine, alanine, L-glutamic acid, cysteine, methionine sulfoxide), and wearable-derived activity variability (e.g., fluctuation coefficients and steps per minute), with the Simpson index of gut microbiome diversity also contributing in the logistic model. In Stage 2, the models converged on a distinct signature involving glucose and albumin, uric acid and uridine, choline and carnitine, multiple amino acids (such as phenylalanine, proline, ornithine, tryptophan, threonine, and short-chain amino acids), oxidative/energy markers (niacinamide, methionine, methionine sulfoxide, ergothioneine), hematologic indices, and high-MET activity fluctuation metrics. Collectively, these results support a stage-dependent, multisystem view of cognitive aging in which broad renal–metabolic, amino acid, and behavioral vulnerabilities characterize early abnormality, whereas more pronounced alterations in energy metabolism, nucleotide and choline pathways, oxidative stress, and activity irregularity accompany progression from Possible MCI to Impaired status. By combining routine clinical chemistry, targeted metabolomics, gut microbiome diversity, and wearable-derived behavioral measures within an explainable AI framework, this two-stage approach illustrates a scalable, biologically grounded strategy for stage-aware risk stratification and monitoring of cognitive decline in community settings.

1. Introduction

Dementia and cognitive impairment in aging populations have posed significant and growing global health challenges all over the world. Mild cognitive impairment (MCI) is widely regarded as an intermediate clinical state between normal aging and dementia; its progression is highly heterogeneous, with 20–40% of MCI patients transitioning to Alzheimer’s disease within three years [1]. This heterogeneity makes early identification of individuals at high risk of cognitive decline particularly challenging yet clinically critical.
Researchers are continuously investigating promising biomarkers for prediction, prognosis, and prevention of cognitive decline. Despite substantial progress, no single biomarker has achieved sufficient accuracy or lead time for reliable prediction of cognitive deterioration in community-dwelling older adults.
This unpredictability stems from the fact that dementia is not a single disease entity but a heterogeneous syndrome affecting multiple physiological systems, ranging from vascular and metabolic dysregulation to neuroinflammation driven by complex, multifactorial causal pathways that remain largely undiscovered [2,3].
Current diagnostic standards, such as amyloid positron emission tomography (PET) and cerebrospinal fluid (CSF) analysis provide valuable pathophysiological insights but are invasive and costly, and restricting scalable screening as they often require specialized facilities and trained personnel [4,5]. Consequently, there remains a critical unmet need for a scalable. minimally invasive, and interpretable biomarkers that can support early risk stratification and monitoring of cognitive decline.
Recent advances in artificial intelligence (AI) have offered a transformative path toward precision medicine [6,7] by enabling the integration of large-scale multimodal data, including biobanks, electronic health records, medical imaging, omics technologies, and data from wearable sensors [8]. Biomarkers are defined as measurable properties that represent biological or pathogenic processes, or responses to interventions [9]. In the era of large multimodal cohorts and high-throughput technologies, the main challenge in discovering novel biomarkers has shifted from data availability to extracting clinically meaningful and interpretable patterns from complex datasets [10]. Machine learning approaches have been increasingly used to predict cognitive decline using multidomain data.
Importantly, existing machine learning studies often rely on a single-step classification framework that directly links baseline features to a final diagnostic category, rather than modeling the stepwise clinical progression from normal cognition to mild cognitive impairment (MCI) and dementia, as outlined in established clinical frameworks [11]. Such single-step models may overlook stage-specific biological and behavioral mechanisms that emerge during cognitive decline and may limit both interpretability and clinical relevance.
Hierarchical or staged modeling approaches offer a clinically relevant alternative because they reflect real-world diagnostic processes, where individuals are first screened for abnormal cognitive status and subsequently evaluated for severity. However, relatively few studies have systematically implemented explainable, stage-aware machine learning frameworks for predicting cognitive decline using multimodal, minimally invasive data. The lack of interpretability in many existing models remains a significant barrier to clinical translation [12].
In this study, we conducted an exploratory secondary analysis of a prospective, community-based cohort of older adults in Japan to identify baseline multimodal biomarkers associated with cognitive status at 18 months. Cognitive outcomes were evaluated using the Mini-Mental State Examination (MMSE), developed by Marshal Folstein, which can provide a standardized, longitudinal measure of global cognitive function suitable for tracking neurodegeneration [13]. We developed a two-stage, explainable machine learning framework based on imbalanced-learn Random Forest and penalized logistic regression models, in which cognitively normal individuals were first distinguished from those with Abnormal cognition, followed by severity stratification between Possible MCI and Impaired cognitive states.
Baseline predictors spanned multiple biological and behavioral domains, including routine clinical chemistry, targeted metabolomics with chiral amino acid profiling, gut microbiome indices, and wearable-derived physical activity measures. To ensure interpretability in these complex black-box machine learning models, SHapley Additive exPlanations (SHAP) analysis was applied to quantify feature contributions at each stage of classification, offering transparent explainable insights into stage-dependent biomarker signatures [14]. This integration of explainable AI with scalable multimodal data can enable robust biomarker discovery, paving the way for accessible, non-invasive early detection systems and integrative analysis of complex diseases, including dementia, for biomarker discovery [15].

2. Materials and Methods

2.1. Original Cohort Study Design

The present analysis utilized data from a prospective, non-randomized controlled cohort study conducted in the Ajisu region of Yamaguchi city, Japan. The primary objective of the original cohort study was to evaluate whether multidomain exercise and yogurt-based nutritional interventions could attenuate decline in cognitive and motor function over 18 months compared with a usual-care control group.
Participants were allocated to three groups (Exercise+, Yogurt+, Control) by preference and feasibility, and all procedures, assessments, and follow-up schedules were predefined in the original protocol and approved by the institutional ethics committee.
The cohort design included extensive baseline data collection across multiple biological systems, including biochemistry, metabolomics, chiral amino acids and glycine, gut microbiome, and physical activity. This comprehensive dataset provides a valuable foundation for subsequent investigations into biomarkers predictive of cognitive outcomes as a secondary endpoint.

2.2. Participants

Participants in the cohort study were recruited from Ajisu, a district in Yamaguchi city, Yamaguchi Prefecture, Japan. Eligible participants were community-dwelling older adults aged 75 to 83 years who were able to provide written informed consent after receiving a comprehensive explanation of the study procedures. All participants were required to be free from any serious medical conditions.
Exclusion criteria included: (1) suspected dementia, defined as a Mini-Mental State Examination (MMSE) score of 23 or lower; (2) certification as requiring nursing care at level 2 or higher; or (3) any condition deemed unsuitable for participation as determined by the investigators. Such conditions include inability to stand unassisted, neurological or musculoskeletal disorders, serious medical illnesses (e.g., cancer or renal failure), or regular use of antibiotics at the time of recruitment. In addition, participants in the exercise group were excluded if they had contraindications to physical activity, including epilepsy, hernia, recent surgery, or joint prostheses. Participants undergoing standard treatment for chronic conditions, including diabetes, hypertension, and dyslipidemia, were eligible to remain in this study.
Initially, 104 volunteers expressed interest in participating in the 18-month intervention study. Four individuals withdrew from this study prior to enrollment due to procedural difficulties (n = 2) or medical requirements (n = 2). During this study, one participant voluntarily withdrew, and one participant died, resulting in 98 individuals who completed this study. The inclusion and exclusion criteria are shown in Table 1.

2.3. Ethical Committee Review

The study protocol, informed consent forms, and related documentation were reviewed and approved by the Ethics Review Committee of Yamaguchi University. This study was conducted in accordance with the Declaration of Helsinki and the Japanese Ethical Guidelines for Medical and Health Research Involving Human Subjects, including the guidelines on human genome and genetic analysis research. This study was registered in the University Hospital Medical Information Network Clinical Trials Registry (UMIN-CTR) prior to participant recruitment.

2.4. Secondary Analysis Framework

The current study constitutes an exploratory secondary analysis using data from the original cohort. We included all participants with complete baseline predictors and Mini-Mental State Examination (MMSE) scores assessed at 18 months after baseline. The analysis aimed to identify baseline biomarkers associated with subsequent cognitive status, defined by MMSE outcomes at 18 months. Participants were categorized into three cognitive groups—Normal, Possible MCI, and Impaired—and these categories were used as outcome variables in a supervised machine learning framework.
A two-stage binary classification approach was applied. In Stage 1, the model distinguished Normal from Possible MCI or Impaired. In Stage 2, the model further classified Possible MCI versus Impaired. Classification was performed using the Imbalanced Random Forest algorithm and Logistic Regression algorithm. Given the relatively small sample size, we adopted Leave-One-Out Cross-Validation (LOOCV) to estimate model performance. Bayesian hyperparameter optimization was used to tune model parameters, and SHapley Additive exPlanations (SHAP) values were computed to evaluate feature importance and interpretability.
Importantly, the set of explanatory variables included not only the original study group assignments (Exercise+, Yogurt+, Control) but also a wide range of baseline multimodal features collected at month 0. This comprehensive feature set enabled the model to assess the potential contribution of both intervention exposure and individual baseline characteristics to future cognitive outcomes.

2.5. Data

We included a range of baseline predictors in the model, spanning demographic, intervention, biochemical, behavioral, and microbiome-related data.
Demographic predictors included age and sex.
Intervention-related information consisted of group assignments in the original Ajisu cohort study: Exercise+ (n = 40), Yogurt+ (n = 20), and Control (n = 40). The Exercise Group was given structured 90-min group sessions conducted once weekly at a local health and welfare center. Participants in the Yogurt Group were instructed to consume one cup (100 g) of yogurt per day. These groups were added as predictors and treated as categorical variables.
Biochemical and metabolic predictors were derived from 55 mL of fasting venous blood collected at baseline. Measurements included general blood biochemistry (liver and kidney function, lipid profile, glucose metabolism, anemia and targeted metabolites), and plasma concentrations of 24 chiral amino acids and Glycine: DL-Alanine (DL-Ala), L-Arginine (L-Arg), DL-Asparagine (DL-Asn), L-Aspartic acid (L-Asp), L-Citrulline (L-Cit), L-Glutamine (L-Gln), L-Glutamate (L-Glu), Glycine (Gly), L-Histidine (L-His), L-Isoleucine (L-Ile), L-Leucine (L-Leu), L-Lysine (L-Lys), L-Methionine (L-Met), L-Ornithine (L-Orn), L-Phenylalanine (L-Phe), DL-Proline (DL-Pro), DL-Serine (DL-Ser), L-Threonine (L-Thr), L-Tryptophan (L-Trp), L-Tyrosine (L-Tyr), and L-Valine (L-Val). These were quantified using chiral tandem liquid chromatography-tandem mass spectrometry (LC-MS/MS).
Cognitive function was evaluated using the standard Mini-Mental State Examination (MMSE) questionnaire at both baseline and 18 months.
Physical activity was recorded from 83 participants using wearable accelerometers (HJA-750C, OMRON Healthcare) worn for one month at the beginning of this study.
Gut microbiome profiles were obtained from fecal samples collected using FS-0007 collection kits (TechnoSuruga Laboratory Co., Ltd., Shizuoka, Japan). DNA libraries were sequenced across four lanes of an Illumina NovaSeq 6000 platform (Illumina, Inc., San Diego, CA, USA) with paired-end 150 bp reads.

2.6. Cognitive Outcome Classification Based on the MMSE

Cognitive status at 18 months was categorized into three groups based on MMSE scores: Normal (MMSE > 26), Possible MCI (24 < MMSE ≤ 26) or a drop in score more than 3 or equal from the initial time point to end time point, and Impaired (MMSE ≤ 23). This classification yielded 73 Normal, 20 Possible MCI, and 5 Impaired cases. These thresholds capture clinically meaningful cutoffs commonly used to characterize cognitive severity in older adults, and were used to define outcome labels for supervised machine learning (Table 2). However, the MMSE alone has limited sensitivity for subtle cognitive changes and there is no universally accepted standard for defining MCI based solely on the MMSE. Accordingly, our outcome categories should be viewed as pragmatic severity strata rather than definitive clinical diagnoses within a cohort that may already include individuals with mild baseline impairment, rather than incident conversion from entirely normal cognition.

2.7. Biological Rationale for Selected Predictors

A range of multimodal predictors, including biochemical markers, chiral amino acids, metabolites, physical activity metrics, gut microbiome alpha diversity, and β-amyloid, was selected to capture complementary biological, behavioral, and pathological pathways that may jointly influence cognitive status. These features were considered suitable for predicting MMSE-based cognitive categories due to their mechanistic relevance to cognitive decline.
Systemic metabolic dysfunction and low-grade inflammation, assessed via routine blood panels (e.g., lipids, glucose, renal function, uric acid, and leukocyte-derived ratios), are linked to amyloid burden and cognitive impairment. Their inclusion enables the identification of vascular, metabolic, and immunologic contributors to cognitive changes.
Targeted plasma concentrations of chiral amino acids and related metabolites provide insights into neurotransmitter pathways (e.g., glutamate/GABA and tryptophan-kynurenine), host microbiome co-metabolism, and oxidative stress. These pathways are directly related to cognitive and mood regulation and provide sensitive molecular signatures for cognitive phenotyping [16,17,18,19].
Physical activity metrics, such as daily steps, metabolic equivalents of tasks (METs), and exercise volume, are objectively associated with preserved cognitive function and represent modifiable behavioral signals that complement static clinical measures. Physical activity also interacts with vascular risk, metabolic regulation, and inflammatory processes [20,21,22,23].
Lastly, gut microbiome alpha diversity (e.g., Shannon index) and microbial community composition have been associated with cognitive performance and future cognitive trajectories, providing a gut–brain axis that complements both blood biomarkers and behavioral metrics [24].

2.8. Data Preprocessing

2.8.1. Variable Scaling

All predictor variables were standardized prior to model training to align their distributions and ensure comparability across features with different units and scales. Standardization adjusts each feature to have zero mean and unit variance, which prevents variables with large absolute magnitudes from disproportionately influencing model fitting. This procedure improves numerical stability, facilitates convergence during optimization, and enhances performance for scale-sensitive algorithms by ensuring balanced gradients.

2.8.2. Encoding of Categorical Variables

Categorical variables were encoded using one-hot encoding. This encoding is broadly compatible with tree-based machine learning models, which can inherently handle sparse and non-ordinal inputs without requiring additional normalization.

2.8.3. Missing Data Handling

Missing values were imputed using the median strategy, which is robust to outliers and skewed distributions, which are common in small, heterogeneous clinical datasets. Unlike the mean or model-based approaches, the median preserves central tendency without being distorted by extreme values, making it a reliable baseline for multimodal data imputation in low-sample settings.

2.9. Machine Learning Model Architecture

We developed a two-stage classification model to predict 18-month cognitive outcomes, as assessed by MMSE scores, based on baseline multimodal data. The classification was structured to reflect the clinical progression of cognitive impairment. In the first stage, the model identified cognitively normal participants from those with any abnormality (Possible MCI or Impaired). In the second stage, it further differentiated between Possible MCI and Impaired cases. This stepwise approach reduces misclassification between adjacent cognitive categories and allows each stage to be optimized for its specific clinical trade-offs.
The MMSE-derived categories exhibit ordinal structure and clinically subtle boundaries, particularly between Normal and Possible MCI. A staged formulation helps isolate this boundary uncertainty by first separating broad abnormality before learning finer distinctions within a pre-filtered subgroup. Prior studies have shown that addressing class imbalance through resampling or class weighting improves model performance in small clinical datasets with skewed label distributions.
To address the marked class imbalance, particularly the low prevalence of Impaired cases, a Random Forest-based model with integrated resampling was employed and in penalized logistic regression class weights were utilized. This approach improves stability and sensitivity to minority classes, especially in small, heterogeneous datasets. In this high-dimensional, low-sample setting, overfitting and instability of model estimates are important concerns, particularly for non-linear models. To mitigate these risks, we combined tree-based ensembles with feature subsampling and class rebalancing, and we used penalized logistic regression as a complementary linear baseline to provide implicit variable selection under Leave-One-Out Cross-Validation.

2.10. Validation Strategy

To evaluate model performance under the limited sample size, we applied Leave-One-Out Cross-Validation (LOOCV) for both stages and for both the Random Forest and logistic regression classifiers. In this framework, each iteration trains the model on all but one participant and evaluates it on the held-out individual, cycling through all samples so that every participant serves once as an independent test case. Using the same LOOCV scheme across models ensured directly comparable performance estimates and approximates the intended clinical use case, in which predictions must generalize to previously unseen individuals with similar baseline assessments.

2.11. Hyperparameter Optimization

Bayesian optimization was used for hyperparameter tuning in both classification stages and for both model families, as it efficiently identifies high-performing configurations with fewer evaluations, which is particularly advantageous under LOOCV, where each evaluation is computationally intensive. For the Imbalanced-Learn Random Forest, a 1000-iteration search was conducted in Stage 1; LOOCV accuracies converged around 0.77–0.78 while macro-level precision, recall, and F1 fluctuated more, and the final model favored a relatively shallow forest with entropy splitting, max_depth = 19, and max_features = sqrt. In Stage 2, the Random Forest search converged within approximately 32 iterations, with optimization-stage LOOCV accuracy exceeding 0.96 and precision and recall approaching 0.92–0.98, leading to the selection of a deeper tree ensemble (max_depth = 17, n_estimators = 256) that evaluates all features at each split. During these searches we relied on the internal resampling strategy of Balanced Random Forest Classifier and did not specify additional class weights, allowing the algorithm’s built-in balancing to handle label skew while hyperparameters were tuned. For the logistic regression baselines, separate Bayesian searches were run in each stage over regularization strength (C), penalty type, and class-weighting schemes, yielding in Stage 1 an L1/Lasso-based penalized model with moderate regularization (C ≈ 1.48) and no explicit class weighting, and in Stage 2 a weakly regularized L1/Lasso-based model (C ≈ 634.35) with manual up-weighting of the minority Impaired class (class_weight = {Impaired: 3.0, Possible MCI: 1.0}). Hyperparameter search and performance estimation were both conducted within the same LOOCV framework without additional nested cross-validation or bootstrap resampling because further partitioning the data or extensive resampling (e.g., nested CV or bootstrap) would have resulted in extremely small and unstable training folds, particularly for the small classes. The resulting LOOCV metrics represent internally optimized, exploratory estimates of model performance rather than fully unbiased external validation. Optimization trajectories and the final tuned parameters for both Random Forest and logistic regression models are summarized in Table 3.

2.12. Feature Attribution via SHAP Analysis

To interpret model predictions and identify contributing features, we applied SHAP (Shapley Additive Explanations) in both classification stages. For the Imbalanced-Learn Random Forest models, we used the TreeExplainer module, and for the logistic regression baselines we used the LinearExplainer module, allowing SHAP values to be computed in a manner tailored to each model class while retaining a common additive framework. SHAP provides feature-wise attribution scores that quantify the contribution of each input variable to individual predictions, supporting both local interpretability at the participant level and global feature importance ranking across the cohort.

2.13. Post Hoc Group Comparison

We conducted Games–Howell tests to compare baseline features across MMSE-defined cognitive categories (Normal, Possible MCI, Impaired), as a post hoc statistical complement to the model-derived feature importance. This test is appropriate for unequal variances and unbalanced sample sizes, providing more reliable Type-I error control than pooled-variance methods in heterogeneous clinical data.

2.14. Software Packages

All analyses were performed using the Python programming language and open-source Python packages. The software environment consisted of Python version 3.10 and the following major libraries (with versions), which were primarily used in the analyses: Pandas 2.2.2, NumPy 1.26.4, Matplotlib 3.5.1, Seaborn 0.13.2, Pingouin 0.5.5, Statsmodels 0.14.0, Scikit-bio 0.6.0, Scikit-learn 1.6.1, Scikit-optimize 0.10.2, SciPy 1.13.0, and SHAP 0.46.0.

3. Results

3.1. Stage 1 Performance: Normal vs. Abnormal

Stage 1 demonstrated a moderate discriminative performance in separating Normal participants from those with an Abnormal cognitive status (Possible MCI and Impaired) in both the Imbalanced-Learn Random Forest and logistic regression models under Leave-One-Out Cross-Validation (LOOCV). The optimized Random Forest achieved an overall accuracy of 71.43% with a ROC-AUC of 0.72, while the tuned penalized regularized logistic regression baseline reached a higher accuracy of 76.53% with a comparable ROC-AUC of 0.71, indicating that the underlying signal is robust to differences in model class.
For the Random Forest, the confusion matrix showed high specificity (83.56%) but limited sensitivity (36.0%) for detecting Abnormal cognition, yielding a substantially stronger performance for the Normal class (precision = 79.22%, recall = 83.56%, F1 = 81.33%) than for the Abnormal class (precision = 42.86%, recall = 36.0%, F1 = 39.13%). Logistic regression improved Abnormal detection while maintaining similar specificity, with sensitivity increasing to 48.0% and Normal/Abnormal class metrics (Normal: precision = 84.93%, recall = 84.93%, F1 = 84.93%; Abnormal: precision = 52.17%, recall = 48.0%, F1 = 50.00%), reflecting a slightly more balanced screening profile. Overall, both models provided fair discrimination at the screening stage, with logistic regression offering modest gains in sensitivity and macro-averaged performance while Random Forest retained marginally higher AUC values (Figure 1; Table 4).

3.2. Stage 2 Performance: Possible MCI vs. Impaired

Stage 2 demonstrated excellent discriminative performance in distinguishing Possible MCI from Impaired cognitive states in both modeling approaches under LOOCV. The optimized Imbalanced-Learn Random Forest achieved an accuracy of 96.0% with a ROC-AUC of 0.95, while the tuned L1-regularized logistic regression baseline attained an accuracy of 92.0% and a ROC-AUC of 0.82, indicating that severity separation remained strong even in a simpler linear model despite the small Impaired subgroup.
The Random Forest model exhibited a near-perfect clinical performance, with 100% specificity for detecting Impaired cases and 95% sensitivity for identifying Possible MCI, yielding exceptional class-specific metrics for both Impaired (precision = 83.33%, recall = 100%, F1 = 90.91%) and Possible MCI (precision = 100%, recall = 95%, F1 = 97.44%). Logistic regression preserved high sensitivity for Possible MCI (95%) but showed lower specificity for Impaired (80%), with corresponding class metrics of Impaired (precision = 80.0%, recall = 80.0%, F1 = 80.0%) and Possible MCI (precision = 95.0%, recall = 95.0%, F1 = 95.0%), reflecting slightly less sharp separation yet still strong severity discrimination (Figure 2; Table 5).

3.3. Feature Importance Based on SHAP Analysis

SHapley Additive exPlanations (SHAP) were applied to the optimized models in both stages to quantify the contribution of individual baseline features to classification outcomes. Feature importance was summarized using SHAP beeswarm plots, with the top 20 features ranked by mean absolute SHAP values for each stage. For the Imbalanced-Learn Random Forest we used TreeExplainer, and for logistic regression we used LinearExplainer, which allowed us to compare feature attribution patterns across non-linear and linear classifiers within a common additive framework (Figure 3 and Figure 4; Table 6).
Stage 1: In the Random Forest model, the most influential features included creatinine (CRE), uric acid (UA), and exercise-related fluctuation metrics, followed by other physical activity measures and amino acid–related variables. The top-ranked features were CRE, UA, EX fluctuation coefficient 1, and number of steps per minute, with additional contributors such as D-serine (D-Ser), D-amino acid proportions (Ser(D/(D + L) × 100), Ala(D/(D + L) × 100)), L-asparagine (L-Asn), cysteine, alanine (Ala), L-glutamic acid (L-Glu), L-methionine (L-Met), adenosine monophosphate, hemoglobin (HGB), red cell distribution width–coefficient of variation (RDW-CV), EX, Activity(Ex), walking time, total steps, and steps per minute. The logistic regression model highlighted a partially overlapping but distinct set of top features, including threonine, number of steps per minute, D-amino acid proportions, aspartic acid, L-phenylalanine, L-citrulline, glutamic acid, albumin (ALB), mean corpuscular hemoglobin (MCH), LDL-cholesterol, D-asparagine, glucose (GLU), height, glyoxylic acid, methionine sulfoxide, and the Simpson index of gut microbiome diversity, indicating that amino acid, lipid, and microbiome-related measures also contribute to linear separation of Normal vs. Abnormal cognition. Together, these analyses show that both models converge on renal function, systemic metabolism, amino acid balance, hematologic indices, and activity variability as key determinants of Stage 1 predictions (Figure 3).
Stage 2: A different pattern of feature importance was observed when distinguishing Possible MCI from Impaired cognitive states. In the Random Forest model, the highest-ranked features were glucose (GLU), albumin (ALB), uric acid, and uridine, followed by choline and carnitine, emphasizing energy metabolism, nutritional status, and nucleotide/choline pathways. Several wearable-derived activity fluctuation metrics, including 3 METs or more fluctuation coefficient 5 and 4 METs or more fluctuation coefficient 5, were also ranked among the top features, indicating that high-intensity activity variability carries information about severity. Within the top 20 Random Forest features, multiple amino acids and related metabolites were identified, including methionine sulfoxide, methionine, phenylalanine, proline, ornithine, tryptophan, threonine, 4-aminobutyric acid, and 2-aminobutyric acid, as well as hematological parameters such as lymphocyte percentage (LYMPH%) and monocyte count (MONO#), and niacinamide. The logistic regression Stage 2 model selected complementary predictors, such as EX fluctuation coefficient 2 and 3, measured height, total protein (TP), L-tryptophan, L-asparagine, red blood cell count, creatinine, adenosine monophosphate, 8 METs or higher activity, MCH, age, choline, L-histidine, 4-hydroxyproline, 2 METs-level activity, D-serine, and ergothioneine. Collectively, these SHAP results indicate that both non-linear and linear models attribute Stage 2 discrimination to a coherent multimodal signature spanning glucose and protein metabolism, amino acid and redox pathways, hematologic indices, and activity-pattern variability (Figure 4).

3.4. Post Hoc Statistical Validation

Games–Howell tests were performed to identify pairwise differences between MMSE-defined cognitive categories (Normal, Possible MCI, and Impaired) across multimodal variables, accounting for unequal variances and unbalanced sample sizes.
Several standard clinical biomarkers exhibited significant group differences. Creatine kinase (CK) showed highly significant differences in both Normal vs. Impaired (p < 0.001) and Possible MCI vs. Impaired (p = 0.01) comparisons. Total protein (TP), albumin (ALB), creatinine (CRE), red blood cell count (RBC), and mean corpuscular volume (MCV) displayed significant differences between Normal vs. Possible MCI groups (p < 0.05).
Amino acid-related measures also showed cognitive category-dependent differences. L-asparagine (L-Asn) and L-threonine (L-Thr) showed significant differences between Normal and Possible MCI (p < 0.05), while glycine exhibited significant contrasts in both Normal vs. Impaired (p < 0.01) and Possible MCI vs. Impaired (p < 0.05) comparisons.
Several metabolites displayed distinct pairwise differences across cognitive categories. Carnitine levels differed significantly between Normal vs. Possible MCI (p < 0.05). Creatinine showed multiple significant contrasts: Normal vs. Impaired (p < 0.05) and Possible MCI vs. Impaired (p < 0.01). Kynurenine, a tryptophan metabolite linked to neuroinflammation, demonstrated a significant difference between Possible MCI and Impaired (p < 0.05).
Physical activity parameters showed significant group differences, primarily between Normal and Possible MCI categories. SMETS/METs (metabolic equivalent tasks) were significantly different between Normal vs. Impaired (p < 0.05). In addition, total walking exercise, steps per minute, and exercise fluctuation coefficients (EX1, EX2, EX4, and EX5) demonstrated significant difference in Normal vs. Possible MCI comparisons (p < 0.05). These differences are visualized in Figure 5. Detailed results of the Games–Howell analysis are summarized in Table 7.

4. Discussion

4.1. Principal Findings and Rationale of the Two-Stage Framework

This study developed and evaluated a two-stage, explainable machine learning framework to predict cognitive status at 18 months using baseline multimodal data from a community-dwelling elderly cohort. Across both the Imbalanced-Learn Random Forest and a penalized logistic regression baseline, the models consistently indicated that combinations of renal and systemic metabolic markers, amino acid and redox-related metabolites, and wearable-derived physical activity features carry informative signals about cognitive abnormality and severity. Structuring prediction into two sequential stages, first separating cognitively Normal individuals from those with any abnormality (Possible MCI or Impaired), which already represents a generalized binary classifier, and then distinguishing Possible MCI from Impaired, aligns the modeling strategy with the clinical progression of cognitive decline and allows stage-specific biomarker patterns to emerge. It is also important to note that our models were trained in a cohort defined by baseline MMSE ≥ 24, a criterion that allows the inclusion of individuals with possible mild impairment at enrollment. As a result, the classifiers primarily capture associations between baseline multimodal profiles and MMSE-defined severity status at 18 months, on top of pre-existing variability in cognitive function, rather than predicting purely de novo onset of impairment from a uniformly normal baseline. Accordingly, our findings should be interpreted in terms of stage-dependent differences in 18-month MMSE severity, not as estimates of strict conversion rates from normal cognition to MCI or dementia.
In Stage 1, both models achieved moderate discrimination: Random Forest provided slightly higher ROC-AUC values, whereas logistic regression achieved somewhat higher accuracy and sensitivity, suggesting that the underlying signal is robust to different modeling assumptions. In Stage 2, both approaches revealed strong separability between Possible MCI and Impaired cognition, with Random Forest achieving very high accuracy and AUC and logistic regression still performing well despite the small Impaired subgroup. SHAP analyses applied separately to the Random Forest and logistic regression models provided convergent evidence that the same broad biomarker groups, renal and metabolic indices, amino acid and oxidative-stress markers, and activity-variability measures, drive predictions at each stage, supporting an integrative view of cognitive aging as a multisystem process.

4.2. Biological Signatures in Stage 1

Stage 1 of our framework, which differentiated Normal participants from those of Possible MCI or Impaired, was primarily influenced by markers of renal function, nitrogen–purine metabolism, and systemic metabolic status in both modeling approaches. Uric acid, creatinine, and blood urea nitrogen emerged as the most influential predictors, indicating that kidney-related metabolic processes offer valuable insights into early cognitive abnormalities. The tuned logistic regression model likewise assigned substantial weight to creatinine. These routinely measured clinical markers may reflect the combined effects of vascular burden, metabolic homeostasis, and systemic clearance mechanisms might be relevant to brain aging [25,26].
In Stage 1, amino acid-related signatures, with multiple chiral and proteinogenic amino acid measures including D-serine, D-amino acid proportions, and L-asparagine, alanine, and L-glutamic acid, together with adenosine monophosphate and cysteine, pointing toward variation in neurotransmission-relevant pools, host microbiome co-metabolism, and antioxidant capacity ranked among the top contributors. Logistic regression emphasized a partially overlapping subset, including threonine, glutamic acid, tryptophan, and methionine sulfoxide, reinforcing the importance of amino acid and redox pathways under a linear modeling assumption as well. These amino acid features can be interpreted as reflecting broader systemic metabolic and neurotransmission-relevant variation already captured by the multimodal panel, rather than being specific markers of established neurodegeneration at the screening stage [27,28]. This interpretation is consistent with prior reports indicating that circulating amino acid profiles, including alanine/asparagine panels and D-amino acid proportions, differ across the cognitive spectrum and may relate to cognitive phenotypes in a domain-dependent manner [20].
Redox- and energy-associated metabolites also contributed to Stage 1 predictions. Cysteine, a key substrate for glutathione synthesis, was an important feature, consistent with its role in maintaining antioxidant defenses and cellular redox balance [29,30,31]. Adenosine monophosphate was another influential predictor, suggesting a link between peripheral energy-state signaling and early cognitive vulnerability. These findings indicate that Stage 1 reflects a range of systemic metabolic signals rather than markers specific to established neurodegeneration [32].
In addition, the Simpson index of gut microbiome diversity appeared among the influential predictors in the Stage 1 logistic regression model, suggesting that overall microbial community structure may also contribute to baseline differences between cognitively Normal and Abnormal participants in this cohort.

4.3. Physical Activity Variability in Both Stages

Wearable-derived physical activity metrics also contributed significantly to the predictive signal across both stages, particularly those capturing variability in activity patterns, highlighting their relevance to cognitive status. In Stage 1, the Random Forest model ranked exercise fluctuation coefficients, steps per minute, total steps, and walking time among the top features, while logistic regression also selected steps per minute and MET-based indicators within its most informative variables. In Stage 2, high-MET fluctuation coefficients (e.g., 3 METs or more and 4 METs or more) remained influential in the Random Forest SHAP results, and logistic regression highlighted additional activity-related variables such as EX fluctuation coefficients and MET-derived categories.
The prominence of activity variability suggests that irregular daily movement patterns may provide an early functional signature of instability associated with cognitive decline, complementing static biochemical measures [33,34]. These wearable-derived features are objective, non-invasive, and scalable, and their consistent appearance among the leading predictors in both models underscores the value of integrating behavioral data with clinical and metabolomic biomarkers for community-based cognitive risk assessment.
These findings are consistent with previous research linking accelerometry-derived features to cognitive function [35].

4.4. Biological Signatures in Stage 2

In Stage 2, which distinguished Possible MCI from Impaired cognitive conditions, the machine learning classifier and logistic regression baseline both indicated a distinct pattern of key features compared with Stage 1. In the Random Forest model, glucose and albumin, together with uric acid and uridine, emerged as important predictors, highlighting the association of glycemic control, protein and nutritional status, and nucleotide-related metabolism in more advanced impairment.
Uridine and uric acid emerged as important markers in Stage 2, indicating a shift toward nucleotide-related metabolism in advanced impairment [36,37,38]. Glutamic acid was also highly ranked, underscoring its role in excitatory neurotransmission and synaptic function [39,40]. Choline was also contributing to the classification, linking cholinergic pathways to cognitive processes commonly examined in dementia research [41,42].
Markers of mitochondrial and energy metabolism were also prominent. Carnitine, involved in fatty acid transport and mitochondrial energy production, and niacinamide, a precursor in NAD+ metabolism, contributed to severity classification. Oxidative stress-related metabolites, such as methionine and methionine sulfoxide, further supported the role of redox imbalance in advanced cognitive impairment. Together, these features suggest that the transition from Possible MCI to Impaired status involves coordinated changes in energy metabolism, neurotransmitter balance, and oxidative stress pathways [43,44]. Logistic regression converged on related pathways, emphasizing tryptophan, asparagine, histidine, 4-hydroxyproline, adenosine monophosphate, and ergothioneine among its top features.
Across both models, amino acid profiles, including phenylalanine, proline, ornithine, tryptophan, threonine, and short-chain amino acids, recurrently appeared, suggesting that coordinated shifts in amino acid metabolism, neurotransmitter precursors, and inflammatory or nitrogen-handling pathways accompany the transition from Possible MCI to more clearly Impaired status [45,46,47,48].

4.5. Integrative Interpretation and Clinical Implications

Overall, findings from both stages support a stage-dependent interpretation of cognitive aging as a systems-level transition captured by multimodal biomarkers. Broad Normal vs. abnormal classifiers (Stage 1) are linked to broad metabolic and behavioral vulnerabilities, reflected in renal and systemic metabolic markers, amino acid balance, and activity variability, whereas Possible vs. Impaired Classifier (Stage 2) is associated with more pronounced alterations in energy metabolism, nucleotide and choline pathways, oxidative stress, and behavioral irregularity. The convergence between the Imbalanced-Learn Random Forest and logistic regression models, together with SHAP-based explanations, strengthens confidence that these biomarker groups represent persistent signals in the data rather than model-specific artefacts. Another important consideration is the balance between model complexity and sample size. The multimodal predictor space in this study is large compared with the number of participants, which increases the likelihood of overfitting and sensitivity of feature rankings to small perturbations in the data. Although the use of LOOCV, regularized logistic regression, and ensemble methods helps to stabilize estimates, the specific feature sets highlighted by SHAP should be regarded as one plausible model-based representation of the signal in this dataset rather than as a definitive, stable list of biomarkers.
While larger cohorts, longer follow-up, and broader cognitive batteries will be needed to validate and refine these signatures, identifying feature contributions through SHAP analysis in hierarchal stages offers transparency that is vital for clinical insights and hypothesis generation. Using accessible biomarkers, such as routine blood tests and wearable-derived measures, increases the feasibility of applying similar frameworks in real-world settings.
The excellent Stage 2 performance metrics should be interpreted with particular caution given to the small size of the Impaired subgroup (n = 5), which substantially increases the risk of overfitting despite the use of LOOCV and imbalanced-learning strategies. We acknowledge that overfitting cannot be definitively ruled out, and validation in larger, independent cohorts remains essential. However, when interpreted as exploratory trends rather than definitive quantitative findings, these results offer meaningful methodological insights. The convergence of biomarker signatures across two independent modeling approaches (Random Forest and Lasso logistic regression), the biological plausibility of identified pathways (renal and systemic metabolic indices, glucose metabolism, oxidative stress, amino acid dysregulation), and supporting evidence from Games–Howell post hoc comparisons collectively suggest that observed patterns may reflect genuine biological signals worthy of further investigation. The primary contribution of this work lies in demonstrating a two-stage, explainable AI framework that integrates minimally invasive multimodal data for stage-aware cognitive assessment, serving as a methodological template for hypothesis-driven validation studies rather than providing immediately actionable clinical biomarkers.

5. Limitation of This Study

Several limitations should be considered when interpreting these results. This study was conducted as a secondary, exploratory analysis of a cohort originally designed for intervention evaluation in non-randomized subjects. The sample size was small, with class imbalance; especially, the Impaired group was small. Although Leave-One-Out Cross-Validation and imbalanced-learning and penalized regression techniques were employed, the results should be viewed as hypothesis-generating rather than confirmatory.
Future studies should verify these outcomes in larger, independent cohorts with more balanced cognitive categories. Longitudinal analyses of biomarker changes over time may further clarify their roles in cognitive trajectories. External validation and integration with interventional studies are also needed to assess generalizability and clinical impact. Regardless of these limitations, this work offers a structured, interpretable framework for multimodal biomarker discovery in cognitive aging.

6. Conclusions

This study offers a two-stage, explainable machine learning framework to predict cognitive status at 18 months using baseline multimodal data. Advanced impairment is characterized by neurochemical, energetic, and oxidative metabolic changes. Key predictors include routine blood chemistry, targeted metabolomics, and variability in wearable-derived physical activity. Their contributions were interpreted using SHAP analysis.
By aligning predictive modeling with clinically meaningful stages, this framework delivers a systems-level perspective on cognitive aging that goes beyond single-domain biomarkers. While exploratory and limited by sample size, the use of accessible and minimally invasive measures supports scalable cognitive risk screening and early intervention planning. Validation in a larger, independent cohorts will be necessary to confirm the generalizability of these findings and to identify a more parsimonious set of predictors with clinical utility.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/aimed1020012/s1, Figure S1: Distribution of MMSE scores at baseline (0 month) and endpoint (18 month) across intervention groups, Figure S2: Trends in mean MMSE scores over the study period (0, 6, 12, and 18 months), plotted by group, Figure S3: Distribution Plot of Top 20 parameters from SHAP analysis of Stage 1 Random Forest model according to MMSE Categories, Figure S4: Distribution Plot of Top 20 parameters from SHAP analysis of Stage 2 Random Forest model according to MMSE Categories; Table S1: Statistical Analysis of Intervention groups over 4 time point according to MMSE score, Table S2: List of Training Parameters.

Author Contributions

Conceptualization: Y.A. and A.H.M.F.; Methodology: A.H.M.F., Y.A., T.A. and M.N.; Formal analysis: A.H.M.F. and Y.A.; Investigation: A.H.M.F. and Y.A.; Data curation: A.H.M.F., R.H., N.Y., Y.A. and K.D.; Writing—original draft preparation: A.H.M.F. and Y.A.; Writing—review and editing: A.H.M.F., M.N., T.A., T.H. (Takahide Hayano), M.H.M., R.H., N.Y., Y.H., Y.I., Y.U., R.Y., R.K., H.T., M.M., F.H., T.Y., K.K., T.H. (Tsunahiko Hirano), K.O., K.D., K.M., T.T. and Y.A.; Visualization: A.H.M.F., Y.A.; Supervision: Y.A., M.N., M.H.M., N.Y., R.H. and T.T.; Project administration: R.H., N.Y., F.H., T.Y., K.D., T.H. (Tsunahiko Hirano), K.O. and T.T.; Funding acquisition: Y.H., Y.I., Y.U., R.Y., R.K., H.T. and M.M. contributed through the provision of specialized resources, technical expertise, and advisory input. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a joint project funded by Yamaguchi Prefecture, Yamaguchi City, Shimadzu Corporation, Kao Corporation, and Kyodo Milk Industry Co., Ltd.

Institutional Review Board Statement

This study was conducted in accordance with the principles of the Declaration of Helsinki, and the study protocol was approved by the institutional review board of Yamaguchi University (2020-183-3).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Y.H., Y.I., Y.U., and R.Y. are employees of Shimadzu Corporation; R.K. and H.T. are employees of Kao Corporation; and M.M. is an employee of Kyodo Milk Industry Co., Ltd. The funders had no role in the design of this study; in the analysis or interpretation of the data; in the preparation of the original draft of the manuscript; or in the decision to publish. The authors declare no financial conflicts of interest related to the results or interpretation of the reported findings.

Abbreviations

The following abbreviations are used in this manuscript:
ADAlzheimer’s disease
AIArtificial intelligence
ALBAlbumin
AsnAsparagine
ATPAdenosine triphosphate
AUCArea under the ROC curve
BASOBasophils
BMIBody mass index
BOBayesian optimization
BUNBlood urea nitrogen
CKCreatine kinase
CRECreatinine
CSFCerebrospinal fluid
CTPCytidine triphosphate
AlaAlanine
ProProline
SerSerine
DNADeoxyribonucleic acid
ECMExtracellular matrix
EIExpected improvement
EOEosinophils
EXWearable-derived activity feature label used in the thesis
GABAGamma-aminobutyric acid
GLUGlucose
GlyGlycine
HbA1cHemoglobin A1c
HCTHematocrit
HGBHemoglobin
HJA-750CWearable accelerometer model used for physical activity recording
ArgArginine
AspAspartic acid
CitCitrulline
GlnGlutamine
HisHistidine
IleIsoleucine
LeuLeucine
LysLysine
MetMethionine
OrnOrnithine
PhePhenylalanine
ThrThreonine
TrpTryptophan
TyrTyrosine
ValValine
LC-MS/MS (LC-MSMS)Liquid chromatography–tandem mass spectrometry
LBDLewy body dementia
LOOCVLeave-One-Out Cross-Validation
LYMPHLymphocytes
LRLogistic regression
MCHMean corpuscular hemoglobin
MCHCMean corpuscular hemoglobin concentration
MCIMild cognitive impairment
MCVMean corpuscular volume
MET/METsMetabolic equivalent(s) of task
MLMachine learning
MMSEMini-Mental State Examination
MONOsMonocytes
MPVMean platelet volume
NADNicotinamide adenine dinucleotide
NEUTN-methyl-D-aspartate
NIHNational Institutes of Health
NMDAN-methyl-D-aspartate
NRBCsNucleated red blood cells
P-LCRPlatelet large cell ratio
PCTPlateletcrit
PDWPlatelet distribution width
PETPositron emission tomography
PLTPlatelet count
RBCRed blood cell count
RDW-CVRed cell distribution width (coefficient of variation)
RDW-SDRed cell distribution width (standard deviation)
ROCReceiver operating characteristic
ROC-AUCArea under the ROC curve
SHAPSHapley Additive exPlanations
TPTotal protein
UAUric acid
UCBUpper confidence bound (acquisition function)
UMIN-CTRUniversity Hospital Medical Information Network Clinical Trials Registry
UTPUridine triphosphate
WBCWhite blood cell count
WHOWorld Health Organization
XAIExplainable artificial intelligence

References

  1. Petersen, R.C.; Parisi, J.E.; Dickson, D.W.; Johnson, K.A.; Knopman, D.S.; Boeve, B.F.; Jicha, G.A.; Ivnik, R.J.; Smith, G.E.; Tangalos, E.G.; et al. Neuropathologic Features of Amnestic Mild Cognitive Impairment. Arch. Neurol. 2006, 63, 665. [Google Scholar] [CrossRef]
  2. Beckmann, N.D.; Lin, W.-J.; Wang, M.; Cohain, A.T.; Wang, P.; Ma, W.; Wang, Y.-C.; Jiang, C.; Audrain, M.; Comella, P.; et al. Multiscale Causal Network Models of Alzheimer’s Disease Identify VGF as a Key Regulator of Disease. bioRxiv 2018. [Google Scholar] [CrossRef]
  3. De Strooper, B.; Karran, E. The Cellular Phase of Alzheimer’s Disease. Cell 2016, 164, 603–615. [Google Scholar] [CrossRef]
  4. D’Amore, F.M.; Moscatelli, M.; Malvaso, A.; D’Antonio, F.; Rodini, M.; Panigutti, M.; Mirino, P.; Carlesimo, G.A.; Guariglia, C.; Caligiore, D. Explainable Machine Learning on Clinical Features to Predict and Differentiate Alzheimer’s Progression by Sex: Toward a Clinician-Tailored Web Interface. J. Neurol. Sci. 2025, 468, 123361. [Google Scholar] [CrossRef] [PubMed]
  5. De Vugt, M.E.; Verhey, F.R.J. The Impact of Early Dementia Diagnosis and Intervention on Informal Caregivers. Prog. Neurobiol. 2013, 110, 54–62. [Google Scholar] [CrossRef]
  6. Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  7. Krones, F.; Marikkar, U.; Parsons, G.; Szmul, A.; Mahdi, A. Review of Multimodal Machine Learning Approaches in Healthcare. Inf. Fusion 2025, 114, 102690. [Google Scholar] [CrossRef]
  8. Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal Biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef]
  9. Califf, R.M. Biomarker Definitions and Their Applications. Exp. Biol. Med. 2018, 243, 213–221. [Google Scholar] [CrossRef]
  10. Winchester, L.M.; Harshfield, E.L.; Shi, L.; Badhwar, A.; Khleifat, A.A.; Clarke, N.; Dehsarvi, A.; Lengyel, I.; Lourida, I.; Madan, C.R.; et al. Artificial Intelligence for Biomarker Discovery in Alzheimer’s Disease and Dementia. Alzheimer’s Dement. 2023, 19, 5860–5871. [Google Scholar] [CrossRef] [PubMed]
  11. Sperling, R.A.; Aisen, P.S.; Beckett, L.A.; Bennett, D.A.; Craft, S.; Fagan, A.M.; Iwatsubo, T.; Jack, C.R.; Kaye, J.; Montine, T.J.; et al. Toward Defining the Preclinical Stages of Alzheimer’s Disease: Recommendations from the National Institute on Aging-Alzheimer’s Association Workgroups on Diagnostic Guidelines for Alzheimer’s Disease. Alzheimer’s Dement. J. Alzheimer’s Assoc. 2011, 7, 280–292. [Google Scholar] [CrossRef] [PubMed]
  12. Jack, C.R.; Knopman, D.S.; Jagust, W.J.; Shaw, L.M.; Aisen, P.S.; Weiner, M.W.; Petersen, R.C.; Trojanowski, J.Q. Hypothetical Model of Dynamic Biomarkers of the Alzheimer’s Pathological Cascade. Lancet Neurol. 2010, 9, 119–128. [Google Scholar] [CrossRef] [PubMed]
  13. Folstein, M.F.; Folstein, S.E.; McHugh, P.R. “Mini-Mental State”: A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 1975, 12, 189–198. [Google Scholar] [CrossRef]
  14. Ponce-Bobadilla, A.V.; Schmitt, V.; Maier, C.S.; Mensing, S.; Stodtmann, S. Practical Guide to SHAP Analysis: Explaining Supervised Machine Learning Model Predictions in Drug Development. Clin. Transl. Sci. 2024, 17, e70056. [Google Scholar] [CrossRef]
  15. Huang, H.-H.; Li, J.; Cho, W.C. Editorial: Integrative Analysis for Complex Disease Biomarker Discovery. Front. Bioeng. Biotechnol. 2023, 11, 1273084. [Google Scholar] [CrossRef]
  16. Sears, S.M.; Hewett, S.J. Influence of Glutamate and GABA Transport on Brain Excitatory/Inhibitory Balance. Exp. Biol. Med. 2021, 246, 1069–1083. [Google Scholar] [CrossRef]
  17. Arnone, D.; Saraykar, S.; Salem, H.; Teixeira, A.L.; Dantzer, R.; Selvaraj, S. Role of Kynurenine Pathway and Its Metabolites in Mood Disorders: A Systematic Review and Meta-Analysis of Clinical Studies. Neurosci. Biobehav. Rev. 2018, 92, 477–485. [Google Scholar] [CrossRef]
  18. Chen, X.; Xu, D.; Yu, J.; Song, X.-J.; Li, X.; Cui, Y.-L. Tryptophan Metabolism Disorder-Triggered Diseases, Mechanisms, and Therapeutic Strategies: A Scientometric Review. Nutrients 2024, 16, 3380. [Google Scholar] [CrossRef] [PubMed]
  19. Teruya, T.; Chen, Y.-J.; Fukuji, Y.; Kondoh, H.; Yanagida, M. Whole-Blood Metabolomics of Dementia Patients Reveal Classes of Disease-Linked Metabolites. Proc. Natl. Acad. Sci. USA 2021, 118, e2022857118. [Google Scholar] [CrossRef]
  20. Kimura, N.; Sasaki, Y.; Masuda, T.; Ataka, T.; Eguchi, A.; Kakuma, T.; Matsubara, E. Lifestyle Factors That Affect Cognitive Function–a Longitudinal Objective Analysis. Front. Public Health 2023, 11, 1215419. [Google Scholar] [CrossRef]
  21. Del Pozo Cruz, B.; Ahmadi, M.; Naismith, S.L.; Stamatakis, E. Association of Daily Step Count and Intensity With Incident Dementia in 78 430 Adults Living in the UK. JAMA Neurol. 2022, 79, 1059. [Google Scholar] [CrossRef] [PubMed]
  22. Zlatar, Z.Z.; Godbole, S.; Takemoto, M.; Crist, K.; Sweet, C.M.C.; Kerr, J.; Rosenberg, D.E. Changes in Moderate Intensity Physical Activity Are Associated With Better Cognition in the Multilevel Intervention for Physical Activity in Retirement Communities (MIPARC) Study. Am. J. Geriatr. Psychiatry 2019, 27, 1110–1121. [Google Scholar] [CrossRef]
  23. Healy, G.N. Balancing Our Day for Heart Health. Eur. Heart J. 2024, 45, 472–474. [Google Scholar] [CrossRef]
  24. Kossowska, M.; Olejniczak, S.; Karbowiak, M.; Mosiej, W.; Zielińska, D.; Brzezicka, A. The Interplay between Gut Microbiota and Cognitive Functioning in the Healthy Aging Population: A Systematic Review. Nutrients 2024, 16, 852. [Google Scholar] [CrossRef]
  25. Stocker, H.; Beyer, L.; Trares, K.; Perna, L.; Rujescu, D.; Holleczek, B.; Beyreuther, K.; Gerwert, K.; Schöttker, B.; Brenner, H. Association of Kidney Function with Development of Alzheimer Disease and Other Dementias and Dementia-Related Blood Biomarkers. JAMA Netw. Open 2023, 6, e2252387. [Google Scholar] [CrossRef]
  26. Leeuw, F.A.; Tijms, B.M.; Doorduijn, A.S.; Hendriksen, H.M.A.; Rest, O.; Van Der Schueren, M.A.E.; Visser, M.; Den Heuvel, E.G.H.M.; Wijk, N.; Bierau, J.; et al. LDL Cholesterol and Uridine Levels in Blood Are Potential Nutritional Biomarkers for Clinical Progression in Alzheimer’s Disease: The NUDAD Project. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2020, 12, e12120. [Google Scholar] [CrossRef]
  27. Corso, G.; Cristofano, A.; Sapere, N.; la Marca, G.; Angiolillo, A.; Vitale, M.; Fratangelo, R.; Lombardi, T.; Porcile, C.; Intrieri, M.; et al. Serum Amino Acid Profiles in Normal Subjects and in Patients with or at Risk of Alzheimer Dementia. Dement. Geriatr. Cogn. Disord. Extra 2017, 7, 143–159. [Google Scholar] [CrossRef] [PubMed]
  28. Youn, C.; Caillaud, M.L.; Li, Y.; Gallagher, I.A.; Strasser, B.; Fuchs, D.; Tanaka, H.; Haley, A.P. Association between large neutral amino acids and white matter hyperintensities in middle-aged adults at varying metabolic risk. Brain Imaging Behav. 2024, 18, 1448–1456. [Google Scholar] [CrossRef]
  29. Hajjar, I.; Hayek, S.S.; Goldstein, F.C.; Martin, G.; Jones, D.P.; Quyyumi, A. Oxidative Stress Predicts Cognitive Decline with Aging in Healthy Adults: An Observational Study. J. Neuroinflamm. 2018, 15, 17. [Google Scholar] [CrossRef]
  30. Paul, B.D.; Sbodio, J.I.; Snyder, S.H. Cysteine Metabolism in Neuronal Redox Homeostasis. Trends Pharmacol. Sci. 2018, 39, 513–524. [Google Scholar] [CrossRef] [PubMed]
  31. Pham, T.K.; Buczek, W.A.; Mead, R.J.; Shaw, P.J.; Collins, M.O. Proteomic Approaches to Study Cysteine Oxidation: Applications in Neurodegenerative Diseases. Front. Mol. Neurosci. 2021, 14, 678837. [Google Scholar] [CrossRef]
  32. Cai, Z.; Yan, L.-J.; Li, K.; Quazi, S.H.; Zhao, B. Roles of AMP-Activated Protein Kinase in Alzheimer’s Disease. NeuroMol. Med. 2012, 14, 1–14. [Google Scholar] [CrossRef] [PubMed]
  33. Fan, L.-J.; Wang, F.-Y.; Zhao, J.-H.; Zhang, J.-J.; Li, Y.-A.; Tang, J.; Lin, T.; Wei, Q. From Physical Activity Patterns to Cognitive Status: Development and Validation of Novel Digital Biomarkers for Cognitive Assessment in Older Adults. Int. J. Behav. Nutr. Phys. Act. 2025, 22, 11. [Google Scholar] [CrossRef]
  34. Jamshed, M.; Shahzad, A.; Riaz, F.; Kim, K. Exploring Inertial Sensor-Based Balance Biomarkers for Early Detection of Mild Cognitive Impairment. Sci. Rep. 2024, 14, 9829. [Google Scholar] [CrossRef] [PubMed]
  35. Paolillo, E.W.; Saloner, R.; VandeBunte, A.; Lee, S.; Bennett, D.A.; Casaletto, K.B. Multimodal Lifestyle Engagement Patterns Support Cognitive Stability beyond Neuropathological Burden. Alzheimer’s Res. Ther. 2023, 15, 221. [Google Scholar] [CrossRef]
  36. Alam, A.B.; Wu, A.; Power, M.C.; West, N.A.; Alonso, A. Associations of Serum Uric Acid with Incident Dementia and Cognitive Decline in the ARIC-NCS Cohort. J. Neurol. Sci. 2020, 414, 116866. [Google Scholar] [CrossRef]
  37. Khan, A.A.; Quinn, T.J.; Hewitt, J.; Fan, Y.; Dawson, J. Serum Uric Acid Level and Association with Cognitive Impairment and Dementia: Systematic Review and Meta-Analysis. Age 2016, 38, 16. [Google Scholar] [CrossRef]
  38. Baumel, B.S.; Doraiswamy, P.M.; Sabbagh, M.; Wurtman, R. Potential Neuroregenerative and Neuroprotective Effects of Uridine/Choline-Enriched Multinutrient Dietary Intervention for Mild Cognitive Impairment: A Narrative Review. Neurol. Ther. 2021, 10, 43–60. [Google Scholar] [CrossRef] [PubMed]
  39. Wang, R.; Reddy, P.H. Role of Glutamate and NMDA Receptors in Alzheimer’s Disease. J. Alzheimer’s Dis. 2017, 57, 1041–1048. [Google Scholar] [CrossRef]
  40. Hynd, M. Glutamate-Mediated Excitotoxicity and Neurodegeneration in Alzheimer’s Disease. Neurochem. Int. 2004, 45, 583–595. [Google Scholar] [CrossRef]
  41. Karosas, T.; Wallace, T.C.; Li, M.; Pan, Y.; Agarwal, P.; Bennett, D.A.; Jacques, P.F.; Chung, M. Dietary Choline Intake and Risk of Alzheimer’s Dementia in Older Adults. J. Nutr. 2025, 155, 2322–2332. [Google Scholar] [CrossRef] [PubMed]
  42. Aguree, S.; Zolnoori, M.; Atwood, T.P.; Owora, A. Association between Choline Supplementation and Alzheimer’s Disease Risk: A Systematic Review Protocol. Front. Aging Neurosci. 2023, 15, 1242853. [Google Scholar] [CrossRef]
  43. Reiten, O.K.; Wilvang, M.A.; Mitchell, S.J.; Hu, Z.; Fang, E.F. Preclinical and Clinical Evidence of NAD+ Precursors in Health, Disease, and Ageing. Mech. Ageing Dev. 2021, 199, 111567. [Google Scholar] [CrossRef]
  44. Chandran, S.; Binninger, D. Role of Oxidative Stress, Methionine Oxidation and Methionine Sulfoxide Reductases (MSR) in Alzheimer’s Disease. Antioxidants 2023, 13, 21. [Google Scholar] [CrossRef]
  45. Liu, P.; Yang, Q.; Yu, N.; Cao, Y.; Wang, X.; Wang, Z.; Qiu, W.-Y.; Ma, C. Phenylalanine Metabolism Is Dysregulated in Human Hippocampus with Alzheimer’s Disease Related Pathological Changes. J. Alzheimer’s Dis. 2021, 83, 609–622. [Google Scholar] [CrossRef]
  46. Chatterjee, P.; Cheong, Y.; Bhatnagar, A.; Goozee, K.; Wu, Y.; McKay, M.; Martins, I.J.; Lim, W.L.F.; Pedrini, S.; Tegg, M.; et al. Plasma Metabolites Associated with Biomarker Evidence of Neurodegeneration in Cognitively Normal Older Adults. J. Neurochem. 2021, 159, 389–402. [Google Scholar] [CrossRef] [PubMed]
  47. Ju, Y.H.; Bhalla, M.; Hyeon, S.J.; Oh, J.E.; Yoo, S.; Chae, U.; Kwon, J.; Koh, W.; Lim, J.; Park, Y.M.; et al. Astrocytic Urea Cycle Detoxifies Aβ-Derived Ammonia While Impairing Memory in Alzheimer’s Disease. Cell Metab. 2022, 34, 1104–1120.e8. [Google Scholar] [CrossRef] [PubMed]
  48. Fernandes, B.S.; Inam, M.E.; Enduru, N.; Quevedo, J.; Zhao, Z. The Kynurenine Pathway in Alzheimer’s Disease: A Meta-Analysis of Central and Peripheral Levels. Braz. J. Psychiatry 2023, 45, 286–297. [Google Scholar] [CrossRef]
Figure 1. Stage 1 classification performance for Normal vs. Abnormal cognitive status using two modeling approaches under Leave-One-Out Cross-Validation (LOOCV). Panels (a,b) show the confusion matrix and ROC curve, respectively, for the tuned Imbalanced-Learn Random Forest model (AUC = 0.715), while panels (c,d) show the corresponding confusion matrix and ROC curve for the tuned logistic regression baseline (AUC = 0.707). Together, these plots illustrate that both models achieve comparable overall discrimination, with logistic regression yielding slightly improved sensitivity and macro-averaged metrics, whereas the Random Forest provides marginally higher ROC-AUC.
Figure 1. Stage 1 classification performance for Normal vs. Abnormal cognitive status using two modeling approaches under Leave-One-Out Cross-Validation (LOOCV). Panels (a,b) show the confusion matrix and ROC curve, respectively, for the tuned Imbalanced-Learn Random Forest model (AUC = 0.715), while panels (c,d) show the corresponding confusion matrix and ROC curve for the tuned logistic regression baseline (AUC = 0.707). Together, these plots illustrate that both models achieve comparable overall discrimination, with logistic regression yielding slightly improved sensitivity and macro-averaged metrics, whereas the Random Forest provides marginally higher ROC-AUC.
Aimed 01 00012 g001
Figure 2. Stage 2 classification performance for Impaired vs. Possible MCI cognitive status under Leave-One-Out Cross-Validation (LOOCV), comparing the tuned Imbalanced-Learn Random Forest and logistic regression models. Panels (a,b) display the confusion matrix and ROC curve, respectively, for the Random Forest model, which demonstrates excellent discrimination with AUC = 0.950 and near-perfect separation between severity categories. Panels (c,d) show the corresponding confusion matrix and ROC curve for the tuned logistic regression baseline, which achieves good overall accuracy but somewhat lower ROC-AUC and specificity for Impaired detection compared with the tree-based model.
Figure 2. Stage 2 classification performance for Impaired vs. Possible MCI cognitive status under Leave-One-Out Cross-Validation (LOOCV), comparing the tuned Imbalanced-Learn Random Forest and logistic regression models. Panels (a,b) display the confusion matrix and ROC curve, respectively, for the Random Forest model, which demonstrates excellent discrimination with AUC = 0.950 and near-perfect separation between severity categories. Panels (c,d) show the corresponding confusion matrix and ROC curve for the tuned logistic regression baseline, which achieves good overall accuracy but somewhat lower ROC-AUC and specificity for Impaired detection compared with the tree-based model.
Aimed 01 00012 g002
Figure 3. SHAP beeswarm plots summarizing the top 20 features contributing to Stage 1 classification between Normal and Abnormal cognitive status for both the optimized Imbalanced-Learn Random Forest and logistic regression models under LOOCV. Panel (a) shows the Random Forest model, while panel (b) shows the logistic regression model, with features ordered on the y-axis by mean absolute SHAP value (global importance) and each point representing an individual participant. Point colors indicate the underlying feature value (red = high, blue = low), and the x-axis SHAP values quantify each feature’s impact on the predicted probability of Abnormal status, allowing the comparison of shared and model-specific multimodal signals across the two modeling approaches. The (*) in the figure represents multiplication to convert the Ratio of D-Serine to Total Serine into percentage.
Figure 3. SHAP beeswarm plots summarizing the top 20 features contributing to Stage 1 classification between Normal and Abnormal cognitive status for both the optimized Imbalanced-Learn Random Forest and logistic regression models under LOOCV. Panel (a) shows the Random Forest model, while panel (b) shows the logistic regression model, with features ordered on the y-axis by mean absolute SHAP value (global importance) and each point representing an individual participant. Point colors indicate the underlying feature value (red = high, blue = low), and the x-axis SHAP values quantify each feature’s impact on the predicted probability of Abnormal status, allowing the comparison of shared and model-specific multimodal signals across the two modeling approaches. The (*) in the figure represents multiplication to convert the Ratio of D-Serine to Total Serine into percentage.
Aimed 01 00012 g003
Figure 4. Stage 2 SHAP beeswarm plots illustrating the top 20 predictive features for distinguishing Possible MCI from Impaired cognitive states for both the optimized Imbalanced-Learn Random Forest and logistic regression models under LOOCV. Panel (a) shows the Random Forest model, and panel (b) shows the logistic regression model, with features ordered by mean absolute SHAP value and each point representing an individual participant colored by the underlying feature value (red = high, blue = low). The horizontal SHAP value axis reflects the magnitude and direction of each feature’s contribution to the predicted probability of Impaired versus Possible MCI status, enabling a comparison of shared and model-specific severity-related signals across the two modeling approaches. The # after MONO/Monocyte represents absolute monocyte count during biochemical analysis of the blood.
Figure 4. Stage 2 SHAP beeswarm plots illustrating the top 20 predictive features for distinguishing Possible MCI from Impaired cognitive states for both the optimized Imbalanced-Learn Random Forest and logistic regression models under LOOCV. Panel (a) shows the Random Forest model, and panel (b) shows the logistic regression model, with features ordered by mean absolute SHAP value and each point representing an individual participant colored by the underlying feature value (red = high, blue = low). The horizontal SHAP value axis reflects the magnitude and direction of each feature’s contribution to the predicted probability of Impaired versus Possible MCI status, enabling a comparison of shared and model-specific severity-related signals across the two modeling approaches. The # after MONO/Monocyte represents absolute monocyte count during biochemical analysis of the blood.
Aimed 01 00012 g004
Figure 5. Games–Howell post hoc comparisons of significant biomarkers and activity parameters (p < 0.05, p is corrected by Welch’s degrees of freedom correction and the studentized range distribution) across cognitive groups. (a) Significant biochemical parameters; (b) significant amino acid markers; (c) significant metabolic parameters; and (d) significant physical activity features (including fluctuation coefficients) are shown. Only parameters with statistically significant group differences after multiple testing correction are displayed, highlighting multi-domain signatures distinguishing Impaired, Normal, and Possible MCI groups. Please note that the activity metrics were collected from 83 individuals.
Figure 5. Games–Howell post hoc comparisons of significant biomarkers and activity parameters (p < 0.05, p is corrected by Welch’s degrees of freedom correction and the studentized range distribution) across cognitive groups. (a) Significant biochemical parameters; (b) significant amino acid markers; (c) significant metabolic parameters; and (d) significant physical activity features (including fluctuation coefficients) are shown. Only parameters with statistically significant group differences after multiple testing correction are displayed, highlighting multi-domain signatures distinguishing Impaired, Normal, and Possible MCI groups. Please note that the activity metrics were collected from 83 individuals.
Aimed 01 00012 g005
Table 1. Inclusion and exclusion criteria for this study.
Table 1. Inclusion and exclusion criteria for this study.
Inclusion and Exclusion Criteria of the Subjects
Inclusion criteria
Subjects Who:
1.Community dwelling older individuals aged 75 to 83 years at the time of informed consent.
2.Able to provide voluntary written informed consent.
3.Not suffering from any serious medical condition.
4.Participants receiving standard treatment for chronic diseases such as diabetes, hypertension and dyslipidemia were eligible to remain in the study.
Exclusion criteria
Subjects who:
1.Suspected dementia (a score of 23 or lower in MMSE examination).
2.Certification of requiring nursing care at level 2.
3.Presence of any condition deemed unsuitable for participation by the investigators such as inability to stand unassisted, neurological, musculoskeletal, or connective tissue disorder, serious medical illnesses such as cancer or renal failure, and regular use of antibiotics at the time of recruitment.
4.Additionally, participants of exercise group were excluded if they had contradictions to physical activity including epilepsy, hernia, recent surgery or joint prostheses.
This table summarizes the eligibility criteria used to recruit community-dwelling older adults for this study. Inclusion criteria ensured participants were aged 75–83 years, capable of providing written informed consent, and free of serious acute medical conditions, while allowing stable chronic disease management (e.g., diabetes, hypertension, dyslipidemia). Exclusion criteria removed individuals with suspected dementia (MMSE ≤ 23), higher long-term care needs (care level ≥ 2), or medical/functional conditions likely to interfere with safe participation or valid assessments (e.g., inability to stand unassisted, serious neurological/musculoskeletal disorders, cancer, renal failure, or current antibiotic use).
Table 2. MMSE category.
Table 2. MMSE category.
MMSE InterpretationAssigned CategoryNumber of Individuals
MMSE score [‘18 month’] > 26Normal73
23 < MMSE score [‘18 month’] ≤ 26 or MMSE score [‘0 month’] − MMSE score [‘18 month’] ≥ 3Possible MCI20
MMSE score [‘18 month’] ≤ 23Impaired5
This table defines the Mini-Mental State Examination (MMSE) thresholds used to classify participants’ cognitive status at the end of follow-up. Participants with an MMSE score [‘18 month’] > 26 were categorized as Normal (n = 73), those with 23 < MMSE score [‘18 month’] ≤ 26 or MMSE score [‘0 month’] − MMSE score [‘18 month’] ≥ 3 as Possible MCI (n = 20), and those with MMSE score [‘18 month’] ≤ 23 as Impaired (n = 5). These categories were used as this study’s cognitive outcome labels for subsequent statistical comparisons and the two-stage classification pipeline.
Table 3. Optimized hyperparameters for the two-stage hierarchical Imbalanced-Learn Random Forest classification model and logistic regression models.
Table 3. Optimized hyperparameters for the two-stage hierarchical Imbalanced-Learn Random Forest classification model and logistic regression models.
ParameterStage 1Stage 2
Splitting criterionentropygini
max_depth1917
max_featuressqrtNone
min_impurity_decrease0.05N/A
min_samples_leaf13
min_samples_split1110
n_estimators361256
replacementTrueN/A
sampling_strategyallN/A
Optimized hyperparameters for the two-stage base Logistic regression classification model
C1.4828634.3524
penaltyl1l1
Class_weightNone‘Impaired’: 3.0,
‘Possible MCI’: 1.0
This table reports the final optimized hyperparameter settings for both the two-stage Imbalanced-Learn Random Forest classifiers and the corresponding baseline logistic regression models at each stage of the hierarchical pipeline. For Random Forest, Stage 1 (Normal vs. Abnormal screening) used an entropy splitting criterion, deeper trees (max_depth = 19), square-root feature subsampling (max_features = sqrt), and imbalance-handling settings (replacement = True, sampling_strategy = all), whereas Stage 2 (Pos-sible MCI vs. Impaired) used a Gini criterion, shallower but still relatively deep trees (max_depth = 17), evaluation of all features at each split (max_features = None), and larger terminal nodes (min_samples_leaf = 3) to limit overfitting within the small abnormal subset. The number of trees differed by stage (361 vs. 256), indicating distinct ensemble complexity requirements. For logistic regression, Stage 1 was tuned to an L1-regularized model with moderate regularization strength (C ≈ 1.48) and no explicit class weighting, while Stage 2 favored a strongly regularized L1 model (C ≈ 634.35) with manual up-weighting of the minority Impaired class (class_weight = {‘Impaired’: 3.0, ‘Possible MCI’: 1.0}) to improve sensitivity to more severe cognitive impairment. The N/A refers to Not applicable.
Table 4. Stage 1 classification performance metrics for Normal vs. Abnormal cognitive status.
Table 4. Stage 1 classification performance metrics for Normal vs. Abnormal cognitive status.
Stage 1 Performance Metrics
Imbalanced-Learn Random ForestLogistic Regression
Accuracy0.710.76
ROC-AUC0.720.71
Macro Precision0.610.67
Macro Recall0.600.66
Macro F1-Score0.600.67
Clinical Metrics
Sensitivity (Abnormal detection)0.360.48
Specificity (Normal detection)0.840.85
Positive predictive value0.430.52
Negative predictive value0.790.83
This table summarizes the predictive performance of the Stage 1 classifiers in distinguishing cognitively Normal from Abnormal participants, comparing the tuned Imbalanced-Learn Random Forest with the tuned logistic regression baseline under the same LOOCV procedure. Both models achieved moderate discriminative ability, with the Random Forest showing slightly higher ROC-AUC (0.72 vs. 0.71), while logistic regression achieved higher overall accuracy (0.76 vs. 0.71) and improved macro-averaged precision, recall, and F1-score. Clinically oriented metrics indicate that logistic regression provided higher sensitivity for Abnormal detection (0.48 vs. 0.36) and slightly higher specificity (0.85 vs. 0.84).
Table 5. Stage 2 classification performance metrics for Possible MCI vs. Impaired cognitive severity discrimination.
Table 5. Stage 2 classification performance metrics for Possible MCI vs. Impaired cognitive severity discrimination.
Stage 2 Performance Metrics
Imbalanced-Learn Random ForestLogistic Regression
Accuracy0.960.92
ROC-AUC0.950.82
Macro Precision0.920.88
Macro Recall0.980.88
Macro F1-Score0.940.88
Clinical Metrics
Sensitivity (Possible MCI detection)0.950.95
Specificity (Impaired detection)1.000.80
Positive predictive value1.000.95
Negative predictive value0.830.80
This table summarizes the predictive performance of the Stage 2 classifiers in distinguishing Possible MCI from Impaired cognitive status within the abnormal subgroup, comparing the tuned Imbalanced-Learn Random Forest with the tuned logistic regression baseline under the same LOOCV procedure. Random Forest achieved very high overall accuracy (0.96) and excellent discriminative ability (ROC-AUC = 0.950), with higher macro-averaged precision, recall, and F1-score than logistic regression (0.92/0.98/0.94 vs. 0.88/0.88/0.88), indicating stronger separation between severity categories in this small sample. Clinically oriented metrics show that both models maintained high sensitivity for Possible MCI detection (0.95), while Random Forest provided perfect specificity for Impaired detection (1.00 vs. 0.80) and higher positive and negative predictive values, suggesting superior rule-in performance for Impaired status compared with the simpler logistic model.
Table 6. Descriptive statistics of top 20 parameters according to the MMSE category from Stage 1 and Stage 2 SHAP analyses of Balanced Forest models.
Table 6. Descriptive statistics of top 20 parameters according to the MMSE category from Stage 1 and Stage 2 SHAP analyses of Balanced Forest models.
FeatureMMSE CategoryCountMean ± SDMedianIQR_LowerIQR_UpperIQR_Range
UAImpaired55.48 ± 1.126.15.36.25.30–6.20
Normal735.47 ± 1.285.54.86.24.80–6.20
Possible MCI205.72 ± 1.715.75.36.95.28–6.93
CREImpaired50.72 ± 0.100.80.70.80.72–0.76
Normal730.89 ± 0.300.80.71.00.71–0.97
Possible MCI200.94 ± 0.380.80.80.90.76–0.94
Number of steps per minuteImpaired560.10 ± 26.1871.241.572.041.49–71.95
Normal6366.83 ± 15.5966.758.378.058.25–77.99
Possible MCI1550.24 ± 12.9751.040.459.440.38–59.43
EX fluctuation coefficient 1Impaired50.44 ± 0.140.40.40.40.35–0.44
Normal630.36 ± 0.120.40.30.40.27–0.43
Possible MCI150.28 ± 0.110.30.20.40.19–0.36
D-Ser, nmol/mL(uM)Impaired52.68 ± 0.752.42.12.92.13–2.90
Normal732.52 ± 0.682.42.03.02.00–2.96
Possible MCI203.09 ± 1.312.72.43.22.40–3.21
Ser(D/(D + L) × 100)Impaired52.60 ± 0.542.72.13.02.11–3.01
Normal732.16 ± 0.522.11.82.41.76–2.45
Possible MCI202.85 ± 1.312.42.22.82.19–2.78
Ala(D/(D + L) × 100)Impaired51.11 ± 0.521.40.71.50.68–1.51
Normal730.80 ± 0.530.60.51.00.45–1.02
Possible MCI200.94 ± 0.950.60.50.90.52–0.86
L-Asn, nmol/mL(µM)Impaired560.09 ± 13.2853.852.062.351.96–62.33
Normal7353.02 ± 7.9251.847.857.347.77–57.25
Possible MCI2048.48 ± 6.3648.545.850.745.78–50.74
Cysteine Impaired50.21 ± 0.080.20.20.30.16–0.27
Normal730.21 ± 0.080.20.20.30.17–0.26
Possible MCI200.20 ± 0.060.20.20.20.17–0.24
Creatinine Impaired549.35 ± 7.1547.045.754.145.66–54.08
Normal7361.60 ± 13.4159.552.867.152.81–67.09
Possible MCI2070.04 ± 18.6264.659.375.459.25–75.37
Alanine Impaired552.26 ± 14.4247.942.164.042.08–63.99
Normal7350.64 ± 11.2249.342.458.542.35–58.53
Possible MCI2052.39 ± 7.7454.349.057.849.01–57.76
Adenosine monophosphate Impaired50.38 ± 0.300.50.10.60.11–0.60
Normal730.39 ± 0.250.40.20.50.19–0.51
Possible MCI200.32 ± 0.230.30.10.40.14–0.41
HGBImpaired514.98 ± 2.0514.614.216.414.20–16.40
Normal7313.88 ± 1.3213.913.014.813.00–14.80
Possible MCI2013.29 ± 1.4913.612.214.212.15–14.20
RDW-CVImpaired512.80 ± 0.3512.912.813.012.80–13.00
Normal7312.93 ± 0.7512.912.513.212.50–13.20
Possible MCI2012.87 ± 0.5912.812.513.212.50–13.20
EXImpaired53.19 ± 2.213.91.84.41.78–4.35
Normal634.63 ± 2.664.22.66.22.61–6.23
Possible MCI154.55 ± 1.545.13.55.63.49–5.60
Activity(Ex)Impaired51.95 ± 1.261.81.62.81.57–2.75
Normal632.74 ± 1.442.41.83.41.83–3.39
Possible MCI153.49 ± 1.343.72.54.62.48–4.56
Walking time (Minutes)Impaired562.00 ± 43.1076.034.776.734.67–76.67
Normal6385.47 ± 46.9776.746.3118.246.33–118.17
Possible MCI1589.11 ± 26.3879.071.7100.771.67–100.67
Number of stepsImpaired54624.13 ± 4113.535408.71438.35516.31438.33–5516.33
Normal635770.38 ± 3584.584866.72727.57354.52727.50–7354.50
Possible MCI154397.16 ± 1497.804127.03457.25099.83457.17–5099.83
L-Glu, nmol/mL(µM)Impaired545.30 ± 17.6650.143.057.643.01–57.64
Normal7336.00 ± 15.0833.924.145.424.13–45.37
Possible MCI2037.82 ± 14.8637.724.845.424.80–45.40
L-Met, nmol/mL(µM)Impaired528.21 ± 7.6730.221.032.921.04–32.91
Normal7326.01 ± 4.2525.523.128.623.11–28.64
Possible MCI2024.69 ± 3.5224.222.627.722.61–27.67
ALBImpaired54.52 ± 0.114.64.44.64.40–4.60
Normal734.64 ± 0.304.74.54.84.50–4.80
Possible MCI204.42 ± 0.244.44.34.64.28–4.60
GLUImpaired5104.00 ± 12.59102.0100.0102.0100.00–102.00
Normal73102.41 ± 16.5098.092.0109.092.00–109.00
Possible MCI20103.85 ± 13.86100.096.0108.396.00–108.25
Uridine Impaired51.06 ± 0.400.80.81.30.79–1.31
Normal731.27 ± 0.341.31.01.51.04–1.46
Possible MCI201.21 ± 0.261.11.01.31.04–1.35
Uric acid Impaired51.74 ± 0.411.81.42.01.36–2.00
Normal731.80 ± 0.411.81.52.11.52–2.11
Possible MCI201.95 ± 0.472.01.72.21.74–2.23
Choline Impaired55.60 ± 1.835.04.85.94.80–5.95
Normal737.41 ± 1.657.36.08.56.04–8.45
Possible MCI207.52 ± 1.577.36.49.06.35–8.97
Carnitine Impaired550.10 ± 10.0951.645.756.645.74–56.57
Normal7349.24 ± 12.1450.740.957.940.89–57.88
Possible MCI2055.75 ± 8.9258.050.061.349.97–61.35
3 METs or more fluctuation coefficient 5Impaired50.47 ± 0.300.40.30.40.32–0.43
Normal630.86 ± 4.150.30.30.40.26–0.42
Possible MCI150.26 ± 0.080.30.20.30.23–0.31
4 METs or more fluctuation coefficient 5Impaired40.77 ± 0.720.40.40.80.40–0.77
Normal630.67 ± 1.260.50.40.70.38–0.66
Possible MCI150.39 ± 0.170.40.30.50.25–0.51
Methionine sulfoxide Impaired50.24 ± 0.070.30.20.30.19–0.28
Normal730.21 ± 0.090.20.20.30.15–0.26
Possible MCI200.24 ± 0.150.20.20.20.16–0.25
Methionine Impaired510.87 ± 2.6911.98.012.98.00–12.94
Normal7310.70 ± 2.3110.38.912.08.91–11.97
Possible MCI2011.11 ± 1.7810.79.712.29.75–12.22
Phenylalanine Impaired5302.99 ± 40.85286.4275.8308.8275.79–308.81
Normal73327.05 ± 67.01315.9278.9352.6278.91–352.58
Possible MCI20336.63 ± 51.60330.5310.1357.1310.08–357.10
Proline Impaired54.07 ± 1.363.73.55.13.51–5.12
Normal734.18 ± 1.244.03.44.53.42–4.51
Possible MCI204.17 ± 1.233.93.54.43.49–4.44
Ornithine Impaired512.61 ± 2.9213.811.614.711.63–14.70
Normal7314.22 ± 3.9515.011.216.711.19–16.69
Possible MCI2015.58 ± 7.3615.111.417.311.35–17.27
Niacinamide Impaired50.36 ± 0.090.40.30.40.30–0.36
Normal730.47 ± 0.170.50.30.60.34–0.60
Possible MCI200.39 ± 0.200.40.30.40.28–0.40
MONO#Impaired55.00 ± 1.524.14.15.84.10–5.80
Normal734.73 ± 1.234.73.85.43.80–5.40
Possible MCI205.00 ± 1.724.83.95.83.90–5.80
LYMPH%Impaired533.92 ± 6.9435.928.538.828.50–38.80
Normal7337.24 ± 8.4036.732.743.132.70–43.10
Possible MCI2034.35 ± 7.3335.229.339.729.32–39.70
Threonine Impaired512.12 ± 1.4212.111.913.311.91–13.27
Normal7312.96 ± 2.6312.811.114.611.10–14.61
Possible MCI2011.76 ± 2.8011.510.313.210.33–13.21
Tryptophan Impaired582.77 ± 20.6674.169.790.369.71–90.28
Normal7381.01 ± 14.8077.371.389.271.31–89.20
Possible MCI2077.88 ± 16.8676.665.787.565.68–87.47
2-Aminobutyric acid Impaired53.48 ± 1.323.02.64.02.58–3.99
Normal734.58 ± 1.224.33.65.33.64–5.27
Possible MCI204.76 ± 1.574.63.95.23.93–5.15
4-Aminobutyric acid Impaired50.03 ± 0.010.00.00.00.02–0.04
Normal730.03 ± 0.010.00.00.00.02–0.04
Possible MCI200.03 ± 0.010.00.00.00.02–0.04
Please note that the activity metrics were collected from 83 individuals because of the failure of measurement device for 1 month.
Table 7. Details of Games–Howell test on cognitive categories with corrected p-values.
Table 7. Details of Games–Howell test on cognitive categories with corrected p-values.
Data Mode-Wise Games–Howell Multi-Comparison Test Between Cognitive Categories
Biochemical Data
Name of the ParameterColumn AColumn BMean(A)Mean(B)DiffSETdfpvalhedges
CKImpairedNormal61116.49−55.498.48−6.5421.630.00004−1.024
ImpairedPossible MCI61130.3−69.317.47−3.9722.060.0018−0.99
TPNormalPossible MCI7.737.430.310.102.9734.850.01440.678
ALBNormalPossible MCI4.634.430.210.063.3236.360.00570.74
CREImpairedNormal0.720.89−0.170.06−3.1010.630.0261−0.59
RBCNormalPossible MCI446.95411.5535.4010.813.2727.750.00770.88
MCVNormalPossible MCI93.4596.28−2.821.10−2.5624.900.0432−0.76
Amino Acid
Column AColumn BMean(A)Mean(B)DiffSETdfpvalhedges
L-AsnNormalPossible MCI53.0248.494.531.702.6736.800.030.59
L-ThrNormalPossible MCI118.86107.5411.314.392.5741.950.0360.53
GlyImpairedNormal236.09288.38−52.3013.05−4.0110.200.006−0.77
ImpairedNormal236.09284.92−48.8417.34−2.8218.870.03−0.82
Metabolites
Column AColumn BMean(A)Mean(B)DiffSETdfpvalhedges
CarnitineNormalPossible MCI49.2455.75−6.512.45−2.6640.460.0295−0.56
CreatinineImpairedNormal49.3561.60−12.253.56−3.446.140.031−0.92
ImpairedPossible MCI49.3570.04−20.695.25−3.9418.090.0026−1.16
KynurenineImpairedPossible MCI1.822.37−0.540.18−2.9514.360.026−0.94
Activity
Column AColumn BMean(A)Mean(B)DiffSETdfpvalhedges
5 METs~6 METsImpairedNormal1.263.50−2.230.70−3.1817.350.014−0.55
Total walking exercise (Ex)NormalPossible MCI1.891.060.830.342.4446.050.0450.46
Number of steps per minuteNormalPossible MCI66.8350.2416.593.884.2724.630.00071.08
EX fluctuation coefficient 1NormalPossible MCI0.360.280.080.032.6222.860.0390.70
EX fluctuation coefficient 2NormalPossible MCI0.350.210.140.043.4140.340.0040.68
EX fluctuation coefficient 4NormalPossible MCI0.400.300.110.042.9171.860.0130.47
EX fluctuation coefficient 5NormalPossible MCI0.370.270.100.042.8466.560.0160.47
Please note that the activity metrics were collected from 83 individuals because of the failure of measurement device for 1 month.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fahad, A.H.M.; Nakatsui, M.; Abe, T.; Hayano, T.; Mahbub, M.H.; Hase, R.; Yamaguchi, N.; Hayakawa, Y.; Inohana, Y.; Umakoshi, Y.; et al. Explainable AI-Driven Identification of Multimodal Biomarkers for Early Prediction of Cognitive Decline. AI Med. 2026, 1, 12. https://doi.org/10.3390/aimed1020012

AMA Style

Fahad AHM, Nakatsui M, Abe T, Hayano T, Mahbub MH, Hase R, Yamaguchi N, Hayakawa Y, Inohana Y, Umakoshi Y, et al. Explainable AI-Driven Identification of Multimodal Biomarkers for Early Prediction of Cognitive Decline. AI in Medicine. 2026; 1(2):12. https://doi.org/10.3390/aimed1020012

Chicago/Turabian Style

Fahad, A. H. M., Masahiko Nakatsui, Takeshi Abe, Takahide Hayano, M. H. Mahbub, Ryosuke Hase, Natsu Yamaguchi, Yoshihiro Hayakawa, Yusuke Inohana, Yutaka Umakoshi, and et al. 2026. "Explainable AI-Driven Identification of Multimodal Biomarkers for Early Prediction of Cognitive Decline" AI in Medicine 1, no. 2: 12. https://doi.org/10.3390/aimed1020012

APA Style

Fahad, A. H. M., Nakatsui, M., Abe, T., Hayano, T., Mahbub, M. H., Hase, R., Yamaguchi, N., Hayakawa, Y., Inohana, Y., Umakoshi, Y., Yamaguchi, R., Kimura, R., Tsujimura, H., Matsumoto, M., Higashijima, F., Yoshimoto, T., Kimura, K., Hirano, T., Ohishi, K., ... Asai, Y. (2026). Explainable AI-Driven Identification of Multimodal Biomarkers for Early Prediction of Cognitive Decline. AI in Medicine, 1(2), 12. https://doi.org/10.3390/aimed1020012

Article Metrics

Back to TopTop