This study developed a two-stage, explainable machine learning framework to predict 18-month MMSE-based cognitive status from baseline multimodal data in community-dwelling older adults in Japan. A hierarchical design was used in which Stage 1 distinguished cognitively Normal participants from those with any abnormality (Possible Mild Cognitive Impairment (MCI) or Impaired), and Stage 2 further separated Possible MCI from Impaired within the abnormal subgroup. Both an Imbalanced-Learn Random Forest and a penalized logistic regression baseline were trained under Leave-One-Out Cross-Validation, yielding fair discrimination in Stage 1 (Random Forest AUC = 0.72, accuracy = 0.71; logistic regression AUC = 0.71, accuracy = 0.76) and apparently strong separability in Stage 2 (Random Forest AUC = 0.95, accuracy = 0.96; logistic regression AUC = 0.82, accuracy = 0.92) in a small sample size with high class imbalance. SHapley Additive exPlanations (SHAP) with TreeExplainer for Random Forest and LinearExplainer for logistic regression were used to identify interpretable biomarkers at each stage though feature attribution. In Stage 1, both models highlighted renal and systemic metabolic markers (e.g., creatinine, uric acid, blood urea nitrogen), amino acid and redox-related metabolites (including D-serine, D-amino acid proportions, L-asparagine, alanine, L-glutamic acid, cysteine, methionine sulfoxide), and wearable-derived activity variability (e.g., fluctuation coefficients and steps per minute), with the Simpson index of gut microbiome diversity also contributing in the logistic model. In Stage 2, the models converged on a distinct signature involving glucose and albumin, uric acid and uridine, choline and carnitine, multiple amino acids (such as phenylalanine, proline, ornithine, tryptophan, threonine, and short-chain amino acids), oxidative/energy markers (niacinamide, methionine, methionine sulfoxide, ergothioneine), hematologic indices, and high-MET activity fluctuation metrics. Collectively, these results support a stage-dependent, multisystem view of cognitive aging in which broad renal–metabolic, amino acid, and behavioral vulnerabilities characterize early abnormality, whereas more pronounced alterations in energy metabolism, nucleotide and choline pathways, oxidative stress, and activity irregularity accompany progression from Possible MCI to Impaired status. By combining routine clinical chemistry, targeted metabolomics, gut microbiome diversity, and wearable-derived behavioral measures within an explainable AI framework, this two-stage approach illustrates a scalable, biologically grounded strategy for stage-aware risk stratification and monitoring of cognitive decline in community settings.
Full article