1. Introduction
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the most common cause of dementia, representing a major global health and socioeconomic challenge. It is characterized by widespread brain atrophy, particularly within the hippocampus and medial temporal lobes, leading to progressive memory and executive function impairment. Over seven million Americans are currently living with AD, and this number is projected to exceed 13 million by 2050 [
1]. Patients with AD show significant impairment in memory and executive functioning. It is characterized by progressive neurodegeneration with prominent medial temporal lobe involvement [
2,
3]. Among temporal imaging markers, hippocampal atrophy is one of the most sensitive and specific disease severity and progression indicators [
4]. Hippocampal atrophy is widely used in research and clinical practice for diagnosis, monitoring, and clinical trial enrolment. Large-scale studies have shown that longitudinal MRI-derived hippocampal measurements capture clinically meaningful trajectories of the AD spectrum [
5,
6,
7]. Patients who convert from MCI to AD showed 1.3% higher hippocampal atrophy compared to stable MCI [
8]. Notably, an autopsy validated work demonstrated that MRI-derived hippocampal volume is a sensitive and specific index of AD neuropathology. Hippocampal volume correlates with neuritic plaque burden and neuritic tangle stage postmortem, a surrogate of disease neuropathology stage [
9].
Despite this progress, a significant unmet need persists in determining which individuals are most at risk for rapid hippocampal volume decline. That stratification could help structure the visit timeline, therapeutic timeline (anti-amyloid or vascular risk management strategies), and stratification for clinical trials [
10]. While previous prognostic models have used multimodal biomarkers like genetics, PET, MRI, and clinical data or imaging deep learning, many of these methods are still hard to implement in everyday clinical practice because of their cost and complex Pipelines [
11,
12]. In addition, many high-performance ML systems are also “black boxes,” limiting transparency and clinical trust. Efforts toward quantitatively interpretable architectures aim to bridge this gap by exposing feature contributions [
13,
14]. Recent studies confirm that although many prediction models report high performance, most remain unvalidated externally, and their complexity challenges translation to routine practice [
15,
16]. Therefore, this underscores the benefit of intrinsically interpretable models in high-stakes domains like neurology [
17]. Given the large patient volume and the high costs and delays associated with complex tests like PET scans and genetic analysis, there is an urgent demand for interpretable standard ML frameworks that produce biologically plausible predictions from routinely available data. Beyond the standard amyloid and tau measures, vascular and demographic features influence neurodegeneration [
18]. Age and sex are the strong non-modifiable risk factors, while blood pressure has been reported to accelerate atrophy through various hemodynamic mechanisms. Women’s brain atrophy rates were roughly 1.5% higher than men’s, while younger individuals exhibited atrophy rates about 1% higher than older ones due to higher tau levels [
11,
18]. The role of blood pressure (BP) in hippocampal atrophy is highly complex. High BP in midlife has been associated with increased brain atrophy later in life. Studies on older individuals have shown the reverse, where low BP was associated with enhanced neurodegeneration [
19]. Furthermore, high blood pressure is often associated with white matter lesions (WMLs), contributing to brain atrophy [
19,
20]. These variables are readily available, which justifies their inclusion in prognostic models.
Recent evidence also implicates erythrocyte load in cerebrospinal fluid (CSF) as relevant to structural neurodegeneration. In our prior study, elevated CSF erythrocytes were associated with greater hippocampal atrophy in AD patients, suggesting that CSF erythrocyte load (CTRED) may carry prognostic information for neurodegenerative diseases [
21,
22]. This finding supports the increasing research linking erythrocyte-derived and iron-related processes to AD pathophysiology. Iron metabolism and erythrocyte balance disruptions, such as elevated hemoglobin, ferritin, and heme oxygenase-1 activity, are connected to higher amyloid levels, hippocampal shrinking, and worsening cognitive function [
23,
24,
25]. Mechanistically, erythrocytes’ oxidative stress and redox imbalance can impair oxygen transport and trigger peroxidative damage in the cerebral microvasculature [
26,
27,
28]. Dysregulated iron handling contributes to ferroptosis, microvascular fragility, and accumulation of paramagnetic iron species, which can be detected using susceptibility-based MRI techniques [
29]. Elevated CSF iron species have also been linked to dementia risk in population studies, supporting that erythrocyte-related biomarkers reflect ongoing vascular and metabolic injury rather than procedural contamination [
24]. Metabolic and antioxidant abnormalities in circulating erythrocytes can impair cerebral oxygen delivery and promote downstream neurodegeneration, positioning RBC-related measures as potential risk indicators for AD [
30]. These findings position CTRED as a biologically plausible and reliable surrogate marker of neurovascular integrity, oxidative stress, and hippocampal vulnerability in AD.
Machine learning (ML) is becoming increasingly prevalent in AD prognosis. However, interpretable models and small footprints (penalized linear methods and SVMs) are more feasible to validate and adopt clinically than data-intensive models [
31,
32]. Explainability methods, such as Permutation Feature Importance (PFI) and SHAP, reveal the importance of each variable in risk estimation [
33].
This study aimed to determine if a simple, interpretable machine-learning (ML) model can predict hippocampal volume decline in AD using routine clinical and laboratory data. The model integrates cerebrospinal fluid erythrocyte load (CTRED), MAPres, age, and sex to classify patients as high or low risk for ongoing hippocampal atrophy. We hypothesized that vascular and hematologic markers, particularly MAPres and CTRED, would independently improve predictions of structural decline, emphasizing the link between vascular dysregulation, microvascular health, and neurodegeneration in AD. A comprehensive literature review revealed no previous ML studies using a CSF erythrocyte load to predict hippocampal volume loss in AD. Most research on CSF erythrocytes concentrates on their influence on other biomarkers, not structural atrophy prediction. Thus, we aim to develop and assess an interpretable ML model combining CTRED, MAPres, age, and sex to categorize AD patients by their risk of hippocampal degeneration [
34,
35].
2. Materials and Methods
2.1. Study Design and Objective
This study employed a reproducible, interpretable machine-learning (ML) pipeline to stratify Alzheimer’s disease (AD) patients by risk of hippocampal volume decline using routine clinical and cerebrospinal fluid (CSF) variables. The approach was guided by the previous literature demonstrating that small-footprint linear models can provide transparent and clinically plausible predictions in neurodegenerative research [
14,
31,
32]. In particular, we focused on standardized, low-dimensional predictors such as mean arterial pressure (MAPres), CSF erythrocyte load (CTRED), age, and sex, which have been independently associated with hippocampal atrophy and vascular dysregulation in AD [
10,
18,
19,
20,
22]. A summary of the cohort characteristics is shown in
Table 1.
Table 2 provides an overview of the main methodological stages, linking each to prior methodological precedents. All steps, from data extraction to model explainability, were implemented using scikit-learn (1.4.2) and Optuna (4.4.0), emphasizing reproducibility and interpretability over algorithmic complexity.
2.2. Data Source and Ethics
Data were obtained from the ADNI dataset (
http://adni.loni.usc.edu, accessed on 9 March 2025). The ADNI project was launched in 2003 as a public–private partnership with the primary goal of testing whether clinical, imaging, genetic, and biochemical biomarkers can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s Disease (AD). All participants gave written informed consent for data collection and sharing during enrolment. The study protocols and consent forms were approved by each participating institution’s institutional review boards (IRBs).
2.3. Eligibility and Cohort Construction
Inclusion in the study required at least two visits, each with an MRI taken, along with a spinal tap and recorded vital signs, at a temporal proximity to the MRIs of 6 months at most.
2.4. Outcome Definition (Annual Percentage Change, APC)
Risk of hippocampal volume decline is quantified as the per-subject Annual Percentage Change (APC) of normalized average hippocampal volume, which is defined as
The left and right hippocampus volumes and the total brain volume were measured using FreeSurfer, as described in [
22]. The available subjects were 26 and separated into two groups, labeled as low and high risk, based on the APC value. Specifically, subjects above the 50th percentile were labeled as high risk, whereas those below the 50th percentile were labeled as low risk, as seen in
Figure 1. The corresponding APC value at the 50th percentile is −1.11%, and low-risk subjects are denoted by label 0, contrary to higher-risk subjects, denoted by label 1.
2.5. Predictor Variables
Baseline predictors were age, sex, cerebrospinal fluid erythrocyte load (CTRED), and mean arterial pressure (MAPres), measured at the earliest qualifying (baseline) visit within ≤6 months of the baseline MRI. The modeling objective was to classify subjects into low vs. high risk of hippocampal volume decline. Data were split into training (n = 20) and test (n = 6) sets.
2.6. Preprocessing
Continuous predictors (CTRED, age, and MAPres) were z-scaled within scikit-learn Pipelines using means/SDs that only fit on the training split and were applied unchanged to the test split to prevent information leakage. Sex was encoded as a binary indicator (male = 0, female = 1). MAPres was computed from baseline systolic and diastolic blood pressures as defined in Equation (3), using measurements recorded within ≤6 months of the baseline MRI (per eligibility criteria). No imputation was required for the final analytic cohort; participants with missing predictor values were excluded during cohort construction. No outlier removal or winsorization was performed. Feature-importance rankings remained qualitatively consistent after alternative CTRED weighting schemes, reinforcing that our findings are unlikely to stem solely from procedural artifacts. This reliability is supported by ADNI’s standardized CSF collection procedures, which enforce strict sample-handling protocols to minimize erythrocyte contamination (e.g., atraumatic LP techniques and pre-analytical controls). All transformations (scaling and encoding) were encapsulated in the model Pipelines used during cross-validation and final fitting. Class balance was not altered at the preprocessing stage; model-level class_weight settings were used where applicable (see
Section 2.7). Outcome derivations (hippocampal normalization and APC) are detailed in
Section 2.4.
2.7. Models
We evaluated three linear classifiers, two of which comprise a soft-voting ensemble. All models were implemented in scikit-learn within Pipelines that applied z-scaling to continuous inputs (see
Section 2.6).
- (i)
Support Vector Machine (SVM)
A linear SVM was built using SVC (kernel = “linear”) inside a Pipeline with StandardScaler. We set class_weight = “balanced” and enabled Platt scaling (probability = True) to obtain calibrated probabilities for downstream ensembling.
- (ii)
Logistic Regression
A penalized logistic regression classifier was implemented in a Pipeline with StandardScaler. Coefficients were estimated by maximum likelihood under L1 or L2 penalty, with solver chosen among liblinear or saga during tuning. Class weights were explored as either “none” or “balanced” in the search space.
- (iii)
Ridge Classifier
A Ridge Classifier was implemented in a Pipeline with StandardScaler, using class_weight = “balanced”. Because Ridge Classifier exposes a decision function rather than calibrated probabilities, it was not included in the probability-averaging ensemble; the regularization strength α was tuned in
Section 2.8.
- (iv)
Soft-Voting Ensemble
The ensemble combined the SVM and logistic regression by averaging their predicted probabilities for the positive class to produce a single ensemble probability. A single decision threshold was selected on the training set under a 10–90% predicted-positive-rate constraint when no candidate satisfied the constraint, Youden’s J (TPR + TNR − 1) was a fallback (
Section 2.9).
2.8. Hyperparameter Optimization
Hyperparameter tuning was performed with Optuna (Tree-structured Parzen Estimator, TPE), running 500 trials per model on the training split only. During tuning, performance was estimated by stratified k-fold cross-validation on the training data; the model with the best mean CV score was refit on the full training set. Out-of-fold (OOF) probabilities from the training split were retained for decision-threshold selection (
Section 2.9).
Search space (log-uniform):
CV objective: F1-score (mean over folds)
Implementation details: SVC (kernel = “linear”, class_weight = “balanced”, probability = True) inside a Pipeline with StandardScaler.
Search space: penalty ∈ {L1, L2}; solver ∈ {liblinear, saga}; (log-uniform); max_iter ∈ [500, 5000]; class_weight ∈ {None, balanced}.
CV objective: F1-score (mean over folds)
CV folds: k capped by the minority-class count (min 2, max 5).
Search space (log-uniform):
CV objective: F1-score (mean over folds).
Note: RidgeClassifier outputs a decision function (not calibrated probabilities) and was therefore excluded from probability averaging in the ensemble.
To ensure reproducibility, fixed random seeds were used where applicable, and all preprocessing (standardization) was embedded within the model Pipelines invoked by Optuna so that each CV fold applied identical transformations learned inside the fold.
2.9. Decision Thresholding
For logistic regression, hyperparameters were tuned by CV ROC-AUC. Out-of-fold probabilities were used to sweep thresholds t ∈ [0.01, 0.99], selecting the one that minimized the right-hand side of (Equation (4)) under a 10–90% positive-rate constraint; fallback was Youden’s J, as defined in Equation (5). This fixed threshold was applied to the refit model for test evaluation.
For the linear SVM, probabilities were obtained with Platt scaling (probability = True). Results are reported using the default 0.5 cutoff, and no custom thresholding was applied.
For the soft-voting ensemble, model probabilities were averaged, and the threshold was tuned during training to maximize accuracy under the same 10–90% positive-rate constraint, again with Youden’s J as fallback.
2.10. Evaluation and Metrics
Model performance was assessed on the held-out test set (n = 6) using threshold-independent and threshold-dependent metrics.
For transparency in a small-N setting, we additionally report training-split performance for the ensemble (AUC and accuracy) while emphasizing the held-out test results as the primary estimate of generalization.
All metrics were computed with scikit-learn confusion matrices, and class-wise metrics reflect the fixed thresholds determined on the training split.
2.11. Software and Reproducibility
Analyses were conducted in Python (v3.12.3) using scikit-learn (v1.4.2) for model development, Optuna (v4.4.0) for hyperparameter optimization, and PFI for explainability. All continuous-feature standardization and categorical encodings were encapsulated in scikit-learn Pipelines to avoid data leakage (train-only fitting applied to test data). Random seeds were fixed where applicable to enhance reproducibility.
4. Discussion
Given the unmet need to stratify patients with a high risk of hippocampal decline using accessible clinical data, we employed a simple approach. Previous methods used proteomics, clinical data, and imaging features to predict AD progression or specifically focus on hippocampal atrophy [
10,
36,
37]. We used a compact, readily available feature set CSF CTRED, age, sex, and MAPres, to develop interpretable classifiers that differentiate AD patients into high or low risk of future hippocampal volume decline. On the held-out test set (n = 6), all models produced identical class assignments (i.e., the same confusion matrix and class-specific Precision/Recall/F1), while threshold-independent discrimination (AUC) separated them: Ridge = 1.000, logistic regression = 0.889, soft-voting ensemble = 0.889, and linear SVM = 0.667. Ensemble performance on the training split (AUC = 0.910; accuracy = 0.850) and the test split (AUC = 0.889, accuracy = 0.833) further supports the consistency of the learned signal across thresholds. Permutation Feature Importance confirmed a consistent pattern: MAPres and sex had the most significant influence on risk classification, CTRED added positive discriminative value across its range, and age contributed a more negligible, monotonic effect.
These findings demonstrate that, using available information, a small, transparent modeling approach can generate coherent and clinically plausible risk stratification of hippocampal atrophy. However, estimates, especially the perfect Ridge AUC, should be interpreted cautiously due to the small sizes of the training and test sets.
The feature-importance pattern revealed by PFI and SHAP indicated MAPres and sex as dominant contributors, with CTRED and age showing secondary yet consistent effects of smaller magnitude. This feature pattern aligns biologically with hippocampal degeneration in AD. MAPres is a significant contributor to classification, indicating its importance in predicting atrophy risk. The relationship between BP and AD is highly complex and controversial [
38]. Prior studies suggest that elevated blood pressure and hypotension can be associated with hippocampal atrophy and cognitive decline through different mechanisms [
39]. Higher baseline BP was linked to greater subsequent hippocampal atrophy, while reduced systolic blood pressure led to faster declines in MMSE scores. Notably, no relationship was found between normotensive patients and cognitive scores [
40]. High BP damages the cerebral blood vessels’ structure and function, which can harm white matter regions critical for cognitive function. Studies suggest that cumulative BP effects on cerebral vasculature over time may worsen the situation, though evidence that antihypertensive treatment improves cognition or slows atrophy in AD remains inconclusive [
34]. Few studies have examined the relationship of low BP with AD, but population-based data indicate that low BP predisposes individuals to AD and dementia. Importantly, patients with AD can have lower BP as a consequence of degeneration in the autonomic nervous system [
35].
The sex effect can be a valuable variable in predicting atrophy in AD. Longitudinal studies have previously shown sex differences in brain atrophy rates and cognitive and functional decline. Ardekani (2016) et al. showed that in MIRIAD AD patients, the hippocampal atrophy rates in women were significantly faster compared to men, with 6.61% versus 4.31% (
p = 0.008) [
18]. This difference is attributed to biological, genetic, or even social factors linked to gender, such as occupation and education level. Hormonal profiles, especially estrogens, play a central role. Estrogens promote glucose uptake, which is the brain’s primary energy source. As estrogen levels decline during menopause, this reduces glucose uptake by neurons. Consequently, neurons must switch to auxiliary sources like lipids, including white matter. Damage to white matter worsens the risk and promotes AD progression [
41,
42]. Our SHAP analysis of the Ridge classifier supported this literature, showing that female sex contributed positively to prediction risk, while male sex showed a protective (negative) contribution.
CTRED emerged as a predictor of risk classification, indicating that cerebrospinal erythrocyte signal points to hippocampal atrophy risk. In previous work, higher CTRED was linked to greater hippocampal atrophy in the AD cohort [
22]. This finding supports previous ADNI studies showing that CSF hemoglobin and ferritin levels are indicators of erythrocyte load predicting cognitive decline and faster conversion from MCI to AD, even after accounting for potential RBC contamination [
25,
43]. Several mechanisms might explain this link.
CTRED may serve as a surrogate for microvascular fragility or subtle hemorrhagic events, especially in the context of amyloid-related angiopathy in AD. Amyloid deposits weaken vessel walls, making them prone to bleeding [
44]. Alternatively, with studies reporting hypertension up to 51%, microhemorrhages can result from vessel leakage (
Figure 5) [
45]. When RBCs degrade, their products might induce neuroinflammation and neuronal stress [
46]. However, because CTRED can also be influenced by procedural factors (e.g., traumatic tap), it is advisable to interpret it cautiously if proper procedural protocols have not been followed. Clinical context and procedural details should be considered. Although the impact of age was modest, aligning with ADNI data showing slower atrophy with age in AD/MCI and faster decline in younger patients, it reduces age-related variance in older cohorts [
10]. Multivariable analyses suggest that once pathology, like Aβ, is accounted for, age adds minimal extra information about hippocampal atrophy, especially due to range restrictions in limited age groups [
47]. Consistent with this, SHAP analysis showed that higher age continued to positively influence the predicted risk in our model, reflecting residual variance in our specific cohort rather than a strong independent biological factor. The model indicates that hippocampal decline results from multiple factors, such as vascular strain (MAPres), sex-related biology, RBC-related processes, and, to a lesser extent, age. Since these variables are often measured at baseline, an interpretable classifier that includes them can provide meaningful, pathophysiologically grounded risk predictions.
On the left, amyloid angiopathy is shown as a pathway of lobar microbleeds and β-amyloid vasculopathy, which weaken vessel walls and allow red blood cells to enter the CSF, thereby increasing hippocampal atrophy risk. On the right, hypertensive small-vessel disease (SVD) is depicted as a pathway involving deep or infratentorial microbleeds, chronic blood–brain barrier stress, and impaired autoregulation, which also elevates CTRED and promotes hippocampal degeneration. In both mechanisms, degradation of red blood cells and iron deposition contribute to inflammation and oxidative stress. Future work may leverage non-invasive surrogates such as DCE-MRI permeability mapping (Ktrans) and QSM/SWI-based iron measures to approximate CTRED without lumbar puncture.
The interpretability analysis used PFI and SHAP. Both identified MAPres and sex as key signals, with CTRED and age being less influential, supporting face validity that the model behaves plausibly for clinicians. While face validity boosts confidence that results are not artifacts, it does not replace external validation. We avoid directional claims, interpreting MAPres as a cerebrovascular marker rather than a strict risk indicator, aligned with literature on pressure states and brain vulnerability. CTRED is an RBC/vascular integrity signal with biological and maybe procedural influences. SHAP complemented PFI by clarifying directionality, showing that female sex contributed positively to risk, high MAPres was associated with lower predicted risk, and higher age was associated with increased risk, which aligns with the prior literature. However, not only is AD a multifactorial process, with the relationship between MAPres or age being extremely complex in directionality, but also the feature importance of the variables can vary between ML models.
This study presents a proof-of-concept modeling framework to explore whether routinely obtainable variables, such as CTRED, mean arterial pressure, can predict outcomes. MAPres, age, and sex can jointly predict the risk of hippocampal volume decline in AD. The framework demonstrates how these parameters, when available from baseline MRI, LP, and clinical records, may be integrated into an interpretable model producing binary risk classifications (high versus low) and feature-importance estimates derived from PFI. Standard quality-control steps, such as reviewing LP reports, verifying blood-pressure measurements, and assessing MRI volumetric reliability, are assumed within the analytic pipeline.
CTRED shows a substantial contribution to the model’s predictions, and traumatic tap appears unlikely based on procedural notes; the signal may reflect aspects of vascular integrity rather than contamination. In such circumstances, corresponding imaging patterns observed on microbleed-sensitive MRI sequences (e.g., SWI or T2*) could help contextualize the result. Lobar-predominant microbleeds are typically associated with amyloid-related angiopathy, while deep or infratentorial patterns align more with hypertensive small-vessel disease, acknowledging that mixed profiles frequently occur [
48,
49]. Where an amyloid-dominant mechanism is suspected, the finding may conceptually align with established literature linking amyloid pathology to microvascular fragility and hemorrhagic risk, particularly in individuals receiving antithrombotic therapy [
44,
50,
51]. Conversely, a hypertensive profile is consistent with studies emphasizing the impact of chronic blood-pressure dysregulation on hippocampal integrity [
52].
When MAPres emerges as a major explanatory variable, the direction of its coefficient offers interpretive context, suggesting whether elevated or reduced pressures dominate within the modeled relationship. Low-pressure associations could, for instance, correspond to effects of medication, dehydration, autonomic dysfunction, or endocrine disturbances (e.g., adrenal or thyroid disorders) [
53]. Age and sex operate as fixed demographic covariates that primarily modulate risk communication rather than representing direct intervention targets.
Overall, this framework (
Figure 6) should be interpreted as an early-stage, hypothesis-generating prototype illustrating the potential integration of hematologic and vascular measures into explainable ML models of structural neurodegeneration.
This study has some limitations. Firstly, it is a proof-of-concept study with a small group of AD patients (n = 26), so its findings may not be broadly applicable and should be considered preliminary until further validation is conducted. CTRED might not be accessible for all patients because it requires LP measurement, which can be complicated by blood contamination. While the LP protocols are carried out per strict ADNI guidelines, a small degree of contamination cannot be completely ruled out. Additionally, the model needs volumetric measurements as input, necessitating dedicated volumetric software for MRI scans. This software can be costly, particularly for smaller medical facility centers.
Future research should aim at developing a non-invasive CTRED surrogate. First, CTRED should be correlated with QSM (regional susceptibility/iron) and SWI/T2* microbleed burden and patterns to determine an imaging cut-point that approximates the current CTRED threshold. Meanwhile, a classifier to predict high versus low CTRED based on imaging and routine variables (QSM/SWI features, MAPres, P-tau, Aβ) and assess whether replacing actual CTRED with its predicted value maintains risk classification accuracy after calibration and external validation in larger cohorts. To explore vascular mechanisms, DCE-MRI can be used to calculate Ktrans, a standard index of BBB leakiness, to evaluate its added prognostic value and potential interactions with MAPres (e.g., subgroups defined by pressure signal and BBB effects) status.