Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts

Lee, Minjae; Lee, Suwon; Seo, Hyeon

doi:10.3390/app16021004

Open AccessArticle

Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts

by

Minjae Lee

¹

,

Suwon Lee

^1,*

and

Hyeon Seo

^1,2,*

¹

Department of Computer Science and Engineering, Gyeongsang National University, Jinju 52828, Republic of Korea

²

The Research Institute of Natural Science, Gyeongsang National University, Jinju 52828, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1004; https://doi.org/10.3390/app16021004

Submission received: 5 December 2025 / Revised: 6 January 2026 / Accepted: 15 January 2026 / Published: 19 January 2026

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Deep learning-based analysis of brain magnetic resonance imaging (MRI) plays a crucial role in the early diagnosis of Alzheimer’s disease (AD). However, data scarcity and racial bias present significant challenges to the generalization of diagnostic models. Large-scale public datasets, which are predominantly composed of Caucasian individuals, often lead to performance degradation when applied to other ethnic groups owing to domain shifts. To address these issues, this study proposes a two-stage transfer learning framework. Initially, a 3D ResNet model was pretrained on a large-scale Alzheimer’s disease neuroimaging initiative (ADNI) dataset to learn structural brain features. Subsequently, the pretrained weights were transferred and fine-tuned on a small-scale Korean dataset utilizing only 30% of the data for training. The proposed model achieved superior performance in classifying mild cognitive impairment (MCI), which is crucial for early diagnosis, compared with a model trained from scratch using 70% of the Korean data. Furthermore, it effectively mitigated the significant performance degradation observed when directly applying the pretrained model, demonstrating its ability to resolve the domain-shift issue. This study explored the feasibility of transfer learning to address data scarcity and domain shift issues in AD classification, underscoring its potential for developing AI-based diagnostic systems tailored to specific ethnic populations.

Keywords:

transfer learning; Alzheimer’s disease detection; MRI-based deep learning

1. Introduction

With advancements in deep learning technologies, artificial intelligence (AI)-based analysis of medical data has become a focus of active researched across various domains [1,2,3]. Alzheimer’s disease (AD), a neurodegenerative disorder, is a primary concern affecting the vast elderly population worldwide, with symptoms that begin with memory loss and progressively lead to a decline in overall cognitive function. Although emerging treatments aim to manage the symptoms and slow disease progression, they currently offer no complete cure [4]. AD progression is generally categorized into three stages: cognitively normal (CN); diseased (AD); and intermediate stages, known as mild cognitive impairment (MCI). Therefore, early detection, particularly during the MCI stage, is crucial for timely intervention to delay disease progression [5]. However, diagnosing MCI is particularly challenging, as it represents a transitional phase from CN to AD, characterized by subtle brain atrophy that is challenging to detect.

Deep learning models have shown significant promise in analyzing subtle neuroimaging features associated with AD. Structural alterations such as hippocampal atrophy and cortical thinning are more pronounced in magnetic resonance imaging (MRI) scans of patients with AD than in individuals who are CN [6,7]. Numerous studies have employed deep learning models such as convolutional neural networks (CNNs) [8,9] to automatically classify AD and MCI from brain MRI data, often achieving a performance comparable to that of clinical experts [10,11]. However, a fundamental challenge in developing these models is the limited availability of large-scale medical datasets owing to patient privacy regulations and the substantial time and cost associated with data collection [12].

To address these limitations in medical imaging, large-scale publicly available datasets, such as the Alzheimer’s disease neuroimaging initiative (ADNI) [13] and open access series of imaging studies (OASIS) [14], have been widely utilized for the development and validation of AI-based diagnostic models. These datasets serve as a foundation for numerous studies. For instance, some researchers have developed deep 3D CNNs to learn hierarchical feature representations directly from raw brain MRI scans for AD classification [15,16,17]. Others have focused on the more challenging task of predicting which individuals with MCI will progress to AD, often utilizing longitudinal data to model disease progression over time [18,19]. Furthermore, multimodal approaches that integrate MRI data with other data sources, such as positron emission tomography (PET) scans and cerebrospinal fluid (CSF) data, have been proposed to improve diagnostic accuracy [20]. Although the ADNI dataset has been invaluable for these advancements, it is primarily composed of Caucasian participants, which raises concerns about the generalizability of the models trained on it when applied to other populations, such as Asians, Africans, and Hispanics. Chee et al. [21] and Yuan et al. [22] demonstrated that anatomical brain structures can significantly vary according to race, suggesting that ethnicity-specific differences must be incorporated when applying such models to diverse groups. This highlights the potential risk of misdiagnosis when models trained on racially homogeneous datasets are applied to ethnically distinct groups.

This study explored the potential of transfer learning to address two key challenges in AD classification tasks: limited data availability in non-Western populations and the performance gap resulting from domain shifts between ethnically distinct cohorts. Specifically, we investigated whether a model pretrained on a large-scale, Caucasian-majority ADNI dataset can be adapted to a small-scale Korean MRI dataset employing a transfer learning strategy, with an emphasis on the classification of MCI, an early stage of AD. By applying a pretrained model to a demographically distinct and data-limited cohort, we assessed the practical feasibility of utilizing existing large-scale resources in underrepresented populations. These findings offer valuable insights into the use of publicly available models in underrepresented populations, particularly in data-constrained neuroimaging classification tasks.

The primary contribution of this research lies in providing a practical evaluation of a transfer learning framework designed to mitigate domain shift issues arising when knowledge from a large-scale Caucasian dataset is applied to an ethnically distinct East Asian cohort. Specifically, this study demonstrates that even in environments with severe data scarcity, the pre-trained model achieves higher data efficiency in classifying MCI, a critical stage for early intervention, compared to models trained from scratch. These findings provide insights into how transfer learning can facilitate the use of small-scale, high-quality medical datasets in data-constrained neuroimaging applications.

2. Datasets and Methodologies

2.1. Data

This study utilized two primary datasets: ADNI dataset (adni.loni.usc.edu (accessed on 24 December 2024)) and the dementia diagnosis neuroimaging dataset from the AI-Hub open AI dataset project (AI-Hub) in Korea [23]. ADNI, launched in 2003 as a public–private partnership, was originally established to evaluate whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments can be integrated to monitor the progression of mild MCI and early AD. The dataset comprises structural MRI scans acquired from a large cohort of participants, exhibiting diverse parameters.

As a large-scale cohort, the ADNI dataset encompasses a diverse range of magnetic field strengths, volume resolutions, and voxel sizes. The magnetic field strengths are primarily 1.5T and 3.0T, and the scans were T1-weighted and acquired in the Sagittal orientation. The volume resolutions vary, with common examples including 256 × 256 × 180, 240 × 256 × 176, and 240 × 256 × 208. Similarly, the voxel sizes are not uniform, with typical dimensions such as 0.9 × 0.9 × 1.2 mm³, 1.1 × 1.1 × 1.2 mm³, and 1.0 × 1.0 × 1.0 mm³.

The Korean dataset comprises 837 scanned brain MRI volumes, all of which have a uniform resolution of 360 × 256 × 256 voxels and an isotropic voxel size of 0.5 × 0.5 ×0.5 mm³. This dataset was collected at Samsung Medical Center using a Philips Achieva 3.0T scanner, which provides high-resolution standardized images. All scans were acquired using a 3D T1-weighted MRI protocol with a fixed slice thickness of 1.0 mm.

All scans were fully anonymized and used with IRB approval from the Gyeongsang National University (GIRB-D24-NY-0090). As shown in Table 1, the ADNI dataset consists of 981 male and 1013 female subjects, totaling 1994 individuals. In contrast, specific individual-level information for the Korean dataset, such as educational attainment and genetic risk markers, was not disclosed under the data provider’s anonymization policy (AI-Hub). Instead, only the overall gender ratio of the dataset was provided. Based on this reported ratio and the total number of MRI scans, the dataset is estimated to include approximately 249 male and 588 female scans, corresponding to a male-to-female ratio of about 1:2.3.

2.2. Data Preprocessing

A primary difference between the ADNI and Korean datasets is the preprocessing of the 3D volumes. As shown on the left of Figure 1, the Korean dataset is provided in skull-stripped form, whereas the ADNI dataset includes the skull and other non-brain regions. To standardize the input data, we performed brain segmentation (

σ_{B}

). Specifically, we employed DeepBrain [24], a U-Net-based [25] segmentation tool, to eliminate the skull and non-brain areas from the ADNI dataset (see Figure 1a). DeepBrain was trained using CC359 [26], NFBS [27], and ADNI datasets. Subsequently, we examined the metadata of digital imaging and communications in medicine (DICOM) files to verify slice distance and pixel spacing. Isotropic resampling (

ρ_{I s o}

) was then applied to ensure unform spacing of 1 mm across all axes.

Following resampling, the cubic volume data were normalized. We applied min-max normalization (

N_{[0, 1]}

) to scale the voxel values within the range of 0–1, making them suitable for input into the deep learning model. To focus on the central and more informative parts of the brain and exclude non-brain regions, we performed

n

-slice

k

-step center cropping (

C_{n, k}

). For this study, we set the number of slices

n

to 32 to balance feature richness and computational load for 3D CNNs. We experimented with

k

-step intervals of 1, 2, and 3 mm, and set

k

= 2. The resulting brain coverage patterns corresponding to each interval are shown in Supplementary Figure S1.

2.3. Model Selection and Input Normalization

As described in Section 1, the classification of CN, MCI, and AD patient groups is based on the degree of structural changes, such as cortical thinning and hippocampal atrophy. This atrophy can be detected by observing the continuous changes in the 3D volume along the scanning direction. Therefore, we formulated the classification task by treating the 3D volume as a sequential video. We utilized a CNN-based deep learning model, specifically ResNet3D [28], which is well-known for its strong performance in video classification, to classify 3D volumetric brain MRI data.

Prior to model input, the volumes preprocessed (as described in Section 2.2) were normalized to match the input distribution of the ResNet3D model, which was pretrained on ImageNet [29]. Specifically, we performed volume normalization (

N_{μ, s}

) by setting the mean (

μ

) to 0.45 and the standard deviation (

s

) to 0.225.

2.4. Large-Scale Pretraining

Table 1 presents the demographic composition of each dataset, categorized by diagnostic class (AD, MCI, CN) and sex. Figure 2 illustrates the age distribution on a log scale, allowing for a clearer representation of the comparatively scarce Korean data. The ADNI dataset comprised 13,989 volume scans collected from 1994 patients, whereas the Korean dataset comprised 837 scans, making it substantially smaller in scale. As shown in Figure 2, the two datasets span age ranges of 51–104 years (ADNI) and 40–96 years (Korean) and exhibit a generally similar distribution.

Additionally, we conducted an analysis of the ethnic composition for the 1994 ADNI participants. Excluding 10 participants with missing data, the remaining 1984 participants were composed as follows: Asian 73 (3.68%), Black 224 (11.29%), White 1647 (83.01%), with the remaining minimal data comprising other ethnicities. This clearly establishes that the ADNI dataset is overwhelmingly Caucasian-dominant.

The direct training of a model on a small-scale dataset, such as the Korean cohort, presents fundamental limitations owing to data scarcity. Increasing the proportion of training data reduces the availability of test data to validate the generalization performance. Conversely, increasing the proportion of the test data leads to insufficient training, making it challenging to ensure the reliability of the model. Therefore, this study adopted a strategy of first pretraining a model on a large-scale ADNI dataset to comprehensively learn the features of structural brain changes and subsequently fine-tuned using a minimal portion of the small-scale Korean dataset.

For pretraining, we utilized the ResNet3D model initialized with ImageNet-pretrained weights. The training was conducted on the ADNI volume data, which was processed with volume normalization (

N_{μ, s}

) and split into a 7:3 ratio (train:test). This pretraining enabled the model to capture the 3D structural brain differences among the AD, MCI, and CN groups by treating volumetric data as image sequence over time. The hyperparameters for this pretraining phase were set as follows: the model was trained for 40 epochs with a batch size of 8. The learning rate was initialized to 0.002.

2.5. Transfer Learning on Small Cohorts

To evaluate the effectiveness of transfer learning, we fine-tuned an ADNI-pretrained classifier using a smaller Korean cohort. Specifically, the proposed transfer learning framework consists of three stages: (1) pretraining ResNet3D model on a large-scale source dataset (ADNI), (2) transferring the learned weights to initialize the target model, and (3) fine-tuning the model under progressively reduced training data conditions to evaluate data efficiency and robustness, as illustrated in Figure 1. In this framework, the model was first initialized with general image features and then pretrained on the large-scale ADNI dataset to learn brain-specific structures. For the Korean dataset, we updated the weights of the entire network to evaluate whether the structural differences associated with CN, MCI, and AD, learned from ADNI, remain informative for classification in a different cohort.

A data-splitting strategy was specifically designed to assess model performance under data-scarce conditions. Although the same volume normalization (

N_{μ, s}

) was applied during preprocessing, the Korean dataset was first split into a fixed 70% test set and a 30% training set. This 3:7 ratio was intentionally chosen to rigorously evaluate performance by minimizing the reliance on training data.

In this study, a fixed 3:7 train–test split was employed to support consistent comparisons across experimental settings, particularly between zero-shot and fine-tuning scenarios. Since k-fold cross-validation cannot be readily applied when the training set is empty, the use of a shared test set enabled a fair assessment of model behavior under data-scarce conditions. By allocating 70% of the dataset (586 scans) to the test set, we aimed to ensure a sufficiently large sample for evaluating performance in the presence of cross-ethnic domain shifts.

To further investigate the dependency of the model on the training set size, we conducted a series of experiments using the 30% training data, systematically reducing the proportion utilized from 100% (full utilization) down to 75%, 50%, 25%, 20%, 15%, 10%, 5%, 2.5%, 0.63%, and 0% (zero-shot). To ensure a fair comparison across all experimental conditions, we opted not to use a separate validation set, as one could not be constructed for the 0% (zero-shot) case. To mitigate the risk of overfitting in this low-data regime, the model was trained for only 5 epochs, and the final model from the last epoch was used for evaluation on the test set.

2.6. Performance Metrics

In this study, we conducted experiments using two datasets: the ADNI dataset and the anonymized Korean dementia diagnosis neuroimaging dataset provided by AI-Hub [23]. The Korean dataset, created in 2020, includes expert-provided diagnostic annotations and was anonymized via the de-identification of personal information such as patient names and ID numbers. To evaluate the performance of the pretrained and fine-tuned models, we used the following metrics: accuracy, precision, recall, F1-score, and area under the curve (AUC).

a c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N}, p r e c i s i o n = \frac{T P}{T P + F P}, r e c a l l = \frac{T P}{T P + F N},

F 1 = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}, A U C = \frac{\sum_{i = 1}^{p} \sum_{j = 1}^{q} I [s c o r e (x_{i}^{+}) > s c o r e (x_{j}^{-})]}{p \times q}

where

$p : the total number of positive samples$ ;
$q : the total number of negative samples$ ;
$x_{i}^{+} : the i - th positive sample$ ;
$x_{j}^{-} : the j - th negative sample$ ;
$score (\cdot) : the model′ s prediction score$ ;
$I [\cdot] : the indicator function (1 if true, 0 otherwise)$ .

For a more detailed assessment, the F1-score was measured for the entire test set and each class. The terms TP, FP, FN, and TN refer to true positives, false positives, false negatives, and true negatives, respectively. These terms are defined from the perspective of the MRI classification, where ‘positive’ indicates the presence of a specific condition.

3. Results

3.1. Baseline Performance

A prerequisite for effective transfer learning is high performance of the model pretrained on the source domain. Therefore, prior to applying transfer learning, we first verified that the model trained on the ADNI dataset achieved sufficient classification accuracy for the target task. To select an appropriate step for the

n

-slice

k

-step center cropping method described in Section 2.2, we conducted experiments at intervals of 1, 2, and 3 mm. For efficiency in deep learning resource usage, we set n = 32, accumulating 32 slices. Consequently, the steps of 1, 2, and 3 mm corresponded to MRI volume thicknesses of 3.2, 6.4, and 9.6 cm, respectively. As summarized in Table 2, the 2 mm step achieved the highest overall performance across all metrics, except the F1-score for the CN class, and achieved an AUC of 0.9896, demonstrating a high classification capability. Therefore, we proceeded with model training using a 2 mm step.

Next, to qualitatively evaluate the latent feature distribution of the selected model, we visualized the feature vectors extracted from the test dataset in 2D space using uniform manifold approximation and projection (UMAP) (see Figure 3a). An analysis of the UMAP plot revealed that the data for each class (AD, MCI, and CN) formed distinct clusters distributed in separate regions of space. This suggests that the pretrained model successfully learned the discriminative features necessary to distinguish between the classes. For a quantitative analysis, we present the confusion matrix, which compares the model’s predictions with the ground-truth labels in Figure 3b. As shown in the figure, the values along the main diagonal are significantly higher than those in the off-diagonal regions, indicating that the model correctly classified most test samples. Although a few misclassifications occurred, the overall classification accuracy was excellent, confirming the reliability of the model.

As a qualitative illustration of the learned representations, we applied Grad-CAM to visualize the activation patterns for each diagnostic class, as shown in Figure 4. The maps represent the cumulative response across all MRI slices. In the AD group (Figure 4a), widespread and intense activation is observed across the majority of the cortical regions, indicating that the model captures extensive structural degeneration. In contrast, the CN group (Figure 4b) shows minimal and localized activation, which is consistent with the absence of significant pathological markers. The MCI group (Figure 4c) exhibits intermediate activation patterns, reflecting a transitional state between a cognitively normal state and advanced pathology. These visual results confirm that the pretrained model successfully identifies the degree of structural changes in the brain cortex to distinguish between the clinical stages.

3.2. Transfer Learning Performance on Korean Dataset

We conducted comparative experiments to demonstrate the necessity of a pretrained model, effectiveness of the proposed transfer learning approach, and potential for efficient data utilization. Table 3 presents a performance comparison of deep learning and machine learning techniques across various conditions. Each experiment was conducted using different Korean dataset ratios for the training and test data.

First, we analyzed the deep-learning-based methods. Case (a) represents a conventional deep learning approach without transfer learning using a large portion (70%) of the entire Korean dataset for training. Owing to the approximately 600 training samples, it showed excellent performance across most metrics; however, it exhibited poor F1-score for the MCI class. Notably, this approach has a clear limitation in a limited medical dataset environment, because it substantially reduces the number of test samples, making it challenging to trust the generalization performance of the model.

Given this limitation, we increased the proportion of the test set to 70% to evaluate the practical performance of the model. Case (b), for comparison, involves applying transfer learning under the same 70% training data condition as in Case (a); a detailed analysis of its results is provided in Section 4. Case (c) is a ‘zero-shot’ scenario wherein the model, pretrained on the ADNI dataset, was immediately evaluated on 70% of the test portion of the Korean dataset without any additional training. All performance metrics degraded significantly, confirming that domain shifts, particularly racial and demographic differences, negatively impact cross-population model performance.

The Inception-V3 [30]-based model, Case (h), is the benchmark performance published by the dataset provider AI-Hub. Although it achieved a high AUC of 0.7667 using 80% of the training data, the proposed method in Case (d) also achieved a comparable AUC of 0.7403 with significantly less data. We can confirm that our methodology reliably surpassed the AUC threshold of 0.7, which is a meaningful performance benchmark according to AI-hub’s dataset provider [23].

Finally, Cases (e)–(g) are machine-learning-based models using random forest (RF) [31], support vector machine (SVM) [32], and logistic regression (LR) [33]. Despite being trained on an extensive amount of data (80%), these models exhibited a lower overall performance than deep learning-based models. In particular, their performance in MCI classification was significantly low, and they failed to meet the objectives of early diagnosis. Furthermore, the RF and SVM models did not achieve an AUC of 0.7, failing to secure a meaningful level of performance.

3.3. Evaluating the Robustness of Transfer Learning Across Varying Dataset Sizes

To evaluate the effect of the training dataset size, we conducted experiments by progressively reducing the proportion of training data and comparing the performance of the models with and without transfer learning. The experiment was conducted by progressively reducing the amount of training data from 100% (all 251 samples from the 30% split, as described in Section 2.5) to 0%. For the transfer learning case (w TL), the model was initialized using weights from the ADNI-pretrained model. The model without transfer learning (w/o TL) used only standard weights pretrained on ImageNet.

Figure 5 summarizes the results, illustrating the variations in the (a) F1-score and (b) AUC across different amounts of training data for both scenarios. As shown in Figure 5a, the F1-score for the w TL case did not exhibit a sharp decline, even as the proportion of training data decreased, showing only a modest decrease from a maximum of 0.5595 to a minimum of 0.506. Following a similar trend, Figure 5b shows that the AUC for the w TL case also demonstrates a small fluctuation, decreasing from a maximum of 0.7439 to a minimum of 0.676. By contrast, the w/o TL case exhibited significant decreases and instability in both Figure 5a,b, and this performance gap widened as the amount of training data decreased.

Notably, both models performance better when no Korean training data were used (0%) than when an extremely small amount was used. We conclude that using an extremely insufficient amount of training data does not facilitate learning and negatively interfere with the pretrained weights. In conclusion, reliance on pretrained weights (ADNI in the TL setting and ImageNet in the non-TL setting) yields substantially superior and more reliable performance outcomes.

4. Discussion

This study primarily aimed to evaluate the efficacy of transfer learning for detecting Alzheimer’s disease using limited datasets. Specifically, we investigated whether a model pretrained on a large-scale, predominantly Caucasian ADNI dataset can be effectively generalized to a small-scale, homogeneous Korean dataset. This approach was designed to assess its applicability across distinct ethnic groups. In AD research, transfer learning has primarily been explored using models pretrained on large general-purpose image datasets, such as ImageNet.

For instance, Hon and Khan [34] demonstrated the high data efficiency of transfer learning by successfully applying VGG [35] and Inception models pretrained on ImageNet to small-scale MRI datasets. In addition, Liu et al. [36] successfully generalized models trained on the ADNI to an independent external dataset called the NACC [37]. However, prior studies predominantly focused on transfers between datasets with similar demographic characteristics. This study is distinguished by its focus on mitigating the domain shifts resulting from ethnic and anatomical differences, providing novel insights into cross-population generalization of diagnostic models.

The results demonstrate that, although the overall classification performance of the model on the Korean dataset was modest, it demonstrated robust performance in distinguishing MCI cases under data-constrained conditions, which is a significant finding. The challenge in MCI classification is consistent with trends observed in prior studies in this field. For instance, Liu et al. [36] reported significantly lower classification performance for the MCI class than that for the CN and AD classes in both the internal and external NACC datasets. This challenge is also evident in our zero-shot experiment (Table 3, Case (c)), wherein the ADNI-pretrained model was evaluated on the Korean dataset without fine-tuning. The model achieved a notably low F1-score of 0.3361 for the MCI class, which was substantially lower than those for AD (0.6371) and CN (0.5263). This result suggests that classifying the subtle features of MCI stage remains challenging, irrespective of ethnic characteristics.

In our experiments (Table 3), when fine-tuning was performed using a larger amount of data (70%), the model (case (b)) performed marginally worse than the that trained from scratch (case (a)). This marginal performance decline is likely attributed to ‘transfer degradation’, which suggests the possibility that robust and generalizable features learned from the source domain (ADNI) were overwritten or diluted by the extensive amount of data from the demographically different target domain (Korean).

However, the true advantages of transfer learning became evident in a data-scarce scenario. Notably, when trained using only 30% of the Korean dataset (approximately 251 scans), the transfer learning model achieved relatively robust results (case (d), MCI F1-Score: 0.4355; AUC: 0.7403). This MCI classification result surpassed that of the model trained from scratch with more than double the amount of data (case (a), F1-Score: 0.3965). This indicates that using a smaller amount of data for fine-tuning acted as a form of regularization, preserving the advantages of the pretrained weights while effectively adapting them to the new domain.

However, we clarify that the point of this finding is not to claim that the AUC of 0.7403 (case (d)) represents a clinically perfect performance. Rather, the significance of this result, while representing a mid-level performance for clinical application, is its demonstration of superior potential over training from scratch (case (a)) in an extremely data-scarce environment. As noted, we acknowledge that a higher level of performance in AUC is required to achieve actual clinical significance, which remains a key objective for future improvements to this framework.

Although this study demonstrated the potential of transfer learning, several areas must be addressed in future research. First, we adopted the

n

-slice

k

-step center-cropping method, which extracts slices at regular intervals from the central part of the 3D MRI volume. This is an intuitive approach for capturing key brain features while maintaining computational efficiency without using the entire 3D volume. However, more sophisticated methods exist, such as the entropy-based slicing proposed by Kumar et al., which selects the most informative slices in a data-driven manner [38]. Future research could explore the introduction of such information-based slice selection techniques to further enhance the model performance.

Second, because the validation in this study was based on a single data split, further research applying k-fold cross-validation would facilitate reinforcing the stability of the results. While a fixed split was adopted to maintain experimental consistency with the zero-shot scenarios, future work should incorporate multi-split evaluations. Although the current test set of 586 scans provides a statistically significant sample size for assessing performance, k-fold cross-validation will be essential in future studies to further confirm the robustness of the findings across different data partitions.

Third, our fine-tuning strategy involved updating the entire network. This decision was based on the hypothesis that a substantial domain shift (i.e., ethnic differences) might exist even in the early layers, making a frozen feature extractor suboptimal for the target domain. However, this approach risks overfitting or the ‘transfer degradation’ observed in case (b). We acknowledge that a more systematic fine-tuning approach, such as progressive layer freezing or applying differential learning rates across layer groups, is warranted. Such methods could provide a better balance between adapting to the new domain and preserving the robust, pre-learned features.

Fourth, given that the performance did not significantly improve on the Korean dataset, validating the generalizability of the framework is crucial by securing more samples or testing other homogeneous ethnic groups. One possible factor contributing to this limitation is the restricted availability of demographic information, as described in Section 2.1. Specifically, the anonymization policies of the public data provider (AI-Hub) limited individual-level analysis, and the dataset exhibited a higher proportion of female MRI scans. Given prior evidence that women may be more vulnerable to Alzheimer’s disease [39,40], this imbalance may have acted as a potential confounding factor. Therefore, securing more comprehensive and gender-balanced demographic information in future studies will be crucial for further enhancing the clinical reliability and overall robustness of the proposed model.

Fifth, although we made efforts to minimize statistical differences through normalization processes to address the heterogeneous attributes of the ADNI dataset, which includes scans acquired with different magnetic field strengths and from various scanner manufacturers, this study did not apply specific harmonization techniques. As data harmonization has the potential to reduce variance between different scanners and improve predictive performance [41,42,43], applying such methods prior to the transfer learning process is a valuable direction for future research.

Sixth, this study provides qualitative interpretability through Grad-CAM visualizations, offering insight into the structural patterns learned by the model across clinical stages. Further investigation of inter-ethnic differences in contributing brain regions and their relationship to AD pathology should be considered in future studies to enhance clinical relevance and biological interpretability.

Finally, we adopted a ResNet-based 3D CNN. Although state-of-the-art (SOTA) architectures such as Vision Transformers (ViT) [44] were considered, such large-scale models often require a massive amount of data to be effective [45,46,47]. Since our Korean dataset is small, ResNet’s strong inductive bias provides greater data efficiency, which was a crucial factor for achieving stable training and reducing overfitting. Therefore, we prioritized 3D ResNet for its stability and efficiency in training with limited data. Future studies should explore these SOTA models once larger target datasets are available, incorporating advanced techniques such as few-shot or zero-shot learning [48,49] to further improve robustness in data-scarce environments.

5. Conclusions

In this study, we proposed a 3D CNN-based transfer learning framework to bridge the gap between a large-scale public dataset and a small-scale ethnically specific dataset. Through experiments, we demonstrated that a model pretrained on the Caucasian-dominant ADNI dataset can be successfully fine-tuned using a small amount of Korean MRI data. The proposed method effectively classified Alzheimer’s disease, particularly the critical MCI stage, for early diagnosis even in a data-scarce environment.

The proposed approach demonstrated superiority over both conventional machine-learning methods and deep-learning models trained from scratch, achieving higher data efficiency and improved performance. Our findings are expected to serve as a crucial foundation for building reliable AI-based medical diagnostic systems for diverse and underrepresented populations.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16021004/s1, Figure S1: A comparative visualization of brain coverage from the

n

-slice

k

-step center cropping method, showing results for different

k

-step intervals. With the number of slices (

n

) fixed at 32, the figure compares outcomes for

k

values of 1, 2, and 3 mm. A 1mm interval covers a 3.2 cm range, suitable for observing local changes. The 2 mm interval covers 6.4 cm, offering a balance between local detail and broad coverage. In contrast, a 3mm interval spans 9.6 cm, enabling the observation of large-scale changes by making the anterior and posterior regions of the brain clearly visible.

Author Contributions

Conceptualization, M.L., S.L. and H.S.; methodology, M.L.; data curation, M.L. and H.S.; writing—original draft preparation, M.L.; writing—review and editing, S.L. and H.S.; supervision, S.L. and H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (grant number RS-2023-00280241) and by the research grant of the Gyeongsang National University in 2024.

Institutional Review Board Statement

This study used fully anonymized brain MRI data from the ADNI dataset (adni.loni.usc.edu) and the dementia diagnosis neuroimaging dataset from the AI-Hub project in Korea. The use of the AI-Hub dataset was approved by the Institutional Review Board of Gyeongsang National University (Approval No. GIRB-D24-NY-0090).

Informed Consent Statement

The AI-Hub and ADNI datasets used in this study are publicly available and were provided in a fully anonymized form. Therefore, informed consent was not required.

Data Availability Statement

Data supporting the findings of this study are available from two public resources. The ADNI dataset is accessible at https://adni.loni.usc.edu (accessed on 24 December 2024) upon registration and compliance with the data use agreement. The Korean MRI dataset used in this study is available from the Open AI Dataset Project (AI-Hub, South Korea) at https://www.aihub.or.kr (accessed on 30 January 2025), subject to registration and adherence to AI-Hub’s data request procedures and access conditions. No original data were generated for this study.

Acknowledgments

Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI was funded by the National Institute on Aging, National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutic.

Conflicts of Interest

The authors declare no competing interests.

References

Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef]
Kumar, Y.; Koul, A.; Singla, R.; Ijaz, M.F. Artificial intelligence in disease diagnosis: A systematic literature review, synthesizing framework and future research agenda. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 8459–8486. [Google Scholar] [CrossRef] [PubMed]
Arya, A.D.; Verma, S.S.; Chakarabarti, P.; Chakrabarti, T.; Elngar, A.A.; Kamali, A.M.; Nami, M. A systematic review on machine learning and deep learning techniques in the effective diagnosis of Alzheimer’s disease. Brain Inform. 2023, 10, 17. [Google Scholar] [CrossRef]
Cummings, J.L.; Zhou, Y.; Lee, G.; Zhong, K.; Fonseca, J.; Leisgang-Osse, A.M.; Cheng, F. Alzheimer’s disease drug development pipeline: 2025. Alzheimer’s Dement. Transl. Res. Clin. Interv. 2025, 11, e70098. [Google Scholar] [CrossRef]
Rasmussen, J.; Langerman, H. Alzheimer’s disease—Why we need early diagnosis. Degener. Neurol. Neuromuscul. Dis. 2019, 9, 123–130. [Google Scholar] [CrossRef]
Ávila-Villanueva, M.; Marcos Dolado, A.; Gómez-Ramírez, J.; Fernández-Blázquez, M. Brain structural and functional changes in cognitive impairment due to Alzheimer’s disease. Front. Psychol. 2022, 13, 886619. [Google Scholar] [CrossRef] [PubMed]
Planche, V.; Manjon, J.V.; Mansencal, B.; Lanuza, E.; Tourdias, T.; Catheline, G.; Coupé, P. Structural progression of Alzheimer’s disease over decades: The MRI staging scheme. Brain Commun. 2022, 4, fcac109. [Google Scholar] [CrossRef]
El-Assy, A.M.; Amer, H.M.; Ibrahim, H.M.; Mohamed, M.A. A novel CNN architecture for accurate early detection and classification of Alzheimer’s disease using MRI data. Sci. Rep. 2024, 14, 3463. [Google Scholar] [CrossRef]
Uyguroğlu, F.; Toygar, Ö.; Demirel, H. CNN-based Alzheimer’s disease classification using fusion of multiple 3D angular orientations. Signal Image Video Process. 2024, 18, 2743–2751. [Google Scholar] [CrossRef]
Zhou, W.; Yang, Y.; Yu, C.; Liu, J.; Duan, X.; Weng, Z.; Chen, D.; Liang, Q.; Fang, Q.; Zhou, J.; et al. Ensembled deep learning model outperforms human experts in diagnosing biliary atresia from sonographic gallbladder images. Nat. Commun. 2021, 12, 1259. [Google Scholar] [CrossRef] [PubMed]
Li, M.D.; Huang, Z.R.; Shan, Q.Y.; Chen, S.L.; Zhang, N.; Hu, H.T.; Wang, W. Performance and comparison of artificial intelligence and human experts in the detection and classification of colonic polyps. BMC Gastroenterol. 2022, 22, 517. [Google Scholar] [CrossRef] [PubMed]
Aung, Y.Y.; Wong, D.C.; Ting, D.S. The promise of artificial intelligence: A review of the opportunities and challenges of artificial intelligence in healthcare. Br. Med. Bull. 2021, 139, 4–15. [Google Scholar] [CrossRef]
Mueller, S.G.; Weiner, M.W.; Thal, L.J.; Petersen, R.C.; Jack, C.; Jagust, W.; Trojanowski, J.Q.; Toga, A.W.; Beckett, L. The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin. 2005, 15, 869–877. [Google Scholar] [CrossRef] [PubMed]
Marcus, D.S.; Wang, T.H.; Parker, J.; Csernansky, J.G.; Morris, J.C.; Buckner, R.L. Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 2007, 19, 1498–1507. [Google Scholar] [CrossRef] [PubMed]
George, A.; Abraham, B.; George, N.; Shine, L.; Ramachandran, S. An Efficient 3D CNN Framework with Attention Mechanisms for Alzheimer’s Disease Classification. Comput. Syst. Sci. Eng. 2023, 47, 2097–2118. [Google Scholar] [CrossRef]
Ramani, R.; Ganesh, S.S.; Rao, S.S.; Aggarwal, N. Integrated multi-modal 3D-CNN and RNN approach with transfer learning for early detection of Alzheimer’s disease. Iran. J. Sci. Technol. Trans. Electr. Eng. 2025, 49, 383–407. [Google Scholar] [CrossRef]
Turrisi, R.; Verri, A.; Barla, A. The effect of data augmentation and 3D-CNN depth on Alzheimer’s Disease detection. arXiv 2023, arXiv:2309.07192. [Google Scholar]
Xie, L.; Das, S.R.; Wisse, L.E.; Ittyerah, R.; de Flores, R.; Shaw, L.M.; Yushkevich, P.A.; Wolk, D.A.; Alzheimer’s Disease Neuroimaging Initiative. Baseline structural MRI and plasma biomarkers predict longitudinal structural atrophy and cognitive decline in early Alzheimer’s disease. Alzheimer’s Res. Ther. 2023, 15, 79. [Google Scholar] [CrossRef]
Aberathne, I.; Kulasiri, D.; Samarasinghe, S. Detection of Alzheimer’s disease onset using MRI and PET neuroimaging: Longitudinal data analysis and machine learning. Neural Regen. Res. 2023, 18, 2134–2140. [Google Scholar] [CrossRef]
Golovanevsky, M.; Eickhoff, C.; Singh, R. Multimodal attention-based deep learning for Alzheimer’s disease diagnosis. J. Am. Med. Inform. Assoc. 2022, 29, 2014–2022. [Google Scholar] [CrossRef]
Chee, M.W.L.; Zheng, H.; Goh, J.O.S.; Park, D.; Sutton, B.P. Brain structure in young and old East Asians and Westerners: Comparisons of structural volume and cortical thickness. J. Cogn. Neurosci. 2011, 23, 1065–1079. [Google Scholar] [CrossRef]
Yuan, C.; Linn, K.A.; Hubbard, R.A. Algorithmic fairness of machine learning models for Alzheimer disease progression. JAMA Netw. Open 2023, 6, e2342203. [Google Scholar] [CrossRef]
AI-Hub. Neuroimaging Data for Dementia Diagnosis [Data Set]. AI-Hub. 2021. Available online: https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=227 (accessed on 30 January 2025).
Itzcovich, I. DeepBrain: Brain Image Processing Tools Using Deep Learning Focused on Speed and Accuracy. GitHub. 2018. Available online: https://github.com/iitzco/deepbrain (accessed on 13 May 2025).
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Warfield, S.K.; Zou, K.H.; Wells, W.M. Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging 2004, 23, 903–921. [Google Scholar] [CrossRef]
Eskildsen, S.F.; Coupé, P.; Fonov, V.; Manjón, J.V.; Leung, K.K.; Guizard, N.; Wassef, S.N.; Østergaard, L.R.; Collins, D.L. Alzheimer’s Disease Neuroimaging Initiative. BEaST: Brain extraction based on nonlocal segmentation technique. NeuroImage 2012, 59, 2362–2373. [Google Scholar] [CrossRef]
Hara, K.; Kataoka, H.; Satoh, Y. Learning spatio-temporal features with 3d residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3154–3160. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2026; IEEE: Piscataway, NJ, USA, 2016; pp. 2818–2826. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Hon, M.; Khan, N.M. Towards Alzheimer’s disease classification through transfer learning. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 13–16 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1166–1169. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Liu, S.; Masurkar, A.V.; Rusinek, H.; Chen, J.; Zhang, B.; Zhu, W.; Fernandez-Granda, C.; Razavian, N. Author Correction: Generalizable deep learning model for early Alzheimer’s disease detection from structural MRIs. Sci. Rep. 2023, 13, 16528. [Google Scholar] [CrossRef]
Beekly, D.L.; Ramos, E.M.; van Belle, G.; Deitrich, W.; Clark, A.D.; Jacka, M.E.; Kukull, W.A. The national Alzheimer’s coordinating center (NACC) database: An Alzheimer disease database. Alzheimer Dis. Assoc. Disord. 2004, 18, 270–277. [Google Scholar]
Kumar, S.S.; Nandhini, M. Entropy slicing extraction and transfer learning classification for early diagnosis of Alzheimer diseases with sMRI. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2021, 17, 1–22. [Google Scholar] [CrossRef]
Snyder, H.M.; Asthana, S.; Bain, L.; Brinton, R.; Craft, S.; Dubal, D.B.; Espeland, M.A.; Gatz, M.; Mielke, M.M.; Raber, J.; et al. Sex biology contributions to vulnerability to Alzheimer’s disease: A think tank convened by the Women’s Alzheimer’s Research Initiative. Alzheimer’s Dement. 2016, 12, 1186–1196. [Google Scholar] [CrossRef] [PubMed]
Dubal, D.B. Sex difference in Alzheimer’s disease: An updated, balanced and emerging perspective on differing vulnerabilities. In Handbook of Clinical Neurology; Elsevier: Amsterdam, The Netherlands, 2020; Volume 175, pp. 261–273. [Google Scholar]
Abbasi, S.; Lan, H.; Choupan, J.; Sheikh-Bahaei, N.; Pandey, G.; Varghese, B. Deep learning for the harmonization of structural MRI scans: A survey. Biomed. Eng. OnLine 2024, 23, 90. [Google Scholar] [CrossRef] [PubMed]
Pinto, M.S.; Paolella, R.; Billiet, T.; Van Dyck, P.; Guns, P.J.; Jeurissen, B.; Ribbens, A.; Dekker, A.J.D.; Sijbers, J. Harmonization of brain diffusion MRI: Concepts and methods. Front. Neurosci. 2020, 14, 396. [Google Scholar] [CrossRef] [PubMed]
Dewey, B.E.; Zhao, C.; Reinhold, J.C.; Carass, A.; Fitzgerald, K.C.; Sotirchos, E.S.; Saidha, S.; Oh, J.; Pham, D.L.; Calabresi, P.A.; et al. DeepHarmony: A deep learning approach to contrast harmonization across scanner changes. Magn. Reson. Imaging 2019, 64, 160–170. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, Q.; Zhang, J.; Tao, D. Vitae: Vision transformer advanced by exploring intrinsic inductive bias. Adv. Neural Inf. Process. Syst. 2021, 34, 28522–28535. [Google Scholar]
Zhai, X.; Kolesnikov, A.; Houlsby, N.; Beyer, L. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 12104–12113. [Google Scholar]
Ren, S.; Gao, Z.; Hua, T.; Xue, Z.; Tian, Y.; He, S.; Zhao, H. Co-advise: Cross inductive bias distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 16773–16782. [Google Scholar]
Gharoun, H.; Momenifar, F.; Chen, F.; Gandomi, A.H. Meta-learning approaches for few-shot learning: A survey of recent advances. ACM Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
Wang, W.; Zheng, V.W.; Yu, H.; Miao, C. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–37. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed two-stage training framework. (a) Large-scale ADNI dataset is preprocessed through brain segmentation, isotropic resampling, normalization, and center cropping (the light green area), then used to pretrain a foundational model. (b) Small-scale Korean dataset undergoes a similar preprocessing pipeline but excludes the initial brain segmentation step. (c) Weights from the pretrained model are transferred (rainbow-colored arrow) to a new model, which is then fine-tuned on the Korean dataset to create a specialized classifier for inference. Slight color changes indicate weight variations in the model.

Figure 2. Age distribution of subjects in the ADNI and Korean datasets. The y-axis represents the subject count on a logarithmic scale.

Figure 3. Qualitative and quantitative analysis of the ADNI pretrained model. (a) UMAP visualization demonstrates that the model learns a well-structured feature space, effectively distinguishing the three clinical classes. (b) Confusion matrix quantitatively confirms the model’s strong performance, with high true positive rates for all classes.

Figure 4. Grad-CAM visualization of the ADNI pretrained model. (a) Alzheimer’s disease, (b) cognitively normal, and (c) mild cognitive impairment. The activation maps represent the cumulative response across all slices, demonstrating that the model effectively captures structural variations in the brain cortex to distinguish between different clinical groups.

Figure 5. Analysis of the effect of transfer learning according to the amount of training data. The plots of (a) F1-score and (b) AUC show that the model pretrained on ADNI (w TL) achieves significantly more stable and superior performance with less data compared with the model using only ImageNet weights (w/o TL).

Table 1. Composition of the ADNI and Korean datasets. The table details the number of patients and scans categorized by diagnostic class and gender.

Dataset	#Patients		#Scans per Class			#Scans per Sex		#Total Scans
Dataset	Male	Female	AD	MCI	CN	Male	Female	#Total Scans
ADNI	981	1013	2270	5895	5824	7529	6460	13,989
Korean	N/A	N/A	311	312	214	249	588	837

Table 2. Comparison of various performance metrics on the ADNI datasets according to step size

k

. The 2 mm step shows superior performance across most metrics (highlighted in bold).

Table 2. Comparison of various performance metrics on the ADNI datasets according to step size

k

. The 2 mm step shows superior performance across most metrics (highlighted in bold).

$k$	Acc	F1	F1(AD)	F1(CN)	F1(MCI)	AUC	Precision	Recall
1 mm	0.9356	0.9357	0.9230	0.9314	0.9448	0.9849	0.9357	0.9356
2 mm	0.9499	0.9498	0.9360	0.9460	0.9590	0.9896	0.9502	0.9499
3 mm	0.9435	0.9434	0.9332	0.9512	0.9394	0.9892	0.9441	0.9435

Table 3. Performance comparison on the Korean dataset. This table evaluates the proposed deep learning model (Cases (a)–(d) and (h)) against traditional machine learning methods (Cases (e)–(g)) and a baseline from the original dataset authors (Case (h)). Notably, the proposed model with a 30% training split (Case (d)) surpassed the baseline’s MCI classification performance, which used a 70% split (Case (a)). * denotes that the model is trained under transfer learning conditions and values in bold indicate the best in each metric.

Case	Train:Test	Acc	F1	F1(AD)	F1(CN)	F1(MCI)	AUC	Precision	Recall
(a)	7:3	0.6653	0.6595	0.7283	0.7605	0.3965	0.7929	0.6557	0.6653
(b) *	7:3	0.6080	0.5999	0.6969	0.6813	0.3500	0.7500	0.5946	0.6080
(c)	0:7	0.5111	0.5193	0.6371	0.5263	0.3361	0.6961	0.5412	0.5111
(d) *	3:7	0.5385	0.5555	0.6077	0.5865	0.4355	0.7403	0.6179	0.5385
(e)	8:2	0.5500	0.4300	0.6100	0.6500	0.0400	0.6708	0.4400	0.4900
(f)	8:2	0.5400	0.4200	0.6000	0.6700	0.0000	0.6939	0.3700	0.4900
(g)	8:2	0.5500	0.5000	0.6000	0.6300	0.2700	0.7101	0.5400	0.5200
(h)	8:2	N/A	N/A	N/A	N/A	N/A	0.7667	N/A	N/A

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, M.; Lee, S.; Seo, H. Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts. Appl. Sci. 2026, 16, 1004. https://doi.org/10.3390/app16021004

AMA Style

Lee M, Lee S, Seo H. Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts. Applied Sciences. 2026; 16(2):1004. https://doi.org/10.3390/app16021004

Chicago/Turabian Style

Lee, Minjae, Suwon Lee, and Hyeon Seo. 2026. "Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts" Applied Sciences 16, no. 2: 1004. https://doi.org/10.3390/app16021004

APA Style

Lee, M., Lee, S., & Seo, H. (2026). Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts. Applied Sciences, 16(2), 1004. https://doi.org/10.3390/app16021004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Early Alzheimer’s Disease Detection via Transfer Learning: From Big Structural MRI Datasets to Ethnically Distinct Small Cohorts

Abstract

1. Introduction

2. Datasets and Methodologies

2.1. Data

2.2. Data Preprocessing

2.3. Model Selection and Input Normalization

2.4. Large-Scale Pretraining

2.5. Transfer Learning on Small Cohorts

2.6. Performance Metrics

3. Results

3.1. Baseline Performance

3.2. Transfer Learning Performance on Korean Dataset

3.3. Evaluating the Robustness of Transfer Learning Across Varying Dataset Sizes

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI