1. Introduction
Population aging has made Alzheimer’s disease (AD) a major elderly health threat. Traditional diagnostics relying on clinical assessments and cognitive tests suffer from subjectivity and delayed detection. Deep learning advancements have enabled automated AD diagnosis through end-to-end learning from medical images, extracting high-dimensional features directly from complex imaging data.
AD progression strongly correlates with cerebral atrophy. Structural MRI (sMRI) serves as a critical biomarker in AD research due to its sensitivity to brain morphological changes [
1]. sMRI quantifies tissue changes and reveals characteristic alterations like hippocampal atrophy, demonstrating significant computer-aided diagnostic value. Although widely used, MRI interpretation remains expertise-dependent, being time-consuming and subjective. Early-stage subtle brain changes particularly challenge manual detection, increasing misdiagnosis risks.
Deep learning advances in medical image analysis [
2,
3,
4,
5,
6,
7] enable automatic brain feature extraction and precise pathological quantification, facilitating accurate AD staging. While enhancing diagnostic efficiency and enabling early intervention, AD imaging data face challenges including high dimensionality, privacy concerns, and limited availability.
sMRI captures tissue-specific changes underlying AD-related atrophy. However, scanner variations in manufacturers, protocols, and software introduce biases that compromise data stability. These confounders introduce scanner-dependent features, obscure disease-related information, reduce model generalization, and cause domain shift. Real-world data from different sources exhibit substantial distribution discrepancies due to varying imaging conditions, device parameters, processing pipelines, and population characteristics. Direct application of source-trained models to target domains typically causes performance degradation, as features become source-domain specific with limited target generalization. Domain adaptation research consequently focuses on eliminating cross-domain distribution differences.
To address AD diagnostic performance degradation from MRI domain shifts, we integrate metric learning and adversarial learning within a joint adversarial domain adaptation framework. Our approach employs multi-scale feature aggregation to enhance multi-scale lesion and structural variation perception, generating dynamic category-clustered prototypes. We compute fused domain and category prototypes across source and target domains, proposing a metric learning method combining both feature differences for cross-domain alignment. A classification certainty maximization strategy establishes a joint adversarial mechanism with domain and classification discrepancy discriminators, creating a dual adversarial learning method. Unlike conventional approaches relying solely on classifier consistency, our method simultaneously optimizes output consistency and certainty, preventing ambiguous decisions while enhancing discriminative capability, stability, and accuracy.
For cross-domain variations in resolution, lesion size, and structural complexity, we introduce a multi-scale feature aggregation module that integrates hierarchical features, preserving semantic information while capturing spatial details. This enables simultaneous perception of fine small-lesion structures and large-lesion contours, mitigating scale-related information loss while improving generalization and robustness.
In summary, we propose a joint adversarial domain adaptation method for AD diagnosis incorporating multi-scale feature aggregation, combining metric and adversarial learning advantages. Our integrated feature alignment method and dual adversarial learning with certainty maximization enhance cross-domain decision reliability and generalization, achieving unsupervised domain adaptation for AD auxiliary diagnosis.
The main contribution of this paper are as follows:
- 1.
We designed a Multi-Scale Feature Aggregation (MSFA) module that integrates feature information from different hierarchical levels. This module enhances the model’s perception capability for diverse lesion scales and structural variations in brain regions, thereby improving its adaptability to multi-scale targets.
- 2.
We propose a metric learning module that integrates both domain and category feature differences. This approach generates dynamic prototype features based on category clustering and computes fused domain feature prototypes and category feature prototypes from the source and target domains. The proposed metric learning method achieves cross-domain feature alignment, enabling the model to better distinguish between different categories while aligning data distributions across domains, thereby significantly enhancing classification performance in cross-domain tasks.
- 3.
We introduce a classification certainty maximization strategy to construct a joint adversarial mechanism comprising a domain discriminator and a classification discrepancy discriminator. This leads to the proposal of a dual adversarial learning domain adaptation module that combines both domain and category information, achieving synergistic optimization of domain adaptation through coordinated domain adversarial and category adversarial learning.
- 4.
Extensive experiments were conducted on four benchmark datasets: ADNI-1, ADNI-2, ADNI-3, and AIBL, for unsupervised target domain four-classification tasks. These tasks include Alzheimer’s Disease (AD) versus Normal Control (NC), Mild Cognitive Impairment (MCI) versus NC, AD versus MCI, and AD versus MCI versus NC. The proposed JDC-DA method achieves the best overall performance compared with several state-of-the-art domain adaptation methods.
3. Methods
Domain discriminator-based adaptive algorithms, while successfully applied in numerous medical diagnostic tasks, exhibit significant limitations in current unsupervised domain adaptation implementations. Most existing methods concentrate exclusively on global domain alignment between source and target domains while neglecting essential category-level alignment. These approaches typically employ two adversarial components: a domain discriminator and a feature generator. Both source and target samples pass through a shared feature generator, while the discriminator distinguishes between domains, and the generator attempts to deceive the discriminator by minimizing inter-domain distribution differences.
This alignment process fundamentally fails to adequately utilize the relationship between samples and specific decision boundaries, consequently hindering the acquisition of discriminative features. In Alzheimer’s disease auxiliary diagnosis, for instance, the generator may produce ambiguous features near decision boundaries, as it primarily optimizes for domain distribution similarity without considering whether these features effectively discriminate between Normal Control (NC), Mild Cognitive Impairment (MCI), and Alzheimer’s Disease (AD) categories.
Using binary classification for illustration, feature generation relying solely on domain alignment increases diagnostic uncertainty and compromises accuracy, as shown in
Figure 1a. Achieving optimal unsupervised classification performance requires simultaneous global domain alignment and category-level alignment to enable accurate cross-domain classification, demonstrated in
Figure 1b. Therefore, in Alzheimer’s disease diagnosis, global distribution alignment alone proves insufficient for extracting discriminative features; integration with task-specific decision boundaries becomes imperative for optimizing feature generation.
To address the performance degradation in Alzheimer’s disease diagnosis caused by domain shifts in MRI data from different sources and acquisition domains, we investigate unsupervised domain adaptation strategies for target domains. Building upon multi-scale feature extraction from original images, we integrate the advantages of two mainstream unsupervised domain adaptation approaches: metric learning and adversarial learning. We propose a metric learning method for feature alignment that incorporates both domain-level and category-level feature differences. Furthermore, we introduce a classification certainty maximization strategy and develop a dual adversarial learning framework based on joint domain and category alignment. Finally, we construct an unsupervised auxiliary diagnosis algorithm for Alzheimer’s disease that achieves joint domain and category adaptation through our proposed dual-domain adaptation approach.
We denote the source domain data as , which represents MRI image data for training, where is a source domain sample and is its corresponding label. The target domain data is denoted as , which comprises MRI images from different domains for training and testing, where is a target domain sample and remains unlabeled. Let and represent the sample sizes of the source and target domains, respectively, with both domains sharing identical feature and label spaces. We design an unsupervised domain adaptation (UDA) algorithm framework that jointly utilizes labeled source domain data and unlabeled target domain data during training, thereby significantly improving the model’s generalization capability and robustness in the target domain.
3.1. Overall Model Architecture
We integrate the advantages of two mainstream UDA approaches, metric learning and adversarial learning, to develop a comprehensive framework for Alzheimer’s disease diagnosis. We employ multi-scale feature aggregation to enhance the model’s perception capability for diverse lesion scales and structural variations across brain regions. This approach effectively mitigates the impact of resolution and scale discrepancies across different domains, such as MRI data collected from various hospitals, while simultaneously improving the model’s adaptability to these variations. We generate dynamic prototype features based on category clustering using the multi-scale aggregated features. By computing and fusing both domain feature prototypes and category feature prototypes from source and target domains, we propose a novel metric learning method that integrates both domain-level and category-level feature differences. This enables effective cross-domain feature alignment at both global and category levels. Furthermore, we introduce a classification certainty maximization strategy that leverages both joint certainty among classifiers and individual classifier certainty to guide model optimization. This strategy facilitates the construction of a joint adversarial mechanism comprising a domain discriminator and a classification discrepancy discriminator, leading to the development of a dual adversarial learning method based on joint domain and category alignment.
The proposed unsupervised joint adversarial domain adaptation algorithm for Alzheimer’s disease auxiliary diagnosis consists of three core components: a multi-scale feature extraction and aggregation module, a metric learning module that integrates domain and category feature differences, and a joint domain and category dual adversarial learning module, as illustrated in
Figure 2.
Our proposed framework comprises three integrated components:
- 1.
Multi-Scale Feature Extraction and Aggregation Module (MSFA): We extract and aggregate multi-scale features from both source and target domains to enhance the representational capacity and discriminative power of the fused features.
- 2.
Incorporating Feature Disparity Across Domains and Categories Metric Learning Module (ADCML): We construct dynamic prototypes for both domains and categories to compute feature disparities, enabling cross-domain feature alignment at both global and category levels.
- 3.
Joint Domain and Category Dual Adversarial Learning Module (DCDAL): We implement a classification certainty maximization strategy and establish a joint adversarial mechanism comprising a domain discriminator and a classification discrepancy discriminator.
By integrating the advantages of two mainstream UDA paradigms, metric learning and adversarial learning, we achieve joint domain and category adaptation through coordinated feature metric learning and dual adversarial learning. This integrated approach enables effective unsupervised cross-domain auxiliary diagnosis for Alzheimer’s disease, simultaneously addressing both domain-level and category-level distribution shifts.
3.2. MSFA: Multi-Scale Feature Extraction and Aggregation Module
Data distributions across different domains, such as MRI data from various hospitals, may exhibit significant variations due to differences in resolution or scale. We employ multi-scale feature extraction techniques to enhance model adaptability to these variations, thereby mitigating the negative impact of inter-domain differences on model performance. In Alzheimer’s disease diagnosis, pathological brain changes typically manifest at multiple scales: macroscopic alterations may include global structural atrophy such as hippocampal volume reduction, while microscopic changes may involve subtle local texture variations in gray and white matter signals. Consequently, single-scale feature extraction methods often fail to simultaneously capture information at these different granularities. Multi-scale feature extraction enables the concurrent capture of both large-scale structural changes and local detailed characteristics, thereby constructing more discriminative feature representations. This approach not only improves classification performance but also enhances diagnostic comprehensiveness and accuracy, providing substantial support for early detection and precise diagnosis of Alzheimer’s disease.
However, during multi-scale feature extraction, as network depth increases, the model’s capacity for extracting high-level semantic features improves at the expense of capturing low-level spatial information. Furthermore, the extended path between low-level features and high-level outputs impedes adequate optimization of low-level features. This high-level feature alignment approach often results in suboptimal performance when adapting to small-scale targets requiring detailed spatial information.
To address these limitations and enable joint adversarial learning across different feature hierarchies, we design a Multi-Scale Feature Aggregation (MSFA) module. This module integrates feature maps generated from both conv5 and conv6 layers, combining hierarchical feature information to preserve high-level semantics while fully utilizing low-level spatial details, thereby significantly enhancing domain adaptation performance. Through this design, the model achieves better balance between high-level semantic features and low-level spatial information, improves adaptability to multi-scale targets, and further optimizes cross-domain task performance.
The schematic diagram of this module is shown in
Figure 3. The workflow proceeds as follows: first, we resize the low-level feature maps
extracted from the multi-scale feature extractor to match the dimensions of the high-level feature maps
through resampling. Subsequently, we concatenate the feature maps from different hierarchies. The concatenated feature maps then undergo transformation through a function composed of 1 × 1 convolution and batch normalization, enabling multi-scale information interaction and correlation to complete “feature fusion”. Finally, we employ a Channel Attention Module (CAM) for “feature enhancement” to further optimize the feature representation. The channel attention module significantly enhances the effectiveness of fused features by modeling inter-dependencies between different channels. Specifically, we first perform global average pooling on the fused feature maps along spatial dimensions to extract global descriptor information for each channel. Through processing with nonlinear activation functions and sigmoid function, we generate channel attention weights. These weights reflect the importance of each channel, with higher values indicating more critical features. Finally, we perform element-wise multiplication between the generated channel weights and the original fused features, thereby suppressing irrelevant or redundant channel information while strengthening focus on key features. This channel attention mechanism enhances information beneficial for the classifier, consequently improving the model’s representational capacity and discriminative power for multi-scale fused features.
We denote the features obtained from source and target domain samples after processing through the MSFA module as and , respectively. By utilizing the Multi-Scale Feature Aggregation module, we effectively aggregate and enhance multi-scale information, providing robust support for domain adaptation learning across multiple scales.
3.3. ADCML: Cross-Domain and Cross-Category Metric Learning Module
As introduced in
Section 1, metric learning-based domain adaptation has emerged as an active research focus in recent years. The fundamental objective of metric learning is to learn an appropriate distance metric that directly captures similarity measurements between samples, thereby providing novel perspectives for domain adaptation tasks. The core concept involves learning a shared embedding space between source and target domains to enhance cross-domain sample consistency. Specifically, metric learning methods define distance functions or similarity metrics to learn a unified feature space, bringing the data distributions of source and target domains closer together and consequently significantly improving model generalization in the target domain.
To excavate more discriminative information from cross-domain relationships and ensure that samples from the same category are closer while those from different categories are farther apart in the feature embedding space, we propose a metric learning module that integrates both domain feature disparities and category feature discrepancies. This module combines the advantages of prototype learning and metric learning, enabling the model to better distinguish between different categories while aligning data distributions across domains, thereby substantially enhancing classification performance in cross-domain tasks. Furthermore, this module can partially alleviate the instability and gradient explosion issues commonly encountered in subsequent adversarial learning domain adaptation, consequently strengthening model stability and robustness.
The module constructs average feature prototypes for source and target domain samples to compute domain feature differences, while establishing average feature prototypes for positive and negative category samples across domains to calculate category feature discrepancies.
We compute the average feature prototypes for the source and target domains as follows:
where
and
represent the average feature prototypes of the source and target domains, respectively, and
and
denote the feature vectors of the
i-th source domain sample and the
j-th target domain sample obtained through the multi-scale feature aggregation module.
The domain feature disparity is computed as:
We calculate the domain prototype feature disparity metric loss
from the domain feature disparity
:
where
denotes the network parameters of the multi-scale feature extraction and aggregation module, i.e., the feature generation network.
Through this formulation, we obtain the domain feature metric loss. Subsequently, we compute the category feature disparity metric loss by constructing sample feature prototypes for the three categories (NC, MCI, and AD) in both the source and target domains, and then calculating the category feature prototype disparity metrics between the source and target domains.
We compute the category feature prototypes for the source domain as follows:
where
,
, and
represent the category average feature prototypes for NC, MCI, and AD samples in the source domain during training, while
,
, and
denote the sample counts for the three categories in the source domain.
For target domain samples lacking labels, we employ k-means clustering to obtain category average feature prototypes. The clustering procedure is summarized as follows: (1) Initialization: initialize cluster centers using the three source domain feature prototypes , , and ; (2) Assignment: assign each feature vector from the feature extraction and aggregation module to the nearest cluster center; (3) Update: recalculate cluster centers as the mean of all points in each cluster; (4) Iteration: repeat assignment and update steps until all target domain samples are processed. This yields the three target domain feature prototypes , , and .
In practical implementation, due to the large size of 3D MRI data and hardware limitations, we set a relatively small batch size (8 in this work). Directly computing feature means for all samples in a category would require excessive computational resources. To address this challenge and enable end-to-end training, we adopt a dynamic computation approach for feature prototypes. Specifically, we first calculate the average feature prototypes for each category within the current batch, then compute a weighted average with the prototypes from previous iterations to obtain updated category prototypes. This moving average representation of category prototypes reduces noise impact and enhances model generalization.
After obtaining category feature prototypes for both source and target domains, we compute the category feature disparities as:
The category prototype feature disparity metric loss
is calculated as:
We formulate the integrated domain and category feature disparity metric loss as:
where
represents the integrated feature disparity metric loss, and
denotes a weighting coefficient. To gradually enhance the reliability of target domain sample categories during training, we design
as an adaptive weighting coefficient that varies between 0 and 1 with increasing training iterations:
where
is a constant,
E indicates the current training epoch, and
represents the total number of training epochs.
For the Alzheimer’s disease auxiliary diagnosis task, we propose a metric learning method that integrates both domain and category feature disparities. This approach effectively excavates complex structural information from imaging data, captures global and category-level features, and enhances distribution alignment capability between source and target domains, thereby improving diagnostic accuracy and model generalization under cross-domain conditions.
3.4. DCDAL: Joint Domain and Category Dual Adversarial Learning
Considering the critical role of domain-level knowledge and category information in successful domain adaptation, we propose a joint domain and category dual adversarial learning method. This approach achieves synergistic optimization through domain adversarial learning via a domain discriminator and category adversarial learning via a dual-classifier mechanism. Our architecture comprises three core components: a feature generator (the multi-scale feature extraction and aggregation module), a domain discriminator, and a discrepancy discriminator composed of dual task-specific label predictors. These components collectively form a minimax game for both domain and category adversarial learning.
The adversarial learning between the feature generator and domain discriminator ensures effective domain alignment, while the adversarial learning between the feature generator and discrepancy discriminator enables the generation of target features that approximate the source class support, thereby achieving category alignment.
In our designed joint domain and category dual adversarial learning module, we incorporate a feature generator , a domain discriminator , and two label classifiers and for the source and target domains. The generator primarily perturbs the data source to make the source and target feature domains mutually similar. The domain discriminator mainly identifies and distinguishes between the source and target domains, implementing domain adversarial learning from a global perspective. The two label classifiers perform classification on both source and target domains, and we implement a classification certainty maximization strategy to achieve category adversarial learning. Building upon the domain discriminator and dual-classifier discriminator, we establish a minimax game between these two modules, collectively realizing global domain adversarial learning and category information-based adversarial learning.
3.4.1. Domain-Adversarial Domain Adaptation
The domain discriminator
is designed to distinguish the origin of samples (source or target domain). The adversarial learning between the feature generator and domain discriminator produces the domain classification loss. The domain discriminator functions as a binary classifier, where
when
and
when
. This adversarial mechanism effectively explores and utilizes inter-domain discriminatory information, enabling the feature generator to learn domain-invariant features and reduce domain discrepancy. The domain adversarial learning loss is formulated as:
where
represents the domain classification loss,
and
denote the parameters of the feature generator
and domain discriminator
respectively, and
indicates the cross-entropy loss operation.
3.4.2. Source Domain Classification Loss
We employ two label classifiers to classify the source domain data and compute the source domain label classification loss. In this work, the two classifiers utilize different initialization methods, enabling them to mitigate misclassification issues caused by insufficient feature extraction during domain alignment, despite potentially producing different predictions. The source domain label classification loss is calculated as follows:
where
denotes the label classification loss,
and
represent the parameters of the two label classifiers
and
respectively, and
indicates the cross-entropy loss operation. It should be noted that since only the source domain samples possess labels,
is computed exclusively using source domain samples.
Our discrepancy discriminator performs dual tasks: ensuring correct predictions for source samples while synthesizing discriminative target features near the source class support by maximizing the classification certainty information from both label predictors. This design effectively addresses the feature ambiguity and category uncertainty issues that may arise when using a single classifier during domain alignment. Specifically, while global alignment with a single classifier may lead to insufficient feature extraction and consequent misclassification, the dual-classifier mechanism provides additional discriminative information to mitigate these limitations.
3.4.3. Classification Certainty Maximization Strategy
In joint adversarial domain adaptation, we utilize two label classifiers to guide the model in learning domain-invariant features. Conventional methods primarily focus on the consistency between classifier outputs while overlooking the classification certainty of individual classifiers. Consequently, existing approaches may potentially misguide the classifiers toward ambiguous outputs that compromise discriminability. To address this limitation, we propose a classification certainty maximization strategy that considers both the joint certainty between classifiers and the individual certainty of each classifier.
The dual task-specific label predictors in joint adversarial domain adaptation operate under the premise that a model can correctly classify samples when both classifiers produce consistent outputs. However, most methods merely minimize the output discrepancy between classifiers while ignoring their output certainty, which may yield ambiguous predictions where each class receives equal probability.
For instance, when constraining the two classifiers’ outputs using L1-distance, specifically where and represent the probability outputs from both classifiers, both classifiers producing predictions of [0.5, 0.5] for Alzheimer’s disease would satisfy the -distance constraint despite failing to provide meaningful classification.
We therefore extend this hypothesis by incorporating classification certainty, proposing that the model can correctly classify samples when both classifiers
and
produce consistent and certain outputs for the same sample. We maintain that certain outputs can effectively guide the model to extract discriminative features from samples, and accordingly propose a joint classification certainty metric formulated as:
where
denotes the joint categorical certainty loss function,
and
represent the prediction outputs from classifiers
and
for target domain samples, and
indicates the softmax probability for the
k-th class. Notably,
is upper-bounded by 1 since it represents the sum of dot products between softmax-normalized classifier outputs. Higher values indicate greater joint classification certainty between the classifiers.
The aforementioned equation serves to quantify the consistent certainty between both classifiers’ outputs. In terms of consistency, when significant discrepancies exist between the classifiers, their predictions for the same category inevitably exhibit substantial variance, resulting in diminished joint classification certainty. Regarding certainty, even with minimal divergence between classifiers, ambiguous classification can still lead to reduced joint classification certainty. Therefore, the joint classification certainty metric simultaneously considers both the consistency and certainty of the outputs from both classifiers.
Furthermore, focusing exclusively on the outputs between classifiers may still yield ambiguous predictions when the two classifiers substantially diverge, particularly in their highest-confidence prediction categories. For instance, with prediction values of [0.8, 0.2] and [0.1, 0.9], both classifiers’ outputs might be optimized toward [0.5, 0.5] to increase joint classification certainty, thereby compromising each classifier’s individual discriminative capability. Consequently, we maintain that the independent certainty of each classifier’s output should also be preserved, which we define as each classifier’s joint classification certainty with itself:
where
represents the classifier independence loss. As evident from Equation (
12), both
and
are necessary since
ensures the discriminative capability of the entire model on the target domain, while
preserves the certainty of each classifier’s output to avoid ambiguous predictions. However, determining the appropriate weight distribution between
and
requires careful consideration. Excessive weighting of
would undermine the purpose of introducing
, while disproportionate weighting of
could impair the overall discriminative capability of both classifiers. Through gradient derivation, we establish that
.
Intuitively, we focus on individual classifier certainty while ensuring the gradient influence of
does not exceed that of the joint classification certainty
. We therefore compute the classification certainty maximization loss as follows:
where
represents the classification certainty maximization loss,
denotes the joint classification certainty loss, and
indicates the individual classifier independence loss. Thus, our proposed classification certainty maximization strategy incorporates both the joint certainty among classifiers and the individual certainty of each classifier.
3.5. Overall Objective Function and Parameter Optimization
We construct an unsupervised Alzheimer’s disease auxiliary diagnosis model based on joint adversarial domain adaptation. By incorporating multi-scale feature aggregation to enhance the model’s perception of diverse lesion scales and structural variations in brain regions, we integrate the advantages of metric learning and adversarial learning. We propose a metric learning method that combines domain feature disparities and category feature discrepancies, introduce a classification certainty maximization strategy, and develop a domain adaptation approach with joint domain and category dual adversarial learning. This framework improves the reliability and generalization capability of cross-domain decision-making. The complete objective function is formulated as:
where
represents the label classification loss,
denotes the feature metric loss,
indicates the domain classification loss, and
represents the classification certainty loss. The coefficients
and
are trade-off parameters with
,
, and
.
Based on the comprehensive analysis above, we formulate the optimization objectives for different loss components. To enhance target domain classification accuracy, we minimize the label classification loss
and metric loss
as shown in Equations (
15) and (
16). For the domain classification loss
, we maximize it to enable the feature generator to extract domain-invariant features while improving the discriminative capability of the domain discriminator, as expressed in Equation (
17). Regarding the classification certainty loss
, we maximize it to ensure both classifiers achieve maximum classification certainty to some extent, as specified in Equation (
18). This formulation establishes a dual adversarial domain adaptation framework with two minimax games.
We optimize the network parameters according to the following formulations:
where
represents the optimal parameters for training the feature generator
. Equation (
15) trains the feature generator
by minimizing the feature disparity metric loss between source and target domains.
and
denote the optimal parameters for training the two classifiers
and
using source domain data. Equation (
16) trains both the feature generator
and the two label classifiers
and
by minimizing the source domain label classification loss based on features generated by the feature generator.
indicates the optimal parameters for training the domain discriminator
. Equation (
17) trains the domain discriminator
by maximizing the domain discriminator loss to distinguish between source and target data. Equation (
18) trains the two label classifiers
and
by maximizing the classification certainty loss for target domain data.
Through this dual minimax joint adversarial game, the model achieves effective training. The feature extractor acquires domain-invariant features that preserve discriminative information, while the domain discriminator and two label classifiers accomplish domain adaptation through joint domain and category dual adversarial learning. This approach enables cross-domain alignment from global to category levels, ultimately realizing unsupervised Alzheimer’s disease classification diagnosis in cross-modal target domains.
4. Result and Analysis
In this section, we conduct an experimental evaluation of our method and compare it with other approaches.
4.1. Datasets and Data Processing
We evaluate the proposed method using four benchmark datasets with baseline structural MRI: (1) Alzheimer’s Disease Neuroimaging Initiative (ADNI-1) [
24], (2) ADNI-2, (3) ADNI-3, and (4) Australian Imaging Biomarkers and Lifestyle Study of Aging database (AIBL) [
25]. To ensure independent evaluation, we remove subjects that appear simultaneously in ADNI-1, ADNI-2, and ADNI-3 from the latter two datasets. To prevent classification bias caused by class imbalance, we select approximately balanced numbers of samples from all three categories in both source and target domains. The demographic characteristics of the studied subjects are presented in
Table 1. In each of our cross-domain experiments, the source domain samples are labeled, while the target domain samples are unlabeled and are used exclusively for model validation. The number of source domain samples is larger than that of the target domain. A portion of the source domain data is utilized for model pre-training, and the remaining part is employed for training the domain adaptation network. For the unlabeled target domain samples, 80% are used for training the domain adaptation network, and the remaining 20% are reserved for model validation. Importantly, the data used for training are strictly excluded from validation.
For consistency across the four datasets, we merge both stable Mild Cognitive Impairment (sMCI) and progressive Mild Cognitive Impairment (pMCI) categories into a single MCI category. To prevent data leakage, we utilize only the initial examination data for each subject, specifically the first available 1.5T/3T T1-weighted structural MRI scans. Data from different time points of the same subject never appear simultaneously in both training and testing sets.
The imaging data employed in this study comprise ADNI-1 with 1.5T scanners and the other three datasets with 3T T1-weighted structural MRI. Original MRI scans from ADNI and AIBL datasets exhibit varying spatial resolutions, with the most common native resolutions of T1-weighted MRI being 1 × 1 × 1.2 mm or 1.2 × 1.2 × 1.2 mm, and partially 1 × 1 × 1 mm, resulting in different voxel counts across scans. Additional variations arise from differences in positioning, brain size and location within images, and intensity variations across multi-center scanner models.
To mitigate the impact of these external factors on Alzheimer’s disease recognition performance, we implement the following preprocessing pipeline for all original images: (1) resampling all images to a uniform spatial resolution (voxel size of 1 × 1 × 1 mm
3) [
26]; (2) skull stripping [
27]; (3) intensity correction using the N4 algorithm [
28]; (4) linear registration of MRI images to the Montreal Neurological Institute (MNI) standard brain template AAL (MNI152, resolution of 1 × 1 × 1 mm
3;) [
29]; (5) center cropping to remove background regions (non-brain areas), reducing computational complexity while ensuring the registered brain remains centered; and (6) intensity normalization to the range [0, 1]. The final processed MRI scans all achieve a spatial resolution of 182 × 218 × 182.
Figure 4 demonstrates representative examples of original and preprocessed sections from identical spatial locations for three subjects with varying native resolutions: Normal Control (256 × 256 × 170), Mild Cognitive Impairment (240 × 256 × 176), and Alzheimer’s Disease (256 × 256 × 211). For each subject, we utilize three-dimensional structural MRI scans for both training and testing procedures.
4.2. Experimental Setup
In order to verify the effectiveness of the proposed method, we conducted experiments on four classification tasks, including AD vs. NC, MCI vs. NC, AD vs. MCI, and AD vs. MCI vs. NC. We adopted four metrics of accuracy (ACC), sensitivity (SEN), specificity (SPE) and area under receiver operating characteristic (ROC) curve (AUC) for evaluation. For each metric, a higher value indicates better classification performance. The metrics are defined as follows:
where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative values respectively.
Among the four datasets, ADNI-3 and AIBL contain relatively fewer samples and therefore serve exclusively as target domains in our domain adaptation experiments. We first pre-train the model using source domain samples before initiating adversarial training, which necessitates larger sample sizes in source domains compared to target domains.
We evaluate four domain adaptation configurations for each classification task: (1) ADNI-1 → ADNI-2: ADNI-1 as source domain and ADNI-2 as target domain (2) ADNI-2 → ADNI-1: ADNI-2 as source domain and ADNI-1 as target domain (3) ADNI-1 + ADNI-2 → ADNI-3: combined ADNI-1 and ADNI-2 as source domain and ADNI-3 as target domain (4) ADNI-1 + ADNI-2 → AIBL: combined ADNI-1 and ADNI-2 as source domain and AIBL as target domain.
To ensure fair comparison with state-of-the-art unsupervised domain adaptation methods, we implement all algorithms using their publicly available source code. All experiments are conducted on 4 × NVIDIA RTX3090Ti GPUs with 24 GB memory. We employ the Adam optimizer with an initial learning rate of 0.001, which is reduced by a factor of 0.1 every 10 epochs. The training process spans 100 epochs with a batch size of 8. We set all loss weights , , and to 1 to maintain balanced optimization. To enhance source domain classification accuracy and accelerate subsequent domain adaptation, we pre-train both the MRI multi-scale feature extraction and aggregation module and the two label classifiers and for 30 epochs prior to formal domain adaptation training.
4.3. Comparison with State-of-Art Domain Adaptation Methods
To validate the effectiveness of the proposed JDC-DA method, we compare our approach with 3DResNet50 [
30] and four state-of-the-art domain adaptation methods:
[
31], ATM [
32], DCAN [
33], and RCE [
34]. These competing methods are briefly introduced as follows: (1) 3DResNet50 [
30]: We employ ResNet50 as the baseline model for supervised learning on the source domain, which is directly applied to prediction tasks on the target domain without performing feature alignment. Given the three-dimensional nature of MRI data, we replace all two-dimensional convolution blocks in ResNet50 with 3D convolution blocks. (2)
[
31]: This attention-guided deep domain adaptation framework for multi-site MRI data employs adversarial training using a domain discriminator and feature extractor, requiring no category label information from the target data. (3) ATM [
32]: This method leverages both adversarial training and metric learning advantages. Through the Maximum Density Divergence loss function, it simultaneously minimizes inter-domain divergence and maximizes intra-class density, effectively achieving cross-domain feature alignment. (4) DCAN [
33]: This approach aligns conditional distributions by minimizing Conditional Maximum Mean Discrepancy while extracting discriminative information from the target domain through mutual information maximization between samples and their predicted labels. (5) RCE [
34]: This method employs both clean and noisy classifiers to estimate the noise transition matrix. The clean classifier assigns pseudo-labels for target data, while the noisy classifier trains on noisy target samples and derives optimal parameters through a closed-form solution, enhanced by a pre-trained domain predictor.
Table 2 presents a comparative summary of four state-of-the-art domain adaptation methods, highlighting the core ideas, contributions, and limitations associated with each method.
We evaluate all methods on cross-domain problems where source domain samples contain labels while target domain samples remain unlabeled. For experimental setup, we use a portion of source domain samples for model pre-training and the remainder for domain adaptation network training. For target domain unlabeled samples, we allocate 80% for domain adaptation network training and 20% for model validation. We conduct comparative experiments on four classification tasks: AD versus NC, MCI versus NC, AD versus MCI, and AD versus MCI versus NC.
To ensure fair comparison, we employ the identical multi-scale feature extraction module of JDC-DA across all methods while maintaining consistent training strategies, including learning rate and number of training epochs.
4.3.1. AD vs. NC Classification
Table 3 presents the performance results of different methods in the task of AD vs. NC classification. As shown in
Table 3, the proposed JDC-DA model achieves the best performance across all four Domain Adaptation scenarios. Furthermore, our JDC-DA demonstrates superior overall performance in the “ADNI-1 + ADNI-2 → ADNI-3/AIBL” scenario compared to single-source domain scenarios.
Figure 5 shows the ROC curves of the compared algorithms for AD vs. NC classification under the four domain adaptation settings.
4.3.2. AD vs. MCI Classification
Table 4 presents the performance results of different methods in the task of AD vs. MCI classification. From
Table 2, we can see that among the compared methods, the proposed JDC-DA model achieves the best performance of ACC, SEN, SPE and AUC across all four Domain Adaptation scenarios. Compare with AD vs. NC classification, AD vs. MCI classification is more challenging. The difficulty may be attributable to the subtle and often indistinguishable neuroimaging features of MCI on structural MRI, owing to its status as an early, prodromal stage of Alzheimer’s disease.
4.3.3. MCI vs. NC Classification
Table 5 presents the performance results of different methods in the task of MCI vs. NC classification. From
Table 5, we can see that the proposed JDC-DA model still achieves the best performance across all four Domain Adaptation scenarios.
4.3.4. AD vs. MCI vs. NC Classification
In the AD versus MCI versus NC three-class classification task,
Table 6 and
Table 7 present comparative ACC metrics for AD and MCI classifications, respectively, across four domain adaptation scenarios. We note that
[
31] employs a binary classification framework for AD versus NC and consequently does not appear in these tables. The three-class classification problem presents greater complexity than binary classification, which explains the relatively lower ACC metrics observed across all four domain adaptation scenarios. From
Table 6 and
Table 7, we observe that among the compared methods, the proposed JDC-DA model achieves superior performance across all four domain adaptation configurations.
Figure 6 presents the confusion matrix comparison of four algorithms for AD versus MCI versus NC classification under the ADNI-1 → ADNI-2 domain adaptation configuration. We observe that the proposed JDC-DA model maintains superior performance among the compared methods. Particularly, misclassifying AD or MCI cases as NC represents missed diagnoses in clinical practice, and our method achieves the lowest misdiagnosis rate in this critical aspect.
Based on the comprehensive experimental results, we demonstrate that the proposed method achieves excellent performance across all four classification tasks: AD versus NC, AD versus MCI, MCI versus NC, and AD versus MCI versus NC. Specifically, our approach attains 92.16% accuracy in the AD versus NC classification task, representing a 10.08% improvement over the supervised learning baseline using 3DResNet50. For the AD versus MCI classification task, our method achieves 83.56% accuracy. The proposed method also demonstrates outstanding performance in the MCI versus NC classification task with 81.96% accuracy. In the challenging three-class classification task of AD versus MCI versus NC, our method obtains the highest classification accuracy for both AD and MCI categories while producing the fewest missed diagnoses.
Compared with existing domain adaptation methods, the proposed algorithm exhibits significant advantages across all key evaluation metrics, including accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC) for the classification tasks.
5. Discussion
Here, we provide discussions on JDC-DA.
5.1. Ablation Study
We propose a joint adversarial domain adaptation framework for auxiliary diagnosis of Alzheimer’s disease. Our approach incorporates multi-scale feature aggregation to enhance the model’s perception of diverse lesion scales and structural variations across brain regions. By integrating the advantages of two mainstream unsupervised domain adaptation methodologies—metric learning and adversarial learning—we develop a feature alignment method that combines both domain and category feature disparities. Furthermore, we introduce a classification certainty maximization strategy and propose a dual adversarial learning approach based on joint domain and category alignment, thereby improving the reliability and generalization capability of cross-domain decision-making.
To validate the effectiveness of individual components in JDC-DA, we employ an adversarial domain adaptation framework as our baseline model for comparative analysis. This baseline comprises three core components: (1) a feature extractor that extracts discriminative features from input images, shared between source and target domains; (2) two classifiers that perform classification predictions on the extracted features, maximizing prediction discrepancies on target domain samples to guide the feature extractor in learning domain-invariant discriminative features; and (3) a domain discriminator that distinguishes between source and target domain samples, employing an adversarial learning mechanism to optimize the feature extractor for generating domain-invariant features. In our experiments, this baseline model utilizes six 3D convolutional layers for feature extraction, with both classifiers and the domain discriminator implemented as three-layer fully connected networks.
To verify the performance improvements achieved by our proposed modules—the Multi-Scale Feature Aggregation module (MSFA), the Across Domains and Categories Metric Learning module (ADCML), and the classification certainty maximization module (Lcc) from the joint domain and category dual adversarial learning (note that domain adversarial learning is already included in the baseline)—we design a comprehensive ablation study. Starting from the baseline model, we progressively incorporate different modules for performance comparison: (a) baseline model; (b) baseline with MSFA; (c) baseline with ADCML; (d) baseline with Lcc; and (e) baseline with all three modules (MSFA, ADCML, and Lcc). We conduct comparative experiments on three binary classification tasks (AD versus NC, AD versus MCI, MCI versus NC) and one three-class classification task (AD versus MCI versus NC) under the ADNI-1 → ADNI-2 domain adaptation setting. The comparative results are presented in
Table 8.
Through comparative analysis of experimental results, we observe that the baseline model (a) demonstrates suboptimal performance across all evaluation metrics for the three classification tasks. Upon integrating the multi-scale feature aggregation module (b), we achieve diagnostic accuracy improvements of 3.57%, 2.94%, 2.02%, and 2.09% for the four classification tasks respectively compared to the baseline network, with simultaneous AUC value increases of approximately 2 percentage points. These findings indicate that the multi-scale feature aggregation module effectively enhances the model’s capacity for modeling image structures at different scales, improves feature representation, and consequently enhances diagnostic performance.
Further incorporation of the classification certainty loss function (c) yields additional improvements in both SPE and SEN values across the three tasks. This demonstrates that the classification certainty loss function reduces prediction uncertainty by increasing confidence in target domain sample predictions, thereby enhancing classifier robustness and reliability.
Finally, the combined integration of multi-scale feature aggregation, the metric learning module incorporating both domain and category feature disparities (ADCML), and the classification certainty loss function (e) with the baseline model yields optimal cross-domain diagnostic performance. Specifically, we achieve 91.03% ACC in AD versus NC classification, representing a 9.82% improvement over the baseline. Additional improvements include 10.57% for MCI versus NC, 9.69% for AD versus MCI, and 7.56% for AD versus MCI versus NC classification. The other three metrics—SEN, SPE, and AUC—show improvements of approximately 9–10 percentage points.
These results collectively demonstrate that the multi-scale feature aggregation module enhances lesion recognition capability through integration of multi-scale brain imaging features; the ADCML module improves cross-domain sample consistency and category discriminability; and the classification certainty loss function reduces classification uncertainty through increased prediction confidence. The synergistic interaction among these three components significantly enhances performance in AD, MCI, and NC classification tasks.
5.2. Grad-CAM Visualization
In Alzheimer’s disease auxiliary diagnosis research, enhancing the interpretability of diagnostic models is crucial for ensuring their reliability in clinical practice. To evaluate the practical effectiveness of our proposed method, we employ the Grad-CAM visualization technique [
35] and randomly select three cases from the target domain: NC (RID: 018_S_0425), MCI (RID: 021_S_0178), and AD (RID: 027_S_1254). The visualization results across three orthogonal planes for one representative layer are presented in
Figure 7.
The color bar adjacent to the visualization maps indicates the model’s attention intensity for each region, where dark blue represents minimal attention (activation values approaching 0.0) and red indicates regions of highest attention (activation values reaching 0.3, 0.4, or 0.75). The varying numerical ranges in the color bars for AD, MCI, and NC images reflect the model’s differential attention patterns across disease stages: NC images (0.0 to 0.3) demonstrate weak model attention; MCI images (0.0 to 0.4) show emerging attention to potential pathological regions; AD images (0.0 to 0.75) reveal heightened sensitivity to characteristic lesion areas.
Figure 7 demonstrates that brain MRI reveals progressive alterations as cognitive status evolves from NC to MCI and finally to AD. Normal brains maintain structural integrity without significant atrophy, while MCI subjects begin to exhibit localized mild atrophy or density changes. AD patients display marked structural degeneration, particularly in memory-related regions such as the hippocampal area.
Concurrently, our Grad-CAM visualizations show the model’s attention regions progressively shifting from homogeneous distributions to specific high-contribution areas. The salient regions in class activation maps expand progressively with disease advancement. These visualization results not only intuitively present pathological characteristics of neurodegenerative changes but also reveal the decision-making rationale of the deep learning model during disease identification, thereby providing interpretable imaging evidence for auxiliary diagnosis.
5.3. Feature Distribution Analysis via T-SNE Visualization
To validate the effectiveness of feature alignment in our proposed method, we conduct visual analysis of features from both source and target domains. We select image data from three categories—NC, MCI, and ADisease—across source and target domains. These data are processed through both the pre-trained 3DResNet50 model and our proposed JDC-DA algorithm for feature extraction and alignment. We then employ t-SNE technique [
36] to reduce the dimensionality of the extracted features for visualization. The comparative results are presented in
Figure 8.
Figure 8 demonstrates that for unlabeled target domain data, the traditional 3DResNet model exhibits significant category confusion among the feature distributions of NC, MCI, and AD classes, resulting in suboptimal classification performance. In contrast, our proposed algorithm achieves substantial feature alignment between source and target domains in the feature space. The feature distributions of all three categories (NC, MCI, and AD) show effective category alignment across domains, resulting in well-defined inter-class boundaries in the target domain that facilitate accurate classification. These findings provide compelling evidence for the superiority of our method in feature alignment, effectively reducing both domain-shift and category distribution discrepancies, thereby significantly enhancing the model’s generalization capability in cross-domain tasks.
5.4. Analysis of Model Training and Inference Speed
The loss curves of our model for the four classification tasks—AD vs. NC, MCI vs. NC, AD vs. MCI, and AD vs. MCI vs. NC—are shown in
Figure 9 (under the ADNI-1 → ADNI-2 cross-domain setting).
As observed from the training loss curves, our model achieves rapid convergence during training.
In our source-domain → target-domain cross-domain experiments, different datasets are used, and the dataset sizes vary across settings. Therefore, the total training time is not a fixed quantity and is not suitable for direct comparison. Instead, we report the per-case training time per epoch and the per-case inference time for volumes of size .
A comparison of computational cost between our method and existing methods is presented in
Table 9. Here,
TTime denotes the per-case training time within one epoch (minutes), and
ITime denotes the per-case inference time (seconds). All experiments were conducted on an Ubuntu workstation equipped with
NVIDIA RTX 3090 Ti GPUs (24 GB memory per GPU).
Our method has a relatively complex training pipeline, including a multi-scale feature aggregation module, a metric learning module, and an adversarial learning module; consequently, its training time is comparatively longer. However, during inference, features extracted by the multi-scale feature aggregation module are directly fed into the target-domain classifier, resulting in a streamlined inference pipeline. Among all compared methods, our model achieves the fastest inference speed, which is advantageous for rapid Alzheimer’s disease diagnosis in practical applications.
6. Conclusions
To address the performance degradation in Alzheimer’s disease diagnosis caused by domain shifts in multi-source MRI data, we integrate the advantages of two mainstream unsupervised domain adaptation approaches, metric learning and adversarial learning, and propose a Joint Adversarial Domain Adaptation framework for Alzheimer’s Disease auxiliary diagnosis (JDC-DA). The proposed framework comprises three key components: (a) a Multi-Scale Feature Aggregation module that integrates hierarchical feature information to enhance the model’s perception of diverse lesion scales and structural variations across brain regions, thereby improving adaptability to multi-scale targets; (b) an Across Domains and Categories Metric Learning module that generates dynamic prototype features based on category clustering, computes fused domain and category feature prototypes from source and target domains, and achieves cross-domain feature alignment through integrated domain and category feature disparities, enabling the model to better distinguish between categories while aligning data distributions across domains; and (c) a Joint Domain and Category Dual Adversarial Learning module that incorporates a classification certainty maximization strategy to establish a joint adversarial mechanism with domain and classification discrepancy discriminators, achieving synergistic optimization of domain adaptation through coordinated domain adversarial and category adversarial learning.
We evaluate our method on 1230 structural MRI scans from four benchmark datasets (ADNI-1, ADNI-2, ADNI-3, and AIBL) for unsupervised target domain classification tasks, including AD versus NC, MCI versus NC, AD versus MCI, and AD versus MCI versus NC. Experimental results demonstrate that the proposed JDC-DA method achieves superior overall performance compared to several state-of-the-art domain adaptation approaches. Our findings confirm the effectiveness of JDC-DA for unsupervised target domain classification of AD, MCI, and CN, suggesting its potential value in clinical applications for Alzheimer’s disease diagnosis.