Next Article in Journal
Exploring the Merchera Ethnic Group Through ChatGPT: The Risks of Epistemic Exclusion
Previous Article in Journal
Wi-Fi Sensing and Passenger Counting: A Statistical Analysis of Local Factors and Error Patterns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Source-Free Domain Adaptation for Cross-Modality Abdominal Multi-Organ Segmentation Challenges

1
College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
2
Key Laboratory of Computer Vision and Machine Learning (Huaqiao University), Fujian Province University, Xiamen 361021, China
3
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(6), 460; https://doi.org/10.3390/info16060460
Submission received: 8 April 2025 / Revised: 27 May 2025 / Accepted: 28 May 2025 / Published: 29 May 2025

Abstract

:
Abdominal organ segmentation in CT images is crucial for accurate diagnosis, treatment planning, and condition monitoring. However, the annotation process is often hindered by challenges such as low contrast, artifacts, and complex organ structures. While unsupervised domain adaptation (UDA) has shown promise in addressing these issues by transferring knowledge from a different modality (source domain), its reliance on both source and target data during training presents a practical challenge in many clinical settings due to data privacy concerns. This study aims to develop a cross-modality abdominal multi-organ segmentation model for label-free CT (target domain) data, leveraging knowledge solely from a pre-trained source domain (MRI) model without accessing the source data. To achieve this, we generate source-like images from target-domain images using a one-way image translation approach with the pre-trained model. These synthesized images preserve the anatomical structure of the target, enabling segmentation predictions from the pre-trained model. To further enhance segmentation accuracy, particularly for organ boundaries and small contours, we introduce an auxiliary translation module with an image decoder and multi-level discriminator. The results demonstrate significant improvements across several performance metrics, including the Dice similarity coefficient (DSC) and average symmetric surface distance (ASSD), highlighting the effectiveness of the proposed method.

1. Introduction

Abdominal organ segmentation is vital in clinical practice, facilitating accurate diagnosis, treatment planning, and monitoring of conditions such as tumors, organ abnormalities, and metabolic disorders [1]. CT imaging, with its high-resolution visualization of organs like the liver, kidneys, and pancreas, enables precise delineation, crucial for medical decision making. However, challenges arise from the poor quality of CT images, which complicates annotation and leads to a shortage of labeled data for training reliable segmentation models [2,3]. To address this issue, unsupervised domain adaptation (UDA) has emerged as one of the most effective strategies [4,5].
The UDA technique adapts knowledge learned from high-quality source data (e.g., MRI) to the target domain (e.g., CT) for cross-modality segmentation [6]. Traditional UDA methods require both source and target data during training, which is often impractical in clinical practice due to data privacy concerns. A promising solution to this challenge is source-free domain adaptation (SFDA), which allows adaptation of a pre-trained source model to the target domain without needing access to source-domain data [7,8,9].
Several SFDA methods [7,10] use entropy minimization on target-domain predictions to constrain the model. However, this can result in high-confidence but incorrect predictions, especially in complex tasks [11]. Other approaches [12,13,14] synthesize source-like images to generate reliable pseudo-labels for self-supervised target model training. However, the quality of pseudo-labels depends on the source-like images, and insufficient use of source information often leads to lower-quality synthesized images, impairing performance. This is particularly problematic in abdominal organ segmentation, where organ boundaries are complex and indistinct. The loss of critical details in synthesized images, such as blurred boundaries or reduced local shape features, hinders accurate segmentation, and modality differences (e.g., CT vs. MRI) exacerbate the need for detail preservation.
In this work, we propose a novel framework for source-free cross-modality abdominal multi-organ segmentation, aiming to transfer anatomical knowledge from labeled high-quality MRI scans (source domain) to label-free low-quality CT scans (target domain) by leveraging source model weights without accessing source data. Our framework consists of two stages: pre-training on the source domain and adaptive segmentation on the target domain. During pre-training, we train a source model using labeled source data. In the second stage, we synthesize source-like images from target-domain images using a one-way image translation approach with the pre-trained source model. These synthesized images retain the same anatomical structure as the original target images, enabling segmentation predictions from the pre-trained model. To enhance segmentation accuracy, particularly for organ boundaries and small organ contours, we introduce an auxiliary translation module. This module includes an image decoder and a multi-level discriminator for the source model, which aims to (i) constrain target image features across multiple spatial scales via cross-domain adversarial learning, and (ii) ensure reconstructed source-like images match the original. The experimental results on abdominal multi-organ segmentation validate the effectiveness of our method. The main contributions are summarized as follows:
1.
We propose a novel source-free domain adaptation framework for multi-organ abdominal segmentation.
2.
We design an auxiliary translation module to enhance segmentation accuracy on synthesized images, improving the transfer of appearance information.
3.
We conduct multi-organ segmentation experiments, demonstrating the effectiveness of our SFDA framework, which outperforms state-of-the-art SFDA methods and achieves competitive results with the UDA methods.

2. Related Work

2.1. UDA for Medical Image Segmentation

The goal of unsupervised domain adaptation (UDA) is to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent UDA methods can be categorized into image-level alignment and feature-level alignment approaches. Image-level methods address domain discrepancies by transforming target images into source-like images through image synthesis. Jiang et al. [15] proposed PSIGAN, which minimizes the mismatch in the joint distribution of images and their segmentation probability maps, emphasizing the geometry and appearance of organs. Chen et al. [16] introduced an anatomical regularization approach to preserve anatomical information during image synthesis. Liu et al. [17] proposed a structurally constrained cross-modality translation method to ensure anatomical similarity between original and synthesized images. Li et al. [18] proposed a filtered pseudo-label-based framework that leverages confident pseudo-labels and structure-aware regularization to enable robust unsupervised cross-modality adaptation for 3D medical image segmentation.
In contrast, feature-level methods map source and target images into the same feature space to reduce the distance between their feature distributions. Wu et al. [19] and Chen et al. [20] addressed this by employing Variational Autoencoders (VAEs) and generative adversarial networks (GANs) for feature alignment. Qi et al. [21] proposed PnP-AdaNet, a plug-and-play network that aligns feature spaces across multiple scales. In addition to network-based methods, some approaches focus on feature handling from different perspectives. For instance, Chen et al. [22] introduced a dual adversarial attention mechanism to generate spatial and class attention maps to constrain features. Sun et al. [23] decomposed features and used an orthogonal loss function to promote their independence. Ding et al. [24] proposed C3R, integrating category contrastive adaptation, consistency regularization at feature levels, and a detached training strategy to mitigate loss conflicts.
Despite significant progress, these UDA methods assume access to source data, which poses challenges in clinical applications due to patient privacy concerns and dataset limitations. Therefore, source-free domain adaptation (SFDA) methods warrant further investigation.

2.2. SFDA for Medical Image Segmentation

Source-free domain adaptation (SFDA) eliminates the need for source-data access, instead leveraging the source model’s information during the adaptation stage to transfer knowledge to the target domain. Liang et al. [7] first proposed an SFDA solution by constraining the feature extractor through entropy minimization with a fixed classifier. Building on this, Bateson et al. [10] introduced a loss function combining Shannon entropy and Kullback–Leibler divergence to align class ratios in segmentation regions with an anatomical prior. Wu et al. [11] incorporated mean prediction-based entropy minimization in their Twice Forward Pass Supervision (TFS) to improve robust learning from pseudo-labels.
However, entropy minimization alone as supervision lacks sufficient constraints, preventing the model from generating high-confidence predictions. This leads to low-quality pseudo-labels, unsuitable for further training. Yang et al. [12] tried leveraging batch normalization (BN) layers to synthesize source-like images by Fourier transforms. However, this method struggles with complex cross-modality tasks, often failing to generate accurate content. In contrast, Hong et al. [25] used a multi-network approach to synthesize images, which works well with significant domain differences but tends to favor dominant organ classes while neglecting smaller ones as the number of segmented organs increases. Yu et al. [26] suggested using source segmenter weights as prototypes to guide target feature alignment. While this provides some knowledge transfer, it falls short in more challenging tasks.
To address these challenges, we propose a novel framework that effectively leverages source model information to generate source-like images, enabling high-precision segmentation directly on these images.

3. Methods

3.1. Overview

Our framework is depicted in Figure 1. In the SFDA setting, during the pre-training stage (Section 3.2), we pre-train the model on labeled abdominal MRI images from the source domain, { x i s , y i s } i = 1 N s . In the adaptive segmentation stage (Section 3.3), we work with unlabeled abdominal CT images from the target domain, { x i t } i = 1 N t , utilizing the pre-trained source model. To transfer knowledge from the source domain, we propose synthesizing source-like images x t s through a one-way image translation approach that leverages a contrastive learning strategy (Section 3.3.1). The weights of the pre-trained source model serve as source prototypes, which are aligned with the target image features to accurately assign organ information to the corresponding class. Furthermore, we integrate an auxiliary translation module into our framework to compensate for information loss during the translation process, thus improving the accuracy of organ segmentation (Section 3.3.2). Finally, the segmentation of target images is achieved by applying the pre-trained source model to segment the synthesized source-like images, i.e., S ( E s ( x t s ) ) . The details of the network architecture and training procedures are provided in Section 3.4 and Section 3.5, respectively. The symbols used in the following sections are summarized in Table 1.

3.2. Pre-Training on Source Domain

During the pre-training stage, we train an encoder E s and a decoder G s to encode and decode source-domain images, respectively. The reconstruction loss is defined as
L s r e c = x s x ^ s 1
Additionally, a multi-level discriminator D is trained in an adversarial manner to learn the distribution of source features across different scales. The discrimination and adversarial loss functions are formulated as
L d i s = 1 M m = 1 M D m ( x s ) 1 2 2 + D m ( x ^ s ) 2 2 ,
L s a d v = 1 M m = 1 M D m ( x ^ s ) 1 2 2 ,
where m denotes the m-th layer of the multi-level discriminator, and x ^ s is the output of the generative model, i.e., x ^ s = G s ( E s ( x s ) ) . Finally, a segmenter S is trained to generate predictions using the encoded features:
L s e g = C E ( S ( f s ) , y s )
where C E represents the cross-entropy loss function, and f s denotes the source features, i.e., f s = E s ( x s ) .

3.3. Domain-Adaptive Segmentation

3.3.1. Source-Free One-Way Image Translation

In the domain-adaptive segmentation stage, our goal is to generate source-like images from target-domain images using a one-way image translation module. Specifically, we begin by training a target encoder E t and decoder G t to encode and decode target-domain images, employing the following reconstruction loss:
L r e c = x t x ^ t 1
Additionally, the encoder E t is designed to map a target-domain image x t to features f t that can be interpreted by the pre-trained source decoder G s , such that x t s = G s ( E t ( x t ) ) . To achieve this, we introduce a contrastive learning strategy to constrain E t at the feature level. Since x t s should resemble the source-domain appearance, we use the pre-trained source encoder E s to extract features for comparison. Given that shallow features capture more appearance details, we optimize the outputs of the first L layers of both encoders using PatchNCE loss [27], encouraging E t to maximize the mutual information between the target features f t and the features of the source-like image f t s . This ensures that the organ structures in x t are preserved while its appearance is aligned with x s .
L P a t c h N C E = l = 1 L s = 1 S l L p ^ l s , p l s , p l S l s
where p l and p ^ l represent the outputs from the l-th selected layer of E t and E s , respectively. s { 1 , , S l } denotes the spatial locations in each selected layer, and L ( ) is the cross-entropy loss, defined as
L ( v , v + , v ) = log exp v · v + / τ exp v · v + / τ + n = 1 N exp v · v n / τ
where τ is a temperature parameter that scales the distances between the query sample and other samples.
We also use the segmenter weights S as source prototypes, which represent the features of each organ (class) in the source domain. Aligning the target features f t with these source prototypes improves the feature representation capability of E t . Following Yu et al. [26], for a mini-batch of B target images { x b t } b = 1 B , we compute a point-to-point cost from their features f b , i t to the source prototypes μ c { μ 1 , , μ C } as follows:
L p r o t o = 1 B × H × W b = 1 B i = 1 H × W c = 1 C d μ c , f b , i t π θ μ c | f b , i t
where d ( , ) represents the cosine distance, and π θ ( μ c | f b , i t ) denotes the probability of transitioning from f b , i t to μ c , calculated as
π θ μ c | f b , i t = p ^ μ c e x p μ c T · f b , i t / ω c = 1 C p ^ μ c e x p μ c T · f b , i t / ω
where ω is the temperature parameter, and p ^ ( μ c ) is the prior distribution over the C classes for the target domain. For further details on these two loss functions, please refer to [26,27].

3.3.2. Auxiliary Translation Module

The proposed one-way image translation module translates a target-domain image into a source-like image that approximates the source appearance. However, this method primarily focuses on shallow features, often neglecting critical details such as contours and shapes. To address this, we introduce an auxiliary translation module, as shown in Figure 1c, which integrates a pre-trained multi-level discriminator D and incorporates the decoder G s . Given that the multi-level discriminator has been trained on various multi-scale features in the source domain, we leverage its feature knowledge to enhance target domain training:
L a d v = 1 M m = 1 M D m ( x t s ) 1 2 2
This cross-modality discrimination approach imposes a comprehensive constraint on features, facilitating a more accurate perception of organ boundaries during segmentation.
Since the predicted results are derived from source-like images, we aim for the target features f t to align with the features extracted by the source encoder from these source-like images, i.e., E t ( x t ) = E s ( x t s ) . However, directly enforcing equality between the outputs of the two encoders would impose a heavy learning burden on the network, potentially leading to instability during training. Since G s is optimized by reconstruction loss during source training, it should hold for source-like images that
G s ( E s ( x t s ) ) = G s ( f t s ) = x t s
Therefore, we use G s within the module to synthesize fake source-like images x ^ t s , achieving image-level consistency:
L i d = x t s x ^ t s 1
By leveraging G s to refine the features, we ensure that x y x does not include irrelevant information.
Finally, we define the total loss function as the weighted sum of the individual loss functions:
L t o t a l = L r e c + λ n c e L P a t c h N C E + λ p r o t o L p r o t o + λ a d v L a d v + λ i d L i d
where λ are hyperparameters that control the relative importance of each loss function.

3.4. Network Architecture

The encoders and decoders in both domains share the same network architecture. The encoder consists of a stride-1 convolutional layer followed by three encoding blocks that progressively downsample the feature maps. Each encoding block includes a stride-2 convolutional layer and two residual blocks [28] to enhance the network’s nonlinear modeling capability. After each convolutional layer, group normalization [29] and Swish [30] activations are applied. Similarly, the decoders consists of a stride-1 transposed convolutional layer and three decoding blocks that progressively upsample the feature maps. Each decoding block includes a stride-2 transposed convolutional layer and two residual blocks, applying the same normalization and activation functions as the encoder. Notably, both the encoder output and decoder input maintain consistent feature dimensions of 512 throughout the architecture. Since both the segmenter and decoders decode the encoder’s outputs, the segmenter’s architecture mirrors that of the decoder. The discriminator starts with a stride-1 convolutional layer, followed by four downsampling blocks. Each block includes a stride-2 convolutional layer and a residual block, with group normalization and Leaky ReLU activations [31]. At each scale, a 1 × 1 convolutional layer produces a low-resolution output, and a final fully connected layer generates the overall output, enabling multi-scale feature discrimination.

3.5. Training Detail

First, we pre-trained the source model using the AdamW optimizer [32] for 100 epochs with a learning rate of 10 5 and a batch size of 1. Subsequently, we trained the target model for 20 epochs, also using the AdamW optimizer, with the same learning rate of 10 5 and a batch size of 1. Both the source and target domains were trained on 2D slices that randomly extracted from 3D volumes. We set the hyperparameters as follows: λ n c e = 10 , λ p r o t o = 10 , λ a d v = 1 , and λ i d = 1 . Additionally, following previous work [27,33], we set τ = 0.7 and ω = 1 .

4. Experiments

4.1. Experimental Settings

We evaluated the proposed framework on the AMOS2022 [34] dataset, which addresses the challenge of cross-modality abdominal multi-organ segmentation. We used all available data, which comprises 300 CT scans and 60 MRI scans, randomly divided it into training, validation, and test sets, as detailed in Table 2.
During data preprocessing, we rigidly registered MRI and CT scans to their respective templates (randomly selected from the training set with standardized spacing of 1 mm × 1 mm × 1 mm) using the NiftyReg tool (available at http://cmictig.cs.ucl.ac.uk/wiki/index.php/NiftyReg, accessed on 14 December 2023). The aligned images were cropped to a size of 216 × 240 × 320 , and the intensities were linearly scaled to [ 1 , 1 ] . Bias field correction was applied to the MRI images before scaling.
The goal was to segment 13 abdominal organs, including the spleen, kidneys, and liver, using a model trained on a source domain and applied to a target domain. First, we pre-trained the model on the annotated MRI training set; as the MRI dataset is smaller than the CT dataset, we applied data augmentation by spatial transformations such as random cropping, rotation, and translation. We saved the checkpoint with the best validation performance for subsequent adaptation. Then, we performed adaptive segmentation on the label-free CT training set with the pre-trained model, and the checkpoint achieving the best performance on the validation set was selected for testing. Finally, the model performance was evaluated on the test set. Notably, for the pre-trained model we performed fine-tuning only on the validation set. All experiments were performed on an NVIDIA GeForce RTX4090 GPU (NVIDIA, Santa Clara, CA, USA) with 24 GB of memory in order to ensure fairness.
To demonstrate the superiority of our method, we compared it with both UDA and SFDA approaches. The UDA methods included FPL [18], DADASeg [22], and C3R [24], while the SFDA methods included AOS [25], Proto [26], and UPL [11]. These methods were implemented using the official code provided. We evaluated performance using two primary metrics, the Dice similarity coefficient (Dice) and average symmetric surface distance (ASSD), to quantitatively assess the segmentation results. Dice measures the overlap between the predicted and ground truth segmentations and ASSD measures the average distance between the surfaces of the predicted and ground truth segmentations.
D S C ( A , B ) = 2 | A B | | A | + | B |
A S S D ( A , B ) = 1 | A | + | B | x A d ( x , B ) + y B d ( y , A )
where A and B denote the predicted and ground truth segmentations, respectively, and A and B denote their corresponding boundary point sets, d ( x , B ) = min b B x b 2 .

4.2. Quantitative and Qualitative Results

The quantitative performance results are presented in Table 3 and Table 4. Our method significantly outperforms others across all metrics. Additionally, we conducted hypothesis testing on each segmented organ, calculating p-values for the segmentation metrics of our method versus those of the other methods. p-values < 0.05 and <0.01 are marked with * and **, respectively, indicating substantial improvements. For certain SFDA methods (UPL and AOS), due to domain gaps and challenging-to-segment organs, the pre-trained model fails to produce reliable results in the target domain. As a result, methods relying on self-supervised pseudo-labeling and entropy minimization for optimization perform poorly, making their results unsuitable for reference.
The representative segmentation results are shown in Figure 2, where each column displays the ground truth, the proposed method, and results from other methods such as FPL [18], DADAseg [22], C3R [24], and Proto [26]. The results for UPL [11] and AOS [25] are omitted due to their poor performance. Our method effectively segments the contours and locations of each organ, especially larger ones like the stomach, while other methods exhibit varying degrees of error. This is due to the one-way image translation process, which preserves organ structure and transfers appearance information, allowing the segmenter to better interpret source-like images. For smaller organs, such as the gall bladder and esophagus, our method’s results are closer to the ground truth thanks to the proposed auxiliary translation module, which compensates for information loss caused by the domain gap and improves segmentation accuracy.
To better illustrate the effect of the one-way image translation process, we visualize the synthesized source-like images in Figure 3. As shown, there exists a clear appearance gap between the original target image and synthetic source-like images. Although the source-like images do not fully replicate the visual characteristics of the source modality, they effectively preserve the anatomical structures of the organs to be segmented; this is highly beneficial for the segmentation task.

4.3. Model Analysis

4.3.1. Ablation Study

We conducted an ablation study to assess the impact of different loss functions. Specifically, we sequentially trained several variants of the proposed method without L P a t c h N C E , L p r o t o , L a d v , and L i d . The results, shown in Table 5, clearly indicate that L P a t c h N C E is crucial, as it helps generate source-like images through one-way image translation. Additionally, L a d v is vital for model performance, highlighting the importance of compensating for multi-scale detail information to improve segmentation accuracy.

4.3.2. Impact of One-Way Image Translation

Since L P a t c h N C E has the most significant impact on model performance, we explore its effect under different parameters. Specifically, L P a t c h N C E is a patch-based approach, where the number of patches determines how much information the model can learn. We trained variants with different patch counts and also examined the impact of selecting the encoder output layer on segmentation performance. The quantitative results are shown in Table 6 and Table 7 (with layer output labels in Figure 4a).
On the AMOS dataset, the best performance is achieved with a patch size of 64. An insufficient number of patches fails to capture all organ features, while too many patches introduce unnecessary information that hinders learning. The model benefits from attending to more encoder output layers, as this enables learning from more diverse knowledge levels. However, since the encoder’s final-layer output is constrained by other loss functions, the performance of V 4 is lower than that of V 3 .

4.3.3. Impact of Auxiliary Translation Module

The multi-level discriminator in the auxiliary translation module allows the model to focus on source-like image features at different resolutions, aligning segmented organ information with the source domain. In L a d v , too few scales may provide insufficient supervision, while too many can increase training complexity. Therefore, we trained variants with different layer combinations, with the results shown in Table 8 (output labels in Figure 4b). The segmentation performance improves progressively as more discriminator layers are added. Since the output shape of D 0 is H × W , removing it can reduce model complexity. Additionally, compared to methods like ARLGAN [16] (e.g., V 11 results), using multi-level discriminators yields superior performance in this task.

4.3.4. Model Convergence Analysis

To ensure adequate model convergence, we monitored the variation of key loss functions during both the pre-training stage, i.e., L s e g , and the adaptive segmentation stage, i.e., L P a t c h N C E , L p r o t o , L a d v , and L i d . As illustrated in Figure 5, these losses exhibits a sharp decline within the initial epochs, followed by a gradual plateauing trend. Based on this observation, we empirically set 100 epochs for pre-training and 20 for adaptive segmentation, which we found sufficient for model convergence.

5. Conclusions and Discussion

In cross-modality abdominal image segmentation, the inability to access both source- and target-domain images simultaneously is a key limitation, especially for complex multi-organ tasks. To address this, we propose an SFDA framework that consists of a pre-training stage in the source domain and a domain-adaptive segmentation stage in the target domain. Our approach synthesizes source-like images, interpretable by the pre-trained source model, using a one-way image translation method. Segmentation results are then obtained by segmenting these synthesized images. To further align organ appearance with the source domain, we introduce a prototype-based approach to align target features with source features. Additionally, an auxiliary translation module is designed to compensate for missing details during translation. The experimental results show that our framework significantly improves segmentation performance compared to other methods. In future work, we plan to extend the applicability of our framework to a wider range of clinical scenarios, and to generalize it beyond abdominal organs to other anatomical regions.

Author Contributions

Conceptualization, X.C. and X.Z.; methodology, X.C.; software, X.Z.; validation, X.Z., Y.W., D.L. and Y.H.; formal analysis, X.C.; investigation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, X.C.; project administration, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by NSFC grant 62276105, Natural Science Foundation of Xiamen, China (3502Z20227193), and Natural Science Foundation of Fujian Province (2023J01136).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The AMOS dataset can be accessed publicly at https://amos22.grand-challenge.org (accessed on 14 December 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
  2. Alirr, O.I.; Rahni, A.A.A. Survey on liver tumour resection planning system: Steps, techniques, and parameters. J. Digit. Imaging 2020, 33, 304–323. [Google Scholar] [CrossRef] [PubMed]
  3. Chen, X.; Pang, Y.; Yap, P.T.; Lian, J. Multi-scale anatomical regularization for domain-adaptive segmentation of pelvic CBCT images. Med. Phys. 2024, 51, 8804–8813. [Google Scholar] [CrossRef]
  4. Dou, Q.; Ouyang, C.; Chen, C.; Chen, H.; Heng, P.A. Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss. arXiv 2018, arXiv:1804.10916. [Google Scholar]
  5. Xian, J.; Li, X.L.; Tu, D.; Zhu, S.; Zhang, C.; Liu, X.; Li, X.; Yang, X. Unsupervised Cross-Modality Adaptation via Dual Structural-Oriented Guidance for 3D Medical Image Segmentation. IEEE Trans. Med. Imaging 2023, 42, 1774–1785. [Google Scholar] [CrossRef]
  6. Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on International Conference on Machine Learning ICML’15, Lille, France, 6–11 July 2015; Volume 37, pp. 1180–1189. [Google Scholar] [CrossRef]
  7. Liang, J.; Hu, D.; Feng, J. Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
  8. Liu, Y.; Zhang, W.; Wang, J. Source-Free Domain Adaptation for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1215–1224. [Google Scholar] [CrossRef]
  9. Stan, S.; Rostami, M. Unsupervised model adaptation for source-free segmentation of medical images. Med. Image Anal. 2024, 95, 103179. [Google Scholar] [CrossRef]
  10. Bateson, M.; Kervadec, H.; Dolz, J.; Lombaert, H.; Ben Ayed, I. Source-free domain adaptation for image segmentation. Med. Image Anal. 2022, 82, 102617. [Google Scholar] [CrossRef]
  11. Wu, J.; Wang, G.; Gu, R.; Lu, T.; Chen, Y.; Zhu, W.; Vercauteren, T.; Ourselin, S.; Zhang, S. UPL-SFDA: Uncertainty-Aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation. IEEE Trans. Med. Imaging 2023, 42, 3932–3943. [Google Scholar] [CrossRef]
  12. Yang, C.; Guo, X.; Chen, Z.; Yuan, Y. Source free domain adaptation for medical image segmentation with fourier style mining. Med. Image Anal. 2022, 79, 102457. [Google Scholar] [CrossRef]
  13. Chen, C.; Liu, Q.; Jin, Y.; Dou, Q.; Heng, P.A. Source-Free Domain Adaptive Fundus Image Segmentation with Denoised Pseudo-Labeling. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI, Strasbourg, France, 27 September–1 October 2021; pp. 225–235. [Google Scholar] [CrossRef]
  14. Huai, Z.; Ding, X.; Li, Y.; Li, X. Context-Aware Pseudo-label Refinement for Source-Free Domain Adaptive Fundus Image Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI, Vancouver, BC, Canada, 8–12 October 2023; pp. 618–628. [Google Scholar] [CrossRef]
  15. Jiang, J.; Hu, Y.C.; Tyagi, N.; Rimner, A.; Lee, N.; Deasy, J.O.; Berry, S.; Veeraraghavan, H. PSIGAN: Joint Probabilistic Segmentation and Image Distribution Matching for Unpaired Cross-Modality Adaptation-Based MRI Segmentation. IEEE Trans. Med. Imaging 2020, 39, 4071–4084. [Google Scholar] [CrossRef]
  16. Chen, X.; Lian, C.; Wang, L.; Deng, H.; Kuang, T.; Fung, S.; Gateno, J.; Yap, P.T.; Xia, J.J.; Shen, D. Anatomy-Regularized Representation Learning for Cross-Modality Medical Image Segmentation. IEEE Trans. Med. Imaging 2021, 40, 274–285. [Google Scholar] [CrossRef] [PubMed]
  17. Liu, H.; Zhuang, Y.; Song, E.; Xu, X.; Hung, C.C. A bidirectional multilayer contrastive adaptation network with anatomical structure preservation for unpaired cross-modality medical image segmentation. Comput. Biol. Med. 2022, 149, 105964. [Google Scholar] [CrossRef] [PubMed]
  18. Wu, J.; Guo, D.; Wang, G.; Yue, Q.; Yu, H.; Li, K.; Zhang, S. FPL+: Filtered Pseudo Label-Based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 3098–3109. [Google Scholar] [CrossRef] [PubMed]
  19. Wu, F.; Zhuang, X. Unsupervised Domain Adaptation With Variational Approximation for Cardiac Segmentation. IEEE Trans. Med. Imaging 2021, 40, 3555–3567. [Google Scholar] [CrossRef]
  20. Chen, C.; Dou, Q.; Chen, H.; Qin, J.; Heng, P.A. Synergistic Image and Feature Adaptation: Towards Cross-Modality Domain Adaptation for Medical Image Segmentation. Proc. AAAI Conf. Artif. Intell. 2019, 33, 865–872. [Google Scholar] [CrossRef]
  21. Dou, Q.; Ouyang, C.; Chen, C.; Chen, H.; Glocker, B.; Zhuang, X.; Heng, P.A. PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network at Unpaired Cross-Modality Cardiac Segmentation. IEEE Access 2019, 7, 99065–99076. [Google Scholar] [CrossRef]
  22. Chen, X.; Kuang, T.; Deng, H.; Fung, S.H.; Gateno, J.; Xia, J.J.; Yap, P.T. Dual Adversarial Attention Mechanism for Unsupervised Domain Adaptive Medical Image Segmentation. IEEE Trans. Med. Imaging 2022, 41, 3445–3453. [Google Scholar] [CrossRef]
  23. Sun, Y.; Dai, D.; Xu, S. Rethinking adversarial domain adaptation: Orthogonal decomposition for unsupervised domain adaptation in medical image segmentation. Med. Image Anal. 2022, 82, 102623. [Google Scholar] [CrossRef]
  24. Ding, S.; Liu, Z.; Liu, P.; Zhu, W.; Xu, H.; Li, Z.; Niu, H.; Cheng, J.; Liu, T. C3R: Category contrastive adaptation and consistency regularization for cross-modality medical image segmentation. Expert Syst. Appl. 2025, 269, 126304. [Google Scholar] [CrossRef]
  25. Hong, J.; Zhang, Y.D.; Chen, W. Source-free unsupervised domain adaptation for cross-modality abdominal multi-organ segmentation. Knowl.-Based Syst. 2022, 250, 109155. [Google Scholar] [CrossRef]
  26. Yu, Q.; Xi, N.; Yuan, J.; Zhou, Z.; Dang, K.; Ding, X. Source-Free Domain Adaptation for Medical Image Segmentation via Prototype-Anchored Feature Alignment and Contrastive Learning. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI, Vancouver, BC, Canada, 8–12 October 2023; pp. 3–12. [Google Scholar] [CrossRef]
  27. Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive Learning for Unpaired Image-to-Image Translation. In Proceedings of the Computer Vision—ECCV, Glasgow, UK, 23–28 August 2020; pp. 319–345. [Google Scholar] [CrossRef]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the Computer Vision—ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar] [CrossRef]
  29. Wu, Y.; He, K. Group Normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  30. Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. [Google Scholar]
  31. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; Volume 30, p. 3. [Google Scholar]
  32. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  33. Tanwisuth, K.; Fan, X.; Zheng, H.; Zhang, S.; Zhang, H.; Chen, B.; Zhou, M. A Prototype-Oriented Framework for Unsupervised Domain Adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 17194–17208. [Google Scholar]
  34. Ji, Y.; Bai, H.; Yang, J.; Ge, C.; Zhu, Y.; Zhang, R.; Li, Z.; Zhang, L.; Ma, W.; Wan, X.; et al. AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. arXiv 2022, arXiv:2206.08023. [Google Scholar]
Figure 1. The proposed SFDA framework. (a) Pre-training on the source domain with labeled data; (b) adaptive segmentation on the target domain using source-like images from one-way image translation; (c) auxiliary translation module for enhanced appearance information transfer.
Figure 1. The proposed SFDA framework. (a) Pre-training on the source domain with labeled data; (b) adaptive segmentation on the target domain using source-like images from one-way image translation; (c) auxiliary translation module for enhanced appearance information transfer.
Information 16 00460 g001
Figure 2. Visual results of abdominal multi-organ segmentation.
Figure 2. Visual results of abdominal multi-organ segmentation.
Information 16 00460 g002
Figure 3. Visual results of synthetic source-like images. (a) Synthetic source-like image; (b) real target image (CT).
Figure 3. Visual results of synthetic source-like images. (a) Synthetic source-like image; (b) real target image (CT).
Information 16 00460 g003
Figure 4. Visual results of abdominal multi-organ segmentation.
Figure 4. Visual results of abdominal multi-organ segmentation.
Information 16 00460 g004
Figure 5. The variations in different key losses. (a) Key losses in pre-training stage; (b) key losses in adaptive segmentation stage.
Figure 5. The variations in different key losses. (a) Key losses in pre-training stage; (b) key losses in adaptive segmentation stage.
Information 16 00460 g005
Table 1. Summary of symbols.
Table 1. Summary of symbols.
SymbolMeaning
x s , x t Image sets of source and target modalities
y s Annotation set corresponding to x s
fEncoder-extracted image features
E s , E t Source and target encoders
G s , G t Source and target generators
DImage discriminator
SSegmenter
Table 2. Partition of dataset.
Table 2. Partition of dataset.
ModalityTraining SetValidation SetTest Set
Source domain (MRI)4020
Target domain (CT)2002080
Table 3. Dice (%) of different methods on the AMOS dataset for abdominal multi-organ segmentation. The best results are highlighted in bold and the arrow (↑) indicates that higher values represent better performance. ** indicates p-values < 0.01.
Table 3. Dice (%) of different methods on the AMOS dataset for abdominal multi-organ segmentation. The best results are highlighted in bold and the arrow (↑) indicates that higher values represent better performance. ** indicates p-values < 0.01.
Dice (%) ↑FPL [18]DADAseg [22]C3R [24]AOS [25]Proto [26]UPL [11]Ours
Spleen61.09 ± 25.16 **85.30 ± 11.8770.22 ± 18.72 **2.46 ± 1.06 **60.55 ± 17.82 **13.18 ± 6.44 **86.86 ± 10.62
Right kidney74.59 ± 18.45 **72.15 ± 13.09 **77.71 ± 13.171.40 ± 0.39 **56.25 ± 25.67 **-81.85 ± 22.41
Left kidney72.72 ± 24.32 **82.82 ± 16.1875.84 ± 12.841.04 ± 0.33 **59.35 ± 23.36 **0.02 ± 0.10 **79.22 ± 22.47
Gall bladder35.44 ± 28.87 **31.65 ± 18.63 **29.81 ± 23.18 **0.34 ± 0.31 **15.53 ± 14.29 **-50.25 ± 31.20
Esophagus25.74 ± 21.23 **45.20 ± 24.27 **49.02 ± 21.66 **0.13 ± 0.09 **26.94 ± 17.40 **-57.70 ± 24.36
Liver80.89 ± 9.80 **92.44 ± 5.5782.58 ± 10.58 **6.21 ± 0.64 **77.00 ± 10.82 **42.23 ± 9.70 **88.58 ± 8.98
Stomach34.82 ± 23.93 **53.98 ± 28.90 **58.20 ± 21.352.64 ± 1.18 **39.63 ± 15.97 **6.86 ± 6.11 **58.96 ± 22.47
Aorta62.84 ± 22.91 **86.45 ± 8.9278.12 ± 8.26 **1.12 ± 0.45 **68.76 ± 22.85 **8.63 ± 4.09 **82.59 ± 14.18
Postcava56.03 ± 14.94 **75.12 ± 11.6064.46 ± 12.211.00 ± 0.28 **46.40 ± 17.26 **-63.28 ± 16.34
Pancreas45.20 ± 21.44 **59.26 ± 18.7357.50 ± 15.670.80 ± 0.23 **30.94 ± 19.05 **-50.17 ± 19.20
Right adrenal gland36.97 ± 18.6729.41 ± 15.86 **31.67 ± 10.35 **0.04 ± 0.02 **17.02 ± 14.99 **-40.35 ± 21.08
Left adrenal gland33.24 ± 22.1019.94 ± 17.85 **17.11 ± 12.59 **0.05 ± 0.02 **16.55 ± 13.79 **-34.13 ± 24.89
Duodenum26.89 ± 16.14 **44.31 ± 19.6940.60 ± 14.53 **0.60 ± 0.23 **17.92 ± 11.81 **-45.17 ± 18.97
Avg49.73 ± 13.34 **59.85 ± 8.91 **56.37 ± 9.06 **1.37 ± 0.19 **40.99 ± 12.61 **5.46 ± 1.14 **63.01 ± 12.61
Table 4. ASSD (mm) of different methods on the AMOS dataset for abdominal multi-organ segmentation. The best results are highlighted in bold and the arrow (↓) indicates that lower values represent better performance. * and ** indicate p-value < 0.05 and p-value < 0.01, respectively.
Table 4. ASSD (mm) of different methods on the AMOS dataset for abdominal multi-organ segmentation. The best results are highlighted in bold and the arrow (↓) indicates that lower values represent better performance. * and ** indicate p-value < 0.05 and p-value < 0.01, respectively.
ASSD (mm) ↓FPL [18]DADAseg [22]C3R [24]AOS [25]Proto [26]UPL [11]Ours
Spleen14.35 ± 12.73 **6.07 ± 2.62 **20.95 ± 16.92 **54.09 ± 4.25 **12.96 ± 5.65 **35.72 ± 5.67 **4.78 ± 5.21
Right kidney13.36 ± 10.07 **26.66 ± 5.79 **5.40 ± 3.39 **54.58 ± 8.32 **17.06 ± 14.54 **-3.72 ± 4.09
Left kidney6.96 ± 7.777.41 ± 3.755.57 ± 4.0863.10 ± 5.87 **11.37 ± 9.82 **70.36 ± 9.43 **6.23 ± 6.82
Gall bladder15.13 ± 23.1928.09 ± 7.01 **27.36 ± 17.96 **54.09 ± 18.64 **30.55 ± 17.37 **-8.47 ± 19.83
Esophagus6.51 ± 5.996.96 ± 5.74 *3.98 ± 3.4063.33 ± 5.06 **30.76 ± 14.09 **-5.09 ± 5.15
Liver12.27 ± 5.86 **3.51 ± 2.9010.41 ± 7.57 **31.36 ± 3.80 **11.16 ± 7.10 **26.41 ± 5.53 **6.47 ± 4.73
Stomach30.64 ± 16.10 **8.31 ± 6.9815.44 ± 10.45 **41.02 ± 7.04 **15.37 ± 6.13 **45.33 ± 10.31 **10.12 ± 4.62
Aorta5.30 ± 5.003.31 ± 1.253.12 ± 2.8846.31 ± 1.98 **5.58 ± 6.42 *24.56 ± 3.45 **4.24 ± 3.64
Postcava5.32 ± 3.473.53 ± 1.863.99 ± 3.0347.16 ± 1.12 **9.58 ± 4.33 **-5.87 ± 3.16
Pancreas8.86 ± 7.257.19 ± 6.356.26 ± 4.3445.05 ± 4.94 **15.01 ± 8.97 **-11.67 ± 5.19
Right adrenal gland4.92 ± 5.2212.10 ± 4.79 *9.23 ± 7.4961.88 ± 2.67 **17.63 ± 8.23 **-7.45 ± 19.90
Left adrenal gland7.45 ± 10.9114.83 ± 7.31 **15.34 ± 7.44 **61.29 ± 3.16 **40.38 ± 14.97 **-6.49 ± 8.07
Duodenum12.85 ± 11.08 *6.76 ± 7.138.69 ± 6.1149.04 ± 8.67 **20.02 ± 9.23 **-9.22 ± 7.65
Avg11.23 ± 4.87 **10.35 ± 2.22 **10.44 ± 4.25 **51.71 ± 2.90 **18.26 ± 6.39 **39.62 ± 4.71 **6.94 ± 4.12
Table 5. Ablation study. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
Table 5. Ablation study. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
VariantDice (%) ↑ASSD (mm) ↓
Without L P a t c h N C E 28.14 ± 11.2220.33 ± 9.19
Without L p r o t o 61.55 ± 13.958.11 ± 5.87
Without L a d v 56.94 ± 15.189.04 ± 5.87
Without L i d 61.38 ± 13.877.67 ± 5.39
Ours63.01 ± 12.616.94 ± 4.12
Table 6. Segmentation performance with different patch numbers. The best results are highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
Table 6. Segmentation performance with different patch numbers. The best results are highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
VariantDice (%) ↑ASSD (mm) ↓
16 patches38.57 ± 15.4713.25 ± 6.67
32 patches61.87 ± 12.827.00 ± 4.39
64 patches63.01 ± 12.616.94 ± 4.12
128 patches59.56 ± 14.678.81 ± 5.70
256 patches55.63 ± 16.1310.09 ± 6.80
Table 7. Segmentation performance of outputs from different encoder layers. The best results are highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
Table 7. Segmentation performance of outputs from different encoder layers. The best results are highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
VariantDice (%) ↑ASSD (mm) ↓
V 1 ( E 0 ) 59.42 ± 14.108.41 ± 5.25
V 2 ( E 0 , E 1 ) 59.94 ± 14.328.58 ± 5.57
V 3 ( E 0 , E 1 , E 2 ) 63.01 ± 12.616.94 ± 4.12
V 4 ( E 0 , E 1 , E 2 , E 3 ) 61.72 ± 13.937.51 ± 5.44
Table 8. Segmentation performance of outputs from different discriminator layers, with the best results highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
Table 8. Segmentation performance of outputs from different discriminator layers, with the best results highlighted in bold. The arrow (↑) indicates that higher values represent better performance and the arrow (↓) indicates that lower values represent better performance.
VariantDice (%) ↑ASSD (mm) ↓
V 5 ( D 0 ) 24.52 ± 15.3327.89 ± 13.84
V 6 ( D 0 , D 1 ) 41.70 ± 17.7116.30 ± 9.68
V 7 ( D 0 , D 1 , D 2 ) 54.00 ± 16.8410.46 ± 6.88
V 8 ( D 0 , D 1 , D 2 , D 3 ) 58.94 ± 14.898.47 ± 5.73
V 9 ( D 0 , D 1 , D 2 , D 3 , D 4 ) 59.94 ± 14.498.38 ± 5.77
V 10 ( D 1 , D 2 , D 3 , D 4 ) 63.01 ± 12.616.94 ± 4.12
V 11 ( D 2 , D 3 , D 4 ) 60.26 ± 14.118.37 ± 5.74
V 12 ( D 3 , D 4 ) 61.20 ± 14.047.76 ± 5.80
V 13 ( D 4 ) 57.11 ± 15.249.72 ± 6.50
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Chen, X.; Wang, Y.; Liu, D.; Hong, Y. Source-Free Domain Adaptation for Cross-Modality Abdominal Multi-Organ Segmentation Challenges. Information 2025, 16, 460. https://doi.org/10.3390/info16060460

AMA Style

Zhang X, Chen X, Wang Y, Liu D, Hong Y. Source-Free Domain Adaptation for Cross-Modality Abdominal Multi-Organ Segmentation Challenges. Information. 2025; 16(6):460. https://doi.org/10.3390/info16060460

Chicago/Turabian Style

Zhang, Xiyu, Xu Chen, Yang Wang, Dongliang Liu, and Yifeng Hong. 2025. "Source-Free Domain Adaptation for Cross-Modality Abdominal Multi-Organ Segmentation Challenges" Information 16, no. 6: 460. https://doi.org/10.3390/info16060460

APA Style

Zhang, X., Chen, X., Wang, Y., Liu, D., & Hong, Y. (2025). Source-Free Domain Adaptation for Cross-Modality Abdominal Multi-Organ Segmentation Challenges. Information, 16(6), 460. https://doi.org/10.3390/info16060460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop