Source-Free Domain Adaptation for Medical Image Segmentation via Mutual Information Maximization and Prediction Bank

Hongzhen Wu; Yue Zhou; Xiaoqiang Li

doi:10.3390/electronics14183656

,

and

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(18), 3656;https://doi.org/10.3390/electronics14183656

Version Notes

Order Reprints

Abstract

Medical image segmentation faces significant challenges due to domain shift between different clinical centers and data privacy restrictions. Current source-free domain adaptation methods for medical images suffer from three critical limitations, including unstable training caused by noisy pseudo-labels and poor handling of foreground-background imbalance where critical structures like optic cup occupy extremely small regions. Additionally, strict privacy regulations often prevent access to source domain data during adaptation. To address these limitations, this paper proposes a source-free domain adaptation approach based on mutual information optimization for fundus image segmentation. The method incorporates a teacher–student network to ensure training stability and a mutual information maximization algorithm to reduce pseudo-label noise naturally. Furthermore, a prediction bank is constructed to handle class imbalance by leveraging complete statistics. Experimental results on fundus segmentation datasets demonstrate better performance, achieving 91.74% average Dice coefficient on Drishti-GS and 87.80% on RIM-ONE-r datasets, outperforming current methods. This work provides a practical solution for cross-institutional medical image analysis while preserving data privacy, with significant potential for eye disease diagnosis and other medical applications requiring robust domain adaptation.

Keywords:

medical image segmentation; domain adaptation; mutual information; teacher–student network

1. Introduction

Medical image segmentation represents a fundamental cornerstone in computer-aided diagnosis systems, where accurate depiction of organic structures directly assists clinical decision-making and patient outcomes []. Traditional segmentation models demonstrate exceptional performance when sufficient well-annotated training data is available, yet the high cost of annotation creates significant bottlenecks for model development across diverse clinical scenarios []. To address this challenge, unsupervised domain adaptation (UDA) methods have emerged as a solution by training adaptive models that integrate unlabeled target domain images with labeled source domain images. They effectively alleviate the cost of data annotation to some extent. Despite the remarkable performance achieved by UDA approaches [,], fundamental limitations persist in medical image segmentation applications. The high sensitivity of medical images concerning privacy and copyright protection poses substantial obstacles, as labeled source domain images are often subject to strict institutional restrictions and may be completely unavailable during the adaptation process.

This ubiquitous practical constraint, commonly encountered in real-world clinical deployments, has driven extensive research into source-free domain adaptation (SFDA) methods [,]. These approaches aim to adapt a source-trained model to the target domain using exclusively unlabeled target data, providing a more adaptable and feasible solution for cross-institutional medical image segmentation development while respecting privacy regulations and data sharing limitations. Since training on both the source and target domains simultaneously is not possible, effective denoising plays a key role in this task. Experiments [] rely on model-generated pseudo-labels, creating a delicate balance between leveraging confident predictions and avoiding error propagation. Advances have introduced various denoising mechanisms to enhance pseudo-label quality, including pixel-level filtering, class-aware thresholding, and uncertainty-based selection criteria.

Despite significant progress in source-free adaptation, three critical limitations persist that particularly affect medical image segmentation performance. The primary challenge involves employing a posterior-like approach for denoising the results, which may lead to error accumulation and performance degradation. Additionally, the absence of ground-truth supervision creates training instability, as models may converge to suboptimal solutions when relying exclusively on self-generated feedback. Most critically for medical applications, existing methods inadequately address the class imbalance characteristic of anatomical structures, where clinically significant regions such as optic cups occupy minimal image areas compared to background tissue. This may lead to biased learning dynamics that favor dominant classes at the expense of diagnostic precision.

To address the aforementioned issues, we propose Mutual Information Maximization and Prediction Bank (MIMPB), a source-free domain adaptation approach for medical image segmentation. The algorithm formulates an optimization problem consisting of two key modules. First, the mutual information maximization module minimizes the relative entropy between pseudo-labels and model output probabilities, effectively reducing the discrepancy between distributions and naturally mitigating pseudo-label noise. This approach aims to fully utilize the information in high-quality labels, thereby reducing the uncertainty in the generated data. Second, the prediction bank effectively addresses the foreground–background imbalance in medical images by analyzing global statistics across the entire dataset. By storing prediction results of all pixels and calculating separate average losses for foreground and background classes, the prediction bank calibrates the loss function to ensure balanced learning between classes, significantly improving segmentation accuracy for minority classes like the optic cup. These two components work synergistically within a teacher–student framework to enhance model stability and performance for source-free domain adaptation in medical image segmentation tasks.

Our contributions can be summarized as follows:

First, we introduce a novel mutual information maximization approach that fundamentally addresses pseudo-label noise by formulating an optimization problem from information theory principles rather than relying on post-processing techniques.
Second, we develop a comprehensive prediction bank mechanism that effectively tackles the class imbalance problem in medical images by analyzing dataset-wide statistics and dynamically adjusting the loss function weights.
Third, we present an enhanced teacher–student framework that seamlessly integrates these aforementioned two components, enabling stable and effective source-free domain adaptation for medical image segmentation without requiring access to source domain data during training.

This paper is organized as follows. Section 2 reviews related work on domain adaptation methods and source-free domain adaptation techniques in medical image segmentation. Section 3 presents our proposed MIMPB framework, including the teacher–student network, mutual information maximization algorithm, and prediction bank mechanism. Experimental results on fundus segmentation datasets with performance comparisons, ablation studies, and parameter analysis are discussed in Section 4. Finally, Section 5 concludes this paper and discusses future research directions.

2. Related Work

Recent developments in segmenting optic disc and cup have increasingly focused on addressing domain shift challenges across different medical institutions and imaging devices. While supervised and unsupervised domain adaptation methods have shown promising results, they often require access to source domain data during the adaptation process. This requirement has led to the emergence of source-free domain adaptation approaches that can adapt pre-trained models using only unlabeled target domain data.

2.1. Domain Adaptation

Domain adaptation methods represent a class of transfer learning techniques designed to address the challenge of domain shift between source and target domains. In the medical imaging field, these methods can be broadly categorized into supervised domain adaptation (SDA) and unsupervised domain adaptation (UDA) approaches, each offering distinct advantages and addressing different scenarios of data availability.

Supervised domain adaptation methods utilize labeled data from both source and target domains to facilitate knowledge transfer. These approaches can be categorized into fine-tuning strategies and multi-step adaptation methods. Fine-tuning strategies involve adapting pre-trained models through parameter adjustment on target domain data. Recent studies [] have demonstrated that fine-tuning convolutional neural networks pre-trained on brain MRI scans can significantly improve brain lesion segmentation performance using limited target samples. Similarly, successful adaptations have been reported for VGG networks on MRI images [] and AlexNet architectures for lesion detection []. Multi-step adaptation approaches address scenarios with insufficient target samples by employing intermediate adaptation stages. Enhanced approaches [] first fine-tune a residual network on a relatively large medical dataset then train on smaller target domain data, showing superior performance over direct transfer methods. Semi-supervised domain adaptive methods [,] have emerged to address the challenging scenario where both limited annotation and domain shift coexist in medical image segmentation. These approaches utilize small amounts of labeled target data alongside abundant unlabeled data to facilitate domain adaptation, offering the advantage of reduced annotation burden while maintaining robust cross-domain performance through effective knowledge transfer mechanisms.

Unsupervised domain adaptation methods address the more challenging scenario where only labeled source domain data and unlabeled target domain data are available during training. These methods primarily rely on feature alignment or image alignment strategies to mitigate domain shift. Feature alignment approaches [,] primarily leverage adversarial learning frameworks to learn domain-invariant representations by constraining feature distributions across domains. These methods help reduce the model’s sensitivity to image quality variations across different domains by encouraging the model to focus on task-relevant features while suppressing domain-specific variations. Researchers [] have also focused on low-level feature training by fixing the weights of high-level networks, thereby better adapting to image assessment requirements. Image alignment strategies typically employ generative adversarial networks to achieve domain-level alignment. A noise adaptation generative adversarial network [] has been proposed to address fundus vessel segmentation by treating domain adaptation as a noise style transfer problem, incorporating dual discriminators to ensure content similarity and noise pattern consistency. Other studies [,] focus on advanced generative modeling and masked autoencoding techniques to bridge domain gaps by learning shared representations across different medical imaging modalities, so as to improve existing methods and provide practical solutions for multi-institutional medical studies with data sharing restrictions.

2.2. Source-Free Domain Adaptation (SFDA)

Source-free domain adaptation methods can be categorized into entropy-based methods and output alignment-based methods. Entropy-based methods [,] utilize a trained model to predict a small amount of unlabeled target domain data. They update the model’s parameters for adaptation to the target domain based on the entropy of the predictions. These methods can enhance discriminability and generalization ability. However, they suffer from several issues, such as updating only the lower-layer batch normalization parameters while neglecting high-level semantic features and encountering uncertainty in supervision information due to differences between the source and target domains, which may lead to erroneous model updates. Output alignment-based methods [] aim to mitigate uncertainty in predictions. Most approaches use denoising autoencoders or other denoising techniques to refine results. The typical process involves first training a segmentation model on the source domain with label supervision to optimize performance, then allowing the model to adapt to a small amount of unlabeled target domain images. The model is trained and updated using denoised segmentation results, mainly adjusting shallow-layer batch normalization parameters. However, these methods still focus primarily on adapting low-level features of target domain images, providing limited improvements in segmentation accuracy.

Source-free domain adaptation methods can also be broadly divided into generation-based methods [,] and pseudo-labeling methods [,]. Generation-based methods face challenges in learning to generate meaningful features and have limited applicability. In contrast, pseudo-labeling methods are relatively simpler and more general, with a strong research foundation, particularly in medical image segmentation. For instance, some researchers [] proposed a method that minimizes an unsupervised entropy loss function defined on target domain data while incorporating domain-invariant prior information from the segmentation region to guide model optimization. This method has been applied to spinal, prostate, and cardiac segmentation tasks. Another study [] employed pseudo-label generation for source-free domain adaptive image segmentation, integrating self-training in a two-stage adaptation process. During the specific target adaptation stage, the study introduced an ensemble entropy minimization loss and a selective voting strategy to enhance pseudo-label generation by reducing high-entropy regions. In the specific task adaptation stage, the approach leveraged a teacher–student network to further refine pseudo-labels, significantly improving segmentation performance in the target domain. Additionally, the ProSFDA method [] was proposed as a source-free domain adaptation approach featuring both a generation phase and an adaptation phase. In the generation phase, class-specific source-like images are generated using the statistical information from a pre-trained source model and Fourier transform. In the adaptation phase, a specially designed distillation module enables feature-level adaptation. This module includes a domain distillation loss to constrain relational knowledge transfer and a domain contrastive loss to minimize domain discrepancies.

3. Methdology

Our proposed MIMPB framework includes two key components. They formulate the adaptation problem as an information-theoretic optimization that naturally reduces pseudo-label noise while maintaining training stability. The framework seamlessly integrates mutual information maximization with a novel prediction bank mechanism to achieve balanced learning across imbalanced anatomical structures.

3.1. Framework of MIMPB

Source-free domain adaptation of medical images segmentation aims to transfer a source model

f_{s}

, which is trained on labeled source domain images

X^{s}

, to the target domain using only unlabeled target domain images

X^{t}

. The ground truth labels in the source domain are denoted as

Y_{i} \in {0, 1}^{H \times W \times C}

, where H, W, and C represent the image height, width, and number of categories, respectively, and i indicates the i-th image. If the j-th pixel of the i-th image belongs to the c-th category, then

y_{i j}^{c} = 1

; otherwise,

y_{i j}^{c} = 0

.

Based on this, we aim to construct and train a deep neural network

f_{θ} : x \to y

. Through training and optimization of this network, the ultimate goal is to achieve accurate segmentation of the optic disc and optic cup. To effectively address key challenges such as data denoising, model training stability, and the imbalance between foreground and background information, we propose a solution based on mutual information optimization. Specifically, this method leverages the properties of mutual information to fundamentally eliminate noise interference in the results. Additionally, a teacher–student network is introduced to enhance the stability of the training process. Moreover, a prediction bank is constructed to balance foreground and background information, thereby improving overall model performance.

The primary objective of the method is to enhance the model’s performance in a specific target domain by leveraging knowledge obtained from the source domain. The model framework is shown in Figure 1. The detailed training procedure is shown in Algorithm 1.

Algorithm 1 Training procedure of MIMPB

Require:: Source model $f_{s}$ , Target data $D_{T} = {X_{i}^{t}}_{i = 1}^{N}$ (unlabeled)
Ensure:: Adapted model parameters $θ_{s}$ , $θ_{t}$

1:: begin
2:: Initialize $θ_{s}, θ_{t} \leftarrow f_{s}$
3:: Build prediction bank $PB$ with $f_{θ_{t}} (D_{T})$
4:: for epoch $= 1$ to E do
5:: for each batch $B \subset D_{T}$ do
6:: Generate $X^{w} \leftarrow WeakAug (B)$ , $X^{s} \leftarrow StrongAug (B)$
7:: // Teacher prediction
8:: $p^{w} \leftarrow σ (f_{θ_{t}} (X^{w}))$ based on Equation (1)
9:: // MIM optimization
10:: Optimize $W^{*}$ via MIM based on Equation (3)
11:: Compute optimal pseudo-labels ${\hat{y}}^{r e}$ based on Equation (18)
12:: // Prediction bank weighting
13:: Calculate $η^{f g}, η^{b g}$ from $PB$ based on Equations (20) and (21)
14:: // Student prediction and training
15:: $p^{s} \leftarrow σ (f_{θ_{s}} (X^{s}))$
16:: $L \leftarrow L_{s e g}$ (with weight $η^{f g} / η^{b g}$ ) $+ α L_{M I}$ based on Equations (11), (22) and (23)
17:: Update $θ_{s}$ via gradient descent on $L$
18:: // Teacher update
19:: $θ_{t} \leftarrow λ θ_{t} + (1 - λ) θ_{s}$ based on Equation (2)
20:: Update $PB$ with new predictions
21:: end for
22:: end for
23:: end

Figure 1. The framework of MIMPB. First, the model undergoes pretraining using labeled source domain data. During this phase, the labeled source domain data is fully utilized to enable the model to learn essential features and patterns. After pretraining, the pretrained model is used to initialize the target domain model. During the training phase in the target domain, the model is divided into a teacher model and a student model. Both models are initialized using the pretrained source model. However, their parameter update mechanisms differ during training. The teacher model does not update its parameters directly; instead, its parameters are updated using the exponential moving average of the student model’s training results. Furthermore, the algorithm optimizes the teacher model’s results through mutual information optimization, which guides the student network’s training in a more precise direction. Additionally, a prediction bank is utilized to address the imbalance between foreground and background data.

3.2. Teacher–Student Network

To avoid error accumulation and ensure a more stable training process, we adopt a weak–strong augmentation teacher–student network. The teacher and student models share the same structure but differ in their weight update methods. Specifically, for each image

X_{i}

, a weakly augmented version

X_{i}^{w}

is generated through image flipping and resizing and is fed into the teacher network. Meanwhile, a strongly augmented version

X_{i}^{s}

is generated and input into the student network. The strong augmentation methods include random erasing, contrast adjustment, and impulse noise.

Finally, the teacher network’s output is used to guide the student network, with the mathematical formulation as follows:

L = E_{x^{s}, {\hat{y}}^{w}} [L_{b c e}],

(1)

where

x^{s} \in X_{i}^{s}

represents the pixels in the image,

{\hat{y}}^{w}

denotes the pseudo-labels obtained from the teacher network, and

L_{b c e}

refers to the binary cross-entropy loss (BCE). Using this network offers the following advantages: First, given that fundus image datasets are generally small in scale, insufficient training data makes the model highly prone to overfitting. To effectively mitigate this issue, the algorithm incorporates image augmentation techniques to enhance the diversity of the training set. Second, applying different random augmentation methods during training introduces consistency regularization, which helps group images with similar semantics into the same category. This, in turn, facilitates the formation of more discriminative feature representations.

The teacher model does not update its parameters through loss backpropagation but instead updates them using the exponential moving average of the student model’s parameters. The update formula is as follows:

θ_{t}^{'} = λ θ_{t - 1}^{'} + (1 - λ) θ_{t},

(2)

where

θ_{t}^{'}

represents the teacher model’s parameters at time step t,

θ_{t}

denotes the student model’s parameters at time step t, and

λ

is the smoothing coefficient hyperparameter. This network enhances generalization ability, stabilizes training, and effectively utilizes unlabeled data.

3.3. Mutual Information Optimization Algorithm

The algorithm formulates an optimization problem consisting of two parts. The first part minimizes the relative entropy between the pseudo-labels and the model’s output probabilities. The goal is to reduce the gap between the pseudo-label distribution and the probability distribution, thereby learning pseudo-labels that better represent the probability distribution. The second part maximizes the mutual information between the pseudo-labels and the data. This is motivated by the idea that high-quality labels can reduce uncertainty within the data.

Given a segmentation model

f_{θ}

, its output is a K-dimensional probability vector, where K is the predefined number of segmentation classes. For an input image

X = {x_{i}}_{i = 1}^{B}

, the model’s output probabilities are defined as

P = {p_{i}}_{i = 1}^{B} \subset R_{K}

, where

p_{i} = s o f t m a x (f_{θ} (x_{i}))

. This method aims to learn the optimized probability results

W^{*} = {w_{i}^{*}}_{i = 1}^{B}

for training the model, where B represents the image size.

Consider the optimization problem:

W^{*} = a r g m i n_{W \subset Δ_{K}} \frac{1}{B} \sum_{i = 1}^{B} D_{K L} (w_{i}, p_{i}) - β I (Y_{W}; B),

(3)

where

Y_{w} \in {1, \dots, K}

represents the pseudo-labels obtained from the distribution

W^{*} = {w_{i}^{*}}_{i = 1}^{B}

,

B \in {1, \dots, B}

denotes the pixel position index, and

Δ_{K} : = {w \in R_{+}^{K} | w^{T} 1_{K} = 1}

. The hyperparameter

β \in [0, 1)

in Equation (3) is a crucial coefficient that balances the two objectives of the optimization problem. It controls the trade-off between minimizing the KL divergence and maximizing mutual information. A higher value of

β

places greater emphasis on the mutual information term, encouraging the generation of diverse and well-structured pseudo-labels that can prevent the model from collapsing into trivial solutions. Conversely, a lower value of

β

ensures that the resulting pseudo-labels

W^{*}

remain closer to the model’s original predictions P. The selection of

β

thus allows for tuning the balance between model confidence and label diversity.

By expanding the formula in Equation (3) using the definitions of KL divergence and mutual information, we can get:

\begin{matrix} W_{i}^{*} = & a r g m i n_{W \subset Δ_{K}} - \frac{1}{B} \sum_{i = 1}^{B} \sum_{j = 1}^{K} w_{i j} l o g p_{i j} \\ + \frac{1 - β}{B} \sum_{i = 1}^{B} \sum_{j = 1}^{K} w_{i j} l o g w_{i j} + β \sum_{j = 1}^{K} {\hat{w}}_{j} l o g {\hat{w}}_{j} . \end{matrix}

(4)

It is clear that this optimization problem is a strictly convex optimization problem. Therefore, the optimization problem is constructed using the Karush–Kuhn–Tucker (KKT) conditions:

L (W) = f (W) + λ g (W),

(5)

where,

\begin{matrix} f (W) = & - \frac{1}{B} \sum_{i = 1}^{B} \sum_{j = 1}^{K} w_{i j} l o g p_{i j} \\ + \frac{1 - β}{B} \sum_{i = 1}^{B} \sum_{j = 1}^{K} w_{i j} l o g w_{i j} \\ + β \sum_{j = 1}^{K} {\hat{w}}_{j} l o g {\hat{w}}_{j}, \end{matrix}

(6)

g (W) = w^{T} 1_{K} - 1 .

(7)

Finally, we can get

w_{i j}^{*} = \frac{p_{i j}^{\frac{1}{1 - β}} {\hat{w_{j}^{*}}}^{- \frac{β}{1 - β}}}{\frac{1}{\sum_{j} p_{i j}^{1 - β}} {\hat{w_{j}^{*}}}^{- \frac{β}{1 - β}}},

(8)

and

\hat{w_{j}^{*}} = [\frac{1}{B} \sum_{i} \frac{p_{i j}^{\frac{1}{1 - β}}}{\sum_{k} p_{i k}^{\frac{1}{1 - β}} {\hat{w_{k}^{*}}}^{- \frac{β}{1 - β}}}]^{1 - β} .

(9)

Based on the above formula,

{u_{j}^{(n)}}_{j = 1}^{I t e r a}

can be iteratively constructed to obtain the optimal solution, and the iterative formula is:

u_{j}^{(n + 1)} = [\frac{1}{B} \sum_{i} \frac{p_{i j}^{\frac{1}{1 - β}}}{\sum_{k} p_{i k}^{\frac{1}{1 - β}} {(u_{j}^{(n)})}^{- \frac{β}{1 - β}}}]^{1 - β} .

(10)

As proved in [], Equation (10) is a fixed-point iteration for the strictly convex objective and converges to the unique optimum. Through this iterative method, the optimal pseudo-label distribution can be obtained, which is then used for model training. That is, the loss function for a single image can be defined as:

L_{M I} = - \frac{1}{B \times K} \sum_{i = 1}^{B} \sum_{j = 1}^{K} u_{i j} l o g p_{i j} .

(11)

It is important to note that since this algorithm uses a teacher–student network,

u_{i j}

is the optimal pseudo-label obtained by optimizing the probability values

p_{i j}^{s}

output by the teacher network. The specific process is not elaborated on here.

The above optimization method minimizes the relative entropy between pseudo labels and the model’s output probabilities. During model training, there is often a discrepancy between the distributions of pseudo labels and output probabilities. Mutual information maximization can effectively reduce this gap, making the two distributions as close as possible. In this way, the model can better exploit the latent features and patterns in the data, thereby gaining a deeper understanding of its essence and underlying structure. Finally, the method leverages the output probabilities of the teacher network to guide those of the student network, further enhancing the stability of model training.

3.4. Prediction Bank

The teacher–student network and mutual information optimization methods effectively address the stability and denoising issues during model training. However, in the fundus segmentation task, there is still the challenge of foreground–background imbalance. Specifically, foreground objects (e.g., the optic cup) usually occupy very small areas in the image, causing the majority of pixels to belong to the background. If binary cross-entropy (BCE) loss is directly used to update the student model, the background class will dominate the loss calculation, weakening the supervisory signal for the foreground class. The prediction bank proposed in this method is designed to alleviate this issue to some extent. In practice, the model performs separate segmentation for the optic cup and the optic disc. When segmenting the cup, the disc and fundus are regarded as background, while when segmenting the disc, the remaining regions serve as background. Thus, the prediction bank is applied in a binary segmentation manner for each structure.

When processing images, the simplest way to address the foreground–background imbalance issue is to separately count the number of pixels belonging to the foreground and background classes in each image and then design a loss weighting function based on these statistics. In standard supervised learning tasks, this strategy may be effective since the labels are reliable. However, for pseudo-labels, performing statistical analysis based on a single image is highly risky. To mitigate this problem, this algorithm constructs a prediction bank to analyze the class imbalance across the entire dataset, using this global information to calibrate the loss for each image. Specifically, the prediction bank stores the prediction results of all pixels in all images and calculates the average loss for the foreground and background:

η_{k}^{f g} = \frac{\sum_{i} L_{i, k} \times 1 [{\hat{y}}_{i, k} = 1]}{\sum_{i} 1 [{\hat{y}}_{i, k} = 1]},

(12)

η_{k}^{b g} = \frac{\sum_{i} L_{i, k} \times 1 [{\hat{y}}_{i, k} = 0]}{\sum_{i} 1 [{\hat{y}}_{i, k} = 0]} .

(13)

L

is the segmentation loss, and its specific form will be detailed in the next section. Here,

f g

and

b g

refer to the foreground and background, respectively. This method uses the mean of the loss, rather than averaging based on the number of pixels, because the loss of each pixel reflects the “difficulty level” of that pixel. This property allows pixels with more informative features to be assigned greater weight. The average loss computed based on Formulas (12) and (13) can be further optimized and calibrated as follows:

{\hat{L}}_{b c e} = E_{x, k} [{\hat{y}}_{k} l o g (p_{k}) + η_{k}^{f g} / η_{k}^{b g} (1 - {\hat{y}}_{k}) l o g (1 - p_{k})] .

(14)

Additionally, considering that in actual model prediction processes, the vast majority of predictions generally exhibit high confidence, meaning their values are extremely close to 0 or 1, these highly deterministic predictions contain relatively little information in terms of information entropy. Therefore, to more accurately compute the average loss and enhance the model’s learning capability for complex cases, only those pixels with relatively large loss magnitudes are considered during the calculation. This screening mechanism is specifically implemented by adopting a constraint threshold approach:

\frac{| p_{i, k} - γ |}{| {\hat{y}}_{i, k} - γ |} > 0.2,

(15)

where

γ

is the probability threshold for converting soft probabilities to hard labels, and

p_{i}

is the output probability value.

Overall, the calibrated loss ensures fair learning across different categories, thereby alleviating the performance degradation problem caused by class imbalance. The model constructs a prediction bank for images and analyzes class imbalance in the dataset from a global perspective. It effectively avoids risks, making the model more robust and reliable. In addition, the model’s decision-making becomes more scientific and reasonable by adopting the mean loss and assigning weights to key pixels according to the pixel-wise loss.

3.5. Objective Function

This section will elaborate on the objective function of the algorithm, focusing on two main components: the segmentation loss and the loss based on mutual information optimization. It is worth noting that the loss related to mutual information optimization has been explained in detail previously, so it will not be repeated here.

The method for constructing the original pseudo-labels, which directly uses the model’s prediction results to generate pseudo-labels, is as follows:

{\hat{y}}_{i, k} = 1 [p_{i, k} > γ],

(16)

where,

1

is the indicator function, and

γ

is the probability threshold for converting soft probabilities to hard labels.

p_{i}

and

{\hat{y}}_{i}

are the i-th dimensions of p and

\hat{y}

, representing the prediction and pseudo-label for the i-th class, respectively.

Most existing works tend to employ two main strategies to optimize the original methods. First, they propose calibration techniques to obtain better pseudo-labels. Second, they measure uncertainty and apply weights during the loss calculation. These improvements do enhance model performance to some extent, but they still fail to overcome stability issues. This is because the presence of noise directly interferes with model performance, and since the prediction results are used to generate pseudo-labels, errors accumulate, which further impacts the reliability of the model. Additionally, these methods generally overlook the imbalance in foreground and background pixel distributions in images, with the foreground region typically occupying a small proportion. Under these conditions, directly using binary cross-entropy loss for probability updates is highly susceptible to interference from inaccurate contextual relationships, leading to the incorrect propagation of background probabilities to the foreground, thus reducing the accuracy of foreground probabilities. To address this, the proposed algorithm not only introduces a prediction bank but also incorporates a correction mechanism for the probabilities, aiming to further enhance the model’s performance and stability. Specifically:

p_{i, k}^{r e} = \frac{p_{i, k}}{m a x_{j} p_{j, k}} .

(17)

To achieve more accurate probability representation, it is necessary to perform a correction operation on the reduced probabilities based on the maximum value in the image. After calibration using Formula (17), the output of the maximum probability (e.g., when at the center of a certain region) should approach 1. This approach effectively improves the accuracy and reliability of probability calculations, thereby optimizing the performance of the related algorithm when processing data.

Based on this, pseudo-labels

{\hat{y}}_{i, k}

are generated:

{\hat{y}}_{i, k}^{r e} = 1 [p_{i, k}^{r e} > γ],

(18)

select

p_{i, k}^{r e}

that satisfies the following formula:

\frac{| p_{i}^{r e} - γ |}{| {\hat{y}}_{i} - γ |} > 0.2,

(19)

and calculate

η_{k}^{f g}

and

η_{k}^{b g}

:

η_{k}^{f g} = \frac{\sum_{i} L_{i, k} \times 1 [{\hat{y}}_{i, k}^{r e} = 1]}{\sum_{i} 1 [{\hat{y}}_{i, k}^{r e} = 1]},

(20)

η_{k}^{b g} = \frac{\sum_{i} L_{i, k} \times 1 [{\hat{y}}_{i, k} = 0]}{\sum_{i} 1 [{\hat{y}}_{i, k} = 0]} .

(21)

Finally, the segmentation loss is updated using BCE loss:

{\hat{L}}_{b c e}^{r e} = E_{x, k} [{\hat{y}}_{k}^{r e} l o g (p_{k}^{r e}) + η_{k}^{f g} / η_{k}^{b g} (1 - {\hat{y}}_{k}^{r e}) l o g (1 - p_{k}^{r e})],

(22)

It is important to note that since this algorithm utilizes a teacher–student network,

{\hat{y}}_{k}^{r e}

is a pseudo-label derived from the probability values output by the teacher network. The specific process is not reiterated here.

Building upon the aforementioned segmentation loss, the mutual information optimization loss is incorporated, resulting in the final objective function:

L = {\hat{L}}_{b c e}^{r e} + α L_{M I},

(23)

where

α

is the weighting coefficient for the mutual information loss. In the next section, a sensitivity analysis of this coefficient will be conducted to explore its impact on model performance.

4. Experiments

In this section, the performance of the proposed algorithm in the retinal image segmentation task will be evaluated. The method will be compared with state-of-the-art methods, and the obtained results will be analyzed. Additionally, ablation experiments will be conducted and examined. Finally, parameter sensitivity evaluation and visualization analysis will be performed.

4.1. Experiment Settings

Datasets: Three datasets were used, described as follows:

REFUGE Dataset []: This dataset is designed for glaucoma assessment and contains a total of 1200 fundus images, all derived from clinical data at the Zhongshan Ophthalmic Center, Sun Yat-sen University. It includes precise annotations for optic disc and cup segmentation, as well as clinical labels for glaucoma, making it suitable for tasks such as optic disc and cup segmentation and glaucoma detection.

RIM-ONE Dataset []: This dataset consists of three subsets, with RIM-ONE-r3 being one of them, primarily used for research on optic disc and cup segmentation in ophthalmic images. The subset contains 159 fundus images, with segmentation annotations for the optic disc and cup, making it useful for training and evaluating segmentation models for these structures and supporting the diagnosis and study of ophthalmic diseases.

Drishti-GS Dataset []: This dataset is intended for retinal optic nerve head segmentation and glaucoma classification, providing a valuable resource for automated glaucoma assessment research. It includes a total of 101 retinal images. The dataset was curated by clinical researchers based on clinical findings during examinations, with glaucoma patients selected accordingly. Each image was annotated by four glaucoma experts with 3, 5, 9, and 20 years of experience, respectively, to capture inter-observer variability. This dataset can be used for tasks such as fundus optic disc and cup segmentation, as well as glaucoma and normal classification.

In the experiment, datasets collected from different clinical centers with distribution differences were used. Specifically, the REFUGE dataset’s training set is set as the source domain, while the publicly available RIM-ONE-r3 dataset and Drishti-GS dataset are used as two different target domains. The source domain contains 400 labeled training images, and the data from the two target domains are divided into training and testing sets at ratios of 99/60 and 50/51, respectively.

Evaluation Metrics: The Dice coefficient and Average Surface Distance (ASD) are used as evaluation metrics.

The Dice coefficient is a statistical measure used to quantify the similarity between two sets. In segmentation tasks, it evaluates the degree of overlap between the predicted segmentation and the ground truth. Given two sets,

| A |

and

| B |

(where

| A |

represents the set of segmented pixels and

| B |

represents the ground truth pixel set), the Dice coefficient is calculated as follows:

D i c e = \frac{2 | A \cap B |}{| A | + | B |},

(24)

where

| A \cap B |

represents the number of pixels that are labeled as the target in both the segmentation result and the ground truth.

The Average Surface Distance (ASD) is a metric used to measure the surface distance between the segmentation result and the ground truth. It primarily focuses on the difference between the segmentation boundary and the actual boundary, reflecting the precision of the segmentation.

Let

S_{1}

be the set of points on the segmented surface and

S_{2}

be the set of points on the ground truth surface. For each point

p_{1} \in S_{1}

, the shortest distance to

S_{2}

is computed as

d_{1}

. Similarly, for each point

p_{2} \in S_{2}

, the shortest distance to

S_{1}

is computed as

d_{2}

.

A S D = \frac{\sum_{p_{1} \in S_{1}} d_{1} + \sum_{p_{2} \in S_{2}} d_{2}}{| S_{1} | + | S_{2} |} .

(25)

Implementation Details: The backbone, threshold, software, and hardware used in the experiment:

Experiments adopt DeepLabv3+ as the backbone network. For this task, the probability threshold

γ

for converting soft probabilities into hard labels is set to 0.75. During training, weak augmentation strategies are applied to the input of the teacher network, specifically through image flipping and resizing. Meanwhile, strong augmentation is applied to the input of the student network, incorporating techniques such as random erasing, contrast adjustment, and impulse noise addition. Introducing perturbations to the input helps deviate predictions from pseudo-labels, thereby increasing training difficulty and encouraging the model to enhance its robustness.

The model is trained using the Adam optimizer, with momentum parameters set to 0.9 and 0.99, a learning rate of

2 \times 10^{- 3}

, and a batch size of 4. The entire framework is implemented in PyTorch 0.4.1 and runs on an NVIDIA GeForce GTX 1080Ti GPU.

In the preprocessing stage, each image is cropped to a region of 512 × 512 pixels to serve as the input to the network.

4.2. Comparison with Other Methods

The proposed model is compared with several state-of-the-art domain adaptation methods. These include unsupervised domain adaptation (UDA) methods such as the BEAL (Boundary and Entropy-driven Adversarial Learning) method [] and AdvEnt method []. Additionally, source-free domain adaptation (SFDA) methods are discussed, including the SRDA (Source-Relaxed Domain Adaptation) method [], DAE method [], DPL (Denoised Pseudo-Labeling) method [], CBMT (Class-Balanced Mean Teacher) method [], Crots (Cross-Domain Teacher–Student Learning) method [], CPR (Context-Aware Pseudo-Label Refinement) method [] and RDPL method []. Table 1 and Table 2 present comparison results on two datasets against various methods. In each column, the best scores are highlighted. For the evaluation metrics, the up-arrow symbol “↑” indicates that a higher score represents better performance, while the down-arrow symbol “↓” signifies that a lower score is better. The symbol “-” indicates that the method did not report corresponding results, while “±” represents the standard deviation of the sample in the dataset. The “S-F” label denotes whether the method is source-free.

Table 1. Performance comparison on RIM-ONE-r dataset.

Table 2. Performance comparison on Drishti-GS dataset.

During the evaluation, the results corresponding to “W/o DA” (without domain adaptation) in the table can be seen as the lower bound of model performance, while the “Oracle” results, representing supervised learning, serve as the upper bound. From the data in the table, it is evident that the evaluation results of both metrics significantly outperform the most advanced source-free domain adaptation methods and, in some aspects, even show performance improvements over traditional unsupervised domain adaptation methods. Notably, on the Drishti-GS dataset, this method demonstrates a significant performance improvement compared to previous research results.

Specifically, according to the data in Table 1, the proposed method achieves the best performance in the disc and cup segmentation tasks on the RIM-ONE-r dataset. Additionally, an analysis of the variance data reveals that this method demonstrates high stability, which reflects its generalization ability, further validating the effectiveness of the teacher–student network. At the same time, the proposed method has surpassed traditional unsupervised domain adaptation methods such as BEAL, indicating that the algorithm can effectively adapt to the target domain even without access to source domain data.

Based on the data in Table 2, the method shows optimal performance in the cup segmentation task on the Drishti-GS dataset. However, in the disc segmentation task, its performance is slightly inferior to the CBMT method. Nevertheless, when considering the overall results, the average DICE coefficient of this method is still 1.27% higher than that of the CBMT method. The improvement in cup segmentation performance strongly validates the effectiveness of the prediction library and probability calibration strategy introduced to address the foreground–background imbalance problem.

In summary, these results confirm that the denoising method optimized using mutual information is effective. This method not only enhances the model’s performance but also strengthens its stability. It helps address potential noise issues in pre-trained model results by naturally removing noise through an entropy-based approach, thereby increasing the information content.

4.3. Ablation Study Results

This section will validate the effectiveness of each module and setting in the algorithm through a series of ablation experiments. Specifically, Table 3 shows the results of the ablation experiments on the mutual information module and the prediction library, which were conducted on the RIM-ONE-r dataset and the Drishti-GS dataset.

Table 3. Ablation study results on RIM-ONE-r and Drishti-GS.

By carefully comparing the first and third rows of each dataset, the effectiveness of the prediction library can be clearly observed. Notably, on the Drishti-GS dataset, the prediction library led to a significant 10% improvement in the DICE coefficient. This result confirms that the module not only effectively enhances the model’s performance but also alleviates the issue caused by the imbalance between the foreground and background. By encouraging the model to focus more on foreground information, the model’s performance is further improved.

Additionally, by analyzing the second and third rows of data for each model, the effectiveness of the mutual information module can be clearly observed. From an information theory perspective, the core principle of the mutual information module is to use the theory of entropy to accurately denoise the model’s output. During domain adaptation, various noise interferences inevitably occur, which can affect the accuracy and reliability of model predictions. The mutual information module can, to some extent, identify and remove these interfering signals, making the model’s output more accurate. This denoising strategy based on information entropy theory has been fully validated in the experimental data comparison, demonstrating its effectiveness in improving model performance.

A deeper analysis reveals the modules’ complementary roles. The MIM module enhances the supervisory signal by filtering noise from pseudo-labels, which is crucial for preventing error accumulation in the self-training process. Concurrently, the Prediction Bank module directly tackles severe class imbalance by re-weighting the loss based on global statistics, forcing the model to focus on vital foreground regions. This synergy ensures that the cleaner labels provided by MIM are effectively utilized. However, we also identified a trade-off in Table 3 for the RIM-ONE-r dataset: while our full model achieves the best Dice score, the MIM module alone yields a slightly better ASD for the optic cup. This suggests the combined approach offers the best overall balance between regional overlap and boundary precision.

It is particularly worth emphasizing that when the prediction library and the mutual information module are combined, they create a synergistic effect that enhances the model’s performance. This combination is not just a simple addition but an interaction and optimization within the model. The prediction library provides the model with rich and targeted prior information, while the mutual information module further optimizes the data quality, complementing each other. This collaboration enables the model to more accurately extract key features when processing various complex data, thereby improving the model’s performance.

In addition, this section also explores different data augmentation methods. Table 4 visually presents the experimental results under various data augmentation strategies. The first row, “Weak Augmentation,” indicates that both the teacher and student networks used weak data augmentation; the second row, “Strong Augmentation,” shows that both networks applied strong data augmentation; and the method proposed in this paper uses weak data augmentation for the teacher network and strong data augmentation for the student network.

Table 4. Segmentation performance with different data augmentation strategies.

Upon analyzing the experimental results, it is evident that when both models use strong data augmentation, the performance is less than satisfactory. This is likely because strong data augmentation alters the images significantly, causing them to lose some of their original key information, which negatively affects the model’s performance. In contrast, the approach used in this paper applies weak data augmentation to the teacher network, effectively ensuring the reliability and stability of the teacher network’s output. On this basis, the student network is further guided to tackle more challenging tasks, thereby improving the student network’s ability to extract information and ultimately enhancing both the performance and stability of the model.

4.4. Parameter Analysis

This study also evaluates the sensitivity of hyperparameter weights. First, Table 5 presents the sensitivity analysis results for the teacher network update coefficient. Analyzing the average results, it is observed that the experimental outcomes remain largely consistent with different update coefficients. However, upon closer inspection, the model performs optimally when the update coefficient is set to 0.99. This result demonstrates that the model exhibits good robustness when faced with different update coefficients. It indicates that the model can flexibly adapt to changes in hyperparameters within a certain range while maintaining relatively stable performance, providing a degree of assurance for its widespread use in practical applications.

Table 5. Teacher–student update coefficient sensitivity analysis.

Additionally, this section performs a sensitivity analysis of the loss coefficient

α

, as shown in Figure 2. The DICE coefficient here represents the average value of the cup and disc results. From the analysis, it can be observed that the DICE coefficients corresponding to different loss coefficients generally fall within the range of 90–92%. This data strongly demonstrates the algorithm’s stability with respect to the loss coefficient. Further observation reveals that when

α

is set to 1.0, the model shows the best performance. This phenomenon suggests that when the ratio of segmentation loss to mutual information loss is balanced, the model achieves the optimal segmentation results.

Figure 2. Sensitivity analysis of loss coefficient

α

. The blue stars indicate the Dice coefficient values corresponding to each

α

.

4.5. Visualization

To intuitively assess the performance of the segmentation algorithm, this section presents a visual analysis of the segmentation results. As shown in Figure 3, the visualization is presented in a four-row layout. The top two rows display the segmentation results for the Drishti-GS dataset, while the bottom two rows focus on the RIM-ONE-r dataset. Each row follows a consistent four-column structure.

Figure 3. Visual comparison of optic disc and cup segmentation before and after domain adaptation.

The first column shows the original images, which retain their initial state in the dataset and provide a basic reference for subsequent comparative analysis. The second column displays the ground truth labels, which are annotated by professionals and serve as the standard for evaluating the accuracy of the segmentation algorithm. The third column presents the segmentation results generated by the source domain pre-trained model. The fourth column shows the segmentation results after adaptation with the method proposed in this paper.

The method proposed in this study fully considers the characteristics of medical images and provides targeted optimization for pre-trained models. From the visual results, it is clear that the segmentation accuracy has been significantly improved after processing with the proposed method. The previously blurred boundaries of the target region are now clear and well defined, closely aligning with the boundaries defined by the ground truth labels. Additionally, the segmentation edges are smoother. Furthermore, jagged lines are noticeably reduced, which not only enhances the visual quality of the segmentation results but, more importantly, provides more accurate and reliable image analysis for doctors in practical applications, such as medical diagnostic assistance systems.

In conclusion, the observation and analysis of the visualization results validate the effectiveness of the proposed method in improving segmentation accuracy and its ability to denoise images.

5. Conclusions and Discussion

In this paper, we addressed the critical challenges of domain shift and data privacy in medical image segmentation by proposing a novel source-free unsupervised domain adaptation algorithm, MIMPB. Our method achieved state-of-the-art performance on public fundus segmentation benchmarks, demonstrating that models can be effectively adapted to new clinical data without accessing sensitive source data. The core contributions establish a strong framework for privacy-preserving AI deployment across different healthcare institutions.

The clinical relevance of this work lies in its potential to be seamlessly integrated into computer-aided diagnostic systems for ophthalmology. For instance, an automated glaucoma screening tool equipped with our method could quickly adapt to new imaging devices or patient populations within a hospital, thus providing consistent and accurate optic disc and cup segmentation. This would reduce the need for costly data re-annotation and decrease the workload of specialists. It would also facilitate large-scale, reliable disease screening. Beyond ophthalmology, the principles of our source-free approach are broadly applicable to other medical imaging modalities, such as MRI and CT scans, where domain shifts are a common obstacle. It also holds promise for non-medical fields where data is isolated due to privacy or commercial sensitivities.

Despite its promising results, this study has limitations. Its efficacy has only been validated on fundus images, and its generalization to modalities with fundamentally different characteristics remains to be explored. Moreover, the performance of the adaptation process is inherently linked to the quality of the initial source-trained model. Future research will therefore focus on validating the MIMPB framework on a wider range of medical imaging tasks, including brain tumor and organ segmentation. By addressing these challenges, we believe the framework can evolve into a truly practical and scalable solution, thereby accelerating the adoption of adaptive AI in diverse medical scenarios.

Author Contributions

Conceptualization, H.W. and Y.Z.; methodology, Y.Z.; software, H.W.; validation, H.W., Y.Z. and X.L.; formal analysis, H.W.; investigation, Y.Z.; resources, X.L.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, H.W.; visualization, Y.Z.; supervision, X.L.; project administration, H.W.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Innovation Plan of Shanghai Science and Technology Commission under Grant 22511106005.

Data Availability Statement

All datasets analyzed in this study are publicly available and can be found at the locations specified in the references [,,]. The code supporting the findings of this study is not publicly available at this time due to project and funding requirements. However, it may be made available from the corresponding author upon reasonable request, pending necessary approvals.

Acknowledgments

The authors would like to thank the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System for providing computing resources and technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yao, W.; Bai, J.; Liao, W.; Chen, Y.; Liu, M.; Xie, Y. From cnn to transformer: A review of medical image segmentation models. J. Imaging Inform. Med. 2024, 37, 1529–1547. [Google Scholar] [CrossRef] [PubMed]
Guan, H.; Liu, M. Domain Adaptation for Medical Image Analysis: A Survey. IEEE Trans. Biomed. Eng. 2022, 69, 1173–1185. [Google Scholar] [CrossRef] [PubMed]
Yang, M.; Wu, Z.; Zheng, H.; Huang, L.; Ding, W.; Pan, L.; Yin, L. Cross-modality medical image segmentation via enhanced feature alignment and cross pseudo supervision learning. Diagnostics 2024, 14, 1751. [Google Scholar] [CrossRef] [PubMed]
Shen, Y.; Sheng, B.; Fang, R.; Li, H.; Dai, L.; Stolte, S.; Qin, J.; Jia, W.; Shen, D. Domain-invariant interpretable fundus image quality assessment. Med. Image Anal. 2020, 61, 101654. [Google Scholar] [CrossRef] [PubMed]
Yu, Q.; Xi, N.; Yuan, J.; Zhou, Z.; Dang, K.; Ding, X. Source-free domain adaptation for medical image segmentation via prototype-anchored feature alignment and contrastive learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 3–12. [Google Scholar]
Shen, W.; Wang, Q.; Jiang, H.; Li, S.; Yin, J. Unsupervised Domain Adaptation for Semantic Segmentation via Self-Supervision. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 12–16 July 2021; pp. 2747–2750. [Google Scholar]
Xu, Z.; Lu, D.; Wang, Y.; Luo, J.; Wei, D.; Zheng, Y.; Tong, R.K.y. Denoising for relaxing: Unsupervised domain adaptive fundus image segmentation without source data. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 214–224. [Google Scholar]
Ghafoorian, M.; Mehrtash, A.; Kapur, T.; Karssemeijer, N.; Marchiori, E.; Pesteie, M.; Guttmann, C.R.G.; de Leeuw, F.E.; Tempany, C.M.; van Ginneken, B.; et al. Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Quebec, QC, Canada, 10–14 September 2017; pp. 516–524. [Google Scholar]
Swati, Z.N.K.; Zhao, Q.; Kabir, M.; Ali, F.; Ali, Z.; Ahmed, S.; Lu, J. Brain tumor classification for MR images using transfer learning and fine-tuning. Comput. Med. Imaging Graphs 2019, 75, 34–46. [Google Scholar] [CrossRef] [PubMed]
Samala, R.K.; Chan, H.P.; Hadjiiski, L.; Helvie, M.A.; Richter, C.; Cha, K. Cross-domain and multi-task transfer learning of deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. In Proceedings of the Medical Imaging 2018: Computer-Aided Diagnosis, Houston, TX, USA, 12–15 February 2018; pp. 172–178. [Google Scholar]
Gu, Y.; Ge, Z.; Bonnington, C.P.; Zhou, J. Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification. IEEE J. Biomed. Health Inform. 2020, 24, 1379–1393. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Xing, F.; Shusharina, N.; Lim, R.; Jay Kuo, C.C.; El Fakhri, G.; Woo, J. Act: Semi-supervised domain-adaptive medical image segmentation with asymmetric co-training. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 66–76. [Google Scholar]
Ma, Q.; Zhang, J.; Qi, L.; Yu, Q.; Shi, Y.; Gao, Y. Constructing and exploring intermediate domains in mixed domain semi-supervised medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 11642–11651. [Google Scholar]
Kamnitsas, K.; Baumgartner, C.; Ledig, C.; Newcombe, V.; Simpson, J.; Kane, A.; Menon, D.; Nori, A.; Criminisi, A.; Rueckert, D.; et al. Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In Proceedings of the International Conference on Information Processing in Medical Imaging, Boone, NC, USA, 25–30 June 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 597–609. [Google Scholar]
Wang, S.; Yu, L.; Yang, X.; Fu, C.W.; Heng, P.A. Patch-Based Output Space Adversarial Learning for Joint Optic Disc and Cup Segmentation. IEEE Trans. Med. Imaging 2019, 38, 2485–2495. [Google Scholar] [CrossRef] [PubMed]
Dou, Q.; Ouyang, C.; Chen, C.; Chen, H.; Glocker, B.; Zhuang, X.; Heng, P.A. Pnp-adanet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation. IEEE Access 2019, 7, 99065–99076. [Google Scholar] [CrossRef]
Zhang, T.; Cheng, J.; Fu, H.; Gu, Z.; Xiao, Y.; Zhou, K.; Gao, S.; Zheng, R.; Liu, J. Noise Adaptation Generative Adversarial Network for Medical Image Analysis. IEEE Trans. Med. Imaging 2020, 39, 1149–1159. [Google Scholar] [CrossRef] [PubMed]
Ji, W.; Chung, A.C. Diffusion-based domain adaptation for medical image segmentation using stochastic step alignment. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; pp. 188–198. [Google Scholar]
Zhang, X.; Wu, Y.; Angelini, E.; Li, A.; Guo, J.; Rasmussen, J.M.; O’Connor, T.G.; Wadhwa, P.D.; Jackowski, A.P.; Li, H.; et al. MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Kim, Y.; Cho, D.; Han, K.; Panda, P.; Hong, S. Domain Adaptation Without Source Data. IEEE Trans. Artif. Intell. 2021, 2, 508–518. [Google Scholar] [CrossRef]
Li, X.; Chen, W.; Xie, D.; Yang, S.; Yuan, P.; Pu, S.; Zhuang, Y. A free lunch for unsupervised domain adaptive object detection without source data. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 8474–8481. [Google Scholar]
Vu, T.H.; Jain, H.; Bucher, M.; Cord, M.; Pérez, P. ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2512–2521. [Google Scholar]
Hou, Y.; Zheng, L. Source free domain adaptation with image translation. arXiv 2020, arXiv:2008.07514. [Google Scholar]
Li, R.; Jiao, Q.; Cao, W.; Wong, H.S.; Wu, S. Model Adaptation: Unsupervised Domain Adaptation Without Source Data. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9638–9647. [Google Scholar]
Iqbal, J.; Ali, M. Mlsl: Multi-level self-supervised learning for domain adaptation with spatially independent and semantically consistent labeling. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1864–1873. [Google Scholar]
Li, G.; Kang, G.; Liu, W.; Wei, Y.; Yang, Y. Content-consistent matching for domain adaptive semantic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 440–456. [Google Scholar]
Bateson, M.; Kervadec, H.; Dolz, J.; Lombaert, H.; Ayed, I.B. Source-free domain adaptation for image segmentation. Med. Image Anal. 2022, 82, 102617. [Google Scholar] [PubMed]
VS, V.; Valanarasu, J.M.J.; Patel, V.M. Target and task specific source-free domain adaptive image segmentation. arXiv 2022, arXiv:2203.15792. [Google Scholar]
Yang, C.; Guo, X.; Chen, Z.; Yuan, Y. Source free domain adaptation for medical image segmentation with fourier style mining. Med. Image Anal. 2022, 79, 102457. [Google Scholar] [CrossRef] [PubMed]
Lee, D.H.; Choi, S.; Kim, H.; Chung, S.Y. Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment. arXiv 2022, arXiv:2211.02284. [Google Scholar] [CrossRef]
Orlando, J.I.; Fu, H.; Breda, J.B.; Van Keer, K.; Bathula, D.R.; Diaz-Pinto, A.; Fang, R.; Heng, P.A.; Kim, J.; Lee, J.; et al. Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med. Image Anal. 2020, 59, 101570. [Google Scholar] [CrossRef] [PubMed]
Fumero, F.; Alayon, S.; Sanchez, J.L.; Sigut, J.; Gonzalez-Hernandez, M. RIM-ONE: An open retinal image database for optic nerve evaluation. In Proceedings of the 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), Bristol, UK, 27–30 June 2011; pp. 1–6. [Google Scholar]
Sivaswamy, J.; Krishnadas, S.; Chakravarty, A.; Joshi, G.; Ujjwal. A comprehensive retinal image dataset for the assessment of glaucoma from the optic nerve head analysis. JSM Biomed. Imaging Data Pap. 2015, 2, 1004. [Google Scholar]
Wang, S.; Yu, L.; Li, K.; Yang, X.; Fu, C.W.; Heng, P.A. Boundary and Entropy-driven Adversarial Learning for Fundus Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October 2019. [Google Scholar]
Bateson, M.; Kervadec, H.; Dolz, J.; Lombaert, H.; Ayed, I.B. Source-Relaxed Domain Adaptation for Image Segmentation. arXiv 2020, arXiv:2005.03697v1. [Google Scholar]
Karani, N.; Erdil, E.; Chaitanya, K.; Konukoglu, E. Test-time adaptable neural networks for robust medical image segmentation. Med. Image Anal. 2021, 68, 101907. [Google Scholar] [CrossRef] [PubMed]
Chen, C.; Liu, Q.; Jin, Y.; Dou, Q.; Heng, P.A. Source-Free Domain Adaptive Fundus Image Segmentation with Denoised Pseudo-Labeling. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021. [Google Scholar]
Tang, L.; Li, K.; He, C.; Zhang, Y.; Li, X. Source-Free Domain Adaptive Fundus Image Segmentation with Class-Balanced Mean Teacher. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2023, Vancouver, BC, Canada, 8–12 October 2023; Springer: Cham, Switzerland, 2023; pp. 684–694. [Google Scholar]
Luo, X.; Chen, W.; Liang, Z.; Yang, L.; Wang, S.; Li, C. Crots: Cross-Domain Teacher–Student Learning for Source-Free Domain Adaptive Semantic Segmentation. Int. J. Comput. Vis. 2024, 132, 20–39. [Google Scholar] [CrossRef]
Huai, Z.; Ding, X.; Li, Y.; Li, X. Context-aware pseudo-label refinement for source-free domain adaptive fundus image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer: Cham, Switzerland, 2023; pp. 618–628. [Google Scholar]
Li, L.; Zhou, Y.; Yang, G. Robust source-free domain adaptation for fundus image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 1–10 January 2024; pp. 7840–7849. [Google Scholar]

Figure 1. The framework of MIMPB. First, the model undergoes pretraining using labeled source domain data. During this phase, the labeled source domain data is fully utilized to enable the model to learn essential features and patterns. After pretraining, the pretrained model is used to initialize the target domain model. During the training phase in the target domain, the model is divided into a teacher model and a student model. Both models are initialized using the pretrained source model. However, their parameter update mechanisms differ during training. The teacher model does not update its parameters directly; instead, its parameters are updated using the exponential moving average of the student model’s training results. Furthermore, the algorithm optimizes the teacher model’s results through mutual information optimization, which guides the student network’s training in a more precise direction. Additionally, a prediction bank is utilized to address the imbalance between foreground and background data.

Figure 2. Sensitivity analysis of loss coefficient

α

. The blue stars indicate the Dice coefficient values corresponding to each

α

.

Figure 3. Visual comparison of optic disc and cup segmentation before and after domain adaptation.

Table 1. Performance comparison on RIM-ONE-r dataset.

Method	S-F	Optic Disc		Optic Cup		Average
Method	S-F	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓
RIM-ONE-r Dataset
W/o DA		83.18 ± 6.46	24.15 ± 15.58	74.51 ± 16.40	14.44 ± 11.27	78.85 ± 1.43	19.47 ± 13.43
Oracle		96.80	-	85.60	-	91.20	-
BEAL	×	89.80	-	81.00	-	85.40	-
AdvEnt	×	89.73 ± 3.66	9.84 ± 3.86	77.99 ± 21.08	7.57 ± 4.24	83.86 ± 12.37	8.71 ± 4.05
SRDA	✓	89.37 ± 2.70	9.91 ± 2.45	77.61 ± 13.58	10.15 ± 5.75	83.49 ± 6.93	10.03 ± 4.10
DAE	✓	89.08 ± 3.32	11.63 ± 6.84	79.01 ± 12.82	10.31 ± 8.45	84.05 ± 8.07	10.97 ± 7.65
DPL	✓	90.13 ± 3.06	9.43 ± 3.46	79.78 ± 11.05	9.01 ± 5.59	84.85 ± 7.06	9.22 ± 4.53
CBMT	✓	93.36 ± 4.07	6.20 ± 4.79	81.16 ± 14.71	8.37 ± 6.99	87.26 ± 9.39	7.29 ± 5.89
Crots	✓	92.93 ± 3.61	6.36 ± 3.38	80.19 ± 2.13	6.84 ± 4.57	86.56 ± 2.87	6.60 ± 3.98
CPR	✓	91.72 ± 7.37	6.80 ± 5.19	78.56 ± 2.03	7.65 ± 5.33	85.14 ± 4.70	7.22 ± 5.26
RDPL	✓	91.70 ± 3.88	7.62 ± 3.84	78.55 ± 2.20	7.48 ± 4.42	85.13 ± 3.04	7.55 ± 4.13
OURS	✓	93.73 ± 2.87	5.53 ± 2.41	81.86 ± 10.10	7.91 ± 3.78	87.80 ± 6.49	6.72 ± 3.10

Table 2. Performance comparison on Drishti-GS dataset.

Method	S-F	Optic Disc		Optic Cup		Average
Method	S-F	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓
Drishti-GS Dataset
W/o DA		93.84 ± 2.91	9.05 ± 7.50	83.36 ± 11.95	11.39 ± 6.30	86.60 ± 7.43	10.22 ± 6.90
Oracle		97.40	-	90.10	-	93.75	-
BEAL []	×	96.10	-	86.20	-	91.15	-
AdvEnt []	×	96.16 ± 1.65	4.36 ± 1.83	82.75 ± 11.08	11.36 ± 7.22	89.46 ± 6.37	7.86 ± 4.53
SRDA []	✓	96.22 ± 1.30	4.88 ± 3.47	80.67 ± 11.78	13.12 ± 6.48	88.45 ± 6.54	9.00 ± 4.98
DAE []	✓	94.04 ± 2.85	8.79 ± 7.45	83.11 ± 11.89	11.56 ± 6.32	88.58 ± 7.37	10.18 ± 6.89
DPL []	✓	96.39 ± 1.33	4.08 ± 1.49	83.53 ± 17.80	11.39 ± 10.18	89.96 ± 9.57	7.74 ± 5.84
CBMT []	✓	96.61 ± 1.45	3.85 ± 1.63	84.33 ± 11.70	10.30 ± 5.88	90.47 ± 6.58	7.08 ± 3.76
Crots []	✓	96.58 ± 1.78	3.90 ± 2.18	83.18 ± 11.36	11.22 ± 6.79	89.88 ± 6.57	7.56 ± 4.49
CPR []	✓	90.12 ± 2.42	10.63 ± 2.10	75.04 ± 12.98	16.69 ± 9.09	82.58 ± 7.70	13.66 ± 5.59
RDPL []	✓	96.64 ± 1.49	3.78 ± 1.61	84.33 ± 12.17	10.30 ± 5.98	90.49 ± 6.83	7.04 ± 3.80
OURS	✓	96.51 ± 1.18	3.90 ± 1.29	86.97 ± 11.88	8.62 ± 5.32	91.74 ± 6.53	6.26 ± 3.31

Table 3. Ablation study results on RIM-ONE-r and Drishti-GS.

MIM Module	PB Module	Optic Disc		Optic Cup		Average
MIM Module	PB Module	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓
RIM-ONE-r Dataset
✓		93.04 ± 3.13	6.02 ± 2.41	80.45 ± 14.64	7.70 ± 3.96	86.75 ± 8.89	6.87 ± 3.19
	✓	92.98 ± 3.19	6.20 ± 2.69	75.20 ± 21.63	8.92 ± 5.88	84.09 ± 12.41	7.56 ± 4.29
✓	✓	93.73 ± 2.87	5.53 ± 2.41	81.86 ± 10.10	7.91 ± 3.78	87.80 ± 6.49	6.72 ± 3.10
Drishti-GS Dataset
✓		86.89 ± 3.73	13.96 ± 2.34	77.73 ± 10.75	12.62 ± 6.11	82.31 ± 7.24	14.29 ± 4.23
	✓	89.16 ± 2.43	11.63 ± 2.17	67.62 ± 9.51	21.14 ± 7.71	78.39 ± 5.97	16.39 ± 4.94
✓	✓	96.51 ± 1.18	3.90 ± 1.29	86.97 ± 11.88	8.62 ± 5.32	91.74 ± 6.53	6.26 ± 3.31

Table 4. Segmentation performance with different data augmentation strategies.

Augmentation	Optic Disc		Optic Cup		Average
Augmentation	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓
Drishti-GS Dataset
Weak Augmentation	93.36 ± 1.46	7.65 ± 1.71	83.01 ± 11.82	11.20 ± 5.32	88.19 ± 6.64	9.42 ± 7.03
Strong Augmentation	92.95 ± 1.54	8.15 ± 1.79	84.88 ± 12.02	9.93 ± 5.22	88.92 ± 6.78	9.04 ± 3.51
Ours	96.51 ± 1.18	3.90 ± 1.29	86.97 ± 11.88	8.62 ± 5.32	91.74 ± 6.53	6.26 ± 3.31

Table 5. Teacher–student update coefficient sensitivity analysis.

Update Coefficient	Optic Disc		Optic Cup		Average
Update Coefficient	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓	Dice [%] ↑	ASD [Pixel] ↓
Drishti-GS Dataset
0.9	95.34 ± 1.55	5.23 ± 1.62	85.29 ± 11.95	9.79 ± 5.19	90.31 ± 6.75	7.51 ± 3.41
0.95	93.36 ± 1.70	7.75 ± 1.71	85.53 ± 11.70	9.50 ± 4.81	89.44 ± 6.70	8.50 ± 4.21
0.96	95.07 ± 1.43	5.58 ± 1.54	85.77 ± 13.51	9.42 ± 6.14	90.42 ± 7.47	7.48 ± 4.51
0.97	96.19 ± 1.51	4.28 ± 1.66	85.54 ± 13.77	9.59 ± 6.34	90.86 ± 7.64	6.94 ± 4.65
0.98	94.55 ± 1.64	6.26 ± 1.98	85.77 ± 13.95	9.39 ± 6.49	90.16 ± 7.80	7.82 ± 5.89
0.99	96.51 ± 1.18	3.90 ± 1.29	86.97 ± 11.88	8.62 ± 5.32	91.74 ± 6.53	6.26 ± 3.31

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Source-Free Domain Adaptation for Medical Image Segmentation via Mutual Information Maximization and Prediction Bank

Abstract

1. Introduction

3. Methdology

3.1. Framework of MIMPB

3.2. Teacher–Student Network

3.3. Mutual Information Optimization Algorithm

3.4. Prediction Bank

3.5. Objective Function

4. Experiments

4.1. Experiment Settings

4.2. Comparison with Other Methods

4.3. Ablation Study Results

4.4. Parameter Analysis

4.5. Visualization

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics