Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation

Huang, Min; Xie, Zifeng; Sun, Bo; Wang, Ning

doi:10.3390/math13040579

Open AccessFeature PaperArticle

Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation

¹

School of Software Engineering, South China University of Technology (SCUT), Guangzhou 510006, China

²

Institute of International Services Outsourcing, Guangdong University of Foreign Studies, Guangzhou 510006, China

³

Operation and Maintenance Center of Information and Communication, CSG EHV Power Transmission Company, Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(4), 579; https://doi.org/10.3390/math13040579

Submission received: 26 December 2024 / Revised: 7 February 2025 / Accepted: 7 February 2025 / Published: 10 February 2025

(This article belongs to the Special Issue Artificial Intelligence and Artificial Neural Networks)

Download

Browse Figures

Versions Notes

Abstract

Multi-source domain adaptation (MSDA) plays an important role in industrial model generalization. Recent efforts regarding MSDA focus on enhancing multi-domain distributional alignment while omitting three issues, e.g., the class-level discrepancy quantification, the unavailability of noisy pseudo labels, and source transferability discrimination, potentially resulting in suboptimal adaption performance. Therefore, we address these issues by proposing a prototype aggregation method that models the discrepancy between source and target domains at the class and domain levels. Our method achieves domain adaptation based on a group of prototypes (i.e., representative feature embeddings). A similarity score-based strategy is designed to quantify the transferability of each domain. At the class level, our method quantifies class-specific cross-domain discrepancy according to reliable target pseudo labels. At the domain level, our method establishes distributional alignment between noisy pseudo-labeled target samples and the source domain prototypes. Therefore, adaptation at the class and domain levels establishes a complementary mechanism to obtain accurate predictions. The results on three standard benchmarks demonstrate that our method outperforms most state-of-the-art methods. In addition, we provide further elaboration of the proposed method in light of the interpretable results obtained from the analysis experiments.

Keywords:

multiple sources; domain adaptation; prototype learning; prototype aggregation

MSC:

68T07

1. Introduction

Model adaption to an unknown domain has always been a significant challenge in machine learning. Due to domain shifts [1,2], the trained model deployed on a new domain suffers significantly degraded performance.

As an effective unsupervised learning technique, domain adaptation has been spotlighted in the public’s attention. It has been used in various industries, such as aerospace [3,4] and biomedicine [5,6], where obtaining labeled data is challenging. Domain adaptation aims to apply trained models to target scenarios where the data distribution varies from the distribution of the training data. Most single-source domain adaptation (SDA) efforts assume all source data follow a sympathetic distributional pattern. Such efforts primarily focus on transferring knowledge from one labeled domain to another unlabeled domain. In many scenarios, industrial data are generally characterized as large-scale, originating from various sources, and lacking labeling. Acquiring more training data does not necessarily lead to better adaptation performance [7]. It is illogical to combine all labeled data into a single-source domain and then adopt the SDA method to establish distributional alignment.

After realizing these problems, researchers proposed multi-source domain adaptation (MSDA) to extend domain adaptation to multiple source scenarios. MSDA was developed to enhance multiple distributional alignments and to mitigate various domain shifts [1]. Although many efforts surround the MSDA topic, three primary issues commonly arise. First, most MSDA methods [8,9,10] focus only on quantifying the domain-level discrepancies between source and target domains while disregarding the class-level discrepancies. These domain-level methods randomly sample multiple batches of samples and then minimize the discrepancy between source and target samples. However, since the class proportions of source and target batches are varied, these methods can lead to alignment between samples belonging to different classes. Second, aligning the noisy pseudo-labeled target samples has been a long-standing problem for class-level MSDA methods [11,12]. On the one hand, incorrect pseudo labels can lead to negative transfer [13] for the class-level method [14,15]. On the other hand, excluding the noisy pseudo-labeled target samples [16,17] would hinder the knowledge extraction of these samples. Third, many MSDA efforts [16,18,19] treat each source domain without discrimination, but the common knowledge of each source domain is diverse. As depicted in Figure 1, when French serves as the target domain

T

, the transferabilities of the source domain

S_{1}

(e.g., Spanish) and the source domain

S_{2}

(e.g., English) to target domains varies. For example, the Spanish word for “number” is more similar to the French word for “number” than the English word for “number”, while the English word for “genre” is more similar to the French word for “genre” than the Spanish word for “genre”. This example suggests that domain adaptation at the domain level is imperfect and that class feature selection is necessary for multi-source alignment.

To tackle these issues, this paper proposes a prototype aggregation method for multi-source domain adaptation (PAMDA). Due to the class disproportion of randomly sampled mini-batches, a group of source prototypes (i.e., representative features) is generated for domain alignment instead of source samples. Since the target label information is agnostic, the PAMDA model leverages the source supervise knowledge to assign pseudo labels to target samples. Nevertheless, existing pseudo label strategies inevitably produce noisy pseudo labels, bringing negative transfer to class-level domain adaptation methods. Therefore, we consider modeling the domain discrepancies at the class and domain levels for diverse confidence pseudo-labeled target samples. A similarity score-based strategy is introduced to aggregate prototype features based on their similarities to the corresponding target prototypes. At the class level, we design a class–prototype aggregation discrepancy metric. The similarity score-based strategy allocates high weights to the source class prototypes that are similar to the corresponding target features, so these source class prototypes can dominate the feature alignment. Such a design is conducive to transferring source-supervised knowledge that is similar to the target features to the target domain. At the domain level, we design a domain–prototype aggregation discrepancy metric. Similarly, through the weight assignment of the strategy, the source domain prototypes similar to the noisy pseudo-labeled target features can dominate the feature alignment. This establishes a complementary mechanism where the class–prototype aggregation discrepancy metric and domain–prototype aggregation discrepancy metric complement each other. On the one hand, minimizing the class–prototype aggregation discrepancy metric can establish alignment for high-confidence pseudo-labeled target samples, thus facilitating the class discriminability of the model. On the other hand, minimizing the domain–prototype aggregation discrepancy metric can establish alignment for low-confidence pseudo-labeled target samples, thus facilitating the model to produce more high-confidence pseudo labels.

The contributions of PAMDA are summed up as follows:

We propose a prototype aggregation method to address three issues, e.g., the class-level discrepancy quantification, the unavailability of noisy pseudo labels, and source transferability discrimination. Furthermore, we first propose a complementary mechanism where the class-level and domain-level alignment methods can work together.
We propose a similarity score-based strategy to assess the transferability of source domains. Additionally, we design two prototype aggregation discrepancy metrics to quantify the cross-domain discrepancy at the class and domain levels.
Extensive experiments on three widely used benchmark datasets demonstrate competitive results, showing superior and effective performance compared to state-of-the-art methods.

2. Related Work

2.1. Single-Source Domain Adaptation

Over the past decade, SDA has been commended for its notable achievements. Previous SDA efforts can be categorized into three groups: discrepancy-based, adversarial-based, and reconstruction-based. Discrepancy-based efforts model discrepancies between source and target domains, followed by domain adaptation by minimizing inter-domain differences. The most widely used discrepancy metrics are maximum mean discrepancy (MMD) [20,21,22], Kullback–Leibler divergence (K-L divergence) [23,24], Jensen–Shannon divergence (J–S divergence) [25,26], etc. Adversarial-based efforts [27,28,29] obfuscate the discriminator’s discrimination of the domain to which the sample belongs by training the feature extractor. Conditional adversarial domain adaptation CDAN [27,29] algorithms adaptively fine-tune the adversarial network training based on the model prediction scores. Reconstruction-based efforts [30,31,32] reconstruct sample features through cross-domain projection functions for heterogeneous knowledge-sharing.

Although these SDA methods are excellent, they are unsuitable for multi-source scenarios. This paper proposes two prototype aggregation discrepancies to pursue multi-source domain adaptation at the class and domain levels.

2.2. Multi-Source Domain Adaptation

MSDA expands the scope of domain adaptation algorithms to encompass multi-source scenarios. Initial efforts [33,34,35] proved a theorem that the target domain can be modeled as a source domain combination with a confirmed upper bound on the error. Building upon the theorem, some efforts [36,37,38] attempt to combine the target-relevant source domains and filter out the irrelevant source samples. Nevertheless, these efforts can neither explore the class-specific semantic information nor make full use of source data. Some multi-model-based methods [8,9,18] establish the prediction alignment among multiple models. However, it is illogical to expect the models that have learned diverse source knowledge to produce identical predictions. Wang et al. [16] proposed a graph-structured method that forces the feature discrepancy between two arbitrary classes to be consistent across all domains. Two years later, they proposed another graphical method [17], i.e., Markov Random Field for MSDA (MRF-MSDA). Both graphical methods prioritize aligning reliable pseudo-labeled target samples yet filter out unreliable ones, causing the loss of source information.

Different from the existing methods, we propose a similarity score-based strategy to explore the transferability discrepancy of source domains at the class and domain levels. In addition, two similarity-based prototype aggregation discrepancy metrics are designed for various quality pseudo-labeled target samples.

3. Method

3.1. Problem Description

In the MSDA scenario, we suppose that there are N labeled source domains, where

S = {S_{j} | j = 1, 2, \dots, N}

, and one unlabeled target domain

T

. Each domain exhibits different data characteristics. The j-th source domain

S_{j} = {(X_{S_{j}}^{(i)}, Y_{S_{j}}^{(i)}) | i = 1, 2, \dots, n_{S_{j}}}

is a collection of

n_{S_{j}}

samples, where

Y_{S_{j}}^{(i)} \in \{1, 2, \dots, K\}

(K represents the number of classes) is the corresponding classification label of

X_{S_{j}}^{(i)}

. Simultaneously, the target domain

T = {X_{T}^{(i)} | i = 1, 2, \dots, n_{T}}

is a collection of

n_{T}

samples without observable labels.

In recent years, with the development of deep learning, most MSDA algorithms optimize their deep neural networks by minimizing the discrepancy between source and target samples in multiple randomly sampled batches. In this scenario, the target samples may be aligned with the source samples of different classes, resulting in misclassification. In addition, the similarities between different source domains and the target domain vary. How do we selectively extract public knowledge from different source domains? In the following subsections, we introduce our PAMDA method, which is designed to address these issues.

3.2. Overall Scheme

The PAMDA is proposed to break through the existing limitations by aggregating multiple source features at the class and domain levels. As illustrated in Figure 2, the PAMDA involves three stages to realize multi-source domain adaptation.

(1) Prototype generation (Section 3.3): The PAMDA model dynamically maintains a group of class prototypes. Specifically, the target samples are categorized into high-confidence and low-confidence subsets by a logistic adaptive threshold proposed in [39]. Subsequently, the reliable labeled samples (i.e., source samples and high-confidence pseudo-labeled target samples) are utilized to estimate the class cluster mean centroids. Finally, a momentum update strategy is adopted to stabilize the prototype generation process.

(2) Prototype aggregation (Section 3.4): Since the target pseudo labels are of diverse confidence, we design two similarity-based prototype aggregation discrepancy metrics for class-level and domain-level alignment. Specifically, a similarity score-based strategy is proposed to highlight relevant feature prototypes and subdue the less relevant ones. At the class level, we model the class cross-domain discrepancy as the class–prototype aggregation discrepancy metric according to the weights produced by the similarity score-based strategy. Without reliable pseudo labels, a domain–prototype aggregation discrepancy metric is designed to quantify the discrepancy between the target samples of the low-confidence subset and domain prototypes (i.e., the mean of all class prototypes in the same domain).

(3) Source knowledge learning (Section 3.5): To supervise source knowledge learning, we design two classification losses (i.e., source classification loss and prototype classification loss) to restrain the learning objective. The source classification loss is designed to supervise the knowledge learning for source samples. The class prototype is the representative embedding of the samples belonging to the same class, so supervised learning for the class prototype can produce more discernible decision boundaries. Therefore, we design the prototype classification loss to supervise the knowledge learning from the class prototypes.

3.3. Prototype Generation

The PAMDA model is constructed to be compatible with mini-batch gradient descent. In each iteration step, a randomly sampled mini-batch

{{\hat{S}}_{1}, {\hat{S}}_{2}, \dots, {\hat{S}}_{N}, \hat{T}}

is fed into a feature extractor

G (\cdot)

, which maps the samples into a one-dimensional latent embedding space. The batch size of each domain is denoted as m. To describe the class semantic information, the PAMDA model maintains a group of class-specific prototypes for each class. Specifically, the source cluster centroid

{\hat{b}}_{S_{j}}^{(k)}

is defined as the average of all embeddings belonging to the k-th class in

{\hat{S}}_{j}

.

\begin{matrix} {\hat{b}}_{S_{j}}^{(k)} = \frac{1}{| {\hat{S}}_{j}^{(k)} |} \sum_{X_{S_{j}} \in {\hat{S}}_{j}^{(k)}} G (X_{S_{j}}), \end{matrix}

(1)

where

{\hat{S}}_{j}^{(k)}

represents the cluster of samples belonging to the k-th class in

{\hat{S}}_{j}

,

| {\hat{S}}_{j}^{(k)} |

is the number of samples in

{\hat{S}}_{j}^{(k)}

, and

G (\cdot)

is the feature extraction backbone.

Since the target label information is agnostic, we introduce a confidence-aware scheme for the PAMDA to allocate pseudo labels to target samples. We can obtain the classification probabilities of the source and target samples by using

G (\cdot)

and the classifier

F (\cdot)

. Subsequently, we leverage the logistic adaptive threshold

γ

proposed in [39] for sample selection.

γ

is defined as

γ = \frac{1}{1 + e^{- 3 C}}

, where C is the classification accuracy of the source samples.

We can obtain the pseudo labels of target samples and the corresponding confidence of these pseudo labels as

{\hat{Y}}_{T}^{(i)} = \arg \max_{k} p (k | X_{T}^{(i)})

and

f_{T}^{(i)} = \max_{k} p (k | X_{T}^{(i)})

, respectively, where

p (k | x)

is the probability that x belongs to the k-th class.

According to

γ

and

f_{T}^{(i)}

,

\hat{T}

can be divided into a high-confidence subset

{\hat{T}}_{1}

and a low-confidence subset

{\hat{T}}_{2}

. Specifically, if

f_{T}^{(i)} > γ

, the corresponding target sample is allocated to

{\hat{T}}_{1}

. If

f_{T}^{(i)} < γ

, the corresponding target sample is allocated to

{\hat{T}}_{2}

. To avoid negative transfer from noisy pseudo labels, the target cluster centroid

{\hat{b}}_{T}^{(k)}

is reformulated as the average of all embeddings belonging to the k-th class in

{\hat{T}}_{1}

.

\begin{matrix} {\hat{b}}_{T}^{(k)} = \frac{1}{| {\hat{T}}_{1}^{(k)} |} \sum_{X_{T} \in {\hat{T}}_{1}^{(k)}} G (X_{T}), \end{matrix}

(2)

where

{\hat{T}}_{1}^{(k)}

represents the cluster of samples belonging to the k-th class in

{\hat{T}}_{1}

, and

| {\hat{T}}_{1}^{(k)} |

is the number of samples in

{\hat{T}}_{1}^{(k)}

.

The PAMDA model is established on multiple mini-batches of samples. The estimated biases of the cluster centroids in each iteration may vary significantly. Therefore, a momentum update strategy is adopted for prototype generation.

\begin{matrix} b_{S_{j}}^{(k)} \leftarrow η b_{S_{j}}^{(k)} + (1 - η) {\hat{b}}_{S_{j}}^{(k)}, j = 1, 2, \dots, N \end{matrix}

(3)

\begin{matrix} b_{T}^{(k)} \leftarrow η b_{T}^{(k)} + (1 - η) {\hat{b}}_{T}^{(k)}, \end{matrix}

(4)

where

η

is the momentum coefficient constant of 0.7 in all experiments. Analogous strategies have been widely adopted in [16,17] to stabilize the prototype generation process.

3.4. Prototype Aggregation

Due to the diverse quality of the target pseudo labels, we designed two prototype aggregation discrepancy metrics to quantify the discrepancy between source and target domains.

(1) Class–prototype aggregation: As mentioned in Section 1, the class features of different source domains significantly differ in transferability. Prioritizing the source class features compatible with the target domain for alignment is more conducive to transferring the source supervisory signal to the target domain. Therefore, we must design a class-prototype similarity weight with respect to each class k.

\begin{matrix} w_{S_{j}}^{(k)} = \frac{e x p (< b_{S_{j}}^{(k)}, {\hat{b}}_{T}^{(k)} > / τ_{c})}{\sum_{n = 1}^{N} e x p (< b_{S_{n}}^{(k)}, {\hat{b}}_{T}^{(k)} > / τ_{c})}, \end{matrix}

(5)

where

τ_{c}

is a class hyperparameter, and

< x, y > = \frac{x^{T} \cdot y}{| | x | | \cdot | | y | |}

.

Inspired by the maximum mean discrepancy [20], we designed a class-prototype aggregation discrepancy to establish category alignment for each class. The class–prototype aggregation discrepancy

D_{c}^{(k)}

is reformulated as follows:

\begin{matrix} D_{c}^{(k)} = {∥\overset{N}{\sum_{j = 1}} w_{S_{j}}^{(k)} ϕ (b_{S_{j}}^{(k)}) - \frac{1}{| {\hat{T}}_{1}^{(k)} |} \sum_{X_{T} \in {\hat{T}}_{1}^{(k)}} ϕ (G (X_{T}))∥}^{2}, \end{matrix}

(6)

where

ϕ (\cdot)

is the Gaussian kernel function, and

| | \cdot | |

is the Euclidean metrics. The overall class–prototype aggregation discrepancy

D_{c}

is denoted as follows:

\begin{matrix} D_{c} = \frac{1}{K} \overset{K}{\sum_{k = 1}} D_{c}^{(k)} . \end{matrix}

(7)

(2) Domain–prototype aggregation: With unreliable pseudo labels, the

{\hat{T}}_{2}

samples can introduce negative transfer to class-level alignment. Therefore, we must design domain–prototype aggregation discrepancy for these target samples. The domain-prototype similarity weight with respect to each source domain is reformulated as follows:

\begin{matrix} e_{S_{j}} = \frac{e x p (< v_{S_{j}}, v_{T} > / τ_{d})}{\sum_{n = 1}^{N} e x p (< v_{S_{n}}, v_{T} > / τ_{d})}, \end{matrix}

(8)

\begin{matrix} v_{S_{j}} = \frac{1}{K} \overset{K}{\sum_{k = 1}} b_{S_{j}}^{(k)}, v_{T} = \frac{1}{| {\hat{T}}_{2} |} \sum_{X_{T} \in {\hat{T}}_{2}} G (X_{T}), \end{matrix}

(9)

where

τ_{d}

is a domain hyperparameter,

v_{S_{j}}

is the source domain prototype of

S_{j}

, and

v_{T}

is the target domain prototype of

T

. The domain–prototype aggregation discrepancy

D_{d}

is defined as follows:

\begin{matrix} D_{d} = {∥\overset{N}{\sum_{j = 1}} \overset{K}{\sum_{k = 1}} e_{S_{j}} ϕ (b_{S_{j}}^{(k)}) - \frac{1}{| {\hat{T}}_{2} |} \sum_{X_{T} \in {\hat{T}}_{2}} ϕ (G (X_{T}))∥}^{2} . \end{matrix}

(10)

Therefore, the total prototype aggregation discrepancy can be formulated as follows:

\begin{matrix} D = D_{c} + D_{d} . \end{matrix}

(11)

3.5. Objective Construction

For the PAMDA model training, two optimization objectives were established. One objective was for the PAMDA model to learn supervised information from the source domain effectively. To achieve this, we designed a classification loss

L_{c l s}

to supervise the model training. The other objective was to transfer the supervised knowledge to the target domain. The total prototype aggregation discrepancy was minimized during model training to pursue this objective.

The classification loss

L_{c l s}

comprises two parts. One is the source classification loss

L_{c l s}^{s}

, which is designed to facilitate source-supervised knowledge learning. Another is the prototype classification loss

L_{c l s}^{p}

. Since the samples used for prototype generation are reliably labeled, the prototypes embody the most representative supervised knowledge. Each class prototype is labeled by its corresponding class. We define the classification loss as below:

\begin{matrix} L_{c l s} = L_{c l s}^{s} + L_{c l s}^{p}, \end{matrix}

(12)

\begin{matrix} L_{c l s}^{s} = \frac{1}{N} \overset{N}{\sum_{j = 1}} E_{(X_{S_{j}}, Y_{S_{j}}) \in {\hat{S}}_{j}} L_{c e} (F (G (X_{S_{j}}), Y_{S_{j}}), \end{matrix}

(13)

\begin{matrix} \begin{matrix} L_{c l s}^{p} = \frac{1}{N K} \overset{N}{\sum_{j = 1}} \overset{K}{\sum_{k = 1}} L_{c e} (F (b_{S_{j}}^{(k)}), k) + \frac{1}{K} \overset{K}{\sum_{k = 1}} L_{c e} (F (b_{T}^{(k)}), k), \end{matrix} \end{matrix}

(14)

where

L_{c e} (\cdot, \cdot)

is the cross-entropy loss function.

The overall optimization objectives are as follows:

\begin{matrix} \begin{matrix} \min_{G} L_{c l s} + α D, \min_{F} L_{c l s}, \end{matrix} \end{matrix}

(15)

where

α

is a trade-off parameter.

Algorithm 1 illustrates the processes of the PAMDA.

Algorithm 1: Algorithm of PAMDA

3.6. Theoretical Error Analysis

Before the experimental verification, we theoretically analyzed the target classification errors of the PAMDA algorithm. Suppose

P_{S}

and

P_{T}

represent the source and target distributions, respectively. Building upon Theorem 2 proposed by [33], the target error expectation can be reformulated as follows:

\begin{matrix} \begin{matrix} σ_{T} (h) & = \frac{1}{| T |} \sum_{(X_{T}, Y_{T}) \in T} 1 (Y_{T}, \arg \min_{k} p (k | X_{T})) \\ \leq σ_{S} (h) + \frac{1}{2} d_{H Δ H} (P_{S}, P_{T}) + β, \end{matrix} \end{matrix}

(16)

where

H

is a hypothesis space,

1

is an indicator function, h (

h \in H

) is a labeling function (i.e.,

G \circ F

), and

Y_{T}

is the true label of

X_{T}

.

σ_{S} (h)

is the source error, which is restricted by the source classification loss.

d_{H Δ H} (P_{S}, P_{T})

is the

H Δ H

-Divergence between the source and target domains, which is restricted by the class–prototype and domain–prototype aggregation discrepancy metrics.

β

is a common error, which is reformulated as

β = \min_{h \in H} ϵ_{T} (h, f_{T}) + \sum_{j = 1}^{N} ϵ_{S_{j}} (h, f_{S_{j}})

, where

f_{X}

is a true labeling function of domain

X

(

X \in {T, S_{1}, S_{2}, \dots, S_{N}}

), and

ϵ_{X}

is the labeling function discrepancy.

\sum_{j = 1}^{N} ϵ_{S_{j}} (h, f_{S_{j}})

is the source classification loss, which can be constrained in our optimization objectives (i.e., the objectives denoted in Equation (15)). Based on [11,33],

ϵ_{T} (h, f_{T})

can be reformulated as follows:

\begin{matrix} \begin{matrix} ϵ_{T} (h, f_{T}) \leq ϵ_{T} (h, {\hat{f}}_{T}) + ϵ_{T} (f_{T}, {\hat{f}}_{T}), \end{matrix} \end{matrix}

(17)

where

{\hat{f}}_{T}

is a pseudo-labeling function of

T

. In the PAMDA,

{\hat{f}}_{T} = h = G \circ F

. Therefore,

ϵ_{T} (h, {\hat{f}}_{T}) = 0

. Furthermore,

ϵ_{T} (f_{T}, {\hat{f}}_{T})

is constrained by the class–prototype and domain–prototype aggregation discrepancy metrics. On the one hand, the class–prototype aggregation discrepancy metric quantifies the discrepancies across domains at the class level. Minimizing the class–prototype aggregation discrepancy metric can enhance the model’s class discriminability for target samples, thus facilitating the PAMDA model to produce more high-quality pseudo labels. On the other hand, the domain–prototype aggregation discrepancy metric quantifies the discrepancies across domains at the domain level. Minimizing the domain–prototype aggregation discrepancy metric can establish the common domain–feature alignment, facilitating the PAMDA model to transfer source-supervised knowledge to unreliable pseudo-labeled target samples.

Therefore, when the PAMDA is optimized by the objectives in Equation (15), the bound of the target error is minimized entirely.

4. Experiments

4.1. Datasets

(1) Digit-5 is a large-scale collection of five digital datasets: MNIST (mt) [40], MNIST-M (mm) [41], USPS (up) [42], SynthDigits (syn) [41], and SVHN (sv) [43]. We adopted the same data preprocessing program of Ltc-MSDA [16] for our approach.

(2) Office_caltech_10 [44] is a public benchmark comprising four office-supply image datasets: Amazon (A), Webcam (W), DSLR (D), and Caltech (C). Office_caltech_10 includes 9000 images across ten classes.

(3) Office-31 [3] is also an office supply image benchmark, including three sub-datasets: Amazon (A), Webcam (W), and DSLR (D). Office-31 contains a total of 4110 images across 31 classes.

4.2. Comparison with Other Algorithms

To demonstrate the competitiveness of the PAMDA, we introduced the following three experimental setups to these algorithms for multiple-level comparisons. (1) Single Best: We present the best results of the SDA algorithms on all transfer tasks. (2) Source Combination: In this setup, all source domains are combined into one sizeable single-source domain. The transfer tasks are directly accomplished using the SDA algorithms, omitting the domain shifts between the source domains. (3) Multiple Source: Except for PAMDA, the results of the other MSDA algorithms are referenced from the literature where they were originally presented.

For the Single Best and the Source Combination setups, four representative SUDA algorithms (e.g., JAN [45], MCD [46], DAN [47], and ADDA [48]) were selected to compare with PAMDA. In addition, we introduced ten recent MSDA algorithms for the Multiple Source setup. These algorithms are DRT [49], MDAN [19], MLAN [8],

M^{3} S D A

[9], DCTN [50], MCD [46], MDDA [10], Ltc-MSDA [16], and MRF-MSDA [17]. Source Only is a target-agnostic model trained only with source data.

4.3. Experimental Setups

All domain adaptation experiments are compatible with mini-batch training and were constructed based on the Pytorch framework [51]. These experiments were implemented on the same device whose hardware configurations are Intel (R) Core (TM) i7-11700K @ 3.60 GHz with eight processors and one NVIDIA Corporation Device 2208 (RTX 3080 Ti) GPU (Operation and Maintenance Center of Information and Communication, CSG EHV Power Transmission Company, Guangzhou, China). Our PAMDA model comprised a classifier

F (\cdot)

and a feature extraction backbone

G (\cdot)

. We adopted two fully connected layers as the classifier for all experiments. On Digit-5, the feature extraction backbone is the 3 conv–2 fc network [9,16], whose initial model parameters were randomly initialized. On Office_caltech_10 [44], the feature extraction backbone is the ResNet101 network [52], which was pretrained on ImageNet. On Office-31 [3], the feature extraction backbone is the AlexNet [53] network [52], whose initial parameters were loaded from DCTN [50]. The maximum training rounds

M a x_R o u n d

were fixed to 200 to ensure the algorithms converged [9,16]. The hyperparameters

τ_{c}

and

τ_{d}

were set to 0.1 and 10, respectively. To mitigate the source noises during the early training, we slowly increased

α

from 0 to 1 through a transitional strategy [54]:

α = \frac{2}{1 + e x p (- 10 t / M a x_R o u n d)} - 1

, where t is the index of the current round. To ensure the result stability, we present the average result of five repititions. This repeated experiment mode has been widely adopted in [7,9,49].

4.4. Result

(1) Results on Digits-5: In Table 1, we present the comparative experiment results on the digit recognition tasks. Our PAMDA algorithm demonstrates the best mean classification accuracy of 94.2%, surpassing the state-of-the-art MSDA algorithm MRF-MSDA [17] by a 0.5% margin. Even on the hard-to-transfer “

\to m m

” task [49], the PAMDA algorithm achieved an absolute gain of 4.5% over the MRF-MSDA algorithm. These competitive results validate the efficacy of the PAMDA algorithm. PAMDA outperformed the state-of-the-art MSDA algorithms for two main reasons. First, PAMDA integrates category-specific knowledge from multiple source domains at a deeper level. Second, the alignment of noisy pseudo-labeled target samples provides more semantic feature information for the PAMDA model.

(2) Results on Office_caltech_10: The result of the PAMDA algorithm and other competitor algorithms are shown in Table 2. On this dataset, the PAMDA algorithm continued to demonstrate superior performance, achieving a 4.5% accuracy improvement compared to the Source Only model. We can observe that the PAMDA algorithm outperformed the state-of-the-art MSDA algorithm

M^{3} S D A

[9] by a 1.2% margin, which consistently demonstrates the effectiveness of the PAMDA algorithm.

(3) Results on Office-31: Table 3 shows that the PAMDA algorithm performed the best classification accuracy on three transfer tasks. Compared to the state-of-the-art algorithm MDDA [10], the PAMDA algorithm obtained average accuracy improvements of 0.2%. The favorable results on three datasets demonstrate the stable generalization of the PAMDA algorithm. On Office-31, all MSDA algorithms demonstrated comparable performance on three transfer tasks. We summarize two reasons for these results. First, the performance of the MSDA algorithms reached their limits on the “

\to D

” and the “

\to W

” tasks. All MSDA algorithms reported in Table 3 achieved a classification performance higher than 95% on these two tasks. Secondly, when the Amazon domain served as the target domain, two source domains (i.e., the Webcam domain and the DSLR domain) were very similar, indicating little domain gap between the two source domains. This phenomenon significantly limited the advantages of the MSDA algorithms, including our PAMDA algorithm.

4.5. Ablation Analysis

In this section, we conducted ablation analysis experiments to evaluate the efficacy of each component in our PAMDA model on Digits-5. From Table 4, we can summarize the following four points. First, class–prototype aggregation can effectively exploited the correlation of class-specific semantic features, thereby establishing a favorable basis for outstanding performance. Second, since the model (i.e., the model in the 3rd row) was only trained with the

{\hat{T}}_{2}

samples, the domain–prototype aggregation did not provide a significant performance gain for the model. Third, the supervised knowledge of the prototype is favorable to the class discriminability of the model. Finally, the class–prototype aggregation worked in conjunction with the domain–prototype aggregation, bringing further performance improvement.

4.6. Component Comparison

In this part, we further evaluate the class–prototype and domain–prototype aggregation discrepancy metrics. As illustrated in the 3rd row of Table 4, we evaluated

D_{d}

on Digits-5. However,

D_{d}

only measures the discrepancy between the

{\hat{T}}_{2}

samples and source domain prototypes. Therefore, we designed a new domain–prototype aggregation discrepancy

D_{d}^{'}

that is denoted in Equation (18). Class-Only is under the configuration in the 2nd row of Table 4. Domain-Only is a model optimized by Equation (19). In Table 5, the result reveals two key points. First, class–prototype aggregation provided performance improvement more than domain–prototype aggregation because it delved more deeply into the correlation of class-specific semantic features. Second, class–prototype aggregation in synergy with domain–prototype aggregation could further improve performance.

\begin{matrix} D_{d}^{'} = {∥\overset{N}{\sum_{j = 1}} \overset{K}{\sum_{k = 1}} e_{S_{j}} ϕ (b_{S_{j}}^{(k)}) - \frac{1}{| \hat{T} |} \sum_{X_{T} \in \hat{T}} ϕ (G (X_{T}))∥}^{2} . \end{matrix}

(18)

\begin{matrix} \begin{matrix} \min_{G} L_{c l s} + α D_{d}^{'}, \min_{F} L_{c l s} . \end{matrix} \end{matrix}

(19)

4.7. Hyperparameter Analysis

This part presents an experimental analysis of the hyperparameters

τ_{c}

(i.e., hyperparameter

τ_{c}

in Equation (5)) and

τ_{d}

(i.e., hyperparameter

τ_{d}

in Equation (8)). As presented in Figure 3, when

τ_{d}

was fixed as 10, the optimal performance was achieved at 0.1. As shown in Figure 3, when

τ_{c}

was fixed as 0.1 and

τ_{d}

was set to 10, the experiment performance reached the peak value. These results show that the PAMDA is not sensitive to

τ_{c}

and

τ_{d}

, which substantiates the resilience of our PAMDA algorithm.

4.8. Visualization

(1) Feature Visualization: Figure 4 depicts the feature distributions of Source Only and PAMDA on the “

\to m m

” task of Digits-5. We projected the feature embeddings into a two-dimensional latent space by t-SNE [55] and then visualized the feature space. As shown in Figure 4 (i.e., the scatter plot of Source Only), we can observe the cross-domain distributional discrepancy. In Figure 4 (i.e., the scatter plot of PAMDA), it can observed that the target features are better aligned with the source features. These results suggest that our PAMDA model demonstrates greater discriminability for target features compared to the Source Only model, substantiating the result presented in Table 1.

(2) Visualization of weight distribution: To validate the efficacy of the similarity score-based strategy, we visualized the class weight distributions on the “

\to m m

” task of Digits-5. For clarity, we only display two classes, i.e., 7 and 6. As illustrated in Figure 5, our PAMDA model assigned high weights to the source prototypes that exhibited structural similarities to the corresponding target prototypes. For example, the handwriting style of the number 6 on MNIST-M is more similar to that on MNIST and USPS, while it differs more from the handwriting style on SVHN and SynthDigits. The prototypes of the number 6 on MNIST and USPS were allocated high weights, while those of the number 6 on SVHN and SynthDigits were allocated low weights. These results demonstrate that our similarity score-based strategy is structurally selective for source prototypes.

4.9. Discussion

Based on the above experimental results, we can list the following insightful observations:

The PAMDA algorithm demonstrated superior performance on three benchmarks, which validates the efficacy of the PAMDA algorithm.
Ablation analytical results show that each component of the PAMDA model is positive for performance improvement.
Component comparison experimental results reveal two points. First, class–prototype aggregation explores the correlation of class-specific semantic features in greater depth than domain–prototype aggregation. However, the reliance on high-confidence pseudo labels leads to the misalignment of low-confidence pseudo-annotated target features. Second, class–prototype aggregation in synergy with domain–prototype aggregation can align both class features of high-confidence pseudo-labeled target samples and domain features of low-confidence pseudo-labeled target samples.
Hyperparameter analytical results reveal the resilience and stability of PAMDA for hyperparameter variation.
The visualization results demonstrate that our PAMDA model establishes a better distributional alignment compared to the Source Only model and that our similarity score-based strategy delves deeply into the class structural features.

Despite the demonstrated effectiveness of the PCMDA, two significant limitations remain evident and should not be disregarded. First, our prototype generation treats all source embeddings equally. Prioritizing the source embeddings that are similar to target class features can promote the distributional alignment. Second, the PAMDA does not semantically segment the samples (i.e., images) to filter out the background information. This background information can negatively affect domain alignment.

5. Conclusions

Previous MSDA efforts suffer from three issues: class-level discrepancy quantification, the unavailability of noisy pseudo labels, and source transferability discrimination. This paper proposes a prototype aggregation method (i.e., PAMDA) for these issues. The PAMDA algorithm is established on a group of prototypes. Since noisy pseudo labels are inevitable, we quantify the multiple domain discrepancies at the class and domain levels for diverse confidence pseudo-labeled target samples. Specifically, we propose a similarity score-based strategy to assess the source transferability at the class and domain levels. According to the weight produced by the similarity score-based strategy, we design class–prototype and domain–prototype aggregation discrepancy metrics for domain discrepancy quantification. As the prototype aggregation discrepancies are continuously minimized, the discriminability of the PAMDA model for target embeddings is constantly improved. Our PAMDA model achieved 94.2%, 97.3%, and 84.4% average accuracy on three popular public datasets of Digits-5, Office_caltech_10, and Office-31, respectively. In addition, further experiments demonstrate that the PAMDA model is competitively robust and exhibits stable generalization. In many scenarios, each domain may contain different categories. The sample proportions from each class may vary significantly. Developing our algorithm to address these issues is our future work.

Author Contributions

Conceptualization, M.H. and Z.X.; methodology, M.H., Z.X., N.W. and B.S.; software, Z.X.; validation, M.H. and Z.X.; writing—original draft preparation, Z.X., N.W. and B.S.; writing—review and editing, M.H.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Natural Science Foundation Project (Grant No. 2022A1515011370).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Ning Wang was employed by the CSG EHV Power Transmission Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MSDA	Multi-Source Domain Adaptation
SDA	Single-Source Domain Adaptation
MMD	Maximum Mean Discrepancy
KL	Kullback–Leibler
JS	Jensen–Shannon
MRF-MSDA	Markov Random Field for Multi-Source Domain Adaptation

References

Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
Quinonero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; Lawrence, N.D. Dataset Shift in Machine Learning; Mit Press: Cambridge, MA, USA, 2008. [Google Scholar]
Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; Proceedings, Part IV 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 213–226. [Google Scholar]
Ong, J.; Waisberg, E.; Masalkhi, M.; Kamran, S.A.; Lowry, K.; Sarker, P.; Zaman, N.; Paladugu, P.; Tavakkoli, A.; Lee, A.G. Artificial intelligence frameworks to detect and investigate the pathophysiology of spaceflight associated neuro-ocular syndrome (SANS). Brain Sci. 2023, 13, 1148. [Google Scholar] [CrossRef]
Kumari, S.; Singh, P. Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives. Comput. Biol. Med. 2024, 170, 107912. [Google Scholar] [CrossRef] [PubMed]
Buonocore, T.M.; Crema, C.; Redolfi, A.; Bellazzi, R.; Parimbelli, E. Localizing in-domain adaptation of transformer-based biomedical language models. J. Biomed. Inform. 2023, 144, 104431. [Google Scholar] [CrossRef] [PubMed]
Yang, L.; Balaji, Y.; Lim, S.N.; Shrivastava, A. Curriculum manager for source selection in multi-source domain adaptation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 608–624. [Google Scholar]
Xu, Y.; Kan, M.; Shan, S.; Chen, X. Mutual learning of joint and separate domain alignments for multi-source domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1890–1899. [Google Scholar]
Peng, X.; Bai, Q.; Xia, X.; Huang, Z.; Saenko, K.; Wang, B. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1406–1415. [Google Scholar]
Zhao, S.; Wang, G.; Zhang, S.; Gu, Y.; Li, Y.; Song, Z.; Xu, P.; Hu, R.; Chai, H.; Keutzer, K. Multi-source distilling domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–8 February 2020; Volume 34, pp. 12975–12983. [Google Scholar]
Zhou, L.; Li, N.; Ye, M.; Zhu, X.; Tang, S. Source-free domain adaptation with class prototype discovery. Pattern Recognit. 2024, 145, 109974. [Google Scholar] [CrossRef]
Zhou, C.; Wang, Z.; Du, B.; Luo, Y. Cycle Self-Refinement for Multi-Source Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 17096–17104. [Google Scholar]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
Cheng, Z.; Wang, S.; Yang, D.; Qi, J.; Xiao, M.; Yan, C. Deep joint semantic adaptation network for multi-source unsupervised domain adaptation. Pattern Recognit. 2024, 151, 110409. [Google Scholar] [CrossRef]
Li, Z.; Cai, R.; Chen, G.; Sun, B.; Hao, Z.; Zhang, K. Subspace identification for multi-source domain adaptation. Adv. Neural Inf. Process. Syst. 2024, 36. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/hash/6cb7246003d556c4d1cbf9c17c392ee3-Abstract-Conference.html (accessed on 8 January 2025).
Wang, H.; Xu, M.; Ni, B.; Zhang, W. Learning to combine: Knowledge aggregation for multi-source domain adaptation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 727–744. [Google Scholar]
Xu, M.; Wang, H.; Ni, B. Graphical modeling for multi-source domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 46, 1727–1741. [Google Scholar] [CrossRef] [PubMed]
Zhu, Y.; Zhuang, F.; Wang, D. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5989–5996. [Google Scholar]
Zhao, H.; Zhang, S.; Wu, G.; Moura, J.M.; Costeira, J.P.; Gordon, G.J. Adversarial multiple source domain adaptation. Adv. Neural Inf. Process. Syst. 2018, 31, 8568–8579. [Google Scholar]
Qian, Q.; Wang, Y.; Zhang, T.; Qin, Y. Maximum mean square discrepancy: A new discrepancy representation metric for mechanical fault transfer diagnosis. Knowl.-Based Syst. 2023, 276, 110748. [Google Scholar] [CrossRef]
Simon-Gabriel, C.J.; Barp, A.; Schölkopf, B.; Mackey, L. Metrizing weak convergence with maximum mean discrepancies. J. Mach. Learn. Res. 2023, 24, 1–20. [Google Scholar]
Ge, P.; Ren, C.X.; Xu, X.L.; Yan, H. Unsupervised domain adaptation via deep conditional adaptation network. Pattern Recognit. 2023, 134, 109088. [Google Scholar] [CrossRef]
Zhang, Y.; Pan, J.; Li, L.K.; Liu, W.; Chen, Z.; Liu, X.; Wang, J. On the properties of Kullback-Leibler divergence between multivariate Gaussian distributions. Adv. Neural Inf. Process. Syst. 2024, 36, 58152–58165. [Google Scholar]
Cui, J.; Tian, Z.; Zhong, Z.; Qi, X.; Yu, B.; Zhang, H. Decoupled kullback-leibler divergence loss. arXiv 2023, arXiv:2305.13948. [Google Scholar]
Chen, L.; Deng, Y.; Cheong, K.H. Permutation Jensen–Shannon divergence for random permutation set. Eng. Appl. Artif. Intell. 2023, 119, 105701. [Google Scholar] [CrossRef]
Meng, Z.; He, H.; Cao, W.; Li, J.; Cao, L.; Fan, J.; Zhu, M.; Fan, F. A novel generation network using feature fusion and guided adversarial learning for fault diagnosis of rotating machinery. Expert Syst. Appl. 2023, 234, 121058. [Google Scholar] [CrossRef]
Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
Luo, X.; Chen, W.; Liang, Z.; Li, C.; Tan, Y. Adversarial style discrepancy minimization for unsupervised domain adaptation. Neural Netw. 2023, 157, 216–225. [Google Scholar] [CrossRef] [PubMed]
Dayal, A.; Aishwarya, M.; Abhilash, S.; Mohan, C.K.; Kumar, A.; Cenkeramaddi, L.R. Adversarial unsupervised domain adaptation for hand gesture recognition using thermal images. IEEE Sens. J. 2023, 23, 3493–3504. [Google Scholar] [CrossRef]
Li, J.; Yu, Z.; Du, Z.; Zhu, L.; Shen, H.T. A comprehensive survey on source-free domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5743–5762. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Pan, Y.; Xia, Y. Reconstruction-driven dynamic refinement based unsupervised domain adaptation for joint optic disc and cup segmentation. IEEE J. Biomed. Health Inform. 2023, 27, 3537–3548. [Google Scholar] [CrossRef] [PubMed]
Zhu, W.; Shi, B.; Feng, Z.; Tang, J. An unsupervised domain adaptation method for intelligent bearing fault diagnosis based on signal reconstruction by cycle-consistent adversarial learning. IEEE Sens. J. 2023, 23, 18477–18485. [Google Scholar] [CrossRef]
Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J.W. A theory of learning from different domains. Mach. Learn. 2010, 79, 151–175. [Google Scholar] [CrossRef]
Crammer, K.; Kearns, M.; Wortman, J. Learning from Multiple Sources. J. Mach. Learn. Res. 2008, 9, 1757–1774. [Google Scholar]
Mansour, Y.; Mohri, M.; Rostamizadeh, A. Domain adaptation with multiple sources. Adv. Neural Inf. Process. Syst. 2008, 21, 1041–1048. [Google Scholar]
Wen, J.; Greiner, R.; Schuurmans, D. Domain aggregation networks for multi-source domain adaptation. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 10214–10224. [Google Scholar]
Shui, C.; Li, Z.; Li, J.; Gagné, C.; Ling, C.X.; Wang, B. Aggregating from multiple target-shifted sources. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 9638–9648. [Google Scholar]
Chen, Q.; Marchand, M. Algorithm-dependent bounds for representation learning of multi-source domain adaptation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Valencia, Spain, 25–27 April 2023; pp. 10368–10394. [Google Scholar]
Zhang, W.; Ouyang, W.; Li, W.; Xu, D. Collaborative and adversarial network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3801–3809. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
Yuval, N. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 12–17 December 2011. [Google Scholar]
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 2066–2073. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3723–3732. [Google Scholar]
Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, PMLR, San Diego, CA, USA, 9–12 May 2015; pp. 97–105. [Google Scholar]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
Li, Y.; Yuan, L.; Chen, Y.; Wang, P.; Vasconcelos, N. Dynamic transfer for multi-source domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 10998–11007. [Google Scholar]
Xu, R.; Chen, Z.; Zuo, W.; Yan, J.; Lin, L. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3964–3973. [Google Scholar]
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2016; pp. 770–778. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1180–1189. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Different transferability of source domains.

Figure 2. Overview of PAMDA. The PAMDA is established on a group of prototypes. (i) Prototype Generation: Each class prototype is generated according to reliable labels. (ii) Prototype Aggregation: We design two discrepancy metrics for diverse confidence pseudo-labeled target samples. At the class level, a class–prototype aggregation discrepancy is adopted for class alignment across multiple domains. At the domain level, we adopt a domain–prototype aggregation discrepancy for cross-domain alignment based on a group of domain prototypes (i.e., the mean of all class prototypes in the same domain). (iii) Source Knowledge Learning: Last but not least, we design a source classification loss and a prototype classification loss to drive the model to learn the supervised knowledge from the source data and class prototypes, respectively.

Figure 3. Hyperparameter analysis of

τ_{c}

and

τ_{d}

on Digits-5.

Figure 3. Hyperparameter analysis of

τ_{c}

and

τ_{d}

on Digits-5.

Figure 4. Feature distributions on the “

\to m m

” task of Digits-5.

Figure 4. Feature distributions on the “

\to m m

” task of Digits-5.

Figure 5. Class weight distributions on the “

\to m m

” task of Digits-5.

Figure 5. Class weight distributions on the “

\to m m

” task of Digits-5.

Table 1. Classification accuracy (mean ± std%) on Digits-5. The best results are marked in bold.

Standards	Models	$\to mt$	$\to mm$	$\to syn$	$\to sv$	$\to up$	Avg
	Source Only	97.2 ± 0.6	59.1 ± 0.6	84.6 ± 0.8	77.7 ± 0.7	84.7 ± 1.0	80.7
Single Best	ADDA [48]	97.9 ± 0.8	71.6 ± 0.5	86.5 ± 0.6	75.5 ± 0.5	92.8 ± 0.7	84.8
	DAN [47]	96.3 ± 0.5	63.8 ± 0.7	85.4 ± 0.8	62.5 ± 0.7	94.2 ± 0.9	80.4
	Source Only	90.2 ± 0.8	63.4 ± 0.8	82.4 ± 0.7	62.9 ± 0.9	88.8 ± 0.8	77.5
Source Combination	DAN [47]	97.5 ± 0.6	67.9 ± 0.8	86.9 ± 0.5	67.8 ± 0.6	93.5 ± 0.8	82.7
	MCD [46]	96.2 ± 0.8	72.5 ± 0.7	87.5 ± 0.7	78.9 ± 0.8	95.3 ± 0.7	86.1
	DCTN [50]	96.2 ± 0.8	70.5 ± 1.2	86.8 ± 0.8	77.6 ± 0.4	92.8 ± 0.3	84.8
	DRT [49]	99.3 ± 0.1	81.0 ± 0.3	93.8 ± 0.3	77.6 ± 0.4	98.4 ± 0.1	91.8
	$M^{3} S D A$ [9]	98.4 ± 0.7	72.8 ± 1.1	89.6 ± 0.6	81.3 ± 0.9	96.1 ± 0.8	87.7
Multiple Source	Ltc-MSDA [16]	99.0 ± 0.4	85.6 ± 0.8	93.0 ± 0.5	83.2 ± 0.6	98.3 ± 0.4	91.8
	MLAN [8]	98.6 ± 0.0	86.3 ± 0.3	93.0 ± 0.3	82.8 ± 0.1	97.5 ± 0.2	91.6
	MRF-MSDA [17]	99.2 ± 0.2	90.7 ± 0.7	94.7 ± 0.5	85.8 ± 0.7	98.5 ± 0.4	93.7
	PAMDA (ours)	99.1 ± 0.0	95.2 ± 0.3	95.3 ± 0.2	82.7 ± 0.4	98.8 ± 0.1	94.2

Table 2. Classification accuracy (mean ± std%) on Office_caltech_10. The best results are marked in bold.

Models	$\to W$	$\to D$	$\to C$	$\to A$	Avg
Source Only	99.0	98.3	87.8	86.1	92.8
DCTN [50]	99.4	99.0	90.2	92.7	95.3
MCD [46]	99.5	99.1	91.5	92.1	95.6
JAN [45]	99.4	99.4	91.2	91.8	95.5
$M^{3} S D A$ [9]	99.4	99.2	91.5	94.1	96.1
PAMDA (ours)	99.3 ± 0.3	100.0 ± 0.0	94.6 ± 0.1	95.2 ± 0.1	97.3

Table 3. Classification accuracy (mean ± std%) on Office-31. The best results are marked in bold.

Standards	Models	$\to W$	$\to D$	$\to A$	Avg
	Source Only	95.3	99.2	50.3	81.6
Single Best	ADDA [48]	95.3	99.4	54.6	83.1
	DAN [47]	96.0	99.0	54.0	83.0
	Source Only	93.2	97.7	51.6	80.8
	DAN [47]	96.2	98.8	54.9	83.3
Source Combination	JAN [45]	95.9	99.4	54.6	83.3
	MCD [46]	96.2	99.5	54.4	83.4
	ADDA [48]	96.0	99.2	55.9	83.7
	MDAN [19]	95.4	99.2	55.2	83.3
	$M^{3} S D A$ [9]	96.2	99.4	55.4	83.7
Multiple Source	MDDA [10]	97.1	99.2	56.2	84.2
	DCTN [50]	96.9	99.6	54.9	83.8
	PAMDA (ours)	97.2 ± 0.1	99.6 ± 0.0	56.5 ± 0.2	84.4

Table 4. Component analysis on Digits-5.

$L_{cls}^{s}$	$L_{cls}^{p}$	$D_{c}$	$D_{d}$	$\to mt$	$\to mm$	$\to syn$	$\to sv$	$\to up$	Avg
√	√			99.1	65.3	83.5	71.3	96.7	83.0
√	√	√		99.1	94.8	95.2	79.0	98.8	93.4
√	√		√	98.4	82.2	88.3	77.1	95.3	88.3
√		√	√	98.8	94.3	94.5	82.7	98.2	93.7
√	√	√	√	99.1	95.2	95.3	82.7	98.8	94.2

Table 5. Effect of the prototype aggregations on Digits-5.

Models	$\to mt$	$\to mm$	$\to syn$	$\to sv$	$\to up$	Avg
Class-Only	99.1	94.8	95.2	79.0	98.8	93.4
Domain-Only	98.9	92.1	93.6	81.1	96.6	92.5
PAMDA	99.1	95.2	95.3	82.7	98.8	94.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, M.; Xie, Z.; Sun, B.; Wang, N. Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation. Mathematics 2025, 13, 579. https://doi.org/10.3390/math13040579

AMA Style

Huang M, Xie Z, Sun B, Wang N. Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation. Mathematics. 2025; 13(4):579. https://doi.org/10.3390/math13040579

Chicago/Turabian Style

Huang, Min, Zifeng Xie, Bo Sun, and Ning Wang. 2025. "Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation" Mathematics 13, no. 4: 579. https://doi.org/10.3390/math13040579

APA Style

Huang, M., Xie, Z., Sun, B., & Wang, N. (2025). Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation. Mathematics, 13(4), 579. https://doi.org/10.3390/math13040579

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Unsupervised Domain Adaptation with Prototype Aggregation

Abstract

1. Introduction

2. Related Work

2.1. Single-Source Domain Adaptation

2.2. Multi-Source Domain Adaptation

3. Method

3.1. Problem Description

3.2. Overall Scheme

3.3. Prototype Generation

3.4. Prototype Aggregation

3.5. Objective Construction

3.6. Theoretical Error Analysis

4. Experiments

4.1. Datasets

4.2. Comparison with Other Algorithms

4.3. Experimental Setups

4.4. Result

4.5. Ablation Analysis

4.6. Component Comparison

4.7. Hyperparameter Analysis

4.8. Visualization

4.9. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI