1. Introduction
In machine learning (ML) applications, such as object detection and image classification, a vast amount of labeled data is necessary to build a model for there to be a good achievement on test data that has the same distribution as the model training data. Several practical applications, such as the diagnosis of medical images [
1], require an enormous quantity of labeled data to be collected, and such efforts are time and moneyconsuming.
To handle this issue, the ML paradigm known as “Domain Adaptation” (DA) [
2,
3] transfers information from a source domain (SD) that has already been labeled to a target domain (TD) that has only recently been labeled by assuming that both domains have the same classification task (see
Figure 1). We focus on the UDA scenario in this study, where all target samples are taken to be unlabeled [
4]. UDA is used for predicting unlabeled data in a particular TD in a similar label space in which various divergences exist among two domains [
5,
6]. Typically, because of the distribution shifts, various models from the SD to the TD are unsuccessful. These issues are figured out by employing numerous methods that seek to match the specified source and target distribution and are used to perform several tasks in computer vision (CV). UDA is becoming more popular due to its applications in numerous CV fields, and different methods are being used to solve the issue [
7,
8,
9,
10]. During the time of the training process, there are various distributions for two domains, and UDA is concerned with situations in which both an unlabeled TD and a labeled SD are accessible [
11]. Several methods, including [
12,
13], suggest matching the two different distributions in a latent subspace to categorize the unlabeled target data. Reference [
12] suggested that, in order to minimize the distribution shift, samples from both domains should be projected onto a new subspace, while also including marginal and conditional distributions. Adaptively weighting the marginal and conditional distributions was suggested by J. Wang et al. [
14], based on [
12], to further reduce the distribution shift.
The current approaches are classified into four categories [
13]: (a) instancebased adaptation, which decreases domain discrepancy by adjusting sample weights in either the SD or both domains [
15], (b) feature representationbased adaptation, which creates feature representations to reduce either learning task inaccuracy, domain shift, or both [
13], (c) classifierbased adaptation [
16], which uses training data from both domains and develops a new model that lowers the generalization error within the TD and (d) hybrid knowledgebased adaptation, which offers several knowledge types, such as joint instance and feature representationbased adaption [
12], joint instance and classifierbased adaptation, or joint feature representation and classifierbased adaptation. Regardless of the classifier’s apparent high performance, the majority of the techniques covered above, including the support vector machine and the nearest neighbor classifier, instruct a straightforward classifier on the projected source data, provide each target sample with a pseudolabel, after that, utilise the capabilities of the classifier. The classification procedure, independent of the target data, may lead to a considerable distribution shift.
This study introduces a novel technique, known as the cluster matchingbased improved kernel Fisher criterion (CMIKFC). To reduce the distribution shift, CMIKFC suggests aligning the conditional and marginal distributions of the SD and TD using cluster matching and the Fisher criteria. Particularly, Kmeans clustering is utilised to cluster the samples in the latent subspace in both domains [
17]. After that, we provide pseudolabels to the target cluster by comparing the TD cluster centroid with the centroid of the SD class. This way, when giving pseudolabels, the distributions of both domains are considered. Additionally, we used the Improved Kernel Fisher Criterion (IKFC) to reduce intraclass discrepancy, while increasing interclass discrepancy, in both domains. This promotes cluster matching and the minimization of distribution shifts. In the training phase, the IKFC is introduced to extend the technique, so that the adapted image class transitions and the semantic structures are preserved. The highlights of our paper’s contributions are listed below:
The proposed method, CMIKFC, clusters the data from both domains using the KMeans technique in a learnt subspace.
The cluster centroid of both the SD and TD should be matched with one another in order to more accurately assign the target sample pseudolabels. In this manner, while assigning pseudolabels, the SD and TD distributions are considered.
In the training phase of cluster matching, the IKFC increases interclass discrepancy in both domains while decreasing intraclass discrepancy. The KFC is improved by utilizing normalized parameters.
Based on experimental results on three benchmark datasets, CMIKFC shows superior performance over stateoftheart UDA approaches.
The remaining sections of the paper are arranged as follows: In
Section 2, a literature review is described; In
Section 3, the proposed method is demonstrated; In
Section 4, the experimental results, which demonstrate that the proposed technique is both successful and efficient, are discussed; In
Section 5, the paper’s conclusion is presented.
2. Related Work
Domain adaptation (DA) employs labelled source data with a distribution different from the TD in an effort to improve target learning. Due to recent developments in CV, several techniques for UDA have been developed that employ deep learning (DL). Li et al. [
18] provided a brandnew UDA technique that attempts to produce target image–label pairings on the spot, and, moreover, creates semantic loss conditional on randomly selected labels. In addition, it uses an adversarial training approach in GAN for similar target styles. The labels on the output images must, specifically, match those on the input images. Due to the reliable targetstyle preparation data used during training, the model performs well in the TD. Our model performs better when dealing with problems having high domain disagreement because it prevents lead distribution alignment between two domains.
Das et al. [
19] suggested a form of UDA that takes into account the presence of the TD unlabeled statistics. The method focuses on connections between samples from SD and TD. To find the correlations, the source and target samples are both considered as graphs and paired using a convex criterion. The measures are classbased regularization and first and secondorder similarity among the graphs. Additionally, they created a convex optimization process that was computationally effective, making the suggested strategy generally applicable. Furthermore, it is indeed necessary to test conventional DA datasets for structured data.
Guan et al. [
20] proposed “CrossDomain Minimization with Deep Autoencoder” (CDMDA) for UDA. CDMDA implements a strategy for multitask learning in which, in a single architecture, CORALaligned sharable feature representations are utilized to simultaneously train the SD’s label prediction and input reconstruction on the TD. Additionally, the cluster assumption can be supported by attaching a label discriminator to the adversarial training procedure, which causes the projected target label distribution and the category distribution to appear to have different distributions. Several domain adaptations on visual, as well as on nonvisual, datasets demonstrated that the developed method consistently outperformed competing UDA methods. These proposed methods have to be applied to several other domainadaptable tasks, such as semantic segmentation and classification.
Tian et al. [
21] suggested a brandnew DA strategy that extensively investigates the TD data distribution structure and, particularly, treats samples that are part of a similar cluster as a whole, rather than as individuals in the TD. The strategy gives the target cluster pseudolabels through class matching of the centroid. A selflearning technique for local manifolds was also included in the presentation to fully leverage the target data’s different structural information and to adaptively capture the local connectedness the target samples naturally possessed. The objective function was to be solved with a theoretical convergence guarantee by a powerful iterative optimization technique. A more comprehensive design of semisupervised algorithms needs further investigation.
Some recent UDA techniques employ the pseudolabeling approach to take advantage of the semantic information of the target samples. Pseudo labels were used by Xie et al. [
22] to estimate class centroids for the TD and to compare them to those in the SD. A selftraining system, that alternatively executes model training and pseudo label refinement, was proposed by Zou et al. [
23]. Recent studies [
24,
25] have demonstrated the superiority of clusteringbased pseudolabeling approaches and have demonstrated how these approaches may be successfully applied to DA.
The Kernel Fisher discrimination analysis (KFDA) is a nonlinear technique for twoclass and multiclass problems with origins in FDA [
26]. KFDA works by transforming the lowdimensional sample space into a highdimensional feature space, where the FDA is then carried out. To make the computation simpler, the kernel matrix was replaced by its submatrix by [
27]. Liu et al. suggested a new KFDA criterion to maximize the uniformity of class–pair separabilities, which was assessed by the entropy of the normalized class–pair separabilities [
28]. One of the theoretical study fields that has received a lot of interest recently is optimal kernel selection. Based on FDA’s quadratic programming formulation, Fung et al. devised an iterative technique [
29]. By using secondorder cone programming, Khemchandani et al. looked at the issue of locating the datadependent “optimal” kernel function [
30]. In order to solve identification or classification issues, KFDA is increasingly being used in conjunction with strong nonlinear feature extraction abilities, and, as a result, is used extensively and successfully in numerous fields.
In this context, Deng et al. [
31] proposed a deep clustering method using a Fisherlike criteriabased loss to align the feature distributions of the SD and TD. However, they only used target clustering as an incremental strategy to increase explicit feature alignment, while our proposed method uses cluster matching to take into account both domains’ distributions and intradomain category information. Chang et al. [
32] proposed a discriminative feature learning method to estimate the interclass separability in UDA. They calculated interclass separability using the distances between pairs of class centers, whereas our proposed method utilizes preserved semantic structure and class transitions. Meng et al. [
33] proposed a method that utilizes pseudolabels and iterative clustering to incorporate label structure information in UDA. Their approach aims to improve the accuracy of the pseudolabels generated for the TD, while our proposed method focuses on generating an accurate pseudolabel in a latent discriminative subspace for each target sample. The majority of modern approaches use a method that explicitly aligns feature distributions between the two domains, while ignoring the target distribution and intradomain category information. The findings of published works first create a classifier on the labeled domain to generate pseudolabels for unlabeled samples and then compute the unlabeled class distribution using the pseudolabels, before aligning the distributions. Unfortunately, if the classifier ignores the unlabeled distribution, it may fail to learn the TD. In the prediction of the unlabeled domain distribution, mislabeled data results in large errors. In order to solve the issue, this research suggests a novel CMIKFC to produce a precise pseudolabel in a latent discriminative subspace for each target sample, while taking into account both domain distributions.
3. Proposed Method
We start with the basic definitions of the UDA problem before providing an overview and delving into the details of the proposed method.
3.1. Problem Statement
We start with the formal concept of DA [
6]. The terminology definitions and the notations used are listed in
Table 1 for clarification.
We focus on UDA for image classification when the SD has enough labeled data ${\mathcal{D}}_{s}=\{{\mathcal{X}}_{s},{Y}_{s}\}={\{{\mathbf{x}}_{\mathbf{i}},{\mathbf{y}}_{\mathbf{i}}\}}_{\mathbf{i}=\mathbf{1}}^{{\mathbf{n}}_{\mathbf{s}}}$, and unlabeled data in the TD ${\mathcal{D}}_{t}=\left\{{\mathcal{X}}_{t}\right\}={\left\{{\mathbf{x}}_{\mathbf{i}}\right\}}_{\mathbf{i}=\mathbf{1}}^{{\mathbf{n}}_{\mathbf{t}}}$, where ${\mathbf{x}}_{\mathbf{i}}\in {\mathbb{R}}^{dx1}$ denotes the feature vector, ${\mathbf{y}}_{\mathbf{i}}\in \{1,...,K\}$ refers to the sample label, having a total of K classes. It is assumed that the SD and TD features and label spaces are the same. The goal of DA is to locate a mapping h that will maximise the consistency between the mapped spaces of the SD data $h\left({\mathcal{X}}_{s}\right)$ and the TD data $h\left({\mathcal{X}}_{t}\right)$. As a result, the model is able to build a more efficient classifier $\mathit{f}(.)$ on the SD using labeled samples to anticipate the labels of the TD, i.e., ${\mathbf{x}}_{\mathbf{t}}\to {\mathbf{y}}_{\mathbf{t}},{\mathbf{y}}_{\mathbf{t}}\in {y}_{t}$.
3.2. CMIKFC
The proposed CMIKFC focuses on creating a discriminative subspace for reducing distributional changes between two domains. As seen in
Figure 2, there are two main components of our model, named the Kmeans algorithm and IKFC. In CMIKFC, first, we use the kmeans algorithm [
17] to cluster the data from the two domains in a learning subspace. After that, we apply pseudolabels to the target samples by comparing the clustered centroid of the TD with the clustered centroid of the SD. In this method, the distributions of the SD and TD are both considered when deciding on pseudolabels. Finally, the KFC is improved by utilizing normalized parameters and weighed schemes to rebuild the scatter matrices between different classes and within the same class. As a result, it has the capability to alter the function known as the kernel scatter difference discriminant. CMIKFC applies the IKFC constraint to increase interclass discrepancy in both domains, while decreasing intraclass discrepancy. This encourages distribution shift reduction and cluster matching.
3.3. Domain Clustering
The Kmeans technique is utilised to cluster both domain samples and attach associated labels. The number of clusters is equal to the number of core classes
K. The distance means for the
K classes in the data set are determined using the distance as the metric and the initial centroid to describe each class. The Euclidean distance is the similarity index for a specific data set X with n multidimensional data points and the class
K to be partitioned. The clustering objectives reduce the sum of the squares of the different data types [
34].
In this context, if a data point ${p}_{x}$ belongs to cluster K, then the value of ${w}_{ik}$ is set to 1; otherwise, it is set to 0. The variable ${m}_{z}$ represents the center of the cluster, i represents the total number of data points, K denotes the cluster number, and $({p}_{x}{m}_{z}){}^{2}$ represents the squared distance between a data point ${p}_{x}$ and the cluster center ${m}_{z}$.
It is a twostep minimization problem. We start by minimizing
e in relation to
${w}_{ik}$ and treating
${m}_{z}$ as fixed. Then, we minimize
e w.r.t.
${m}_{z}$ and regard
${w}_{ik}$ as fixed. Technically, we distinguish
e with respect to
${w}_{ik}$ first and then update cluster assignments. After that, we differentiate
e concerning
${m}_{z}$ and recompute the centroids based on the cluster assignments from the initial phase as follows:
It means that the data point
${p}_{x}$ should be assigned to the cluster that is the closest to it, based on its sum of squared distance from the centroid of the cluster, and the assignment should be made using the following formula:
This requires the centroids of all of the clusters to be recalculated so that they reflect the new assignments.
Initially, the cluster center is chosen at random, drawn from the sample set. Every sample point is grouped to form a cluster that represents the center point that is nearest to it. Then, all sample points are combined, and the core of each cluster is its own center. The preceding procedures are performed until the cluster’s center point is constant or until the specified number of iterations have been completed. The algorithm output fluctuates depending on the center point selected, causing instabilities. The K value chosen sets the center point, which serves as the algorithm’s target. The results of clustering directly affect aspects such as local or global optimality.
3.4. Cluster Matching
To calculate the UDA’s conditional distribution, each target sample is labeled along with having a pseudolabel. In contrast to traditional methods, to allocate the pseudolabels, a classifier that has been trained on the SD is employed. This task takes all these domain distributions into account. In such scenarios, it becomes possible to obtain pertinent information regarding the sample distribution structure of the target data. To accomplish this objective, there are several preexisting clustering algorithms that can be suitable options. Without sacrificing the generalizability of our approach, we chose to employ the wellestablished Kmeans algorithm in this study for the purpose of obtaining cluster prototypes. Therefore, the ensuing formula can be expressed as follows [
21]:
where
$\mathbf{H}\in {\mathbb{R}}^{m\times d}$ represent the projection matrix.
$\mathbf{F}\in {\mathbb{R}}^{d\times C}$ denotes the cluster centroids of the target data.
${\mathbf{L}}_{t}\in {\mathbb{R}}^{{n}_{t}\times C}$ is the cluster indicator matrix for the target data. This matrix is defined as
${\left({\mathbf{L}}_{t}\right)}_{ij}=1$ if the cluster label of
${\mathbf{x}}_{ti}$ is
j, and
${\left({\mathbf{L}}_{t}\right)}_{ij}=0$ otherwise.
Next, we conduct the computation of class centroids for the source data and class centroid matching of the SD and TD. When we have the cluster prototypes of the target data, we may reframe the distribution discrepancy reduction issue as the class centroid matching problem. This is possible once we receive the cluster prototypes. It is important to keep in mind that the class centroids of the source data may be precisely determined by computing the mean value of the sample features that belong to the same class. In this study, we used the straightforward and timesaving technique of searching for the closest neighbor to find a solution to the issue of matching class centroids. To be more specific, we looked for the class centroid that was geographically closest to each target cluster centroid, and we tried to find a way to reduce the total distance between each pair of class centroids. In conclusion, the formula for class centroid matching between two domains reads as follows [
21]:
where the matrix
${\mathbf{E}}_{s}\in {\mathbb{R}}^{{n}_{s}\times C}$ is a fixed matrix that is utilized to calculate the class centroids of the source data in the transformed space with each element
${\mathbf{E}}_{ij}=1/{n}_{s}^{j}$ if
${y}_{si}=j$, and
${\mathbf{E}}_{ij}=0$ otherwise.
3.5. KFDA
KFDA is an effective nonlinear FDA technique, in which the kernel function is applied to handle the issues of nonlinear optimization. The KFDA is widely used in several fields, due to its effectiveness. The numerical principle is evaluated as follows.
Suppose $S=\left\{{x}_{1},{x}_{2},\dots .,{x}_{N}\right\}$ is the dataset, which consists of K classes in a ddimensional real space ${R}^{d}$, where ${N}_{i}$ samples belong to the j th class, $\left(N={N}_{1}+{N}_{2}+\cdots +{N}_{k}\right)$.
FDA is employed for the purpose of finding the optimal projection vectors
w that minimize the withinclass scatter among different samples. The vector
$w\in {R}^{d}$ that optimizes the Fisher discriminant function provides FDA as
where
${M}_{w}$ represents the withinclass scatter matrix and
${M}_{b}$ represents the scatter matrix between classes. FDA primarily relies on linear techniques, posing a challenge when attempting to distinguish samples that exhibit nonlinear separability.
The utilization of a kernel trick in the KFDA leads to a notable enhancement in the classification capability of the FDA when dealing with samples that are nonlinearly separable. To accommodate nonlinear scenarios, the function
$\phi \left(x\right)$ transforms the samples from a lowerdimensional space into a higherdimensional feature space. Note that
$\varphi \left({\mu}_{i}^{j}\right)\left(i=1,2,\dots ,{N}_{j},j=1,2,\dots ,K\right)$ is the
i th projection value in the class
${\omega}_{j}$.
${m}^{\varphi}$ denotes the mean vector of the entire population, and
${m}_{j}^{\varphi}$ represents the mean vector of class
${\omega}_{j}$. In the feature space
F, we define the total scatter matrix
${M}_{t}$, the withinclass scatter matrix
${M}_{w}$, and the betweenclass scatter matrix
${M}_{b}$ as [
35]:
The development of KFDA may be traced back to the following improvement in the kernel Fisher criteria function:
where the various optimal projection vectors are represented by
v. Directly calculating the ideal discriminant vector
v is not feasible, due to the large dimension of feature space
F and the infinite dimension. Applying the kernel technique is one approach to resolve this issue, as is shown below [
36]:
Any solution
v must exist inside the feature space
F, as stated by the notion of reproducing a kernel [
26], which spans
$\varphi \left({x}_{1}\right),\varphi \left(x\right),\dots ,\varphi \left({x}_{N}\right)$ as follows:
By projecting any test samples into
w in
F, the following equation is obtained:
The kernel betweenclass scatter matrix
${H}_{y}$ and withinclass scatter matrix
${H}_{u}$ in
F may be defined as follows [
37]:
We know that
${H}_{u}$,
${H}_{u}$,
${H}_{t}$ are in a semidefinite symmetrical matrix. The fisher criterion function in the feature space
F is defined as [
38]:
In accordance with the characteristics of the generalized Rayleigh quotient, the optimum solution vector
w may be found by increasing the value of the criteria function in (
19) until it is equal to the solution of the generalized characteristic equation in the following manner:
3.6. IKFC
The scatter difference discriminant function [
39] was constructed in this study in response to the illposed issue of the discriminant criteria function (
19):
The withinclass scatter matrix singular issue is substantially resolved by this approach.
The basic concept of the weighted kernel maximum scatter difference discriminant analysis is that a class is designated as a margin class if it is significantly distant from the center. In this instance, the margin class is distinguished from other classes using the best discriminant vectors produced by maximizing the kernel scatter difference criteria function, since the class variance is greatest in this direction. The class with a greater distance from the center plays a significant part in the process of maximizing the kernel scatter difference criteria function. These projection vectors not only cannot separate classes, other than the margin class, but also cause neighboring classes in the feature space to overlap. The betweenclass scatter matrix is redefined in response to this issue in the manner described below [
38]:
where
${d}_{i}=\sqrt{{\left(\varphi \left({\mu}_{i}\right)\varphi \left({\mu}_{0}\right)\right)}^{\mathrm{T}}\left(\varphi \left({\mu}_{i}\right)\varphi \left({\mu}_{0}\right)\right)}$;
${d}_{i}$ is the distance between ith class and center.
$\varpi (\xb7)$ represents the weighted function. In order to reduce the impact of margin classes, we employed a strategy where the larger value of
$\u2225{\mu}_{i}{\mu}_{0}\u2225$ was assigned a smaller weight. To achieve this, we defined a weighted function
$\varpi \left({d}_{i}\right)={d}_{i}^{3}$. Additionally, we assigned a weight
$\alpha =1/\left(1+{\sum}_{i=1}^{K}{\sum}_{j=1}^{{N}_{i}}{d}_{j}^{i}\right)$ to the withinclass scatter matrix. Here,
${d}_{j}^{i}=\sqrt{{\left(\varphi \left({\xi}_{{x}_{j}}\right){\left({\mu}_{0}\right)}^{\varphi}\right)}^{\mathrm{T}}\left(\varphi \left({\xi}_{{x}_{j}}\right){\left({\mu}_{0}\right)}^{\varphi}\right)}$. By employing these methods, we obtained the following result.
The weight’s objective was to bring all of the training samples belonging to a certain class closer together to the class’s center, and so further reducing the amount of large overlapping occurring across classes. The following is a definition of the weighted kernel scatter difference discriminant function in the feature space:
The column vector
${\alpha}^{*}$ is of dimension
N. By applying the Regeneration Kernel Theory and Equations (
14)–(
22), we obtain the following result:
In the given context,
${d}_{i}^{*}=\sqrt{{\left({u}_{i}{u}_{0}\right)}^{T}\left({u}_{i}{u}_{0}\right)}.$${K}_{T},{K}_{W}$ and
${K}_{B}$ are referred to as the weighted kernel total scatter matrix, weighted kernel withinclass scatter matrix, and weighted kernel betweenclass scatter matrix, respectively. It is important to note that these matrices are semidefinite and symmetrical, with dimensions of
$N\times N$. Moving on, we can infer that Equation (
22) can be equivalently expressed in the following forms:
The expression
$\frac{{\alpha}^{*T}\left({K}_{B}{K}_{W}\right){\alpha}^{*}}{{\alpha}^{*T}{\alpha}^{*}}$ can be rewritten as
$\frac{{\alpha}^{*T}\left({K}_{B}{K}_{W}\right){\alpha}^{*}}{{\alpha}^{*T}I{\alpha}^{*}}$, where
$\mathbf{I}$ represents a unit matrix. Using the Expansion Rayleigh Quotient, we derive the following information.
The optimal solutions
${\alpha}^{*}$ correspond to the solutions of Equation (
26). By utilizing
${\alpha}^{*}$ and Equation (
26), we derive the nonlinear discriminant vectors in the feature space
$\mathrm{F}$. These vectors are denoted as
${v}^{*}$, which can be represented as
${v}^{*}=\left({v}_{1}^{*},{v}_{2}^{*},\cdots ,{v}_{k}^{*}\right)$, where
${v}_{i}^{*}={\sum}_{k=1}^{K1}{\alpha}_{k}^{*i}\varphi \left({x}_{k}\right),i=$$1,2,\cdots ,N$.
3.7. Selecting a Normalized Parameter
A normalized parameter was used because the characteristic equation is difficult to solve. If
${H}_{y}$ represents a nonsingular matrix, then
$\alpha $ is an optimal vector which is obtained by maximizing Equation (
19), which is comparable to the feature vector corresponding to the top
m greatest eigenvalues, as described in the article by [
40]. Equation (
20) may also be written as
where
${H}_{y}$ represents the singular matrix. It is not always possible to just use Equation (
32). Furthermore, employing a regularized method improves the stability of numerical methods.
where,
$\delta $ represents a small, positive number, and
$\mu $ indicates the identity matrix. Then, Equation (
20) can be written as
While KFDA is employed to solve applied problems, the parameter is determined based on experience or experimental results. In this, a normalized parameter was used and the value
${H}_{y}+\delta \mu $ regarded as a function of
$\delta $:
where, function f is stable, and the value of the function tends to zero in Equation (
36), the
$\delta \phantom{\rule{4pt}{0ex}}$ is the best classification parameter,
In Algorithm 1, the whole optimization of CMIKFC is presented.
Algorithm 1 CMIKFC 
 1:
Input: source data $\{{\mathcal{X}}_{s},{Y}_{s}\}$, target data $\left\{{\mathcal{X}}_{t}\right\}$, initial target label matrix ${\mathbf{L}}_{t}$, subspace dimensionality d = 100, maximum iteration number i = 10, parameters $\gamma $ = 5, $\alpha $ and $\beta $;  2:
Output: Target label matrix ${\mathbf{L}}_{t}$  3:
while converge condition not satisfied do  4:
Calculate the source class center with a focus on the source features  5:
Configure the source class center to create the target cluster center  6:
Discover the target samples’ pseudolabels using the Kmeans technique ( Section 3.4)  7:
Update $\mathbf{H}$ by Equation ( 4)  8:
Update ${\alpha}^{*}$ with Equation ( 26)  9:
Update $f\left(\delta \right)$ with Equation ( 35)  10:
end while  11:
Output: target label matrix ${\mathbf{L}}_{t}$

4. Experiments and Analysis
We evaluated the CMIKFC method using benchmarks, including Office31, ImageCLEF, and OfficeHome, for object classification. We compared CMIKFC with stateoftheart DA methods. To enable an accurate comparison, original or stateoftheart papers were used to obtain the results. The research findings showed how well the proposed CMIKFC handled the DA problem.
4.1. Benchmark Datasets
The research used three wellknown benchmark datasets: Office31, ImageCLEF, and OfficeHome. Each dataset is briefly described below, and
Table 2 provides a quick rundown of each dataset’s specifics.
The Office31 [
41], a frequently used evaluation dataset for visual DA, is made up of 4652 images and 31 classes from the following three different domains: images gathered from the (1) Amazon site (Amazon domain), (2) digital SLR photo (DSLR domain), and (3) web camera (Webcam domain) under various conditions, as shown in
Figure 3. The dataset’s distribution across domains is shown in
Table 2 with 2817 images belonging to the Amazon domain, 795 images to the Webcam domain, and 498 images to the DSLR domain.
The Office–Home [
42] poses significant challenges when utilizing it for the examination of the DA model. The 65 categories contain more than 15,500 images of things used every day in homes and offices. The four distinct domains are artistic images (Ar), clip art (Cl), product images (Pr), and realworld images (Rw), as shown in
Figure 4. Backgrounds and appearances in these photos are very different. It is more difficult to transfer data between domains because there are many more categories than in Office31. We examined each technique on each of the 12 adaptation tasks.
The ImageCLEF [
43] dataset has 31 categories and 3 domains, namely, Pascal (P), Caltech (C) and ImageNet (I), as shown in
Figure 5. We set up six DA tasks: I
$\to $C, I→P, P
$\to $I, C→I, P→C, C→P.
4.2. Implementation Details
The model was implemented using the PyTorch framework. In every experiment, the projection matrix dimension was always 100. Twenty iterations were the most that could be carried out. We used Kmeans algorithms for clustering, so the value of K was set according to the classes in the datasets. Specifically, the values of K on the Office31, OfficeHome and ImageCLEF datasets were set as 31, 65, and 31, respectively. Furthermore, we set the parameter $\delta \phantom{\rule{4pt}{0ex}}$= 0.2 analytically. The following section analyzes the impact of the tradeoff parameter.
4.3. Baseline Methods
To evaluate the effectiveness of our proposed strategy in an experimental setting, we looked at both conventional and stateoftheart DA methodologies, such as DAN [
44], DANN [
45], CyCADA [
46], JDDA [
47], CDAN [
48], HAFN [
15], SAFN [
15], DMP [
49], ADDA [
50], JAN [
51] and rRevGrad+ CAT [
31]. We evaluated the method with other domain adaptation techniques. For the purpose of classifying target samples, ResNet50 [
52] simply utilizes the classification model trained on the SD. Multimode structures are used by MADA [
53] to achieve proper alignment of different data distributions, based on numerous domain discriminators.
4.4. Results and Analysis
In order to demonstrate the efficacy of CMIKFC, this section presents the classification accuracy achieved on the benchmark datasets. Additionally, we conducted several experiments to investigate how and why our CMIKFC model successfully addressed domain adaptability. The bold values in the tables represent the highest accuracy achieved for the specific task, indicating superior performance.
The experimental data confirmed the model’s efficacy in this research. The classification results of our model on six UDA tasks from the Office31 dataset, are shown in
Table 3. The sourceonly model performed satisfactorily for D⇆W tasks because the domain gap was minimal. When compared to other models, our method’s accuracy improved more significantly for the challenging A⇆W and A⇆D adaptation tasks. For instance, our approach considerably outperformed all the current adaptation techniques on A→D, W→A, and D→A, especially when compared to CyCADA [
46], which had an equal number of classbased discriminators. Our suggested CMIKFC model outperformed other approaches in the majority of domain adaptation tasks and also improved average performance by 2.47%. In the six experiments, CMIKFC produced the four best results and none of the poorest results. This demonstrates how well Kmeans cluster matching and the IKFC work with UDA.
In this study, we describe how the use of IKFC enhanced the performance of our proposed approach, CMIKFC, in comparison to the base KFC. To demonstrate this improvement, we conducted an experiment where we compared the performance of our proposed approach, CMIKFC, with that of the KFCbased method without the IKFC extension. The results presented in
Table 3, demonstrate that the use of the IKFC criterion significantly enhanced performance by increasing interclass variability and decreasing intraclass variation. As a result, we observed a substantial improvement in the accuracy of pseudolabels for TD samples. These findings comprehensively explain how the IKFC criterion was integrated into the KFC to improve its effectiveness. Specifically, the inclusion of a normalized parameter enabled the IKFC criterion to solve the characteristic equation more accurately, leading to a significant improvement in the classification performance of our proposed approach.
The experiment’s results for the office–home dataset are presented in
Table 4. Compared to other methods, CMIKFC outperformed some traditional methods and performed better on 8 of the 12 crossdomain tasks. The average classification accuracy obtained by CMIKFC was 68.50%, and, compared with the baseline model, it displayed an improvement in accuracy of 1.3% on average, proving that the process of incremental hardening of prediction labels improves the discriminative information therein. Overall, the majority of tasks achieved excellent results.
On the ImageCLEF dataset,
Table 5 displays the experimental results of this method and the related comparison methods. In domain adaptation tasks, such as I→P, P→I, C→I, and P→C, CMIKFC achieved the four best accuracies and none of the six domain adaptation trials’ poorest performances. This demonstrates that CMIKFC produces greater improvements on challenging DA tasks. The proposed model achieved an average accuracy result of 89.8%, an improvement of 0.7% in comparison with the other baseline methods. We found, through our experiments on all three datasets, that good results with fewer classes were achieved, as seen with the average accuracy of Office31 and ImageCLEF’s being higher than Office–Home.
The proposed method is wellsuited for UDA in image classification tasks. However, it may not be applicable to other types of data. The performance of the method can be affected by highdimensional input data, and it may be necessary to employ feature selection or dimensionality reduction techniques. Additionally, the method requires similar feature distributions between the SD and TD, as large differences may compromise its performance. These limitations should be taken into consideration when applying the method. The information provided is helpful in understanding the restrictions associated with this approach.
4.5. Effectiveness of the Parameter Tuning
Finally, we examined the consequences of the parameter
$\delta \phantom{\rule{4pt}{0ex}}$. In order to take the parametric sensitivities into account, we chose 4 transfer tasks from each trial. as we tuned the parameter over the range of {0.1, 0.2, 0.3, 0.5, 0.8, 1.0}. In
Figure 6, the outcomes are displayed. It is clear that the results of several trials differed from one another. The performance levels in the Office31 and OfficeHome experiments rose gradually, peaked when the tradeoff parameter
$\delta \phantom{\rule{4pt}{0ex}}$ was around 0.2, and then declined as
$\delta \phantom{\rule{4pt}{0ex}}$ grew. The ImageCLEF experiment showed that it performed best when
$\delta \phantom{\rule{4pt}{0ex}}$ was around 0.3 and degraded as the parameter increased. This pattern showed that the ideal
$\delta \phantom{\rule{4pt}{0ex}}$ varied depending on the transfer task; for instance, the ideal
$\delta \phantom{\rule{4pt}{0ex}}$ for task AW from the Office31 experiment was 0.2, although it was around
$\delta \phantom{\rule{4pt}{0ex}}$ 0.3 for other tasks within the same experiment. We selected 0.2 for Office–Home and Office31 and 0.3 for the ImageCLEF experiment in our experimental settings because the parameters were suitable for most tasks.
5. Conclusions
The purpose of UDA is to reduce distribution discrepancy when data is being transferred data from a labeled SD to an unlabeled TD. The new CMIKFC approach proposed creates a proper pseudolabel for every target sample. Specifically, the samples are clustered by utilizing KMeans clustering in both domains. In the TD, the clusters are matched by using cluster matching, and then this is extended in the training phase by suggesting an IKFC. This ensures that the updated images’ semantic architectures and class transitions are preserved. Furthermore, because the characteristic equation is difficult to solve, to improve the KFC, a normalized parameter is utilized. In all domains, this CMIKFC minimizes intraclass variability while boosting interclass variants. The results from several experiments showed that, on a variety of image benchmark datasets, CMIKFC was superior to stateoftheart UDA methods.