Cross-Domain Residual Learning for Shared Representation Discovery

Zhao, Baoqi; Pan, Jie; Zhang, Zhijie; Yang, Fang

doi:10.3390/info16100852

Open AccessArticle

Cross-Domain Residual Learning for Shared Representation Discovery

¹

School of Artificial Intelligence, Ningbo Polytechnic University, Ningbo 315800, China

²

School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China

³

Ningbo University Health Science Center, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(10), 852; https://doi.org/10.3390/info16100852

Submission received: 11 June 2025 / Revised: 26 July 2025 / Accepted: 28 July 2025 / Published: 2 October 2025

(This article belongs to the Special Issue Machine Learning in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

In order to solve the problem of inconsistent data distribution in machine learning, domain adaptation based on feature representation methods extracts features from the source domain, and transfers them to the target domain for classification. The existing feature representation-based methods mainly solve the problem of inconsistent feature distribution between the source domain data and the target domain data, but only a few methods analyze the correlation of cross-domain features between the original space and shared latent space, which reduces the performance of domain adaptation. To this end, we propose a domain adaptation method with a residual module, the main ideas of which are as follows: (1) transfer the source domain data features to the target domain data through the shared latent space to achieve features sharing; (2) build a cross-domain residual learning model using the latent feature space as the residual connection of the original feature space, which improves the propagation efficiency of features; (3) use a regular feature space to sparse feature representation, which can improve the robustness of the model; and (4) give an optimization algorithm, and the experiments on the public visual datasets (Office31, Office-Caltech, Office-Home, PIE, MNIST-UPS, COIL20) results show that our method achieved 92.7% accuracy on Office-Caltech and 83.2% on PIE and achieved the highest recognition accuracy in three datasets, which verify the effectiveness of the method.

Keywords:

shared space; residual model; domain adaptation; visual recognition

1. Introduction

In machine learning, it is necessary to have complete and sufficient training data for parameter tuning to achieve good performance. Therefore, the amount of data and the distribution of data features become an important factor. Traditional machine learning assumes that the training data and test data satisfy the condition of being independent and identically distributed. Under this assumption, most models work well. However, in practical applications, it is usually difficult to obtain a large amount of annotated data for a specific task, and data annotation is time-consuming and laborious, which leads to insufficient data in machine learning models. Moreover, there is a problem of inconsistent distribution of data between different data domains, which is called sample selection bias [1] or covariance displacement [2,3]. The above issues will reduce the robustness and reliability of traditional machine learning models, leading to a decrease in the performance of the model.

To avoid repetitive data annotation and improve the performance of the model, currently, domain adaptation (DA) methods that can achieve knowledge transfer between different domains, solving the problems of different data distributions and lack of train data in different domains effectively [4,5,6]. Domain adaptation can be mainly divided into instance-based methods, feature representation-based methods, and classifier-based methods. This paper focuses on the feature representation-based methods, which assumes that the relevant domain data has shared implicit features [7]. That is, the marginal distribution between data in different domains is matched [8]. Then, feature space information is used to align the domain data in the feature space in a mapping way to find the shared features between different domains. Therefore, a reliable machine learning model can be constructed using source domain data with sufficient labels [9]. By transferring source and target domain data features, common features between different domain data are utilized to expand the features of target domain data, reducing the strict constraint requirements of the model on data and improving the performance of the model [10].

The initial domain adaptation methods based on feature representation use feature representation methods to transform source domain data and target domain data into latent space, and construct a classifier through the latent feature representation of the target domain. The principle of these methods is summarized in Figure 1a, which employs feature extraction to construct a feature invariant latent space and then design a classifier for the target domain [11]. Gheisari et al. [12] demonstrated that minimizing classification error while maximizing manifold consistency in a shared space led to improved classification accuracy on target domains compared to non-adaptive baselines. Zheng et al. [13] showed that finding a dimensionality reduction technique minimizing distribution distance in the latent space effectively enabled feature transfer, resulting in measurable performance gains. Blitzer et al. [14] established that modeling feature correlations across domains via structural correspondence learning identified pivot features crucial for cross-domain discrimination, enhancing adaptation robustness. Jiang et al. [15] extended this concept effectively to multi-view data, showing that incorporating latent space features across views significantly addressed the unique challenges posed by multi-perspective domain differences. Xu et al. [16] leveraged distributionally robust optimization for feature extraction under weak supervision, demonstrating enhanced robustness to distributional uncertainties and achieving competitive results.

Later methods incorporate the original space information, which can guide target domain classifier construction by combining it with latent space features. This joint information is then utilized to build the classifier [17,18], as shown in Figure 1b. Dong et al. [19] proved that embedding a shared low-dimensional latent space into an SVM (support vector machine) framework, constrained by source-target latent space alignment, substantially improved SVM performance in domain adaptation tasks compared to standard SVMs. Yao et al. [20] successfully expanded this concept to the more complex multi-source adaptation scenario, showing that constraining the predicted label matrix across sources further enhanced adaptation effectiveness and robustness. Zhang et al. [21] demonstrated that combining latent space features with target pseudo-labels effectively mined richer domain-invariant information, yielding state-of-the-art results on several benchmarks. However, the above methods do not consider the distribution of latent space features from the perspective of distribution consistency, and the differences in distribution will reduce the effectiveness of shared features, thereby reducing the performance of domain adaptation.

Inspired by Residual Network [22], we propose a shared latent space domain adaptation with residual model (LRDA) to make full use of the relationship between original feature and latent space feature in domains, and address the problem of mismatched distribution of latent space features in domain adaptation. Specifically, mapping the source and target domain data into a shared latent feature space, mining common features between source and target domains, to achieve the goal of fully utilizing the source domain data to enhance the feature expression of the target domain data. Subsequently, minimizing the distribution differences between the shared feature space of the source and target domain, ensuring latent space features have a more consistent feature distribution, and reducing model performance degradation caused by inconsistent feature distribution. Finally, by combining the original spatial features with the the latent space features in both source and target domain, a residual model is formed to reduce the difficulty of fitting the model and obtain a better classifier. Following the above description, the principle of LRDA is shown in Figure 1c.

The main contributions of this paper are:

1.: We introduce a shared latent space by source domain feature and target domain feature, constructing a latent space domain adaptation method. Measuring the differences in latent space feature distribution, further constraining the consistent distribution of shared feature spaces in the domain to align the feature distribution in the shared latent space.
2.: We build a residual model using the original feature space and latent feature space, optimizing residual function to reduce the difficulty of feature transfer and improve model performance.
3.: We adopt the $l_{2, 1}$ -norm for feature selection to sparsely represent the original feature, increasing the robustness of the model for outliers and noise naturally existing in dataset.
4.: Experiments verify that our method has better performance and can effectively recognize shallow and deep features, effectively improve the performance of cross-domain visual recognition tasks.

2. LRDA

2.1. Notation & Definition

We denote with small and capital letters respectively column vectors and matrices. We denote

X^{s} = [x_{1}^{s}, x_{2}^{s}, \dots, x_{n_{s}}^{s}] \in R^{d \times n_{s}}

is the matrix of source domain data

x_{i}^{s}

, where d means d-dimension feature and

n_{s}

is the number of source domain samples. We denote

Y^{s} = [y_{1}^{s}, y_{2}^{s}, \dots, y_{n_{s}}^{s}] \in R^{n_{s} \times c}

is source domain label matrix, where

y_{i}^{s} = {0, 1}^{1 \times c}

is one-hot label for source domain data, and c is the number of categories. Source domain is denoted as

D_{s} = \{(x_{i}^{s}, y_{i}^{s}) \in X^{s} \times Y^{s} : i = 1, 2, \dots, n_{s}\}

. In the same way, given

D_{t} = \{(x_{j}^{t}, y_{j}^{t}) \in X^{t} \times Y^{t} : j = 1, 2, \dots, n_{t}\}

as target domain,

X^{t} = [x_{1}^{t}, x_{2}^{t}, \dots, x_{n_{t}}^{t}] \in R^{d \times n_{t}}

and

Y^{t} = [y_{1}^{t}, y_{2}^{t}, \dots, y_{n_{t}}^{t}] \in R^{n_{t} \times c}

as target domain data matrix and target domain label matrix respectively. It is worth noting that the target domain data usually lacks complete label, but source and target domains share the same c-cardinality label set.

Assuming the distribution of the data is known, then the marginal probability distribution of the source domain data and the target domain data can be represented by

P (X^{s})

and

Q (X^{t})

respectively. The goal of domain adaptation is to design a robust model that can predict the labels of target domain data using samples from the source domain. As a matter of fact,

D_{s}

and

D_{t}

are neither identical nor uncorrelated, so their edge probability distributions are different. For domain adaptation problems, the source domain data and target domain data come from different domains. Therefore, it is assumed that

P (X^{s}) \neq Q (X^{t})

. However, the distribution of category conditions for the source and target domain data is consistent, i.e.,

P (X^{s} | Y^{s}) = Q (X^{t} | Y^{t})

.

2.2. Formulation of LRDA

We design an objective function for latent space residual domain adaptation, which can simultaneously achieve the following goals: (1) finding a shared latent space to ensure that there are more common feature between the source domain data and the target domain data; (2) regularization the models of source domain and target domain to reduce the complexity of model structure; and (3) constraining in the latent space, the distribution of the source and target domains is consistent. Due to the lack of sufficient label in the target domain, we denote a label matrix

F = [f_{1}^{t}, f_{2}^{t}, \dots, f_{n_{t}}^{t}] \in R^{n_{t} \times c}

corresponding to the target domain samples

X^{t}

. If the sample has a label, set

f_{i}^{t} = y_{i}^{t}

; otherwise

f_{i}^{t}

is a pseudo label. Therefore, we can express our model using the following objective functions:

\begin{matrix} R & = α (L (f_{s} (X^{s}), Y^{s}) + L (f_{t} (X^{t}), F)) \\ + β (Ω_{r} (f_{s}) + Ω_{r} (f_{t})) \\ + γ Ω_{d} (X^{s}, X^{t}), \end{matrix}

(1)

where

f_{s}

and

f_{t}

are source and target domain classifiers, respectively.

L (\cdot)

comprises regression losses in the source and target domains.

Ω_{r}

is the regularization term of the source domain and the target domain to reduce the generalization error of model.

Ω_{d}

is used to align the distribution of data between the source and target domains.

α

,

β

,

γ

are regularization hyper parameter to adjust the weight.

Definition 1

(Residual model). Assuming data is x, the objective function of the residual model can be expressed as

f (x) = x + F (x)

, where

F (x) = f (x) - x

is residual function. The structure of the residual model is shown in Figure 2a.

Ref. [22] proved that it is less difficult to optimize residuals

F (x)

than optimization functions

f (x)

. Therefore, the objective function can be transformed into the following: let model

F (x)

approximate the residual function

f (x) - x

and use

x + F (x)

to express the objective function.

Inspired by the residual model, we use the feature of the source and target domains to construct a shared latent feature space to constrain the classifier to integrate the common feature of the source and target domain. The latent feature space and the original feature space are constructed as a residual model, reducing the difficulty of fitting the model and improving the performance, as shown in Figure 2b. Specifically, the classifier of the shared feature space consists of two parts: the decision function of the original data and the decision function of the shared feature, which can be represented as follows:

\begin{matrix} f_{s} (X^{s}) = {W_{s}}^{T} X^{s} + V_{s}^{T} θ^{T} X^{s} \\ f_{t} (X^{t}) = {W_{t}}^{T} X^{t} + V_{t}^{T} θ^{T} X^{t}, \end{matrix}

(2)

where

W_{s}

and

W_{t}

are the classification model vectors of the source domain and the target domain in the original space.

θ \in R^{r \times d}

comprises shared latent spatial transformation matrices used to map the source domain and the target domain to the same feature space. r is the dimension of the shared feature space,

V_{s}

and

V_{t}

are the classification model vectors of

θ X^{s}

and

θ X^{t}

, respectively. The source domain and target domain information are shared by

θ

.

Then the regression loss of the residual model classifier is:

\begin{matrix} L (f_{s} (X^{s}), Y^{s}) = {∥{W_{s}}^{T} X^{s} + V_{s}^{T} θ^{T} X^{s} - Y^{s}∥}^{2} \\ L (f_{t} (X^{t}), F) = {∥{W_{t}}^{T} X^{t} + V_{t}^{T} θ^{T} X^{s} - F∥}^{2} . \end{matrix}

(3)

In order to improve the robustness of the model, we conduct sparse regression on the feature extracted from the source domain and the target domain [23]. We use

l_{2, 1}

-norms [24] to sparse the classification model vectors of the original space to obtain the function of

Ω_{r}

:

\begin{matrix} Ω_{r} (f_{s}) = {∥W_{s}∥}_{2, 1} + {∥W_{s} + θ V_{s}∥}^{2} \\ Ω_{r} (f_{t}) = {∥W_{t}∥}_{2, 1} + {∥W_{t} + θ V_{t}∥}^{2} . \end{matrix}

(4)

Due to the source and target domain commonly being

P (X^{s}) \neq Q (X^{t})

, that is, inconsistent distribution between source and target domain. If simply applying a source domain classifier to target domain, it may lead to deterioration of model performance due to misaligned domain distribution. How to minimize the distribution gap between different domains is a key issue for DA. Therefore, we use the MMD (maximum mean discrepancy) [25] to calculate the average vector distance of data in the reproducing kernel Hilbert space, so as to measure the distribution difference between different domains. Therefore, minimizing Equation (5) can align the feature distribution differences in the shared space, and the transformed source and target domains have common latent space distribution feature:

\begin{matrix} Ω_{d} (X^{s}, X^{t}) & = {∥\frac{1}{n_{s}} θ^{T} X^{s} - \frac{1}{n_{t}} θ^{T} X^{t}∥}^{2} \\ = t r (θ^{T} X D X^{T} θ) \\ s . t . θ^{T} θ & = I_{r \times r}, \end{matrix}

(5)

where

D_{i, j} = \{\begin{matrix} \begin{matrix} \frac{1}{n_{s}^{2}}, & w h e n & x_{i}, x_{j} \in X^{s} \end{matrix} \\ \begin{matrix} \frac{1}{n_{t}^{2}}, & w h e n & x_{i}, x_{j} \in X^{t} \end{matrix} \\ \frac{- 1}{n_{s} n_{t}}, o t h e r s \end{matrix}

,

X = [X^{s}, X^{t}]

, r is the dimension of the shared hidden space according to matrix theory. By minimizing Equation (5), the feature in latent space has a similarly distribution, also ensuring both discriminant and domain invariance of the model.

2.3. Final Formulation

In summary, we propose a latent space domain adaptation with residual model. The algorithm schematic is shown in Figure 1c. Combining Equations (3)–(5), the objective function is as follows:

\begin{matrix} R_{1} = & arg min_{W_{s}, W_{t}, V_{s}, V_{t}, F, θ} \\ + & α ({∥{X^{s}}^{T} W_{s} + X^{s T} V_{s} θ - Y^{s}∥}^{2} \\ + & {∥{X^{s}}^{T} W_{t} + X^{s T} V_{t} θ - F∥}^{2}) \\ + & β ({∥W_{s}∥}_{2, 1} + {∥W_{t}∥}_{2, 1} \\ + & {∥W_{s} + θ V_{s}∥}^{2} + {∥W_{t} + θ V_{t}∥}^{2}) \\ + & γ t r (θ^{T} X D X^{T} θ) . \\ s . t . θ^{T} θ = & I_{r \times r} \\ F F^{T} = & I_{r \times r} \end{matrix}

(6)

The objective function is solved by minimize the

W_{s}

,

W_{t}

,

V_{s}

,

V_{t}

, F and

θ

, and we apply these parameters to the decision functions

f_{s} (X^{s})

and

f_{t} (X^{t})

. Thus we fuse decision functions linearly as final decision functions:

\begin{matrix} m = max_{m} {(y_{i}^{t} = φ f^{s} (x_{i}^{t}) + (1 - φ) f^{t} (x_{i}^{t}))}_{m} . \end{matrix}

(7)

where

φ \in [0, 1]

is hyper parameter to adjust the weight between source and the target domain classifiers. To simplify, we set

φ = 0.5

.

For the convenience of subsequent solutions, this paper converts Equation (6) to

\begin{matrix} R_{2} = & arg min_{U_{s}, U_{t}, V_{s}, V_{t}, F, θ} \\ + & α ({∥{X^{s}}^{T} U_{s} - Y^{s}∥}^{2} + {∥{X^{t}}^{T} U_{t} - F∥}^{2}) \\ + & β ({∥U_{s}∥}^{2} + {∥U_{t}∥}^{2} \\ + & {∥U_{s} - θ V_{s}∥}_{2, 1} + {∥U_{t} - θ V_{t}∥}_{2, 1}) \\ + & γ t r (θ^{T} X D X^{T} θ), \end{matrix}

(8)

where

U_{s} = W_{s} + θ V_{s}

,

U_{t} = W_{t} + θ V_{t}

, are auxiliary variables for simplify the calculation.

3. Optimization Algorithm

To facilitate optimization, we refer to parameter alternation optimization methods in fuzzy clustering; we divide six variables into four groups, namely {

U_{s}

,

U_{t}

}, {

V_{s}

,

V_{t}

}, {F}, and {

θ

}. We iteratively optimize these sets of parameters until the variables converge or the objective function is less than the threshold. Next, we optimize each part of the functions in Equation (8).

3.1. Optimization Procedure

3.1.1. Optimize $V_{s}$ , $V_{t}$ by Fixing $U_{s}$ , $U_{t}$ , F and $θ$

When

V_{s} = θ^{T} U_{s}

and

V_{t} = θ^{T} U_{t}

, the objective function

R_{2}

can obtain the minimum value regarding the variable

V_{s}

and

V_{s}

.

Proof.

Assuming that the variables

U_{s}

,

U_{t}

, F and

θ

are known, Equation (8) can be transformed into:

\begin{matrix} R_{3} = arg min_{V_{s}, V_{t}} β ({∥U_{s} - θ V_{s}∥}_{2, 1} + {∥U_{t} - θ V_{t}∥}_{2, 1}) . \end{matrix}

(9)

Let:

\begin{matrix} L_{1} = arg min_{V_{s}} β t r ({(U_{s} - θ V_{s})}^{T} M_{s} (U_{s} - θ V_{s})), \\ L_{2} = arg min_{V_{t}} β t r ({(U_{t} - θ V_{t})}^{T} M_{t} (U_{t} - θ V_{t})) . \end{matrix}

(10)

In terms of the definition in Nie et al. [26], for matrix

A \in R^{n \times d}

, we can derive

{∥A∥}_{2, 1} = 2 t r (A^{T} M A)

, where

M_{i i} = \frac{1}{2 {∥A_{i, :}∥}_{2}}

. By solving the derivative of Equation (10) w.r.t.

V_{s}

,

V_{t}

and letting it equal to zero:

\begin{matrix} \frac{\partial L_{1}}{\partial V_{s}} = 0 \Rightarrow V_{s} = θ^{T} U_{s}, \\ \frac{\partial L_{2}}{\partial V_{t}} = 0 \Rightarrow V_{t} = θ^{T} U_{t} . \end{matrix}

(11)

□

3.1.2. Optimize $U_{s}$ , $U_{t}$ by Fixing $V_{s}$ , $V_{t}$ , F and $θ$

After obtaining the optimal value obtained from Section 3.1.1, when

U_{s} = {A_{s}}^{- 1} B_{s}

and

U_{t} = {A_{t}}^{- 1} B_{t}

, the objective function

R_{2}

can obtain the minimum value regarding the variable

U_{s}

and

U_{s}

.

Proof.

It is obvious that Equation (8) with fixed

V_{s}

,

V_{t}

, F and

θ

is equivalent to

\begin{matrix} R_{4} = & arg min_{U_{s}, U_{t}} \\ + & α ({∥{X^{s}}^{T} U_{s} - Y^{s}∥}^{2} + {∥{X^{t}}^{T} U_{t} - F∥}^{2}) \\ + & β ({∥U_{s}∥}^{2} + {∥U_{t}∥}^{2} \\ + & {∥U_{s} - θ V_{s}∥}_{2, 1} + {∥U_{t} - θ V_{t}∥}_{2, 1}) . \end{matrix}

(12)

Let

\begin{matrix} L_{3} = & arg min_{U_{s}} α ({∥{X^{s}}^{T} U_{s} - Y^{s}∥}^{2}) \\ + & β ({∥U_{s}∥}^{2} + {∥U_{s} - θ V_{s}∥}_{2, 1}), \\ L_{4} = & arg min_{U_{t}} α ({∥{X^{t}}^{T} U_{t} - F∥}^{2}) \\ + & β ({∥U_{t}∥}^{2} + {∥U_{t} - θ V_{t}∥}_{2, 1}) . \end{matrix}

(13)

by solving the derivative of Equation (13) w.r.t.

V_{s}

,

V_{t}

and letting it equal zero. We have

\begin{matrix} U_{s} = {A_{s}}^{- 1} B_{s}, \\ U_{t} = {A_{t}}^{- 1} B_{t} . \end{matrix}

(14)

where

A_{s} = α X^{s} {X^{s}}^{T} + β + β M_{s}

,

B_{s} = α X^{s} Y^{s} + β M_{s} θ V_{s}

,

A_{t} = α X^{t} {X^{t}}^{T} + β + β M_{t}

and

B_{t} = α X^{t} F + β M_{t} θ V

. □

3.1.3. Optimize F by Fixing $U_{s}$ , $U_{t}$ , $V_{s}$ , $V_{t}$ and $θ$

After obtaining the optimal values from Section 3.1.1 and Section 3.1.2, the objective function

R_{2}

can obtain the minimum value when

F = {X^{t}}^{T} U_{t}

Proof.

Assuming known variables

U_{s}

,

U_{t}

,

V_{s}

,

V_{t}

and

θ

, then Equation (8) can be simplified to be:

\begin{matrix} R_{5} = arg min_{F} α ({∥{X^{t}}^{T} U_{t} - F∥}^{2}) . \end{matrix}

(15)

By setting the derivative of Equation (14) w.r.t. F as 0, the prediction labels F are obtained by

F = {X^{t}}^{T} U_{t}

. □

3.1.4. Optimize $θ$ by Fixing $U_{s}$ , $U_{t}$ , $V_{s}$ , $V_{t}$ and F

When the variable

U_{s}

,

U_{t}

,

V_{s}

,

V_{t}

and F obtain the optimal solution obtained from Section 3.1.1, Section 3.1.2 and Section 3.1.3, the solution of

θ

can be transformed into a singular value decomposition (SVD) problem for the matrix O.

Proof.

Then with fixed

U_{s}

,

U_{t}

and F, we can get the optimal

θ

by solving

\begin{matrix} R_{6} = & arg min_{V_{s}, V_{t}, θ} \\ β ({∥U_{s} - θ V_{s}∥}_{2, 1} + {∥U_{t} - θ V_{t}∥}_{2, 1}) \\ + & γ t r (θ^{T} X D X^{T} θ) \end{matrix}

(16)

To solve the minimum value of Equation (16), let

V_{s} = θ^{T} U_{s}

,

V_{t} = θ^{T} U_{t}

and then Equation (16) can be rewritten as solving the generalized eigen-decomposition problem:

\begin{matrix} R_{7} = & arg min_{θ} 2 β t r (- θ^{T} U U^{T} N θ - θ^{T} N U U^{T} θ) \\ + & γ t r (θ^{T} X D X^{T} θ) \\ = & arg max_{θ} t r (θ^{T} O O^{T} θ), \\ s . t . θ^{T} θ = & I_{r \times r} \end{matrix}

(17)

where

U = [U_{s}, U_{t}]

. Let

O O^{T} = 2 β U U^{T} N + 2 β N U U^{T} - γ X D X^{T}

. According to matrix theory, the solution of the above equation can be obtained by singular value decomposition of the matrix O. Let

O = E Σ G^{T}

, by descending the diagonal elements in the diagonal matrix

Σ

, the first r rows of

E^{T}

are the optimal solutions for

θ

. □

3.2. Algorithm Description

According to the above optimization rules, the method adopts iterative learning strategy for parameter optimization to achieve parameter update and optimization. The algorithm description is shown in Algorithm 1.

Algorithm 1 Latent Space Domain Adaptation with Residual Model

Require: Source domain data and labels

X^{s} \times Y^{s}

, target source domain data and labels

X^{t} \times Y^{t}

, regularization hyper-parameter

α

,

β

,

γ

, shared feature dimension r, the maximal iteration number N.

1:: Set v = 0, set $U_{s}$ , $U_{t}$ , $V_{s}$ , $V_{t}$ , $θ$ as uniform distribution matrix in $U [0, 1]$ , set threshold $ε$ of objective function
2:: while $v > N$ or $R_{2} < ε$ do
3:: Compute $V_{s} = θ^{T} U_{s}$ and $V_{t} = θ^{T} U_{t}$
4:: Compute $A_{s} = α X^{s} {X^{s}}^{T} + β + β M_{s}$ and $B_{s} = α X^{s} Y^{s} + β M_{s} θ V_{s}$
5:: Compute $A_{t} = α X^{t} {X^{t}}^{T} + β + β M_{t}$ and $B_{t} = α X^{t} F + β M_{t} θ V_{t}$
6:: Compute $U = A^{- 1} B$ and get $U_{s}$ , $U_{t}$
7:: Compute $F = {X^{t}}^{T} U_{t}$
8:: Compute $O O^{T} = 2 β U U^{T} N + 2 β N U U^{T} - γ X D X^{T}$ , let $O = E Σ G^{T}$ , by descending the diagonal elements in the diagonal matrix $Σ$ , the first r rows of $E^{T}$ are the optimal solutions for $θ$ .
9:: Calculate the objective function $R_{2}$ according to Equation (8).
10:: Let $v = v + 1$
11:: end while

4. Experiment and Analysis

In order to verify the effectiveness of LRDA, we experiment on 6 visual databases as benchmark domain databases, and compare with existing related methods.

4.1. Datasets

Table 1 visually presents detailed information about the six datasets used in the experiment. Next, we will introduce each dataset.

1.: Office31 [27] contains images from Amazon (A), DSLR (D), and Webcam (W). There are 31 categories. Referring to [28], we use the characteristics of fine-tuning on AlexNet-FC7.
2.: Office-Caltech contains 10 categories of images between Caltech-256 (C) [29] and Office31, and each type contains more than 80 images. Among them, the number of images from sources C, A, W, and D are 1123, 958, 295, and 157. Office-Caltech provides SURF features and DeCAF features [30].
3.: Office-Home [31] includes the four following domains: art (sketching, painting, decoration, and other forms, A), clipart (collection of clipping images, Cl), products (object images without a background, Pr), and real-world (object images taken with ordinary cameras, RW). Each domain consists of 65 categories.
4.: PIE [32] involves the pose, illumination, and expressions of faces, including 68 people with 13 attitudes. Referring to [33], we select C05 (left), C07 (upward), C09 (down), C27 (front), and C29 (right).
5.: MNIST-UPS is composed of handwritten digital datasets MNIST [34] and UPS [35]. The experiments in this article are consistent with [35]; we select 2000 and 1800 images from MNIST and UPS, and the size of each image is 16 × 16.
6.: COIL20 [36] contains 20 objects, each rotating $360^{°}$ horizontally and capturing an image every $5^{°}$ , obtaining 72 images per object. According to the direction of the shooting target, the database is divided into two subsets, namely COIL1 and COIL2. Specifically, COIL1 includes all images captured along the [0; 85] and [180; 265] directions, while COIL2 includes the remaining directions.

4.2. Benchmark Methods & Experimental Settings

The proposed algorithm can be applied to both shallow and deep models, and it is a universal optimization paradigm. Therefore, classic traditional algorithms and domain adaptation methods with good performance in recent years have been selected for comparison to verify the effectiveness of the algorithm. Specifically, we compare LRDA with shallow and deep unsupervised domain adaptation methods. The shallow methods are: SVM, GFK [37], JDA [33], DIP [38], CDDA [39], JGSA [30], SA [40], CORAL [28], ATI [41], DICE [42]. The deep methods are: DDC [43], DAN [44], DANN [45], DRCN [46], RTN [47], WDAN [48], JAN [49], ADDA [50], AutoDIAL [47], CAN [51], SFDA [9], GVB-GD [52], GSDA [53], SRDC [54], and DGA-DA [55]. In addition, we rerun public code of JGSA [30] and CORAL [28] and replicate the code of DGA-DA. The original experimental results of other methods were obtained from corresponding papers. We run the CORAL source code in LIBSVM and selected the optimal variant DICESVM method for its performance.

Parameter Setting: In the method of LRDA, there are three hyper parameters need to be set, which are used to balance the importance of residual classifiers, feature selection, and feature distribution alignment. Therefore, these three parameters have a crucial impact on the final performance of the algorithm. How to set is still an unresolved issue in machine learning. Referring to the work of Tao et al. [56], we use grid search strategy to solve this problem. Specifically, we adjust hyper parameters within the range of grids {

10^{- 6}

,

10^{- 5}

,…,

10^{5}

,

10^{6}

} to find the parameter combination with the highest accuracy. The iteration limit was fixed at 100. At the same time, when the result of the objective function is less than the threshold

ε = 0.01

, the model training is completed and optimization is stopped. In order to achieve optimal results, all experimental algorithm parameters will be searched and adjusted.

Performance Evaluation: Due to target domain data without label, standard cross validation cannot be performed. To address this issue, we conduct p-fold cross validation on labeled source domain data, which trains using p-1 fold source domain data and target domain data, then calculates the average accuracy of all category data on 1-fold source domain data. The optimal parameters are obtained when the accuracy is highest. We repeated each experiment three times and calculated the average accuracy.

4.3. Results of Cross-Domain Recognition

In this section, we compare LRDA with other unsupervised adaptive peer methods. In the experimental results, the left side of → represents the source domain and the right side represents the target domain.

4.3.1. Results and on the Office31 Dataset

The comparison between LRDA and shallow domain adaptation method on the Office31 dataset is shown in Table 2. LRDA can basically achieve the optimal recognition accuracy in subtasks. The results of deep domain adaptation methods are shown in Table 3. It is worth noting that whether compared with shallow models or deep models, the recognition accuracies of LRDA in tasks A→D and W→D are slightly lower than comparison methods. The reason is LRDA needs a certain amount of target domain data to provide information for model decision-making, resulting in a greater preference for source domain data in the model. Overall, LRDA generally achieves good performance on both shallow and deep models, achieving 76.6% and 77.8% respectively, indicating its good applicability. In the comparison of deep models, LRDA does not have obvious advantages. However, deep models need a lot of extra training images and computing resource. This is not available in many practical applications and advanced computing platforms.

4.3.2. Results on the Office-Caltech Dataset

This section conducts cross-domain recognition experiments on Office-Caltech, with reference settings [8,9]. The results with SURF and DeCAF6 features are shown in Table 4 and Table 5. The LRDA achieves the highest recognition accuracy among 10 cross-domain recognition tasks with SURF features, only on C→D and W→D tasks are slightly lower than DICE and DGA-DA. In addition, LRDA achieves the highest average accuracy, demonstrating the effectiveness of the method. In experiments with DeCAF6 features, accuracy of LRDA is above 85% in all settings and achieves the best results in average accuracy particularly. Comparing with RTN based on the deep learning model AlexNet, LRDA achieves higher accuracy in all five experimental settings, with a higher accuracy of 5.4% in the D→C compared to RTN, resulting in significant performance improvement. This is due to the significant distribution differences between images of different concepts, it is difficult to align the fields. LRDA can further explore common features between different fields, thereby enhancing the ability of discriminators.

4.3.3. Results on the Office-Home Dataset

Table 6 shows the cross-domain recognition results on Office-Home dataset with ResNet101-P5 features. We fine-tune the parameters on the ResNet101 and obtain the results by inputting the extracted fifth feature pooling layer into the unsupervised domain for adaptive learning. Notice that LRDA achieves the best recognition performance in five domain recognition tasks, with an average recognition accuracy rate 0.2% lower than DICE. We consider that DICE excludes the features of different types of data while incorporating the distribution of similar data. Table 7 shows the cross-domain recognition results on Office-Home with VGG-F7 features. This experiment compares both shallow and deep models, and LRDA is the second-best performing method in terms of average accuracy, being only inferior to JAN. Counter-intuitively, LRDA achieves better result even than deep methods (e.g., DAN and DANN). In addition, the performance of GSDA, GVB-GD, SRDC, and SFDA methods is superior to that of LRDA methods in some tasks. The results indicate that above methods consider the differences between different categories of data or maintain the distribution structure of intra class data when extracting domain features, thereby improving the performance of the above methods. Nevertheless, LRDA still has good performance, especially achieving better performance than deep methods, indicating that shallow methods have advantages in domain adaptation tasks, reducing algorithm complexity and improving model efficiency while ensuring algorithm accuracy.

4.3.4. Results on the PIE Dataset

In the task of cross-domain facial recognition, this experiment uses two data preprocessing methods (

l_{2}

normalization and z-score standardization) [42]. The experimental settings divide into two groups, and the recognition accuracies are shown in Table 8 and Table 9. In the experiment using the

l_{2}

normalization (Table 8), we can clearly find that LRDA achieves the highest recognition accuracy in four tasks compared to shallow and deep methods. Moreover, LRDA has the highest average accuracy. In the experiment of z-score standardization, LRDA achieves recognition performance comparable to DICE and far superior to other recognition methods. Affected by the features in the background area in facial data, LRDA focuses on how to effectively convert features between the source and target domains, possibly aligning interference features in the background area. However, the recognition accuracy of LRDA on the PIE dataset is still comparable.

4.3.5. Results on the MNIST-USPS & COIL20 Dataset

In order to further demonstrate the advantages of the proposed method in cross-domain adaptation tasks of digital images, Table 10 shows the cross-domain recognition performance on the MNIST-UPS dataset and COIL20. As can be seen from the first three rows, LRDA achieves the highest average accuracy, as well as U→M task. Nevertheless, the performance of LRDA is slightly lower than that of DGA-DA and DICE on the M→U task, possibly due to the significant differences between these two fields. In the M→U task, the feature distribution is difficult to close, and the feature projection methods of DGA-DA and DICE are more suitable for this type of data. The comparison results in last three rows of Table 9 indicate that the performance of LRDA is close to 100%, only slightly lower than DGA-DA, with a recognition accuracy reduction of approximately 0.5%. This is due to the exclusion of heterogeneous data by the DGA-DA method. Overall, LRDA is still significantly superior to other methods.

4.4. Convergence Analysis

In this section, we explore the convergence of the algorithm. LRDA adapts iterative learning strategy for parameter optimization to achieve parameter update optimization. Experiments use a computer with a memory of 64GB and an Intel i5-9600k CPU, use MATLAB (R2024a) as experimental platform. Figure 3 shows the convergence curve of LRDA. Three sets of experiments use Amazon (A), DSLR (D) and Webcam (W) as the source and target domains in Office31 dataset. The result shows that after 40–50 iterations of optimization, the algorithm achieves convergence in all three sets of experiments, indicating that the objective function of the model can be minimized through parameter iterative learning. It proves that the LRDA method has good convergence and can be optimized through parameter iteration.

4.5. Complexity Comparison

The domain adaptation method requires the extraction of features from two data domains, which lead to an increase in computation time. Table 11 compares the computational time of 8 domain adaptation methods on the Office13 dataset, with the experimental setup being the same as Section 4.4. The experimental result indicates that because of LRDA’s need for iterative optimization, its time cost is relatively high, which also occurs in DICE, CDDA, and CORAL. Because of its closed form, the computation time of GFK, DIP, and SA are relatively low. JDA needs feature decomposition of data; its computation time is increasing with the increase in data volume. However, according to the experiment in Section 4.3, although LRDA requires a longer computation time, its performance is higher than other methods.

5. Conclusions

In this study, we explore the mismatched distribution of latent space features and neglect the association between the original space and latent space. In response to these problems, we propose a latent space domain adaptation with residual model. LRDA reduces the difference in feature distribution in the shared latent space to enhance the feature expression. The objective function constructs the original feature space and the shared feature space as residual model to facilitate parameter optimization, improve model performance. The experiments compare LRDA with the-state-of-art shallow and deep domain adaptation algorithms on six open datasets, verify the proposed method is comparable.

From a broader perspective of cross-domain learning, this study provides a new solution to the multi-domain adaptive problem. Although the current model is mainly aimed at two domain adaptive scenarios, its residual modeling idea can be extended to multi-domain transfer learning, which is expected to realize knowledge transfer from multiple source domains to target domains by constructing hierarchical residual connections or designing multi spatial alignment mechanisms. In addition, in multi-modal data scenarios (such as vision text cross-modal learning), this method can provide the possibility of knowledge transfer between heterogeneous modes through mode-specific feature space decoupling and shared potential space modeling. It should be noted that due to the need to calculate the inter domain decision function, this method increases the time cost of the algorithm, and the balance between accuracy and time consumption is still a contradiction to be solved. In follow-up studies, we will further examine (1) how to batch train adaptive models in the field of large-scale data to reduce the time cost; (2) explore the automatic optimization scheme of super parameter settings (such as the meta learning strategy) to reduce the burden of grid search; and (3) expand the universality of the model in multi-domain migration and multi-modal scenarios, as well as promote cross-domain learning to more complex practical applications.

Author Contributions

Conceptualization, B.Z. and J.P.; data curation, B.Z.; formal analysis, B.Z. and J.P.; funding acquisition, B.Z. and J.P.; investigation, B.Z. and F.Y.; methodology, B.Z. and F.Y.; project administration, B.Z. and J.P.; resources, J.P. and Z.Z.; software, B.Z. and Z.Z.; supervision, F.Y.; validation, J.P.; visualization, Z.Z. and J.P.; writing—original draft, B.Z.; writing—review and editing, B.Z., J.P. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the key projects of Ningbo Education Science Planning in 2025 no. 2025YZD023 and Ningbo Natural Science Foundation no. 2023J242.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study is openly available in Office31, Caltech-256, Office-Home, PIE, MNIST-UPS and COIL20, reference number [27,29,31,32,34,35,36].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, J.; Smola, A.; Gretton, A.; Borgwardt, K.; Scholkopf, B. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems 19: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Sugiyama, M.; Storkey, A.J. Mixture regression for covariate shift. In Advances in Neural Information Processing Systems 19: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Sugiyama, M.; Nakajima, S.; Kashima, H.; Buenau, P.; Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; Curran Associates, Inc.: Red Hook, NY, USA, 2008. [Google Scholar]
Ghifary, M.; Balduzzi, D.; Kleijn, W.B.; Zhang, M. Scatter component analysis: A unified framework for domain adaptation and domain generalization. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1414–1430. [Google Scholar] [CrossRef]
Evgeniou, T.; Micchelli, C.A.; Pontil, M.; Shawe-Taylor, J. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 2005, 4, 615–637. [Google Scholar]
Duan, L.; Tsang, I.W.; Xu, D. Domain transfer multiple kernellearning. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 465–479. [Google Scholar] [CrossRef]
Wang, S.; Wang, B.; Zhang, Z.; Heidari, A.A.; Chen, H. Class-aware sample reweighting optimal transport for multi-source domain adaptation. Neurocomputing 2024, 523, 213–223. [Google Scholar] [CrossRef]
Rostami, M.; Rostami, M.; Bose, D.; Narayanan, S.; Galstyan, A. Domain adaptation for sentiment analysis using robust internal representations. In Findings of the Association for Computational Linguistics: EMNLP; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 11484–11498. [Google Scholar]
Ding, N.; Xu, Y.; Tang, Y.; Xu, C.; Wang, Y.; Tao, D. Source-free domain adaptation via distribution estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7212–7222. [Google Scholar]
Ge, C.; Huang, R.; Xie, M.; Lai, Z.; Song, S.; Li, S.; Huang, G. Domain adaptation via prompt learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 1160–1170. [Google Scholar] [CrossRef] [PubMed]
Kay, J.; Haucke, T.; Stathatos, S.; Deng, S.; Young, E.; Perona, P.; Beery, S.; Van Horn, G. Align and distill: Unifying and improving domain adaptive object detection. arXiv 2024, arXiv:2403.12029. [Google Scholar] [CrossRef]
Gheisari, M.; Baghshah, M.S. Unsupervised domain adaptation via representation learning and adaptive classifier learning. Neurocomputing 2015, 165, 300–311. [Google Scholar] [CrossRef]
Zheng, V.W.; Pan, S.J.; Yang, Q.; Pan, J.J. Transferring Multi-device Localization Models using Latent Multi-task Learning. In Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago, IL, USA, 13–17 July 2008; pp. 1427–1432. [Google Scholar]
Blitzer, J.; McDonald, R.; Pereira, F. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 22–23 July 2006; pp. 120–128. [Google Scholar]
Jiang, L.; Hauptmann, A.G.; Xiang, G. Leveraging high-level and low-level features for multimedia event detection. In Proceedings of the 20th ACM international conference on Multimedia, Nara, Japan, 29 October–2 November 2012; pp. 449–458. [Google Scholar]
Xu, H.; Guo, H.; Yi, L.; Ling, C.; Wang, B.; Yi, G. Revisiting Source-Free Domain Adaptation: A New Perspective via Uncertainty Control. In Proceedings of the The Thirteenth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Tao, J.; Xu, H. Discovering domain-invariant subspace for depression recognition by jointly exploiting appearance and dynamics feature representations. IEEE Access 2019, 7, 186417–186436. [Google Scholar] [CrossRef]
Wu, Y.; Li, Z.; Wang, C.; Zheng, H.; Zhao, S.; Li, B.; Tao, D. Domain re-modulation for few-shot generative domain adaptation. Adv. Neural Inf. Process. Syst. 2024, 36, 57099–57124. [Google Scholar]
Aimei, D.; Shitong, W. A Shared Latent Subspace Transfer Learning Algorithm Using SVM. Acta Autom. Sin. 2014, 40, 2276–2287. [Google Scholar]
Yao, Z.; Tao, J. Multi-source adaptation multi-label classification framework via joint sparse feature selection and shared subspace learning. Comput. Eng. Appl. 2017, 53, 88–96. [Google Scholar]
Zhang, Y.; Tao, J.; Yan, L. Domain-Invariant Label Propagation With Adaptive Graph Regularization. IEEE Access 2024, 12, 190728–190745. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shi, X.; Guo, Z.; Lai, Z.; Yang, Y.; Bao, Z.; Zhang, D. A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans. Image Process. 2015, 24, 1341–1355. [Google Scholar] [CrossRef] [PubMed]
Ma, Z.; Nie, F.; Yang, Y.; Uijlings, J.R.R.; Sebe, N. Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans. Multimed. 2012, 14, 1021–1030. [Google Scholar] [CrossRef]
Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Bernhard, S.; Smola, A.J. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems 19: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
Nie, F.; Huang, H.; Cai, X.; Ding, C.H.Q. Efficient and Robust Feature Selection via Joint L2,1-Norms Minimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Vancouver, BC, USA, 6–9 December 2010; pp. 1813–1821. [Google Scholar]
Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Computer VisionECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010, Proceedings, Part IV 11; Springer: Berlin/Heidelberg, Germany, 2010; pp. 213–226. [Google Scholar]
Sun, B.; Feng, J.; Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset; California Institute of Technology: Pasadena, CA, USA, 2007. [Google Scholar]
Zhang, J.; Li, W.; Ogunbona, P. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1859–1867. [Google Scholar]
Venkateswara, H.; Eusebio, J.; Chakraborty, S.; Panchanathan, S. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5018–5027. [Google Scholar]
Sim, T.; Baker, S.; Bsat, M. The CMU pose, illumination and expression database. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1615–1618. [Google Scholar] [CrossRef]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
LeCun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
Nene, S.A.; Nayar, S.K.; Murase, H. Columbia Object Image Library (Coil-20). Technical Report CUCS-005-96. 1996. Available online: https://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php (accessed on 27 July 2025).
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 2066–2073. [Google Scholar]
Baktashmotlagh, M.; Harandi, M.T.; Lovell, B.C.; Salzmann, M. Unsupervised domain adaptation by domain invariant projection. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 769–776. [Google Scholar]
Luo, L.; Wang, X.; Hu, S.; Wang, C.; Tang, Y.; Chen, L. Close yet distinctive domain adaptation. arXiv 2017, arXiv:1704.04235. [Google Scholar] [CrossRef]
Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2014; pp. 2960–2967. [Google Scholar]
Busto, P.P.; Gall, J. Open set domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 754–763. [Google Scholar]
Liang, J.; He, R.; Sun, Z.; Tan, T. Aggregating randomized clustering-promoting invariant projections for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1027–1042. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar] [CrossRef]
Long, M.; Wang, J. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 97–105. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 2030–2096. [Google Scholar]
Ghifary, M.; Kleijn, W.B.; Zhang, M.; Balduzzi, D.; Li, W. Deep reconstruction-classification networks for unsupervised domain adaptation. In Computer VisionECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part IV 14; Springer: Cham, Switzerland, 2016; pp. 597–613. [Google Scholar]
Long, M.; Wang, J.; Jordan, M.I. Unsupervised domain adaptation with residual transfer networks. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Yan, H.; Ding, Y.; Li, P.; Wang, Q.; Xu, Y.; Zuo, W. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2272–2281. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
Kang, G.; Jiang, L.; Wei, Y.; Yang, Y.; Hauptmann, A. Contrastive adaptation network for single-and multisource domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1793–1804. [Google Scholar] [CrossRef] [PubMed]
Cui, S.; Wang, S.; Zhuo, J.; Su, C.; Tian, Q. Gradually vanishing bridge for adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12455–12464. [Google Scholar]
Hu, L.; Kan, M.; Shan, S.; Chen, X. Unsupervised domain adaptation with hierarchical gradient synchronization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4043–4052. [Google Scholar]
Tang, H.; Chen, K.; Jia, K. Unsupervised domain adaptation via structurally regularized deep clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8725–8735. [Google Scholar]
Luo, L.; Chen, L.; Hu, S.; Lu, Y.; Wang, X. Discriminative and geometry aware unsupervised domain adaptation. IEEE Trans. Cybern. 2020, 50, 3914–3927. [Google Scholar] [CrossRef]
Tao, J.; Dan, Y.; Zhou, D.; He, S. Robust latent multi-source adaptation for encephalogram based emotion recognition. Front. Neurosci. 2022, 16, 850–906. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Comparison diagram with the existing latent space domain adaptive algorithm. LRDA achieves feature reuse by sharing latent space transfer features, constructs a cross-domain residual learning model with latent space as residual connections to improve feature propagation efficiency, and introduces feature space normalization to generate sparse representations to enhance model robustness.

Figure 2. Schematic diagram of residual model. (a) Residual network, (b) latent space residual model.

Figure 3. Convergence curve of LRDA algorithm.

Table 1. Details of six domain adaptation datasets.

Database		Subset		Image Feature	Classes
Office31	Amazon	A	2817		31
	DSLR	D	498	AlexNet-FC7 (4096)
	Webcam	W	795
Office-Caltech	Amazon	A	958		10
	Caltech	C	1123	SURF (800)
	DSLR	D	157	DeCAF6 (4096)
	Webcam	W	295
Office-Home	Art	A	2421		65
	Clipart	Cl	4379	ResNet101-P5
	Product	Pr	4428	VGG-FC7
	Real-World	Rw	4357
PIE	C05	P1	3332		68
	C06	P2	1629
	C07	P3	1632	Pixel (1024)
	C08	P4	3329
	C09	P5	1632
MNIST-USPS	MNIST	M	2000	VGG-FC7	10
MNIST-USPS	USPS	U	1800	VGG-FC7	10
COIL20	COIL1	C1	720	VGG-FC7	20
COIL20	COIL2	C2	720	VGG-FC7	20

Table 2. Comparison of results with shallow DA methods on the Office31.

DATA	SVM	GFK	JDA	DIP	CDDA	JGSA	SA	CORAL	ATI	DICE	LRDA
A→D	57.8	58.2	66.5	56	64.1	67.5	59.4	60.4	70.3	68.5	67.8
A→W	56.9	59.4	68.8	51.9	65.2	62.3	57.7	57	68.7	72.5	74.5
D→A	47.2	45.9	56.3	44	55	55.6	47.2	47.6	55.3	58.1	59
D→W	95.8	95.6	97.7	95.3	97.2	98.1	95.1	96.2	95	97.2	98.4
W→A	45.5	43.8	53.5	42.3	53.8	52	46.5	46.3	56.9	60.3	60.8
W→D	98.6	98.6	99.6	98.8	99.8	99.8	99	99	98.7	100	98.9
Avg.	67.0	66.9	73.7	64.7	72.5	72.5	67.5	67.8	74.2	76.1	76.6

Table 3. Comparison of results with deep DA methods on the Office31.

DATA	DDC	DAN	DANN	DRCN	RTN	WDAN	JAN	ADDA	AutoDIAL	CAN	GVB-GD	GSDA	SRDC	SFDA	LRDA
A→D	64.4	67	72.3	66.8	71	64.5	71.8	72.4	73.6	74.6	75	74.8	75.8	76	74.4
A→W	61.8	68.5	73	68.7	73.3	66.8	74.9	75.1	75.5	79.3	74.8	75.7	75.7	74.2	76.8
D→A	52.1	54	53.4	56	50.5	53.8	58.3	58.8	58.1	60.1	53.4	53.5	56.7	56.6	58.7
D→W	95	96	96.4	96.4	96.8	95.9	96.6	97	96.6	97	98.7	99.1	99.2	98.5	99.1
W→A	52.2	53.1	51.2	54.9	51	52.7	55	57.3	59.4	58.7	53.7	54.9	57.1	55.5	58.8
W→D	98.5	99	99.2	99	99.6	98.7	99.5	99.6	99.5	97	100	100	100	99.8	99.5
Avg.	70.6	72.9	74.3	73.6	73.7	72.1	76	76.7	77.1	77.8	75.9	76.3	77.4	76.8	77.9

Table 4. Comparison of results on Office-Caltech with SURF feature.

DATA	GFK	JDA	DIP	OTGL	CDDA	JGSA	SA	CORAL	DICE	DGA-DA	LRDA
A→C	43.6	39.4	40	34.6	39.6	41.5	44.3	45.1	44.3	38.0	45.2
A→D	44.6	39.5	36.9	38.9	38.2	45.2	36.3	39.5	52.2	42.0	52.1
A→W	45.1	38.0	35.9	37	46.1	45.1	38.3	44.4	53.2	52.2	54.1
C→A	51.6	44.8	40.8	44.2	49.4	53.1	54.8	54.3	53.8	50.3	54.6
C→D	43.3	45.2	45.2	44.5	49.7	48.4	45.2	36.3	51.6	52.9	50.9
C→W	44.1	41.7	37.3	38.9	38.6	48.5	44.4	38.6	54.9	45.8	55.2
D→A	39.1	33.1	31.5	37.2	32.8	38.7	39.4	37.7	42.6	37.1	43.7
D→C	31.7	31.5	30.6	32.4	33.7	30.3	34.3	33.8	34.2	33.5	34.5
D→W	85.4	89.5	83.7	81.1	89.8	93.2	85.1	84.7	84.1	91.2	91.2
W→A	31.8	32.8	27.6	39.4	36.7	40.8	36.3	35.9	39.5	32.4	40.5
W→C	34.3	31.2	28.8	36.0	32	33.6	33.2	33.7	38.6	31.2	38.7
W→D	87.9	89.2	91.7	84.0	91.1	88.5	83.4	86.6	87.3	91.7	88.7
Avg.	48.5	46.3	44.2	45.7	48.1	50.6	47.9	47.6	53.0	49.8	54.1

Table 5. Comparison of results on Office-Caltech with DeCAF6 feature.

DATA	GFK	JDA	DIP	OTGL	CDDA	JGSA	DGA-DA	SA	CORAL	DICE	JDOT	ATI	RTN	LRDA
A→C	82.1	85	78.9	85.5	85.8	85	86.7	85	83.6	87.6	85.2	86.5	87.8	86.8
A→D	82.2	86.6	80.9	85	86.6	85.4	89.2	87.3	84.7	91.1	87.9	92.8	92.9	91.5
A→W	73.2	83.1	69.8	83.1	81.4	84.8	86.4	76.6	74.2	88.1	84.8	88.7	93.8	92.1
C→A	92.1	91	89.8	92.2	91.6	91.8	92.3	92.2	91.2	93.4	91.5	93.8	93.2	93.5
C→D	92.4	87.9	84.7	87.3	91.7	92.4	92.4	88.5	87.9	95.5	89.8	89.6	93.9	93.4
C→W	84.4	82.4	72.2	84.2	88.8	85.1	89.8	82	80.7	95.3	88.8	93.6	96.6	96.9
D→A	88.4	91	85.3	92.3	92.1	92.3	92.4	85.6	83.8	92.5	88.1	93.4	93.6	93.2
D→C	80.3	85.1	75.5	84.1	86.3	85.8	86.5	76.9	71.6	88.5	84.3	85.9	83.4	88.8
D→W	99	99.7	98.6	96.3	99	98.6	99	96.9	97.6	99	96.6	98.9	98.6	98.4
W→A	84.3	91.1	72.4	90.6	90.4	91.4	90.7	83.6	72.1	91.1	90.7	93.6	92.7	92.9
W→C	76.5	85.3	70.3	81.5	85.5	84.7	85.6	74.3	67.4	88	82.6	86.3	84.8	85.4
W→D	100	100	99.4	96.3	100	100	100	99.4	100	100	98.1	100	100	100
Avg.	86.2	89.1	81.5	88.2	89.9	89.8	90.9	85.7	82.9	92.5	89	91.9	92.6	92.7

Table 6. Comparison of results with ResNet101-P5 features on the Office-Home.

DATA	GFK	JDA	DIP	CDDA	JGSA	DGA-DA	CNN	SA	CORAL	DICE	LRDA
Ar→Cl	35.8	40.5	35.5	40.8	40.8	40.8	36	36.7	36.3	42.6	42.5
Ar→Pr	54.4	58.9	54.3	57.7	58.2	57.7	53.7	54.7	54.1	61.1	60.9
Ar→Rw	65	67.5	64.9	66.3	67.5	66.3	64.6	65.3	65.3	68.3	68.6
Cl→Ar	39.2	40.8	39.5	41.3	40.8	41.3	39.2	39.9	39.2	43.3	43.1
Cl→Pr	48.4	51.9	48	51.7	52	51.7	48.9	48.3	47.9	54.3	54.7
Cl→Rw	51.8	55.2	51.9	53.9	54.6	53.9	51.7	51.4	51.5	57.1	57.5
Pr→Ar	42.3	45.1	41.9	46.1	45.3	46.1	41.2	41.4	41.5	48.3	48.4
Pr→Cl	32.5	33.3	32.1	35.4	33.5	35.4	32.9	33	32.7	35.9	35.5
Pr→Rw	64.1	67.2	63.7	66	66.4	66	63.2	64.4	64.1	69.2	68.8
Rw→Ar	58.1	58.8	58.2	59.1	58.5	59.1	57.9	57.8	57.7	60.2	59.3
Rw→Cl	39.5	44.2	39.2	45.3	43.8	45.3	39.5	40	39.5	46.2	46.0
Rw→Pr	69.6	72.4	69.6	71.6	72.4	71.6	68.9	69.5	69.3	73.5	72.7
Avg.	50.1	53	49.9	52.9	52.8	52.9	49.8	50.2	49.9	55	54.8

Table 7. Comparison of results with VGG-F7 features on the Office-Home.

DATA	GSDA	GVB-GD	SRDC	SFDA	JAN	CDDA	JGSA	DICE	DAN	DANN	LRDA
Ar→Cl	61.3	57	52.3	59.7	45.9	42.6	41.9	43.5	43.6	45.6	44.4
Ar→Pr	76.1	74.7	76.3	79.5	61.2	62.3	63.1	64.2	57	59.3	64.2
Ar→Rw	79.4	79.8	81	82.4	68.9	69.2	70.1	70.5	67 9	70.1	70.6
Cl→Ar	65.4	64.6	69.5	69.7	50.4	46.3	46.5	46.7	45.8	47	46.1
Cl→Pr	73.3	74.1	76.2	78.6	59.7	52.9	54.1	54.6	56.5	58.5	56.9
Cl→Rw	74.3	74.6	78	79.2	61	57.8	59	59.4	60.4	60.9	59.5
Pr→Ar	65	65.2	68.7	66.1	45.8	50.5	50.4	51.8	44	46.1	50.7
Pr→Cl	53.2	55.1	53.8	57.2	43.4	43	42	43.9	43.6	43.7	44.1
Pr→Rw	80	81	81.7	82.6	70.3	69.6	70.5	71.8	67.7	68.5	69.8
Rw→Ar	72.2	74.6	76.3	73.9	63.9	63.1	62.3	64.1	63.1	63.2	63.7
Rw→Cl	60.6	59.7	57.1	60.8	52.4	47.9	46.2	48	51.5	51.8	52.4
Rw→Pr	83.1	84.3	85	85.5	76.8	73.8	73.1	74.3	74.3	76.8	74.7
Avg.	70.3	70.4	71.3	72.9	58.3	56.6	56.6	57.7	56.3	57.6	58.1

Table 8. Recognition accuracies on the PIE Dataset with

l_{2}

regularization.

Table 8. Recognition accuracies on the PIE Dataset with

l_{2}

regularization.

DATA	DIP	OTGL	CDDA	JGSA	SA	CORAL	DGA-DA	GFK	JDA	CDDA	DICE	LRDA
P1→P2	29.9	59.4	76.3	62.2	32.8	31.8	76.4	42.2	76.1	77.5	85.1	83.6
P1→P3	32.8	58.7	72.3	60	34.5	31.9	72.5	53.7	74.1	77	86.9	86.4
P1→P4	36.7	0	92.1	80.6	43.5	41.8	92.1	69.8	92.8	90.3	95.2	95.9
P1→P5	12.7	48.4	60.7	45.1	22.5	19.9	60.8	43.9	70.8	67.8	71.8	72.7
P2→P1	25.8	61.9	77	68.2	27.7	26.6	77	43.2	79.8	77.3	76.8	77.1
P2→P3	53.4	64.4	77.5	64.9	37.3	35	77.5	54	80.2	81.1	79.1	78.6
P2→P4	50.1	0	87.1	77.6	58.5	59.7	87.1	69.1	90.4	88.5	93.5	93.6
P2→P5	29.5	52.7	64.3	52.3	27.1	25.9	63.6	42.6	68.3	70.4	72.3	71.6
P3→P1	22.7	57.9	80.8	62.9	29.1	25.1	80.8	51.6	78.3	81.2	80.1	80.7
P3→P2	36.3	64.7	72.2	60.3	37	36.5	72.2	52	81.2	82	77.7	80.4
P3→P4	45.8	0	84.7	71	54.8	54	84.7	72.1	92.3	91.7	95.1	94.8
Avg.	34.2	42.6	76.8	64.1	36.8	35.3	76.8	54.0	80.4	80.4	83.2	83.2

Table 9. Recognition accuracies on the PIE Dataset with z-score standardization.

DATA	DIP	OTGL	CDDA	JGSA	SA	CORAL	DGA-DA	GFK	JDA	CDDA	DICE	LRDA
P3→P5	20.2	52.8	64.3	51.2	30.5	26	64.5	50.9	70.5	79.9	78.1	78.6
P4→P1	31.4	0	93.6	84.4	52.4	48.3	93.4	72.6	95.1	89.7	96.8	96.1
P4→P2	67.5	0	93.2	83.5	70	69.7	93.2	75.8	94.7	94.8	96.6	95.7
P4→P3	76.8	0	92.2	80.8	72.7	72.7	92.2	80.9	92.4	92.1	94.5	94.0
P4→P5	36.5	0	74	65.9	48.6	48.5	74	61.1	80.8	85.1	90.4	89.8
P5→P1	14.2	45.7	68.1	53.5	34.5	32	67.7	45.3	64.1	67.3	79.4	78.8
P5→P2	29.3	51.3	65.1	57.5	30.9	30.4	65.4	38.9	74.2	74.5	71.4	72.2
P5→P3	31.7	52.6	70.5	54.3	31.9	32.6	71.6	47.7	75.3	79.3	82.7	82.3
P5→P4	26.3	0	79.7	62.3	45.1	44.5	79.7	59.3	81.7	80.3	89.5	88.9
Avg.	37.1	22.5	77.9	65.9	46.3	45.0	78.0	59.2	81.0	82.3	86.6	86.3

Table 10. Recognition accuracies on the MNIST-USPS and COIL20 dataset.

DATA	GFK	SA	JDA	DIP	CDDA	ILS	JGSA	DGA-DA	CORAL	DICE	CPDM	LRDA
M→U	68.6	67.8	70.6	67.1	76.2	71.2	80.4	82.3	35.8	79.7	82.1	80.6
U→M	50.1	48.8	60	46.3	62.1	54.9	68.2	70.8	36.4	59.8	68.8	71.1
MU-Avg.	59.3	58.3	65.3	56.7	69.1	63	74.3	76.5	36.1	69.8	75.4	75.8
C1→C2	86.4	86.8	94.7	84.6	91.5	86.9	95.4	99.6	82.1	92.5	98.5	99.2
C2→C1	85	85	93.5	84	93.9	86.9	93.9	99.7	81.8	94.4	98.7	99.3
C1C2-Avg.	85.7	85.9	94.1	84.3	92.7	86.9	94.7	99.7	81.9	93.5	98.6	99.3

Table 11. Computation time(s) of each domain adaptation method on Office31.

Task	JDA	DIP	SA	GFK	DICE	CDDA	CORAL	LRDA
A→D	71.2	65.1	74.7	77.6	126.3	116.6	64.2	114.8
D→W	45.5	54.4	59.6	723	71.33	96.4	48.4	92.8
W→A	147.1	220.3	135.5	234.7	240.9	319.1	238.2	225.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, B.; Pan, J.; Zhang, Z.; Yang, F. Cross-Domain Residual Learning for Shared Representation Discovery. Information 2025, 16, 852. https://doi.org/10.3390/info16100852

AMA Style

Zhao B, Pan J, Zhang Z, Yang F. Cross-Domain Residual Learning for Shared Representation Discovery. Information. 2025; 16(10):852. https://doi.org/10.3390/info16100852

Chicago/Turabian Style

Zhao, Baoqi, Jie Pan, Zhijie Zhang, and Fang Yang. 2025. "Cross-Domain Residual Learning for Shared Representation Discovery" Information 16, no. 10: 852. https://doi.org/10.3390/info16100852

APA Style

Zhao, B., Pan, J., Zhang, Z., & Yang, F. (2025). Cross-Domain Residual Learning for Shared Representation Discovery. Information, 16(10), 852. https://doi.org/10.3390/info16100852

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Domain Residual Learning for Shared Representation Discovery

Abstract

1. Introduction

2. LRDA

2.1. Notation & Definition

2.2. Formulation of LRDA

2.3. Final Formulation

3. Optimization Algorithm

3.1. Optimization Procedure

3.1.1. Optimize V s , V t by Fixing U s , U t , F and θ

3.1.2. Optimize U s , U t by Fixing V s , V t , F and θ

3.1.3. Optimize F by Fixing U s , U t , V s , V t and θ

3.1.4. Optimize θ by Fixing U s , U t , V s , V t and F

3.2. Algorithm Description

4. Experiment and Analysis

4.1. Datasets

4.2. Benchmark Methods & Experimental Settings

4.3. Results of Cross-Domain Recognition

4.3.1. Results and on the Office31 Dataset

4.3.2. Results on the Office-Caltech Dataset

4.3.3. Results on the Office-Home Dataset

4.3.4. Results on the PIE Dataset

4.3.5. Results on the MNIST-USPS & COIL20 Dataset

4.4. Convergence Analysis

4.5. Complexity Comparison

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.1. Optimize $V_{s}$ , $V_{t}$ by Fixing $U_{s}$ , $U_{t}$ , F and $θ$

3.1.2. Optimize $U_{s}$ , $U_{t}$ by Fixing $V_{s}$ , $V_{t}$ , F and $θ$

3.1.3. Optimize F by Fixing $U_{s}$ , $U_{t}$ , $V_{s}$ , $V_{t}$ and $θ$

3.1.4. Optimize $θ$ by Fixing $U_{s}$ , $U_{t}$ , $V_{s}$ , $V_{t}$ and F