MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks
Abstract
1. Introduction
2. Related Work and Key Concepts
2.1. Domain Adaptation
2.2. RKHS, Kernels, and the Kernel Trick
2.3. Maximum Mean Discrepancy (MMD)
2.4. The Mean Discrepancy with a Deep Kernel
2.5. Class-Wise Maximum Mean Discrepancy
2.6. Discriminative Class-Wise MMD Based on Euclidean Distance
2.7. Discriminative Class-Wise MMD Based on Gaussian Kernels
3. The Proposed Method
3.1. Deep Kernel Training Network
Algorithm 1 Training the MMDDK | |
Input: ; Initialize ; repeat until convergence mini-batch from ; mini-batch from ; | |
; | #using (11) |
; | #using (12) with |
/ ; | #using (10) |
#update parameters: | |
; | #maximizing J |
end repeat |
3.2. Meta-Learning of Maximum Mean Discrepancy
Algorithm 2 Training MTMMD | |
Input: ; Initialize and : sets of parameters for MMDDK Model and MTMDD Model , ; # ; for t 0 to T do mini-batch from ; mini-batch from ; ; # #alternatively update parameters ω and ψ: | |
; | #using (11) |
; | #using (12) with |
#using (10) | |
if t is even then | |
; ; | |
; | #maximizing |
else | |
; | #maximizing J |
end for |
Algorithm 3 MTMMD Inferencing |
Input: and ; mini-batch from ; mini-batch from ; ; #; ; return |
3.3. MeTa Discriminative Class-Wise Maximum Mean Discrepancy
Algorithm 4 Training MCWMMD model | |
Input: Initialize parameters and # train the model parameters and on and repeat until convergence mini-batch from mini-batch from ; F F #generate pseudo labels: | |
; | #classify target samples |
#obtain pseudo labels | |
# (()) ; | |
# evaluate losses: | |
; | #using (41) |
#using (44) | |
; | #using (45) |
#using (42) | |
# update andto minimize | |
; | |
end repeat |
4. Experimental Results
4.1. Data Preparation
4.2. Experimental Setting
4.3. Results
5. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lin, Y.; Chen, J.; Cao, Y.; Zhou, Y.; Zhang, L.; Tang, Y.Y.; Wang, S. Cross-domain recognition by identifying joint subspaces of source domain and target Domain. IEEE Trans. Cybern. 2017, 47, 1090–1101. [Google Scholar] [CrossRef]
- Khan, S.; Guo, Y.; Ye, Y.; Li, C.; Wu, Q. Mini-batch dynamic geometric embedding for unsupervised domain adaptation. Neural Process. Lett. 2023, 55, 2063–2080. [Google Scholar] [CrossRef]
- Zhang, W.; Ouyang, W.; Li, W.; Xu, D. Collaborative and adversarial network for unsupervised domain adaptation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar] [CrossRef]
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2962–2971. [Google Scholar] [CrossRef]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 97–105. [Google Scholar]
- Vettoruzzo, A.; Bouguelia, M.-R.; Rögnvaldsson, T.S. Meta-learning for efficient unsupervised domain adaptation. Neurocomputing 2024, 574, 127264. [Google Scholar] [CrossRef]
- Liu, X.; Yoo, C.; Xing, F.; Oh, H.; El Fakhri, G.; Kang, J.-W.; Woo, J. Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives. arXiv 2022, arXiv:2208.07422v1. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, S.; Jiang, W.; Zhang, Y.; Lu, J.; Kwok, J.T. Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation. arXiv 2023, arXiv:2309.14360v1. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Wu, Z.; Weng, Z.; Chen, J.; Qi, G.-J.; Jiang, Y.-G. Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation. arXiv 2022, arXiv:2106.05528v2. [Google Scholar] [CrossRef]
- Luo, Y.-W.; Ren, C.-X.; Dai, D.-Q.; Yan, H. Unsupervised Domain Adaptation via Discriminative Manifold Propagation. IEEE Trans. Pattern Anal. Machine Intell. 2022, 44, 1653–1669. [Google Scholar] [CrossRef] [PubMed]
- Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K. Cycada: Cycle-consistent adversarial domain adaptation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1989–1998. [Google Scholar]
- Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. arXiv 2018, arXiv:1712.02560v4. [Google Scholar] [CrossRef]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster R-CNN for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Si, S.; Tao, D.; Geng, B. Bregman divergence based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 929–942. [Google Scholar] [CrossRef]
- Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Wortman, J. Learning bounds for domain adaptation. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 129–136. [Google Scholar]
- Ding, Z.; Fu, Y. Robust transfer metric learning for image classification. IEEE Trans. Image Process. 2017, 26, 660–670. [Google Scholar] [CrossRef]
- Gretton, A.; Borgwardt, K.; Rasch, M.; Sch, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
- Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef] [PubMed]
- Song, L.; Gretton, A.; Bickson, D.; Low, Y.; Guestrin, C. Kernel belief propagation. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 707–715. [Google Scholar]
- Park, M.; Jitkrittum, W.; Sejdinovic, D. K2-ABC: Approximate bayesian computation with kernel embeddings. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016; pp. 398–407. [Google Scholar]
- Li, Y.; Swersky, K.; Zemel, R.S. Generative moment matching networks. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1718–1727. [Google Scholar]
- Liu, F.; Xu, W.; Lu, J.; Zhang, G.; Gretton, A.; Sutherland, D. Learning deep kernels for non-parametric two-sample tests. arXiv 2020, arXiv:2002.09116. [Google Scholar]
- Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; IEEE Computer Society: Sydney, Australia, 2013; pp. 2200–2207. [Google Scholar]
- Wang, W.; Li, H.; Ding, Z.; Wang, Z. Rethink maximum mean discrepancy for domain adaptation. arXiv 2020, arXiv:2007.00689. [Google Scholar] [CrossRef]
- Devroye, L.; Lugosi, G. Combinatorial Methods in Density Estimation; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
- Baraud, Y.; Birgé, L. Rho-estimators revisited: General theory and applications. Ann. Statist. 2018, 46, 3767–3804. [Google Scholar] [CrossRef]
- Lin, H.-W.; Tsai, Y.; Lin, H.J.; Yu, C.-H.; Liu, M.-H. Unsupervised domain adaptation deep network based on discriminative class-wise MMD. AIMS Math. 2024, 9, 6628–6647. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Denil, M.; Colmenarejo, S.G.; Hoffman, M.W.; Pfau, D.; Schaul, T.; de Freitas, N. Learning to learn by gradient descent by gradient descent. arXiv 2016, arXiv:1606.04474v2. [Google Scholar] [CrossRef]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Zheng, S.; Ding, C.; Nie, F.; Huang, H. Harmonic mean linear discriminant analysis. IEEE Trans. Knowl. Data Eng. 2019, 31, 1520–1531. [Google Scholar] [CrossRef]
- Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.-P.; Sch, B.; Smola, A.J. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef]
- Gao, W.; Shao, M.; Shu, J.; Zhuang, X. Meta-BN Net for few-shot learning. Front. Comput. Sci. 2023, 17, 171302. [Google Scholar] [CrossRef]
- Bechtle, S.; Molchanov, A.; Chebotar, Y.; Grefenstette, E.; Righetti, L.; Sukhatme, G.; Meier, F. Meta-learning via learned loss. arXiv 2019, arXiv:1906.05374. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G. When does label smoothing help? In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Machine Intell. 1994, 16, 550–555. [Google Scholar] [CrossRef]
- SVHN Dataset. Available online: https://www.openml.org/search?type=data&sort=runs&id=41081&status=active (accessed on 1 March 2024).
- Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6314, pp. 213–226. [Google Scholar] [CrossRef]
- Office-Home Dataset. Available online: https://www.hemanthdv.org/officeHomeDataset.html (accessed on 1 March 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Adversarial dropout regularization. arXiv 2018, arXiv:1711.01575. [Google Scholar] [CrossRef]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
- Lee, C.Y.; Batra, T.; Baig, M.H.; Ulbricht, D. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10285–10295. [Google Scholar]
- Liang, J.; Hu, D.; Feng, J. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 6028–6039. [Google Scholar]
- Pei, Z.; Cao, Z.; Long, M.; Wang, J. Multi-adversarial domain adaptation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. Available online: http://www.jmlr.org/papers/v9/vandermaaten08a.html (accessed on 1 March 2024).
Symbol | Meaning |
---|---|
Set of samples from the source domain. | |
Set of samples from the target domain. | |
Set of samples of class c from the source domain. | |
Set of samples of class c from the target domain. | |
Union of source and target samples for class c. | |
Union of all samples from source and target domains. | |
A single sample from the source domain. | |
A single sample from the target domain. | |
Set of feature vectors of samples from the source domain. | |
Set of feature vectors of samples from the target domain. | |
Feature vector of sample from the source domain. | |
Feature vector of sample from the target domain. | |
Number of samples in the source and target domains, respectively. | |
Number of samples of class c in the source and target domains, respectively. | |
Mean of samples in the source and target domains, respectively. | |
Mean of samples of class c in the source and target domains, respectively. | |
Deep kernel function mapping features into latent space. | |
Set of parameters of the feature extractor network. | |
Hyperparameters for balancing loss components. | |
Learning rate for optimization. |
Target Methods | U | M | M | Average |
---|---|---|---|---|
Source-only | 69.6 | 82.2 | 67.1 | 73.0 |
ADDA [5] | 90.1 | 89.4 | 76.0 | 85.2 |
ADR [43] | 93.1 | 93.2 | 95.0 | 93.8 |
CDAN [44] | 98.0 | 95.6 | 89.2 | 94.3 |
CyCADA [12] | 96.5 | 95.6 | 90.4 | 94.2 |
SWD [45] | 97.1 | 98.1 | 98.9 | 98.0 |
SHOT [46] | 97.8 | 97.6 | 99.0 | 98.1 |
DCWMMD [28] | 98.0 | 98.2 | 98.8 | 98.3 |
MCWMMD | 98.5 | 98.3 | 98.9 | 98.6 |
Target-supervised | 98.9 | 99.4 | 99.4 | 99.2 |
Methods | D | W | A | W | A | D | Average |
---|---|---|---|---|---|---|---|
Source-only | 68.90 | 68.40 | 62.50 | 96.70 | 60.70 | 99.30 | 76.10 |
Wang et al. [25] | 90.80 | 88.90 | 75.48 | 98.50 | 75.20 | 99.80 | 88.10 |
DAN [6] | 78.60 | 80.50 | 63.60 | 97.10 | 62.80 | 99.60 | 80.40 |
DANN [4] | 79.70 | 82.00 | 68.20 | 96.90 | 67.40 | 99.10 | 82.20 |
ADDA [5] | 77.80 | 86.20 | 69.50 | 96.20 | 68.90 | 98.40 | 82.90 |
MADA [47] | 87.80 | 90.00 | 70.30 | 97.40 | 66.40 | 99.60 | 85.20 |
SHOT [46] | 93.90 | 90.10 | 75.30 | 98.70 | 75.00 | 99.90 | 88.80 |
CAN [3] | 95.00 | 94.50 | 78.00 | 99.10 | 77.00 | 99.80 | 90.60 |
MDGE [2] | 90.60 | 89.40 | 69.50 | 98.90 | 68.40 | 99.80 | 86.10 |
DACDM [9] | 95.31 | 95.51 | 78.26 | 98.58 | 78.43 | 99.93 | 91.01 |
CDCL [10] | 96.00 | 96.00 | 77.20 | 99.20 | 75.50 | 100 | 90.60 |
DMP [11] | 91.00 | 93.00 | 71.40 | 99.00 | 70.20 | 100 | 87.40 |
DCWMMD [28] | 96.30 | 94.90 | 77.90 | 99.50 | 76.50 | 99.60 | 90.80 |
MCWMMD | 96.70 | 96.60 | 78.40 | 99.60 | 78.60 | 99.83 | 91.62 |
Target-supervised | 98.00 | 98.70 | 86.00 | 98.70 | 86.00 | 98.00 | 94.30 |
Methods | Cl | Pr | Rw | Ar | Pr | Rw | Ar | Cl | Rw | Ar | Cl | Pr | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
source-only | 28.07 | 38.30 | 42.05 | 26.15 | 40.57 | 39.14 | 25.94 | 28.40 | 46.61 | 27.10 | 30.12 | 55.35 | 35.65 |
Wang et al. [25] | 58.44 | 77.79 | 79.32 | 61.60 | 72.81 | 73.03 | 62.71 | 55.33 | 78.91 | 70.42 | 60.09 | 83.24 | 69.47 |
DAN [6] | 43.60 | 57.00 | 67.90 | 45.80 | 56.50 | 60.40 | 44.00 | 43.60 | 67.70 | 63.10 | 51.50 | 74.30 | 56.28 |
DANN [4] | 45.60 | 59.30 | 70.10 | 47.00 | 58.50 | 60.90 | 46.10 | 43.70 | 68.50 | 63.20 | 51.80 | 76.80 | 57.63 |
DACDM [9] | 60.94 | 79.27 | 83.34 | 69.67 | 81.53 | 80.40 | 65.06 | 58.97 | 83.45 | 75.90 | 65.61 | 85.99 | 74.18 |
DMP [11] | 59.00 | 81.20 | 86.30 | 68.10 | 72.80 | 78.80 | 71.20 | 57.60 | 84.90 | 77.30 | 61.50 | 82.90 | 73.50 |
DCWMMD [28] | 59.69 | 80.23 | 81.31 | 70.24 | 79.45 | 82.65 | 69.20 | 57.87 | 85.12 | 74.80 | 64.20 | 83.14 | 73.99 |
MCWMMD | 62.21 | 81.46 | 83.92 | 71.45 | 79.98 | 83.42 | 71.08 | 59.12 | 85.64 | 76.10 | 65.32 | 85.48 | 75.43 |
target-supervised | 93.24 | 92.35 | 92.08 | 91.16 | 92.35 | 92.08 | 91.16 | 93.24 | 92.08 | 91.16 | 93.24 | 92.35 | 92.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, H.-W.; Ho, T.-T.; Tu, C.-T.; Lin, H.-J.; Yu, C.-H. MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics 2025, 13, 226. https://doi.org/10.3390/math13020226
Lin H-W, Ho T-T, Tu C-T, Lin H-J, Yu C-H. MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics. 2025; 13(2):226. https://doi.org/10.3390/math13020226
Chicago/Turabian StyleLin, Hsiau-Wen, Trang-Thi Ho, Ching-Ting Tu, Hwei-Jen Lin, and Chen-Hsiang Yu. 2025. "MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks" Mathematics 13, no. 2: 226. https://doi.org/10.3390/math13020226
APA StyleLin, H.-W., Ho, T.-T., Tu, C.-T., Lin, H.-J., & Yu, C.-H. (2025). MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics, 13(2), 226. https://doi.org/10.3390/math13020226