MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks
Abstract
:1. Introduction
2. Related Work and Key Concepts
2.1. Domain Adaptation
2.2. RKHS, Kernels, and the Kernel Trick
2.3. Maximum Mean Discrepancy (MMD)
2.4. The Mean Discrepancy with a Deep Kernel
2.5. Class-Wise Maximum Mean Discrepancy
2.6. Discriminative Class-Wise MMD Based on Euclidean Distance
2.7. Discriminative Class-Wise MMD Based on Gaussian Kernels
3. The Proposed Method
3.1. Deep Kernel Training Network
Algorithm 1 Training the MMDDK | |
Input: ; Initialize ; repeat until convergence mini-batch from ; mini-batch from ; | |
; | #using (11) |
; | #using (12) with |
/ ; | #using (10) |
#update parameters: | |
; | #maximizing J |
end repeat |
3.2. Meta-Learning of Maximum Mean Discrepancy
Algorithm 2 Training MTMMD | |
Input: ; Initialize and : sets of parameters for MMDDK Model and MTMDD Model , ; # ; for t 0 to T do mini-batch from ; mini-batch from ; ; # #alternatively update parameters ω and ψ: | |
; | #using (11) |
; | #using (12) with |
#using (10) | |
if t is even then | |
; ; | |
; | #maximizing |
else | |
; | #maximizing J |
end for |
Algorithm 3 MTMMD Inferencing |
Input: and ; mini-batch from ; mini-batch from ; ; #; ; return |
3.3. MeTa Discriminative Class-Wise Maximum Mean Discrepancy
Algorithm 4 Training MCWMMD model | |
Input: Initialize parameters and # train the model parameters and on and repeat until convergence mini-batch from mini-batch from ; F F #generate pseudo labels: | |
; | #classify target samples |
#obtain pseudo labels | |
# (()) ; | |
# evaluate losses: | |
; | #using (41) |
#using (44) | |
; | #using (45) |
#using (42) | |
# update andto minimize | |
; | |
end repeat |
4. Experimental Results
4.1. Data Preparation
4.2. Experimental Setting
4.3. Results
5. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lin, Y.; Chen, J.; Cao, Y.; Zhou, Y.; Zhang, L.; Tang, Y.Y.; Wang, S. Cross-domain recognition by identifying joint subspaces of source domain and target Domain. IEEE Trans. Cybern. 2017, 47, 1090–1101. [Google Scholar] [CrossRef]
- Khan, S.; Guo, Y.; Ye, Y.; Li, C.; Wu, Q. Mini-batch dynamic geometric embedding for unsupervised domain adaptation. Neural Process. Lett. 2023, 55, 2063–2080. [Google Scholar] [CrossRef]
- Zhang, W.; Ouyang, W.; Li, W.; Xu, D. Collaborative and adversarial network for unsupervised domain adaptation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar] [CrossRef]
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2962–2971. [Google Scholar] [CrossRef]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 97–105. [Google Scholar]
- Vettoruzzo, A.; Bouguelia, M.-R.; Rögnvaldsson, T.S. Meta-learning for efficient unsupervised domain adaptation. Neurocomputing 2024, 574, 127264. [Google Scholar] [CrossRef]
- Liu, X.; Yoo, C.; Xing, F.; Oh, H.; El Fakhri, G.; Kang, J.-W.; Woo, J. Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives. arXiv 2022, arXiv:2208.07422v1. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, S.; Jiang, W.; Zhang, Y.; Lu, J.; Kwok, J.T. Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation. arXiv 2023, arXiv:2309.14360v1. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Wu, Z.; Weng, Z.; Chen, J.; Qi, G.-J.; Jiang, Y.-G. Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation. arXiv 2022, arXiv:2106.05528v2. [Google Scholar] [CrossRef]
- Luo, Y.-W.; Ren, C.-X.; Dai, D.-Q.; Yan, H. Unsupervised Domain Adaptation via Discriminative Manifold Propagation. IEEE Trans. Pattern Anal. Machine Intell. 2022, 44, 1653–1669. [Google Scholar] [CrossRef] [PubMed]
- Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K. Cycada: Cycle-consistent adversarial domain adaptation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1989–1998. [Google Scholar]
- Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. arXiv 2018, arXiv:1712.02560v4. [Google Scholar] [CrossRef]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster R-CNN for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Si, S.; Tao, D.; Geng, B. Bregman divergence based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 929–942. [Google Scholar] [CrossRef]
- Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Wortman, J. Learning bounds for domain adaptation. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 129–136. [Google Scholar]
- Ding, Z.; Fu, Y. Robust transfer metric learning for image classification. IEEE Trans. Image Process. 2017, 26, 660–670. [Google Scholar] [CrossRef]
- Gretton, A.; Borgwardt, K.; Rasch, M.; Sch, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
- Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef] [PubMed]
- Song, L.; Gretton, A.; Bickson, D.; Low, Y.; Guestrin, C. Kernel belief propagation. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 707–715. [Google Scholar]
- Park, M.; Jitkrittum, W.; Sejdinovic, D. K2-ABC: Approximate bayesian computation with kernel embeddings. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016; pp. 398–407. [Google Scholar]
- Li, Y.; Swersky, K.; Zemel, R.S. Generative moment matching networks. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1718–1727. [Google Scholar]
- Liu, F.; Xu, W.; Lu, J.; Zhang, G.; Gretton, A.; Sutherland, D. Learning deep kernels for non-parametric two-sample tests. arXiv 2020, arXiv:2002.09116. [Google Scholar]
- Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; IEEE Computer Society: Sydney, Australia, 2013; pp. 2200–2207. [Google Scholar]
- Wang, W.; Li, H.; Ding, Z.; Wang, Z. Rethink maximum mean discrepancy for domain adaptation. arXiv 2020, arXiv:2007.00689. [Google Scholar] [CrossRef]
- Devroye, L.; Lugosi, G. Combinatorial Methods in Density Estimation; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
- Baraud, Y.; Birgé, L. Rho-estimators revisited: General theory and applications. Ann. Statist. 2018, 46, 3767–3804. [Google Scholar] [CrossRef]
- Lin, H.-W.; Tsai, Y.; Lin, H.J.; Yu, C.-H.; Liu, M.-H. Unsupervised domain adaptation deep network based on discriminative class-wise MMD. AIMS Math. 2024, 9, 6628–6647. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Denil, M.; Colmenarejo, S.G.; Hoffman, M.W.; Pfau, D.; Schaul, T.; de Freitas, N. Learning to learn by gradient descent by gradient descent. arXiv 2016, arXiv:1606.04474v2. [Google Scholar] [CrossRef]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Zheng, S.; Ding, C.; Nie, F.; Huang, H. Harmonic mean linear discriminant analysis. IEEE Trans. Knowl. Data Eng. 2019, 31, 1520–1531. [Google Scholar] [CrossRef]
- Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.-P.; Sch, B.; Smola, A.J. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef]
- Gao, W.; Shao, M.; Shu, J.; Zhuang, X. Meta-BN Net for few-shot learning. Front. Comput. Sci. 2023, 17, 171302. [Google Scholar] [CrossRef]
- Bechtle, S.; Molchanov, A.; Chebotar, Y.; Grefenstette, E.; Righetti, L.; Sukhatme, G.; Meier, F. Meta-learning via learned loss. arXiv 2019, arXiv:1906.05374. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G. When does label smoothing help? In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Machine Intell. 1994, 16, 550–555. [Google Scholar] [CrossRef]
- SVHN Dataset. Available online: https://www.openml.org/search?type=data&sort=runs&id=41081&status=active (accessed on 1 March 2024).
- Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6314, pp. 213–226. [Google Scholar] [CrossRef]
- Office-Home Dataset. Available online: https://www.hemanthdv.org/officeHomeDataset.html (accessed on 1 March 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Adversarial dropout regularization. arXiv 2018, arXiv:1711.01575. [Google Scholar] [CrossRef]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
- Lee, C.Y.; Batra, T.; Baig, M.H.; Ulbricht, D. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10285–10295. [Google Scholar]
- Liang, J.; Hu, D.; Feng, J. Do we really need to access the source data? Source hypothesis transfer for unsupervised domain adaptation. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 6028–6039. [Google Scholar]
- Pei, Z.; Cao, Z.; Long, M.; Wang, J. Multi-adversarial domain adaptation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. Available online: http://www.jmlr.org/papers/v9/vandermaaten08a.html (accessed on 1 March 2024).
Symbol | Meaning |
---|---|
Set of samples from the source domain. | |
Set of samples from the target domain. | |
Set of samples of class c from the source domain. | |
Set of samples of class c from the target domain. | |
Union of source and target samples for class c. | |
Union of all samples from source and target domains. | |
A single sample from the source domain. | |
A single sample from the target domain. | |
Set of feature vectors of samples from the source domain. | |
Set of feature vectors of samples from the target domain. | |
Feature vector of sample from the source domain. | |
Feature vector of sample from the target domain. | |
Number of samples in the source and target domains, respectively. | |
Number of samples of class c in the source and target domains, respectively. | |
Mean of samples in the source and target domains, respectively. | |
Mean of samples of class c in the source and target domains, respectively. | |
Deep kernel function mapping features into latent space. | |
Set of parameters of the feature extractor network. | |
Hyperparameters for balancing loss components. | |
Learning rate for optimization. |
Target Methods | U | M | M | Average |
---|---|---|---|---|
Source-only | 69.6 | 82.2 | 67.1 | 73.0 |
ADDA [5] | 90.1 | 89.4 | 76.0 | 85.2 |
ADR [43] | 93.1 | 93.2 | 95.0 | 93.8 |
CDAN [44] | 98.0 | 95.6 | 89.2 | 94.3 |
CyCADA [12] | 96.5 | 95.6 | 90.4 | 94.2 |
SWD [45] | 97.1 | 98.1 | 98.9 | 98.0 |
SHOT [46] | 97.8 | 97.6 | 99.0 | 98.1 |
DCWMMD [28] | 98.0 | 98.2 | 98.8 | 98.3 |
MCWMMD | 98.5 | 98.3 | 98.9 | 98.6 |
Target-supervised | 98.9 | 99.4 | 99.4 | 99.2 |
Methods | D | W | A | W | A | D | Average |
---|---|---|---|---|---|---|---|
Source-only | 68.90 | 68.40 | 62.50 | 96.70 | 60.70 | 99.30 | 76.10 |
Wang et al. [25] | 90.80 | 88.90 | 75.48 | 98.50 | 75.20 | 99.80 | 88.10 |
DAN [6] | 78.60 | 80.50 | 63.60 | 97.10 | 62.80 | 99.60 | 80.40 |
DANN [4] | 79.70 | 82.00 | 68.20 | 96.90 | 67.40 | 99.10 | 82.20 |
ADDA [5] | 77.80 | 86.20 | 69.50 | 96.20 | 68.90 | 98.40 | 82.90 |
MADA [47] | 87.80 | 90.00 | 70.30 | 97.40 | 66.40 | 99.60 | 85.20 |
SHOT [46] | 93.90 | 90.10 | 75.30 | 98.70 | 75.00 | 99.90 | 88.80 |
CAN [3] | 95.00 | 94.50 | 78.00 | 99.10 | 77.00 | 99.80 | 90.60 |
MDGE [2] | 90.60 | 89.40 | 69.50 | 98.90 | 68.40 | 99.80 | 86.10 |
DACDM [9] | 95.31 | 95.51 | 78.26 | 98.58 | 78.43 | 99.93 | 91.01 |
CDCL [10] | 96.00 | 96.00 | 77.20 | 99.20 | 75.50 | 100 | 90.60 |
DMP [11] | 91.00 | 93.00 | 71.40 | 99.00 | 70.20 | 100 | 87.40 |
DCWMMD [28] | 96.30 | 94.90 | 77.90 | 99.50 | 76.50 | 99.60 | 90.80 |
MCWMMD | 96.70 | 96.60 | 78.40 | 99.60 | 78.60 | 99.83 | 91.62 |
Target-supervised | 98.00 | 98.70 | 86.00 | 98.70 | 86.00 | 98.00 | 94.30 |
Methods | Cl | Pr | Rw | Ar | Pr | Rw | Ar | Cl | Rw | Ar | Cl | Pr | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
source-only | 28.07 | 38.30 | 42.05 | 26.15 | 40.57 | 39.14 | 25.94 | 28.40 | 46.61 | 27.10 | 30.12 | 55.35 | 35.65 |
Wang et al. [25] | 58.44 | 77.79 | 79.32 | 61.60 | 72.81 | 73.03 | 62.71 | 55.33 | 78.91 | 70.42 | 60.09 | 83.24 | 69.47 |
DAN [6] | 43.60 | 57.00 | 67.90 | 45.80 | 56.50 | 60.40 | 44.00 | 43.60 | 67.70 | 63.10 | 51.50 | 74.30 | 56.28 |
DANN [4] | 45.60 | 59.30 | 70.10 | 47.00 | 58.50 | 60.90 | 46.10 | 43.70 | 68.50 | 63.20 | 51.80 | 76.80 | 57.63 |
DACDM [9] | 60.94 | 79.27 | 83.34 | 69.67 | 81.53 | 80.40 | 65.06 | 58.97 | 83.45 | 75.90 | 65.61 | 85.99 | 74.18 |
DMP [11] | 59.00 | 81.20 | 86.30 | 68.10 | 72.80 | 78.80 | 71.20 | 57.60 | 84.90 | 77.30 | 61.50 | 82.90 | 73.50 |
DCWMMD [28] | 59.69 | 80.23 | 81.31 | 70.24 | 79.45 | 82.65 | 69.20 | 57.87 | 85.12 | 74.80 | 64.20 | 83.14 | 73.99 |
MCWMMD | 62.21 | 81.46 | 83.92 | 71.45 | 79.98 | 83.42 | 71.08 | 59.12 | 85.64 | 76.10 | 65.32 | 85.48 | 75.43 |
target-supervised | 93.24 | 92.35 | 92.08 | 91.16 | 92.35 | 92.08 | 91.16 | 93.24 | 92.08 | 91.16 | 93.24 | 92.35 | 92.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, H.-W.; Ho, T.-T.; Tu, C.-T.; Lin, H.-J.; Yu, C.-H. MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics 2025, 13, 226. https://doi.org/10.3390/math13020226
Lin H-W, Ho T-T, Tu C-T, Lin H-J, Yu C-H. MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics. 2025; 13(2):226. https://doi.org/10.3390/math13020226
Chicago/Turabian StyleLin, Hsiau-Wen, Trang-Thi Ho, Ching-Ting Tu, Hwei-Jen Lin, and Chen-Hsiang Yu. 2025. "MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks" Mathematics 13, no. 2: 226. https://doi.org/10.3390/math13020226
APA StyleLin, H.-W., Ho, T.-T., Tu, C.-T., Lin, H.-J., & Yu, C.-H. (2025). MeTa Learning-Based Optimization of Unsupervised Domain Adaptation Deep Networks. Mathematics, 13(2), 226. https://doi.org/10.3390/math13020226